class: center, middle, inverse, title-slide # Labour reallocation and remote work ## Evidence from millions of GitHub users during COVID-19 ### Grant McDermott and Ben Hansen ### University of Oregon --- class: inverse, center, middle name: prologue <style type="text/css"> # CSS for including pauses in printed PDF output (see bottom of lecture) @media print { .has-continuation { display: block !important; } } </style> <!-- # Table of contents --> <!-- 1. [Prologue](#prologue) --> <!-- --- --> # Introduction --- # Preview We analyse the impact of COVID-19 on tech sector productivity using data from: **GitHub** ([www.github.com](www.github.com)) - Several terabytes of data at incredibly high temporal resolution. - For a subset of users, we are also able to match location, job description, and gender. Main findings: - COVID-19 led to large and distinct changes in workplace habits at the global level. - More work online overall and more done outside of traditional office hours. - However, global trends mask quite a lot of heterogeneity at the local level. --- # Related literature -- count:false Look, there's a lot of it. _Many_ journals with COVID-19 special issues and colloquia. (E.g. [JPupE](https://www.sciencedirect.com/journal/journal-of-public-economics/special-issue/10JWB645FT5).) -- count:false To single out a few favourites and (most) closely related. Alternative data: - Google Trends (e.g. Brodeur et. al., [JPubE 2021](https://www.sciencedirect.com/science/article/pii/S0047272720302103)) - Electricity (e.g. Cicala, [NBER WP 2020](https://www.nber.org/digest-202012/working-homes-impact-electricity-use-pandemic)) - Cellphone (e.g. Goolsbee & Syverson, [JPubE 2021](https://www.sciencedirect.com/science/article/pii/S0047272720301754); Harris, [NBER WP 2020](https://www.nber.org/papers/w28132); Bravata et. al., [NBER WP 2021](https://www.nber.org/papers/w28645)) - Consumer spending, job postings, etc. (e.g. Chetty et al., [NBER WP 2020](https://www.nber.org/papers/w27431); Forsythe et. al., [JPubE, 2021](https://www.sciencedirect.com/science/article/abs/pii/S004727272030102X)) Gendered outcomes: - Deryugina et al. ([AEA P&P, 2021](https://secureservercdn.net/166.62.111.84/0zv.ccf.myftpupload.com/wp-content/uploads/2021/02/NBER-Working-Paper-Deryugina_Shurchkov_Stearns_2021.pdf)) - Albanesi and Kim ([NBER WP, 2021](https://www.nber.org/papers/w28505)) Labour market adjustment and WFH: - Barrero et. al. ([NBER WP, 2021](https://www.nber.org/papers/w28731)) - Bartik et. al. ([PNAS, 2020](https://www.pnas.org/content/early/2020/07/09/2006991117); [Brookings, 2021](https://www.brookings.edu/wp-content/uploads/2020/06/Bartik-et-al-conference-draft.pdf); [NBER WP, 2020](https://www.nber.org/papers/w27422)) --- # Just yesterday .pull-left[ <img src="pics/leonhardt-tweet-covid-schools.png" width="95%" style="display: block; margin: auto;" /> ] .pull-right[ Source: https://twitter.com/DLeonhardt/status/1389196436177358852 Article: https://www.nytimes.com/2021/05/03/briefing/schools-reopening-working-mothers.html ] --- class: inverse, center, middle name: github # GitHub --- # What is Git(Hub)? ### Git - A distributed version control system (created by Linus Torvalds to help manage contributions to the Linux kernel). - "Imagine that Dropbox and the 'Track Changes' feature in MS Word had a baby. Git would be that baby." - Git is a program that lives on your computer. ### GitHub - Cloud service that builds on top of Git, providing an array of additional features. Easy to share and contribute to code online, host files and websites, etc. - There are other, similar platforms. But GitHub is by far the largest. - More than 30 million user accounts (including 2.5 million with some form of geographic identification). --- # Data access Live session example(s). - [GitHub](https://github.com/grantmcdermott) | [API](https://api.github.com/users/grantmcdermott) | [BigQuery](https://console.cloud.google.com/bigquery?project=mcd-lab&p=githubarchive&d=year&t=2020&page=table) -- count:false Key takeaways: Git(Hub) is version control platform that allows millions of users to carefully track and edit code. - Activity is documented _to the second_ and logged via the GitHub API. -- count:false Key data sources: [GH Archive](https://www.gharchive.org/) and [GH Torrent](https://ghtorrent.org/). - Both provide data access via Google BigQuery. -- count:false Our primary measure of activity is GitHub "[events](https://docs.github.com/en/developers/webhooks-and-events/github-event-types)". - _Pushes_ (can contain multiple _commits_), _Pulls_, _Issues_, _Comments_, _Pull Requests_, etc. - We also measure "users". How many people logged on at least once during the relevant time period (day, hour, etc.) - Limitations: Only see public repos. Don't have geographic/demographic info for everyone (and mostly self-reported where we do). --- # Top cities by geo-tagged users .pull-left[ <img src="pics/top-cities.png" width="95%" style="display: block; margin: auto;" /> ] .pull-right[ In the slides that follow, I'm going to focus on the top 5 plus Seattle (no. 7). ] --- class: center, middle (Are you ready for lots of plots?) --- class: inverse, center, middle name: raw # Raw event data --- # Global <img src="slides_files/figure-html/g_daily_diff-1.png" width="95%" style="display: block; margin: auto;" /> --- # London <img src="slides_files/figure-html/lon_daily_diff-1.png" width="95%" style="display: block; margin: auto;" /> --- # NYC <img src="slides_files/figure-html/nyc_daily_diff-1.png" width="95%" style="display: block; margin: auto;" /> --- # SFO <img src="slides_files/figure-html/sfo_daily_diff-1.png" width="95%" style="display: block; margin: auto;" /> --- # Beijing <img src="slides_files/figure-html/bei_daily_diff-1.png" width="95%" style="display: block; margin: auto;" /> --- # Bengalaru <img src="slides_files/figure-html/blr_daily_diff-1.png" width="95%" style="display: block; margin: auto;" /> --- class: inverse, center, middle name: reallocation # Labour reallocation --- # Global activity on weekends <img src="slides_files/figure-html/g_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # Global activity on weekends <img src="slides_files/figure-html/g_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- name: cities # Cities Okay, so a big shift in work habits at the global level. What about individual cities? - Let's take a look. -- count:false At the same time, for users in a common geography/timezone, we can drill down further to look at labour reallocation *within a day*. - Are people working more outside of regular work hours? (Here: defined as **9 am to 6 pm**.) --- # London: Activity on weekends <img src="slides_files/figure-html/lon_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # London: Activity on weekends <img src="slides_files/figure-html/lon_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- # London: Activity outside office hours <img src="slides_files/figure-html/lon_prop_whours_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # London: Activity outside office hours <img src="slides_files/figure-html/lon_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- # Beijing: Activity on weekends <img src="slides_files/figure-html/bei_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # Beijing: Activity on weekends <img src="slides_files/figure-html/bei_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- # Beijing: Activity outside office hours <img src="slides_files/figure-html/bei_prop_whours_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # Beijing: Activity outside office hours <img src="slides_files/figure-html/bei_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- # Bengaluru: Activity on weekends <img src="slides_files/figure-html/blr_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # Bengaluru: Activity on weekends <img src="slides_files/figure-html/blr_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- # Bengaluru: Activity outside office hours <img src="slides_files/figure-html/blr_prop_whours_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # Bengaluru: Activity outside office hours <img src="slides_files/figure-html/blr_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- # US tech Hubs: Activity on weekends <img src="slides_files/figure-html/cities_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- # US tech hubs: Activity outside office hours <img src="slides_files/figure-html/cities_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- # NYC: Activity on weekends <img src="slides_files/figure-html/nyc_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # NYC: Activity on weekends <img src="slides_files/figure-html/nyc_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- # NYC: Activity outside office hours <img src="slides_files/figure-html/nyc_prop_whours_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # NYC: Activity outside office hours <img src="slides_files/figure-html/nyc_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- # SFO: Activity on weekends <img src="slides_files/figure-html/sfo_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # SFO: Activity on weekends <img src="slides_files/figure-html/sfo_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- # SFO: Activity outside office hours <img src="slides_files/figure-html/sfo_prop_whours_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # SFO: Activity outside office hours <img src="slides_files/figure-html/sfo_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- # SEA: Activity on weekends <img src="slides_files/figure-html/sea_prop_wends_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # SEA: Activity on weekends <img src="slides_files/figure-html/sea_prop_wends-1.png" width="95%" style="display: block; margin: auto;" /> --- # SEA: Activity outside office hours <img src="slides_files/figure-html/sea_prop_whours_guess-1.png" width="95%" style="display: block; margin: auto;" /> --- count: false # SEA: Activity outside office hours <img src="slides_files/figure-html/sea_prop_whours-1.png" width="95%" style="display: block; margin: auto;" /> --- class: inverse, center, middle name: demographics # User demographics --- # GitHub 🤝 LinkedIn <img src="pics/github-linkedin.png" width="95%" style="display: block; margin: auto;" /> --- # A Pyrrhic victory <img src="pics/eff-hiq-vs-linkedin.png" width="95%" style="display: block; margin: auto;" /> --- # Matched dataset for Seattle GitHub + Linkedin users After some very tedious webscraping, we match about **1,800** Seattle GitHub users to their LinkedIn profiles. Impute age based on education (degree and attendance dates). - `<30` (16%) - `30--39` (26%) - `40--49` (14%) - `>50` (7%) - Unmatched (38%) Impute gender using a mix of automated name-matching (good for Anglo names) and grad student crowd-sourcing (good for non-Anglo names). - `male` (90%) - `female` (7%) - Unmatched (3%) -- In the plots that follow, I'm going to mark three "treatment" dates: 1) Microsoft and Amazon mandate remote work, 2) school closures, and 3) mayoral stay-at-home order. --- # Age × gender effects: males, under 30 <img src="slides_files/figure-html/age_gender_mu30-1.png" width="95%" style="display: block; margin: auto;" /> --- # Age × gender effects: females, under 30 <img src="slides_files/figure-html/age_gender_fu30-1.png" width="95%" style="display: block; margin: auto;" /> --- # Age × gender effects: males, 30--39 <img src="slides_files/figure-html/age_gender_m30_39-1.png" width="95%" style="display: block; margin: auto;" /> --- # Age × gender effects: females, 30--39 <img src="slides_files/figure-html/age_gender_f30_39-1.png" width="95%" style="display: block; margin: auto;" /> --- # Age × gender effects: males, 40--49 <img src="slides_files/figure-html/age_gender_m40_49-1.png" width="95%" style="display: block; margin: auto;" /> --- # Age × gender effects: females, 40--49 <img src="slides_files/figure-html/age_gender_f40_49-1.png" width="95%" style="display: block; margin: auto;" /> --- class: inverse, center, middle name: conclusion # Conclusions --- # Aside: DiD example (London time of day) <img src="pics/lon-did.png" width="95%" style="display: block; margin: auto;" /> --- # Conclusions We use a ~~novel~~ criminally underused observational dataset from the tech sector: **GitHub** ([www.github.com](www.github.com)) - The first of (hopefully) many projects using these data. We see a pronounced reallocation of work habits at the global level. - Overall increase in activity and more work on weekends. - These trends persist for most of 2021, although some reversion to baseline. Global trends mask considerable heterogeneity at the local level. - Generally coincide with local lockdowns and remote work mandates, but different work "cultures" and demographic features yield differential results.... sometimes of the opposing sign. Panel-based analysis is potentially complicated (SUTVA violations), although year-on-year and BSTS comparisons confirm what is directly observable in the data.