A researcher has to adjust the standard errors (SEs) for a regression model that she has already run. Maybe this is to appease a journal referee. Or, maybe it’s because she is busy iterating through the early stages of a project. She’s still getting to grips with her data and wants to understand how sensitive her results are to different modeling assumptions.
Does that sound familiar? I believe it should, because something like that has happened to me on every single one of my empirical projects. I end up estimating multiple versions of the same underlying regression model — even putting them side-by-side in a regression table, where the only difference across columns is slight tweaks to the way that the SEs were calculated.
Confronted by this task, I’m willing to bet that most people do the following:
Run their model under one SE specification (e.g. iid).
Re-run their model a second time under another SE specification (e.g. HC robust).
Re-run their model a third time under yet another SE specification (e.g. clustered).
Etc.
While this is fine as far as it goes, I’m here to tell you that there’s a better way. Rather than re-running your model multiple times, I’m going to advocate that you run your model only once and then adjust SEs on the backend as needed. This approach — what I’ll call “on-the-fly” SE adjustment — is not only safer, it’s much faster too.
Let’s see some examples.
Example 1: sandwich
To the best of my knowledge, on-the-fly SE adjustment was introduced to R by the sandwich package (@Achim Zeilles et al.) This package has been around for well over a decade and is incredibly versatile, providing an object-orientated framework for recomputing variance-covariance (VCOV) matrix estimators — and thus SEs — for a wide array of model objects and classes. At the same time, sandwich just recently got its own website to coincide with some cool new features. So it’s worth exploring what that means for a modern empirical workflow. In the code that follows, I’m going to borrow liberally from the introductory vignette. But I’ll also tack on some additional tips and tricks that I use in my own workflow. (UPDATE (2020-08-23): The vignette has now been updated to include some of the suggestions from this post. Thanks Achim!)
Let’s start by running a simple linear regression on some sample data; namely, the “PetersenCL” dataset that comes bundled with the package.
library(sandwich)data('PetersenCL')m =lm(y ~ x, data = PetersenCL)summary(m)
Call:
lm(formula = y ~ x, data = PetersenCL)
Residuals:
Min 1Q Median 3Q Max
-6.7611 -1.3680 -0.0166 1.3387 8.6779
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02968 0.02836 1.047 0.295
x 1.03483 0.02858 36.204 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.005 on 4998 degrees of freedom
Multiple R-squared: 0.2078, Adjusted R-squared: 0.2076
F-statistic: 1311 on 1 and 4998 DF, p-value: < 2.2e-16
Our simple model above assumes that the errors are iid. But we can adjust these SEs by calling one of the many alternate VCOV estimators provided by sandwich. For example, to get a robust, or heteroscedasticity-consistent (“HC3”), VCOV matrix we’d use:
vcovHC(m)
(Intercept) x
(Intercept) 8.046458e-04 -1.155095e-05
x -1.155095e-05 8.072475e-04
To actually substitute the robust VCOV into our original model — so that we can print it in a nice regression table and perform statistical inference — we pair sandwich with its companion package, lmtest. The workhorse function here is lmtest::coeftest and, as we can see, this yields an object that is similar to a standard model summary in R.
library(lmtest)coeftest(m, vcov = vcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680 0.028366 1.0463 0.2955
x 1.034833 0.028412 36.4223 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To recap: We ran our base m model just the once and then adjusted for robust SEs on the backend using sandwich/coeftest.
Now, I’ll admit that the benefits of this workflow aren’t super clear from my simple example yet. Though, we did cut down on copying-and-pasting of duplicate code and this automatically helps to minimize user error. (Remember: DRY!) But we can easily scale things up to get a better sense of its power. For instance, we could imagine calling a whole host of alternate VCOVs to our base model.
## Calculate the VCOV (SEs) under a range of different assumptionsvc =list("Standard"=vcov(m),"Sandwich (basic)"=sandwich(m),"Clustered"=vcovCL(m, cluster =~ firm),"Clustered (two-way)"=vcovCL(m, cluster =~ firm + year),"HC3"=vcovHC(m),"Andrews' kernel HAC"=kernHAC(m),"Newey-West"=NeweyWest(m),"Bootstrap"=vcovBS(m),"Bootstrap (clustered)"=vcovBS(m, cluster =~ firm) )
You could, of course, print the vc list to screen now if you so wanted. But I want to go one small step further by showing you how easy it is to create a regression table that encapsulates all of these different models. In the next code chunk, I’m going to create a list of models by passing vc to an lapply() call.1 I’m then going to generate a regression table using msummary() from the excellent modelsummary package (@Vincent Arel-Bundock).
library(modelsummary) ## For great-looking regression tables (among other things)## Adjust our model SEs on-the-flylm_mods =lapply(vc, function(x) coeftest(m, vcov = x))## Print the regression tablemsummary(lm_mods)
Standard
Sandwich (basic)
Clustered
Clustered (two-way)
HC3
Andrews' kernel HAC
Newey-West
Bootstrap
Bootstrap (clustered)
(Intercept)
0.030
0.030
0.030
0.030
0.030
0.030
0.030
0.030
0.030
(0.028)
(0.028)
(0.067)
(0.065)
(0.028)
(0.044)
(0.066)
(0.031)
(0.075)
x
1.035
1.035
1.035
1.035
1.035
1.035
1.035
1.035
1.035
(0.029)
(0.028)
(0.051)
(0.054)
(0.028)
(0.035)
(0.048)
(0.029)
(0.056)
Num.Obs.
5000
5000
5000
5000
5000
5000
5000
5000
5000
AIC
31141.2
31141.2
31141.2
31141.2
31141.2
31141.2
31141.2
31141.2
31141.2
BIC
63714.1
63714.1
63714.1
63714.1
63714.1
63714.1
63714.1
63714.1
63714.1
If you’re the type of person — like me — that prefers visual representation, then printing a coefficient plot is equally easy with modelsummary::modelplot(). This creates a ggplot2 object that can be further manipulated as needed. In the code chunk below, I’ll demonstrate this fairly simply by flipping the plot orientation.
And there you have it: An intuitive and helpful comparison across a host of specifications, even though we only “ran” the underlying model once. Simple!
Note
UPDATE (June 2021): You can now automate all of the steps that I show above in newer versions of modelsummary, thanks to the incredibly flexible vcov argument. See the documentation here.
While sandwich covers a wide range of model classes in R, it’s important to know that a number of libraries provide their own specialised methods for on-the-fly SE adjustment. The one that I want to show you for this second example is the fixest package (@Laurent Bergé).
If you follow me on Twitter or have read my lecture notes, you already know that I am a huge fan of this package. It’s very elegantly designed and provides an insanely fast way to estimate high-dimensional fixed effects models. More importantly for today’s post, fixest offers automatic support for on-the-fly SE adjustment. We only need to run our model once and can then adjust the SEs on the backend via a call to summary(..., vcov = 'vcov_type').2
To demonstrate, I’m going to run some regressions on a subsample of the well-known RITA air traffic data. I’ve already downloaded the dataset from Revolution Analytics and prepped it for the narrow purposes of this blog post. (See the data appendix below for code.) All told we’re looking at 9 variables extending over approximately 1.8 million rows. So, not “big” data by any stretch of the imagination, but my regressions should take at least a few seconds to run on most computers.
The actual regression that I’m going to run on these data is somewhat uninspired: Namely, how does arrival delay depend on departure delay, conditional on the time of day?3 I’ll throw in a bunch of fixed effects to make the computation a bit more interesting/intensive, but it’s fairly standard stuff. Note that I am running a linear fixed effect model by calling fixest::feols().
But, really, I don’t want you to get sidetrack by the regression details. The main thing I want to focus your attention on is the fact that I’m only going run the base model once, i.e. for mod1. Then, I’m going to adjust the SE for two more models, mod2 and mod3, on the fly via respective summary() calls.
library(fixest)## Start timer; we'll use for benchmarking laterpt =proc.time()## Run the model once and once only!## By default, fixest::feols will cluster the SEs by the first FE (here: month)mod1 =feols(arr_delay ~ dep_tod / dep_delay | month + year + day_of_week + origin_airport_id + dest_airport_id, data = air)## Adjust SEs on the fly: two-way clustermod2 =summary(mod1, vcov =~month + origin_airport_id)## Adjust SEs on the fly: three-way cluster. Note that we can even include## cluster vars (e.g. tail_num) that weren't present in the original regressionmod3 =summary(mod1, vcov =~month + origin_airport_id + tail_num)## Stop timer and save resultstime_feols = (proc.time() - pt)[3]
Before I get to benchmarking, how about a quick coefficient plot? I’ll use modelsummary::modelplot() again, focusing only on the key “time of day × departure delay” interaction terms.
Great, it worked. But did it save time? To answer this question I’ve benchmarked against three other methods:
feols(), again from the fixest package, but this time with each of the three models run separately.
felm() from the lfe package.
reghdfe from the reghdfe package (Stata).
You can find the benchmarking code for these other methods in the appendix. (Please let me know if you spot any errors.) In the interests of brevity, here are the results.
There are several takeaways from this exercise. For example, fixest::feols() is the fastest method even if you are (inefficiently) re-running the models separately. But — yet again and once more, dear friends — the key thing that I want to emphasise is the additional time savings brought on by adjusting the SEs on the fly. Indeed, we can see that the on-the-fly feols approach only takes a third of the time (approximately) that it does to run the models separately. This means that fixest is recomputing the SEs for models 2 and 3 pretty much instantaneously.
To add one last thing about benchmarking, the absolute difference in model run times was not that huge for this particular exercise. There’s maybe two minutes separating the fastest and slowest methods. (Then again, not trivial either…) But if you are like me and find yourself estimating models where each run takes many minutes or hours, or even days and weeks, then the time savings are literally exponential.
Conclusion
There comes a point in almost every empirical project where you have to estimate multiple versions of the same model. Which is to say, the only difference between these multiple versions is how the standard errors were calculated: robust, clustered, etc. Maybe you’re trying to satisfy a referee request before publication. Or, maybe you’re trying to understand how sensitive your results are to different modeling assumptions.
The goal of this blog post has been to show you that there is often a better approach than manually re-running multiple iterations of your model. Instead, I advocate that you run the model once and then adjust your standard errors on the backend, as needed. This “on-the-fly” approach will save you a ton of time if you are working with big datasets. Even if you aren’t working with big data, you will minimize copying and pasting of duplicate code. All of which will help to make your code more readable and cut down on potential errors.
What’s not to like?
P.S. There are a couple of other R libraries with support for on-the-fly SE adjustment, e.g. clubSandwich. Since I’ve used it as a counterfoil in the benchmark, I should add that lfe::felm() provides its own method for swapping out different SEs post-estimation; see here. Similarly, I’ve focused on R because it’s the software language that I use most often and — as far as I am aware — is the only one to provide methods for on-the-fly SE adjustment across a wide range of models. If anyone knows of equivalent methods or canned routines in other languages, please let me know in the comments.
P.P.S. Look, I’m not saying it’s necessarily “wrong” to specify your SEs in the model call. Particularly if you’ve already settled on a single VCOV to use. Then, by all means, use the convenience of Stata’s , robust syntax or the R equivalent lm_robust() (via the estimatr package).
Appendices
Flight data download and prep
if (!file.exists(path.expand('~/air.csv'))) {## Download and extract the data URL ='https://packages.revolutionanalytics.com/datasets/AirlineSubsetCsv.tar.gz' dest_file =path.expand('~/AirlineSubsetCsv.tar.gz')download.file(URL, dest_file, mode ="wb")untar(dest_file, exdir =path.expand('~'))## Bind together and do some data cleaninglibrary(data.table) csvs =list.files(path.expand('~/AirlineSubsetCsv/'), full.names =TRUE) air =rbindlist(lapply(csvs, fread))names(air) =tolower(names(air)) int_vars =c('arr_delay', 'dep_delay', 'dep_time', 'arr_time') air[, (int_vars) :=lapply(.SD, as.integer), .SDcols = int_vars]## Create a departure 'time of day' factor variable, dividing the day in four## quarters air[, dep_tod :=fcase(dep_time <=600, 'am1', dep_time <=1200, 'am2', dep_time <=1800, 'pm1', dep_time >1800, 'pm2')]## Subset air = air[!is.na(arr_delay), .(year, month, day_of_week, tail_num, origin_airport_id, dest_airport_id, arr_delay, dep_delay, dep_tod)]## Write to diskfwrite(air, path.expand('~/air.csv'))## Clean upfile.remove(c(dest_file, csvs))file.remove(path.expand('~/AirlineSubsetCsv/')) ## empty dir too}
You can substitute with a regular for loop or purrr::map() if you prefer.↩︎
See the package documentation for details about valid vcov arguments (e.g., “iid”, “hc1”, “twoway”, “threeway”, etc.)↩︎
Note that I’m going to use a dep_tod / dep_delay expansion on the RHS to get the full marginal effect of the interaction terms. Don’t worry too much about this if you haven’t seen it before (click on the previous link if you want to learn more).↩︎