R package for estimating Extended Two-Way Fixed Effects (ETWFE).
Motivation
One of the most popular causal inference tools for social scientists is the two-way fixed effects (TWFE) estimator, especially in the context of difference-in-differences (DiD) designs. However, it turns out that vanilla TWFE is often biased in real-life situations, such as under staggered treatment rollout. Wooldridge (2023, 2025) points out that this isn’t the fault of TWFE per se, but rather due to the fact that we haven’t specified an appropriately flexible model. His recommended solution involves saturating the model with cohort×time interaction terms, yielding a specification that he calls extended TWFE (ETWFE). The Wooldridge solution is intuitive and elegant, but rather tedious and error prone to code up manually.
The good news is that the etwfe R package is here to help. It provides a set of convenience functions and diagnostic tools that do the work for you. Please see below for some quickstart examples. Full documentation is available on the etwfe homepage.
Python users should take a look at Armand Kapllani’s port of this package.
Installation
You can install etwfe from CRAN.
install.packages("etwfe")Or, you can grab the development version from R-universe.
install.packages("etwfe", repos = "https://grantmcdermott.r-universe.dev")Quickstart example
A detailed walkthrough of etwfe is provided in the introductory vignette (available online, or by typing vignette("etwfe") in your R console). But here’s a quickstart example to demonstrate the basic syntax.
Start by loading the package and some data.
library(etwfe)
# install.packages("did")
data("mpdta", package = "did")
head(mpdta, 2)
#> year countyreal lpop lemp first.treat treat
#> 866 2003 8001 5.896761 8.461469 2007 1
#> 841 2004 8001 5.896761 8.336870 2007 1Step 1: Run etwfe() to estimate a model with full saturated interactions.
mod = etwfe(
fml = lemp ~ lpop, # outcome ~ controls
tvar = year, # time variable
gvar = first.treat, # group variable
data = mpdta, # dataset
vcov = ~countyreal # vcov adjustment (here: clustered)
)
mod
#> OLS estimation, Dep. Var.: lemp
#> Observations: 2,500
#> Fixed-effects: first.treat: 4, year: 5
#> Standard-errors: Clustered (countyreal)
#> Estimate Std. Error t value Pr(>|t|)
#> lpop 1.065461 0.021824 48.821102 < 2.2e-16 ***
#> first.treat::2004:lpop 0.050982 0.037756 1.350320 0.177525
#> first.treat::2006:lpop -0.041095 0.047390 -0.867183 0.386259
#> first.treat::2007:lpop 0.055518 0.039212 1.415838 0.157447
#> year::2004:lpop 0.011014 0.007554 1.458043 0.145458
#> year::2005:lpop 0.020733 0.008104 2.558268 0.010814 *
#> year::2006:lpop 0.010535 0.010816 0.974084 0.330487
#> year::2007:lpop 0.020921 0.011808 1.771708 0.077053 .
#> ... 14 coefficients remaining (display them with summary() or use argument n)
#> ... 10 variables were removed because of collinearity
#> (.Dtreat:first.treat::2006:year::2004, .Dtreat:first.treat::2006:year::2005 and 8 others
#> [full set in $collin.var])
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.537131 Adj. R2: 0.871722
#> Within R2: 0.869464Step 2: Pass to emfx() to recover the ATTs of interest. In this case, an event-study example.
emfx(mod, type = "event")
#>
#> event Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 0 -0.0332 0.0134 -2.48 0.013 6.3 -0.0594 -0.00702
#> 1 -0.0573 0.0171 -3.34 <0.001 10.2 -0.0910 -0.02373
#> 2 -0.1379 0.0308 -4.48 <0.001 17.0 -0.1982 -0.07753
#> 3 -0.1095 0.0323 -3.39 <0.001 10.5 -0.1729 -0.04620
#>
#> Term: .Dtreat
#> Type: response
#> Comparison: TRUE - FALSEAcknowledgements
- Jeffrey Wooldridge for the underlying ETWFE theory (1, 2).
- Laurent Bergé (fixest) and Vincent Arel-Bundock (marginaleffects) for maintaining the two wonderful R packages that do most of the heavy lifting under the hood here.
-
Fernando Rios-Avila for the
jwdidStata module, which has provided a welcome foil for unit testing and whose elegant design helped inform my own choices for this R equivalent.