Skip to contents

Post-estimation treatment effects for an ETWFE regressions.


  type = c("simple", "group", "calendar", "event"),
  by_xvar = "auto",
  collapse = "auto",
  post_only = TRUE,



An etwfe model object.


Character. The desired type of post-estimation aggregation.


Logical. Should the results account for heterogeneous treatment effects? Only relevant if the preceding etwfe call included a specified xvar argument, i.e. interacted categorical covariate. The default behaviour ("auto") is to automatically estimate heterogeneous treatment effects for each level of xvar if these are detected as part of the underlying etwfe model object. Users can override by setting to either FALSE or TRUE. See the section on Heterogeneous treatment effects below.


Logical. Collapse the data by (period by cohort) groups before calculating marginal effects? This trades off a loss in estimate accuracy (typically around the 1st or 2nd significant decimal point) for a substantial improvement in estimation time for large datasets. The default behaviour ("auto") is to automatically collapse if the original dataset has more than 500,000 rows. Users can override by setting either FALSE or TRUE. Note that collapsing by group is only valid if the preceding etwfe call was run with "ivar = NULL" (the default). See the section on Performance tips below.


Logical. Drop pre-treatment ATTs? Only evaluated if (a) type = "event" and (b) the original etwfe model object was estimated using the default "notyet" treated control group. If conditions (a) and (b) are met then the pre-treatment effects will be zero as a mechanical result of ETWFE's estimation setup. The default behaviour (FALSE) is thus to drop these nuisance rows from the dataset. The post_only argument recognises that you may still want to keep them for presentation purposes (e.g., plotting an event-study). Nevertheless, be forewarned that enabling that behaviour via TRUE is strictly performative: the "zero" treatment effects for any pre-treatment periods is purely an artefact of the estimation setup.


Additional arguments passed to marginaleffects::slopes. For example, you can pass vcov = FALSE to dramatically speed up estimation times of the main marginal effects (but at the cost of not getting any information about standard errors; see Performance tips below). Another potentially useful application is testing whether heterogeneous treatment effects (i.e. the levels of any xvar covariate) are equal by invoking the hypothesis argument, e.g. hypothesis = "b1 = b2".


A slopes object from the marginaleffects package.

Performance tips

Under most situations, etwfe should complete very quickly. For its part, emfx is quite performant too and should take a few seconds or less for datasets under 100k rows. However, emfx's computation time does tend to scale non-linearly with the size of the original data, as well as the number of interactions from the underlying etwfe model. Without getting too deep into the weeds, the numerical delta method used to recover the ATEs of interest has to estimate two prediction models for each coefficient in the model and then compute their standard errors. So, it's a potentially expensive operation that can push the computation time for large datasets (> 1m rows) up to several minutes or longer.

Fortunately, there are two complementary strategies that you can use to speed things up. The first is to turn off the most expensive part of the whole procedure---standard error calculation---by calling emfx(..., vcov = FALSE). Doing so should bring the estimation time back down to a few seconds or less, even for datasets in excess of a million rows. While the loss of standard errors might not be an acceptable trade-off for projects where statistical inference is critical, the good news is this first strategy can still be combined our second strategy. It turns out that collapsing the data by groups prior to estimating the marginal effects can yield substantial speed gains of its own. Users can do this by invoking the emfx(..., collapse = TRUE) argument. While the effect here is not as dramatic as the first strategy, our second strategy does have the virtue of retaining information about the standard errors. The trade-off this time, however, is that collapsing our data does lead to a loss in accuracy for our estimated parameters. On the other hand, testing suggests that this loss in accuracy tends to be relatively minor, with results equivalent up to the 1st or 2nd significant decimal place (or even better).

Summarizing, here's a quick plan of attack for you to try if you are worried about the estimation time for large datasets and models:

  1. Estimate mod = etwfe(...) as per usual.

  2. Run emfx(mod, vcov = FALSE, ...).

  3. Run emfx(mod, vcov = FALSE, collapse = TRUE, ...).

  4. Compare the point estimates from steps 1 and 2. If they are are similar enough to your satisfaction, get the approximate standard errors by running emfx(mod, collapse = TRUE, ...).

Heterogeneous treatment effects

Specifying etwfe(..., xvar = <xvar>) will generate interaction effects for all levels of <xvar> as part of the main regression model. The reason that this is useful (as opposed to a regular, non-interacted covariate in the formula RHS) is that it allows us to estimate heterogeneous treatment effects as part of the larger ETWFE framework. Specifically, we can recover heterogeneous treatment effects for each level of <xvar> by passing the resulting etwfe model object on to emfx().

For example, imagine that we have a categorical variable called "age" in our dataset, with two distinct levels "adult" and "child". Running emfx(etwfe(..., xvar = age)) will tell us how the efficacy of treatment varies across adults and children. We can then also leverage the in-built hypothesis testing infrastructure of marginaleffects to test whether the treatment effect is statistically different across these two age groups; see Examples below. Note the same principles carry over to categorical variables with multiple levels, or even continuous variables (although continuous variables are not as well supported yet).


# \dontrun{
# We’ll use the mpdta dataset from the did package (which you’ll need to 
# install separately).

# install.packages("did")
data("mpdta", package = "did")

# Basic example

# The basic ETWFE workflow involves two steps:

# 1) Estimate the main regression model with etwfe().

mod = etwfe(
    fml  = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none)
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered by county)

# mod ## A fixest model object with fully saturated interaction effects.

# 2) Recover the treatment effects of interest with emfx().

emfx(mod, type = "event") # dynamic ATE a la an event study
#>     Term                 Contrast event Estimate Std. Error     z Pr(>|z|)    S
#>  .Dtreat mean(TRUE) - mean(FALSE)     0  -0.0332     0.0134 -2.48    0.013  6.3
#>  .Dtreat mean(TRUE) - mean(FALSE)     1  -0.0573     0.0172 -3.34   <0.001 10.2
#>  .Dtreat mean(TRUE) - mean(FALSE)     2  -0.1379     0.0308 -4.48   <0.001 17.0
#>  .Dtreat mean(TRUE) - mean(FALSE)     3  -0.1095     0.0323 -3.39   <0.001 10.5
#>    2.5 %   97.5 %
#>  -0.0594 -0.00701
#>  -0.0910 -0.02373
#>  -0.1982 -0.07751
#>  -0.1729 -0.04619
#> Columns: term, contrast, event, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response 

# Etc. Other aggregation type options are "simple" (the default), "group"
# and "calendar"

# Heterogeneous treatment effects

# Example where we estimate heterogeneous treatment effects for counties 
# within the 8 US Great Lake states (versus all other counties). 

gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27,
        "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55)

mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls

hmod = etwfe(
   lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, 
   vcov = ~countyreal,
   xvar = gls           ## <= het. TEs by gls

# Heterogeneous ATEs (could also specify "event", etc.) 

#>     Term                 Contrast .Dtreat   gls Estimate Std. Error     z
#>  .Dtreat mean(TRUE) - mean(FALSE)    TRUE FALSE  -0.0637     0.0376 -1.69
#>  .Dtreat mean(TRUE) - mean(FALSE)    TRUE  TRUE  -0.0472     0.0271 -1.74
#>  Pr(>|z|)   S  2.5 %  97.5 %
#>    0.0906 3.5 -0.137 0.01007
#>    0.0817 3.6 -0.100 0.00594
#> Columns: term, contrast, .Dtreat, gls, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response 

# To test whether the ATEs across these two groups (non-GLS vs GLS) are 
# statistically different, simply pass an appropriate "hypothesis" argument.

emfx(hmod, hypothesis = "b1 = b2")
#>   Term Estimate Std. Error      z Pr(>|z|)   S  2.5 % 97.5 %
#>  b1=b2  -0.0164     0.0559 -0.294    0.769 0.4 -0.126  0.093
#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
#> Type:  response 

# Nonlinear model (distribution / link) families

# Poisson example

mpdta$emp = exp(mpdta$lemp)

   emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, 
   vcov = ~countyreal,
   family = "poisson"   ## <= family arg for nonlinear options
   ) |>
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and eight others have been removed because of collinearity (see $collin.var).
#>     Term                 Contrast event Estimate Std. Error       z Pr(>|z|)
#>  .Dtreat mean(TRUE) - mean(FALSE)     0   -25.35       15.9 -1.5942    0.111
#>  .Dtreat mean(TRUE) - mean(FALSE)     1     1.09       41.8  0.0261    0.979
#>  .Dtreat mean(TRUE) - mean(FALSE)     2   -75.12       22.3 -3.3696   <0.001
#>  .Dtreat mean(TRUE) - mean(FALSE)     3  -101.82       28.1 -3.6234   <0.001
#>     S  2.5 % 97.5 %
#>   3.2  -56.5   5.82
#>   0.0  -80.9  83.09
#>  10.4 -118.8 -31.43
#>  11.7 -156.9 -46.75
#> Columns: term, contrast, event, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response 
# }