I apologise for the title of this post and will show myself the door shortly.
I wanted to quickly follow up on my last post about efficient simulations in R. If you recall, in that post we used data.table and some other tricks to run 40,000 regressions (i.e. 20k simulations with 2 regressions each) in just over 2 seconds. The question before us today is: Can we go even faster using only base R? And it turns out that the answer is, yes, we can.
My motivation for a follow-up is partially the result of this very nice post by Benjamin Elbers, who replicates my simulation using Julia. In so doing, he demonstrates some of Julia’s killer features; most notably the fact that we don’t need to think about vectorisation — e.g. when creating the data — since Julia’s compiler will take care of that for us automatically.1 But Ben also does another interesting thing, which is to show the speed gains that come from defining our own (super lean) regression function. He uses Cholesky decomposition and it’s fairly straightforward to do the same thing in R. (Here is a nice tutorial by Issac Lee.)
I was halfway on my way to doing this myself when I stumbled on a totally different post by R Core member, Martin Maechler. Therein he introduces the .lm.fit() function (note the leading dot), which incurs even less overhead than the lm.fit() function I mentioned in my last post. I’m slightly embarrassed to say I had never heard about it until now2, but a quick bit of testing “confirms” Martin’s more rigorous benchmarks: .lm.fit yields a consistent 30-40% improvement over even lm.fit.
Now, it would be trivial to amend my previous simulation script to slot in .lm.fit() and re-run the benchmarks. But I thought I’d make this a bit more interesting by redoing the whole thing using only base R. (I’ll load the parallel package, but that comes bundled with the base distribution so hardly counts as cheating.) Here’s the full script with benchmarks for both sequential and parallel implementations at the bottom.
# Data generation ---------------------------------------------------------gen_data =function(sims=1) {## Total time periods in the the panel = 500 tt =500## x1 covariates x1_A =1+rnorm(tt*sims, 0, 1) x1_B =1/4+rnorm(tt*sims, 0, 1) x1 =c(x1_A, x1_B)## Add second, nested x2 covariates for each country x2_A =1+ x1_A +rnorm(tt*sims, 0, 1) x2_B =1+ x1_B +rnorm(tt*sims, 0, 1) x2 =c(x2_A, x2_B)## Outcomes (notice different slope coefs for x2_A and x2_B) y_A = x1_A +1*x2_A +rnorm(tt*sims, 0, 1) y_B = x1_B +2*x2_B +rnorm(tt*sims, 0, 1) y =c(y_A, y_B)## Group variables (id and sim) id =as.factor(c(rep('A', length(x1_A)), rep('B', length(x1_B)))) sim =rep(rep(1L:sims, each = tt), times =length(levels(id)))## Demeaned covariates x1_dmean = x1 -ave(x1, list(sim, id), FUN = mean) x2_dmean = x2 -ave(x2, list(sim, id), FUN = mean)## Bind in a matrix mat =cbind('sim'= sim, 'id'= id,'y'= y,'intercept'=1, 'x1'= x1, 'x2'= x2, 'x1:x2'= x1*x2, 'x1_dmean:x2_dmean'= x1_dmean * x2_dmean)## Set order i.t.o simulations mat = mat[order(mat[, 'sim']), ]return(mat)}## How many simulations do we want?n_sims =2e4## Generate them all as one big matrixd =gen_data(n_sims)## Create index list (for efficient subsetting of the large data matrix)## Note that each simulation is (2*500=)1000 rows long.ii =lapply(1L:n_sims-1, function(i) 1L:1e3L +rep(1e3L*(i), each=1e3L))# Benchmarks --------------------------------------------------------------library(microbenchmark) ## For high-precision timinglibrary(parallel)n_cores =detectCores()## Convenience function for running the two regressions and extracting the ## interaction coefficients of interest (saves having to retype everything).## The key bit is the .lm.fit() function.get_coefs =function(dat) { level =coef(.lm.fit(dat[, c('intercept', 'x1', 'x2', 'x1:x2', 'id')], dat[, 'y']))[4] dmean =coef(.lm.fit(dat[, c('intercept', 'x1', 'x2', 'x1_dmean:x2_dmean', 'id')], dat[, 'y']))[4]return(cbind(level, dmean))}## Run the benchmarks for both sequential and parallel versionsmicrobenchmark(sims_sequential =lapply(1:n_sims, function(i) { index = ii[[i]]get_coefs(d[index, ]) }),sims_parallel =mclapply(1:n_sims, function(i) { index = ii[[i]]get_coefs(d[index, ]) }, mc.cores = n_cores ),times =1)
Unit: milliseconds
expr min lq mean median uq max
sims_sequential 2749.2756 2749.2756 2749.2756 2749.2756 2749.2756 2749.2756
sims_parallel 988.4186 988.4186 988.4186 988.4186 988.4186 988.4186
neval
1
1
There you have it. Down to less than a second for a simulation involving 40,000 regressions using only base R.3 On a laptop, no less. Just incredibly impressive.
Conclusion
No grand conclusion today… except a sincere note of gratitude to the R Core team (and Julia devs and so many other OSS maintainers) for providing us with such an incredible base to build from.
P.S. Achim Zeileis (who else?) has another great tip for speeding up simulations where the experimental design is fixed here.
I think Dirk Eddelbuettel had mentioned it to me, but I hadn’t grokked the difference.↩︎
The timings from this knitted Quarto document are actually about 20% slower than when I run the script directly in my R console. But we’re really splitting hairs now.↩︎