Fast R code

Posted by Doug on July 24, 2009
Technology

I have been using R for a few weeks now, and now that I have a feel for how to get things done, I am starting to explore how to get those things done faster. On the suggestion of a friend, I picked up Data Manipulation with R by Phil Spector. Once you get the hang of R syntax, it is a great book to show you how to actually get things done; I am most of the way through it, and would highly recommend it.

One thing he introduces in chapter 9 tangentially is the “system.time” function; feed it a code block, and it will tell you the amount of real and system time elapsed. So, it is the perfect test bench for which of two methods runs faster; he uses it to show you how you get a 4x speedup using the built-in “colSum” function over “apply”. Using vectorization, a loop is just as fast as “apply”, however without vectorization, you get a 60x slowdown using element-by-element looping.

This points to the key to R-speed: don’t lie to R. Tell it everything, so that it can allocate memory efficiently, which is by far your slowest task. Take this example for applying “rbind” across a set of data frames. Previously, I had thought “functional programming”, and naturally went with “Reduce”. However, this book motivated me to find the “do.call” solution. Observe the results.

> df <- lapply( 1:1000, function(z) data.frame( runif(1000 )))
> system.time( a <- do.call( cbind, df ))
 user  system elapsed
 0.54    0.01    0.59
> system.time( b <- Reduce( cbind, df ))
 user  system elapsed
 40.16    0.80   42.61

That’s right, the “Reduce” method took 80x longer than the “do.call” method. I am going to change some of my code right now…

No comments yet.

Leave a comment

WP_Big_City