Posted by Doug
on August 17, 2009
City Streets /
No Comments
Following up on my previous post about my new bike, everything has been going quite peachy. Who knew that buying everything new helps?
Well, it’s no silver bullet. If there’s anything I have learned about fixed gears so far, it is that you have to get the crank bolts just right, and check them frequently. There should not be any noise, and there should not be any binding of the chain. When I first got the repairs done, there developed an awful rattling in the chainring; it turns out the bolts were too long and weren’t able to be tightened enough. After a couple weeks riding around with a new set of properly sized bolts, the rattling came back. I got a wrench and discovered everything had come loose. I tightened, and the chainring went back to being almost totally quiet.
Continue reading…
Posted by Doug
on July 24, 2009
Technology /
No Comments
I have been using R for a few weeks now, and now that I have a feel for how to get things done, I am starting to explore how to get those things done faster. On the suggestion of a friend, I picked up Data Manipulation with R by Phil Spector. Once you get the hang of R syntax, it is a great book to show you how to actually get things done; I am most of the way through it, and would highly recommend it.
One thing he introduces in chapter 9 tangentially is the “system.time” function; feed it a code block, and it will tell you the amount of real and system time elapsed. So, it is the perfect test bench for which of two methods runs faster; he uses it to show you how you get a 4x speedup using the built-in “colSum” function over “apply”. Using vectorization, a loop is just as fast as “apply”, however without vectorization, you get a 60x slowdown using element-by-element looping.
This points to the key to R-speed: don’t lie to R. Tell it everything, so that it can allocate memory efficiently, which is by far your slowest task. Take this example for applying “rbind” across a set of data frames. Previously, I had thought “functional programming”, and naturally went with “Reduce”. However, this book motivated me to find the “do.call” solution. Observe the results.
> df <- lapply( 1:1000, function(z) data.frame( runif(1000 )))
> system.time( a <- do.call( cbind, df ))
user system elapsed
0.54 0.01 0.59
> system.time( b <- Reduce( cbind, df ))
user system elapsed
40.16 0.80 42.61
That’s right, the “Reduce” method took 80x longer than the “do.call” method. I am going to change some of my code right now…
Posted by Doug
on July 18, 2009
Excursions /
1 Comment
I work in Acton, and I live in Cambridge. Given that the driving distance is about 20 miles, and the biking distance about 25, it was only a matter of time until I rode the route. This week I did it in both directions, although not on the same time. All I can say is, thank goodness for the minuteman bikeway.
Continue reading…
Posted by Doug
on July 12, 2009
City Streets /
1 Comment
Finally, a picture of my bike. Love it! The bag under the seat has my flat fix kit, keys etc. That’s a cable lock I have stuck on the handlebars.

I had everything below the frame replaced. See my last post.

Push on pedals A to rotate chain ring B, driving chain C, rotating rear cog D, and pushing wheel E forward.
Posted by Doug
on July 10, 2009
City Streets /
No Comments
This is actually very limited advice on bike repair. But first, the background.
Continue reading…
Posted by Doug
on July 05, 2009
Technology /
No Comments
On the theme I’m using (Big City), the wordpress default for showing counts next to categories looks awful. The lines break after the link, so it will say something like “Technology ::break:: (31)”. The solution I found was basically just to hack the code where it prints the count, and move that up. This stuff is hard-coded in wordress, and I couldn’t find a CSS solution to my problem (I do suck at CSS, but that’s another story).
Here are the changes as of WP 2.8.
In “general-template.php”, locate the lines that say
$after = ' ('.$arcresult->posts.')' . $afterafter;
and change them to
$text .= <...>
There are a number of such lines, so this will make them all consistent. I think I counted 4 such changes.
The other change to make is in “classes.php”. Break line 1336 (which looked something like this before I got to it)
$link .= $cat_name . "</a>";
So that it becomes
$link .= $cat_name;
$link .= "</a>";
Now move up this block between those two lines:
if ( isset($show_count) && $show_count )
$link .= ' (' . intval($category->count) . ')';
This will change your formatting slightly if you were using the feed code, and I don’t have a good answer for you on that.
Posted by Doug
on July 05, 2009
Excursions /
No Comments
July 4 was awesome, but I won’t recount it here. I just wanted to point out this very cool bike group out on the street. Disco ball bike crowd riding around at 2 am? Yes, that’s sweet.
Today I went riding out around Cambridge to improve my mental map of the place. My realization is that there are a few very old areas — Harvard, Kendal Square, Porter Square in Cambridge, Concord, Lexington, and of course Boston — with direct roads between them. The names of these roads are generally given by the larger place (at least locally), which explains the numerous Cambridge, Harvard, Concord, Lexington streets, although rarely in the place for which they’re named.
I also discovered the source of the Minuteman Bikeway. Although the maps show the starting point as Alewife Station, there is actually a pleasant extension from past Davis Square not marked there. (See also my previous post on my ride on the bikeway.)
Posted by Doug
on June 20, 2009
Technology /
No Comments
Add to my list of fantastic software you should be using (Firefox, Dropbox, Adblock, XMarks) another: Zotero. It is an organization tool for your research that exists as a Firefox add-in (I don’t know why it only supports FF). You can bookmarket pages (imagine that!), sort them, tag them. However, you can also store files in with your bookmarks, refile bookmarks in several places, add notes, citations, and annotations. Basically, it is your bookmarks on super steroids. You can back the bookmarks up to their server, and the stored files to your own server, so everthing can be duplicated across machines. (You know how I feel about that!)
Continue reading…
Posted by Doug
on June 17, 2009
Technology /
No Comments
For unknown or inadequate reasons, I bought a copy of Office 2007 the other day and loaded it up for work. (As an aside, Amazon has Office 2007 Home & Student for $80 and Excel 2007 for $105 — and it says people are buying the individal program after comparing the two?). For the past few weeks I have been developing code that pulls financial data from Yahoo and populates a template sheet, among other things. Because of the way I add the data to the sheet (the fastest way excel supports, setting a range’s value equal to a VBA array), some summary functions at the top of the sheet have to account for the fact that the data could be of an arbitrary height.
My solution to this problem in the original version, written and debugged in Excel 2003, was to have those summary cells reference the range A23:A65536 and take a count, etc. to get the number or summary of those cells. This was fine. My code would take 30 seconds per run, most of which I thought was spent in VBA.
Enter Excel 2007. On a whim, I saved my spreadsheet in 2007 format and ran it. It was slow. I mean, mind bogglingly slow. I thought it was just the network taking a long time to download the data, however my coworker ran it a few times and it locked up his computer. Then it dawned on me: the formulas were rewritten to now reference A23:A1048576. Excel was touching 1 million rows over and over again every time the sheet calculated, which was often (he was using an RTD link, so it recalculated constantly). A simple solution: write a UDF that would count excatly the size of my data array, and use that instead.
This was the surprise: since I only had about 1,000 data points (but I didn’t want to hardcode to that number, and so I referenced all rows on the sheet), my running time, even using Excel 2003, went down from 30 seconds to about 6 seconds. The original slowness I had mostly chalked up to Excel handling data poorly; instead, it was purely a function of my forcing it to handle the data poorly, a problem only exposed by migrating to a higher-throughput device. The moral of the story is that if it doesn’t scale, it may be your fault, not theirs.
Posted by Doug
on June 14, 2009
Excursions /
No Comments
I rode out on the Minuteman Bike Path from Cambridge to Walden Pond, and then came back on a southern route through Waltham and Watertown. Lots of roads involved, but not too well-traveled on a Sunday. I had a great partner, who asked directions at exactly the right time to keep us from getting hopelessly lost.
Continue reading…