Add to my list of fantastic software you should be using (Firefox, Dropbox, Adblock, XMarks) another: Zotero. It is an organization tool for your research that exists as a Firefox add-in (I don’t know why it only supports FF). You can bookmarket pages (imagine that!), sort them, tag them. However, you can also store files in with your bookmarks, refile bookmarks in several places, add notes, citations, and annotations. Basically, it is your bookmarks on super steroids. You can back the bookmarks up to their server, and the stored files to your own server, so everthing can be duplicated across machines. (You know how I feel about that!)
Archive for June, 2009
For unknown or inadequate reasons, I bought a copy of Office 2007 the other day and loaded it up for work. (As an aside, Amazon has Office 2007 Home & Student for $80 and Excel 2007 for $105 — and it says people are buying the individal program after comparing the two?). For the past few weeks I have been developing code that pulls financial data from Yahoo and populates a template sheet, among other things. Because of the way I add the data to the sheet (the fastest way excel supports, setting a range’s value equal to a VBA array), some summary functions at the top of the sheet have to account for the fact that the data could be of an arbitrary height.
My solution to this problem in the original version, written and debugged in Excel 2003, was to have those summary cells reference the range A23:A65536 and take a count, etc. to get the number or summary of those cells. This was fine. My code would take 30 seconds per run, most of which I thought was spent in VBA.
Enter Excel 2007. On a whim, I saved my spreadsheet in 2007 format and ran it. It was slow. I mean, mind bogglingly slow. I thought it was just the network taking a long time to download the data, however my coworker ran it a few times and it locked up his computer. Then it dawned on me: the formulas were rewritten to now reference A23:A1048576. Excel was touching 1 million rows over and over again every time the sheet calculated, which was often (he was using an RTD link, so it recalculated constantly). A simple solution: write a UDF that would count excatly the size of my data array, and use that instead.
This was the surprise: since I only had about 1,000 data points (but I didn’t want to hardcode to that number, and so I referenced all rows on the sheet), my running time, even using Excel 2003, went down from 30 seconds to about 6 seconds. The original slowness I had mostly chalked up to Excel handling data poorly; instead, it was purely a function of my forcing it to handle the data poorly, a problem only exposed by migrating to a higher-throughput device. The moral of the story is that if it doesn’t scale, it may be your fault, not theirs.
I rode out on the Minuteman Bike Path from Cambridge to Walden Pond, and then came back on a southern route through Waltham and Watertown. Lots of roads involved, but not too well-traveled on a Sunday. I had a great partner, who asked directions at exactly the right time to keep us from getting hopelessly lost.
