The Project That Would Not Die

Vesalius spread

Hallowe’en is coming, which must be why the Bookalator has been stirring again, gathering its strength to once more stalk the earth.

Back in May, I made a summer “to do” list which was long enough to keep me busy for the next ten years. Among the proposed tasks was “finishing” the perpetually stalled Bookalator project. I then, of course, proceeded to spend the next three months mostly sitting around feeling bad about not doing stuff. But the constant demand for Ideas, Ideas, Ideas, and my constant lack of same, has made spring’s doleful mess look like fall’s Golden Ticket.

First, I needed something data-y to play with in Crafting with Data this semester, and, though I originally thought I’d make some gadget to help remotely manage my cats—who get into all sorts of trouble when nobody is there to remind them that they should be sleeping
Spidercat
—today I decided that I would revive Bookalator for week 5’s REST assignment in Understanding Networks. So all day I was sitting in class trying to think of ways I could Webize and RESTify the Bookalator (which until now has mostly consisted of some tatty scraps of Java run from the command line), and I thought one thing that might be relatively worthwhile would be to make a Web interface for looking up words in the Bookalator corpus, cross-referenced to the nifty little frequency charts on Wordnik. And I thought I might be able to do something with it for Mashups, in which we’re also futzing around with such things.

I was assuming I’d have to scrape all this Webness together, so imagine my delight when, shortly after class, I saw on the Twitter that Wordnik was launching a public API. I immediately went to the website to sign up, and although I was tempted to just fill out the application form with, “I love Erin!” I thought I should take advantage of this opportunity to try to explain the project as it currently stands, in writing, to someone who’s not required to listen to me. So the following is what I said (minus some stupid typos that I missed, writing in a text box 1.5 inches high):

I’m a graduate student at NYU’s Interactive Telecommunications Program and have been working since February on a project that attempts to in some way address the question, “(how) has (American, English) vocabulary changed in the last hundred or so years?”

The seed for the project was a pair of figures from Harper’s August 2000 index:

“Average number of words in the written vocabulary of a 6- to 14-year-old American child in 1945: 25,000
“Average number today: 10,000”
(http://is.gd/iWaF)

Cecil Adams has since discredited this comparison for The Straight Dope (http://is.gd/iWbZ), but I still wondered how I might explore the underlying question of whether we use more or fewer words than we used to, and whether our vocabulary has changed in any surprising ways. My original goal was to compare the words used in two sets of novels–one group from around 1900 to 1910 and the other from approximately the last ten years. I wished to create a browser (provisionally called “the Bookalator”) that would allow one to see which words were shared between sets or individual books, which were unique, and–inspired by Wordnik–how specific words were used in the actual texts.

The project has evolved over the course of the year (some relevant blog posts are at http://is.gd/4wyCZ) and is no longer limited to comparison over time (for example, I’d also like to allow comparison by author’s region, age, and gender), but one element that I have always hoped could become a part of it was some sort of link to the fascinating usage frequency graphs at Wordnik. I had also—as recently as this afternoon!—considered creating a component that would check to see whether Wordnik’s word of the day was included in the set of books used in this project.

With the advent of Wordnik’s API, the path to Bookalator-Wordnik integration is now so much shorter and less gravelly. Won’t you please grant me an access key?

Thank you for your consideration.

API key arrived a few hours later. Yay! Now I have to figure out what to do with it.

Update: Here’s Erin McKean’s presentation at the Web 2.0 Summit 09 announcing the Wordnik API:

(Via @wordnik)

Photo: Andreas Vesalius – De humani corporis fabrica (On the Workings of the Human Body) Huntington Library by brewbooks / J Brew; some rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.