A boring line chart

What’s up with the Booknik?

I have been wrestling with the Google Chart API for days, trying to get it to show me a line chart with two freaking lines, but clearly there’s something I’m missing. In the meantime, though, a reasonable portion of Booknik is working, so you can go poke at that.

The broken chart is the one on the author detail page, and the image above shows what it’s currently spewing out. And if it weren’t for James B. Allen’s handy Google Chart GUI, I wouldn’t even have that (Allen’s tool generates a beautiful chart, but from hard-coded data; it’s my dynamic conversion that’s busted). I really want to get that chart working, so that I can view each author’s vocabulary usage over the course of his or career. (You can see the raw numbers now, but those aren’t particularly glamorous.) I’m especially interested in Agatha Christie, since (a) she and Ian Fleming are the authors for whom I have the most books in the system (nine each), and (b) there’s a real research study of her work being done at the University of Toronto,1 and I’d like to see if the results of my relatively low-tech analysis correlate with theirs.2

There’s more not yet done than done on Booknik, but right now I think the most interesting part is the book detail page (see, for instance, Casino Royale), which shows a list of all the distinct words the book contains (or, rather, the first fifty words the query returns; I haven’t yet put in any pagination links), as well as a list of all the words that are unique to that book when compared with the other ninety-odd books in the Booknik database. The lists are sortable alphabetically or by frequency, and when one does the latter, one sees that although most of the unique words are proper names (or typos), there are some everyday words in there, as well. Some of them are mildly surprising. You can then click on any of those words and see a detail page that includes, among many other things, a graph of the frequency of that word’s occurrence in Wordnik‘s corpus. Ultimately I’d like to chart frequency counts from my own database side by side with Wordnik’s—and maybe somehow work in the Google Book Search results, too—but given my difficulties in comprehending Google Chartese, I’ve been reluctant to start that frustrating process.

One of the main challenges so far, besides the chart thing, has been building a modular, RESTful framework in PHP. Right now I’ve got a single index.php file that reads in the URL and figures out which chunks of code it needs to import to display the requested page. I’ve built in a superprimitive sort of debugging console (which you can’t see, because it’s hidden to anyone who’s not visiting from my current workstation; all you get is an enigmatic statement of your IP address at the bottom of every page), and I’m starting to peel off sections that get reused, such as the breadcrumbs code.

There are a lot of things on the to do list—too many for me to have the patience to list here right now—but per a recommendation from Andrew Styer in Understanding Networks last week, I think the next big thing is to figure out how to get some of the Java-based text-parsing code that I’ve been running from the command line into a form I can use online, on the fly, so that I can get more books in there and get more information out of them. I had started trying to convert these to Processing, at least, but got stuck on the fact that Processing uses Java 1.4 while the code I have takes advantage of features introduced in Java 1.5. Also, I hate Java. But I like PHP! I’d never made anything sizable from scratch before in PHP, mostly just edited WordPress templates and plug-ins and such, and I’ve really appreciated how easy it is to make it do stuff. So I’m thinking maybe I can rewrite those Java functions in PHP. Or something. Think how educational that would be!

Last and least, regarding the name change from “Bookalator” to “Booknik,” Monzy pointed out that -ator is really the wrong suffix for what the thing is intended to do. Per Collins:

suffix forming nouns
a person or thing that performs a certain action

  • agitator
  • escalator
  • radiator

from Latin ātor; see -ATE 1 -OR 1

The thing I’m making does not book in the sense that an agitator agitates or an escalator escalates or a radiator radiates. It would be better termed a bookalyzer, but I didn’t like the sound of that, so I went for -nik, which isn’t much really any more appropriate but at least ties the project more to its BFF, Wordnik.

Anyway, that’s the name until I come up with a better one. Suggestions welcome.

  1. Brought to my attention by my kind classmate Alex Kauffmann.
  2. “Vocabulary Changes in Agatha Christie’s Mysteries as an Indication of Dementia: A Case Study,” by Ian Lancashire and Graeme Hirst, University of Toronto, Department of English and Department of Computer Science (respectively): at-a-glance poster (PDF; 3.6MB); five-page paper (PDF; 112K).
Reblog this post [with Zemanta]

Leave a Reply

Your email address will not be published. Required fields are marked *