Markov bookshelf

best-seller covers

Oh, well. We’re back to kill-me-now territory.

This week’s assignment was

Get some XML from a web service. Extract some interesting information from the XML and use it as input to one of your homework assignments from previous weeks.

You could do this either by piping the output of your XML parsing program to the input of your previously implemented program, or by using a class (e.g., instantiate and use the MarkovChain class).

So I thought, cool, I’ll scramble the New York Times Best Seller titles using their shiny new API. Okay, so, I got an API key and managed to pull in lists of titles using—

[java]import org.dom4j.Document;
import org.dom4j.io.SAXReader;
import org.dom4j.Element;
import java.util.List;

public class BestSeller
{
public static void main(String[] args) throws Exception
{

/* Input should be one of the NY Times Best Seller list categories.
Available categories:
hardcover-fiction
hardcover-nonfiction
hardcover-advice
paperback-nonfiction
paperback-advice
trade-fiction-paperback
picture-books
chapter-books
paperback-books
series-books
mass-market-paperback
*/
String listName = args[0];

SAXReader reader = new SAXReader();
EasyHTTPGet getter = new EasyHTTPGet(
“http://api.nytimes.com/svc/books/v2/lists/” + listName + “.xml?api-key=f25de92cf1d2615621b68d2d31f81b63:4:1703697”
);

Document document = reader.read(getter.responseAsInputStream());
List bookTitle = document.selectNodes(“//results/book/book_details/book_detail/title”);

// If I wanted to scramble the authors, . . .
// List bookAuthor =
// document.selectNodes(“//results/book/book_details/book_detail/author”);

for (Object o: bookTitle)
{
Element elem = (Element)o;
String text = elem.getText();
System.out.println(text);
} // end for
} // end main
} // end class[/java]

I had tried to make an array of category names and then loop through them, but I got lost after the first bit—

[java]import org.dom4j.Document;
import org.dom4j.io.SAXReader;
import org.dom4j.Element;
import java.util.List;

public class BestSellerAll
{
public static void main(String[] args) throws Exception
{

/* There are 11 categories of NY Times Best Seller lists. I want to download all of them, but the Times makes you download them individually.
*/

String [] book_cats = new String [] {
“hardcover-fiction”, “hardcover-nonfiction”, “hardcover-advice”, “paperback-nonfiction”, “paperback-advice”, “trade-fiction-paperback”, “picture-books”, “chapter-books”, “paperback-books”, “series-books”, “mass-market-paperback”};

// String listName = args[0];

SAXReader [] reader = new SAXReader() [];
EasyHTTPGet [] getter = new EasyHTTPGet [];
for (int i = 0; i < 11; i++) { getter[i] = "http://api.nytimes.com/svc/books/v2/lists/" + book_cats[n] + ".xml?api-key=f25de92cf1d2615621b68d2d31f81b63:4:1703697"; } // end for i // I got sadly confused somewhere in this block: Document [] document = new Document []; for (int j = 0; j < 11; j++) { document[j] = reader.read(getter[j].responseAsInputStream()); List bookTitle = document.selectNodes("//results/book/book_details/book_detail/title"); } // end for j // If I wanted to scramble the authors, . . . // List bookAuthor = // document.selectNodes("//results/book/book_details/book_detail/author"); for (Object o: bookTitle) { Element elem = (Element)o; String text = elem.getText(); System.out.println(text); } // end for bookTitle } // end main } // end class[/java] So, semimanually, then, I ran this on one category list at a time and used cat to agglomerate them. So then I had my list of all the current best sellers' titles:

HANDLE WITH CARE
CORSAIR
THE ASSOCIATE
THE HOST
RUN FOR YOUR LIFE
PROMISES IN DEATH
DEAD SILENCE
HEART AND SOUL
ONE DAY AT A TIME
NIGHT AND DAY
THE GUERNSEY LITERARY AND POTATO PEEL PIE SOCIETY
WHITE WITCH, BLACK CURSE
PATHS OF GLORY
TERMINAL FREEZE
FOOL
THE HELP
DON’T LOOK TWICE
FAULT LINE
TRUE COLORS
STORM FROM THE SHADOWS
OUTLIERS
HOUSE OF CARDS
THE YANKEE YEARS
OUT OF CAPTIVITY
DEWEY
THE LOST CITY OF Z
A LION CALLED CHRISTIAN
A BOLD FRESH PIECE OF HUMANITY
MY BOOKY WOOK
THE UNFORGIVING MINUTE
INSIDE THE REVOLUTION
MELTDOWN
ARE YOU THERE, VODKA? IT’S ME, CHELSEA
JESUS, INTERRUPTED
JOKER ONE
NO ANGEL
LORDS OF FINANCE
THE NEXT 100 YEARS
MULTIPLE BLESSINGS
PICKING COTTON
ACT LIKE A LADY, THINK LIKE A MAN
THE LAST LECTURE
THE POWER OF SOUL
THE SECRET
THE ULTRAMIND SOLUTION
FLAT BELLY DIET!
THE GREAT DEPRESSION AHEAD
PEAKS AND VALLEYS
UNCOMMON
THE SURVIVORS CLUB
MAGNIFICENT MIND AT ANY AGE
THE TOTAL MONEY MAKEOVER
EMOTIONAL FREEDOM
THE 4 DAY DIET
FIGHT FOR YOUR MONEY
THREE CUPS OF TEA
THE MIDDLE PLACE
I HOPE THEY SERVE BEER IN HELL
DREAMS FROM MY FATHER
THE TIPPING POINT
EAT, PRAY, LOVE
THE AUDACITY OF HOPE
MY HORIZONTAL LIFE
90 MINUTES IN HEAVEN
TEAM OF RIVALS
SAME KIND OF DIFFERENT AS ME
BLINK
MARLEY & ME
THE FORGOTTEN MAN
THE OMNIVORE’S DILEMMA
THE ZOOKEEPER’S WIFE
BEAUTIFUL BOY
A WHOLE NEW MIND
ANIMAL, VEGETABLE, MIRACLE
INFIDEL
THE LOVE DARE
WHAT TO EXPECT WHEN YOU’RE EXPECTING
EMERGENCY
SUZE ORMAN’S 2009 ACTION PLAN
NATURALLY THIN
TWILIGHT
SKINNY BITCH
THE FIVE LOVE LANGUAGES
THE POWER OF NOW
HAPPY FOR NO REASON
THE PURPOSE-DRIVEN LIFE
A NEW EARTH
THE BIGGEST LOSER 30-DAY JUMP START
HE’S JUST NOT THAT INTO YOU
THE BIGGEST LOSER FAMILY COOKBOOK
THE SHACK
THE READER
FIREFLY LANE
AMERICAN WIFE
SUNDAYS AT TIFFANY’S
PEOPLE OF THE BOOK
A THOUSAND SPLENDID SUNS
TAKE ONE
THE ALCHEMIST
SARAH’S KEY
THE MIRACLE AT SPEEDY MOTORS
THE BRIEF WONDROUS LIFE OF OSCAR WAO
REVOLUTIONARY ROAD
WATER FOR ELEPHANTS
THE WHITE TIGER
STILL ALICE
THE ELEGANCE OF THE HEDGEHOG
LOVING FRANK
THE KITE RUNNER
LUSH LIFE
THE HOUSE IN THE NIGHT
THE COMPOSER IS DEAD
BLUEBERRY GIRL
LISTEN TO THE WIND: THE STORY OF DR. GREG AND “THREE CUPS OF TEA”
LADYBUG GIRL AND BUMBLEBEE BOY
GALLOP!
CAT
SWING!
NAKED MOLE RAT GETS DRESSED
ALL IN A DAY
MILES TO GO
THE GRAVEYARD BOOK
THIRTEEN REASONS WHY
SCAT
THE HUNGER GAMES
FADE
THREE CUPS OF TEA
3 WILLOWS
SEEKERS: GREAT BEAR LAKE
THE MYSTERIOUS BENEDICT SOCIETY AND THE PERILOUS JOURNEY
EVERMORE
THE BOY IN THE STRIPED PAJAMAS
THE BOOK THIEF
THREE CUPS OF TEA: YOUNG READERS EDITION
TWEAK
WICKED: WITCH AND CURSE
CORALINE
THE MYSTERIOUS BENEDICT SOCIETY
THE TALE OF DESPEREAUX
SLAM
THE TWILIGHT SAGA
HOUSE OF NIGHT
DIARY OF A WIMPY KID
THE 39 CLUES
THE CLIQUE
PERCY JACKSON & THE OLYMPIANS
HARRY POTTER
NIGHT WORLD
KISSED BY AN ANGEL
INKHEART
THE WHOLE TRUTH
HOLD TIGHT
BONES
THE GRAND FINALE
PLAGUE SHIP
LOST SOULS
MONTANA CREEDS: DYLAN
DANGER IN A RED DRESS
THE APPEAL
THE MACKADE BROTHERS: RAFE AND JARED
MAVERICK
SMALL FAVOR
ANGELS AND DEMONS
THE READER
SECRETS
FIRST COMES MARRIAGE
CHASING DARKNESS
SHADOW COMMAND
TEMPTATION RIDGE
CONFESSIONS OF A SHOPAHOLIC

And then, all I wanted to fucking do was run the Markov code from week 5 and make some crazy new titles. I had it working earlier in this process, when I was working with just the hardcover fiction list, and came up with these completely uninteresting results:

TERMINAL FREEZE
TERMINAL FREEZE
TERMINAL FREEZE
THE ASSOCIETY
ONE DAY AT A TIME
NIGHT AND SOUL
ONE DAY AT A TIME
FOOL
STORM FROM THE HELP
FOOL
TERMINAL FREEZE
CORSAIR
ONE DAY AT A TIME
TRUE COLORS
FOOL
ONE DAY AT A TIME

Clearly, the set of words was too small to do anything really interesting with, which is when I decided to make one file of all the lists, to get more words. But then . . . I couldn’t get the Markov code to work anymore. Kept getting this stupid error:

java.lang.StringIndexOutOfBoundsException: String index out of range: 4
at java.lang.String.substring(String.java:1765)
at Markov.feedLine(Markov.java:21)
at MarkovFilter.eachLine(MarkovFilter.java:12)
at com.decontextualize.a2z.TextFilter.internalRun(TextFilter.java:326)
at com.decontextualize.a2z.TextFilter.run(TextFilter.java:208)
at MarkovFilter.main(MarkovFilter.java:6)

And it took me one million years to determine that this was because the code was choking on the colon in

LISTEN TO THE WIND: THE STORY OF DR. GREG AND “THREE CUPS OF TEA”

and then the quotation marks, and then something else, and then I still don’t know what. It will not work if I include any lines after “GALLOP.” So, with that frustrating exception noted, here, finally, are the new titles:

I HOPE THE STORM FROM MY FATHERE, VODKA? IT’S ME, CHELSEA
THE FIVE LANE
MY HORIZONTAL LIFE OF DIFFERENT MIND THE LOVE DARE
THE BRIEF WONDROUS LIFE OF THERE, VODKA? IT’S WIFE
INSIDE THE REVOLUTION AHEAD
THE 4 DAY JUMP START
MAGNIFICENT AS ME, CHELSEA
THE ELEGANCE OF SOUL
THE ELEPHANTS
THE UNFORGOTTEN TO THE YANKEE YEARS
THE HEDGEHOG
THE COLORS
THE BOY
PROMISES IN HEAVEN
I HOPE THE TOTAL LIFE OF Z
HE’S DILEMMA
PROMISES IN THE TIPPING MIND SPLENDID SUNS
THE ZOOKEEPER’S JUST NOT THAT TO EXPECT WHEN YOUR LIFE OF NOW
TWILIGHT ANY AGE
A THOUSE OF DIFFERENT MINUTES IN HEAVEN
A BOLD FRESH PIECE OF OSCAR WAO
THE KIND OF HUMANITY
OUT OF GLORY
SAME KITE TIGER
FIGHT ANY AGE
MAGNIFICENT MIND SOUL
SAME KIND OF HUMANITY
THE FIVE LANGUAGES
THE MIDDLE WITH CARDS
TERMINAL FRESH PIE SOCIATE
90 MINUTES IN DEAD SILENCE
HE’S 2009 ACTIONARY AND SOLUTIONAL FRESH PIECE OF DIFFERENT MIND SOUL
NIGHT FOR ELEGANCE OF THERE, VODKA? IT’S WIFE
STORM FROM THERE, VODKA? IT’S JUST NOT THAT INTO YOUR LIFE OF THE LOVE DAY JUMP START
THE NIGHT ANY AGE

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.