{"id":573,"date":"2009-03-31T07:59:12","date_gmt":"2009-03-31T12:59:12","guid":{"rendered":"http:\/\/itp.indiamos.com\/blog\/?p=573"},"modified":"2009-04-14T01:36:50","modified_gmt":"2009-04-14T06:36:50","slug":"markov-bookshelf","status":"publish","type":"post","link":"https:\/\/itp.indiamos.com\/blog\/2009\/03\/31\/markov-bookshelf\/","title":{"rendered":"Markov bookshelf"},"content":{"rendered":"<p><img loading=\"lazy\" src=\"https:\/\/i1.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/2009\/03\/bestseller-blog-image.jpg?resize=400%2C172\" alt=\"best-seller covers\" title=\"best-seller covers\" width=\"400\" height=\"172\" class=\"alignnone size-full wp-image-630\" data-recalc-dims=\"1\" \/><\/p>\n<p>Oh, well. We&#8217;re back to kill-me-now territory.<\/p>\n<p>This week&#8217;s assignment was<\/p>\n<blockquote><p>Get some XML from a web service. Extract some interesting information from the XML and use it as input to one of your homework assignments from previous weeks.<\/p>\n<p>You could do this either by piping the output of your XML parsing program to the input of your previously implemented program, or by using a class (e.g., instantiate and use the MarkovChain class).<\/p><\/blockquote>\n<p>So I thought, cool, I&#8217;ll scramble the <cite>New York Times<\/cite> Best Seller titles using their <a href=\"http:\/\/developer.nytimes.com\/docs\/best_sellers_api\/\">shiny new API<\/a>. Okay, so, I got an API key and managed to pull in lists of titles using&#8212;<\/p>\n<p>[java]import org.dom4j.Document;<br \/>\nimport org.dom4j.io.SAXReader;<br \/>\nimport org.dom4j.Element;<br \/>\nimport java.util.List;<\/p>\n<p>public class BestSeller<br \/>\n{<br \/>\n  public static void main(String[] args) throws Exception<br \/>\n  {<\/p>\n<p>\/*  Input should be one of the NY Times Best Seller list categories.<br \/>\n    Available categories:<br \/>\n    hardcover-fiction<br \/>\n    hardcover-nonfiction<br \/>\n    hardcover-advice<br \/>\n    paperback-nonfiction<br \/>\n    paperback-advice<br \/>\n    trade-fiction-paperback<br \/>\n    picture-books<br \/>\n    chapter-books<br \/>\n    paperback-books<br \/>\n    series-books<br \/>\n    mass-market-paperback<br \/>\n*\/<br \/>\n    String listName = args[0];<\/p>\n<p>    SAXReader reader = new SAXReader();<br \/>\n    EasyHTTPGet getter = new EasyHTTPGet(<br \/>\n      &#8220;http:\/\/api.nytimes.com\/svc\/books\/v2\/lists\/&#8221; + listName + &#8220;.xml?api-key=f25de92cf1d2615621b68d2d31f81b63:4:1703697&#8221;<br \/>\n    );<\/p>\n<p>    Document document = reader.read(getter.responseAsInputStream());<br \/>\n    List bookTitle = document.selectNodes(&#8220;\/\/results\/book\/book_details\/book_detail\/title&#8221;);<\/p>\n<p>\/\/  If I wanted to scramble the authors, . . .<br \/>\n\/\/  List bookAuthor =<br \/>\n\/\/  document.selectNodes(&#8220;\/\/results\/book\/book_details\/book_detail\/author&#8221;);<\/p>\n<p>    for (Object o: bookTitle)<br \/>\n    {<br \/>\n      Element elem = (Element)o;<br \/>\n      String text = elem.getText();<br \/>\n      System.out.println(text);<br \/>\n    } \/\/ end for<br \/>\n  } \/\/ end main<br \/>\n} \/\/ end class[\/java]<\/p>\n<p>I had tried to make an array of category names and then loop through them, but I got lost after the first bit&#8212;<\/p>\n<p>[java]import org.dom4j.Document;<br \/>\nimport org.dom4j.io.SAXReader;<br \/>\nimport org.dom4j.Element;<br \/>\nimport java.util.List;<\/p>\n<p>public class BestSellerAll<br \/>\n{<br \/>\n  public static void main(String[] args) throws Exception<br \/>\n  {<\/p>\n<p>\/*  There are 11 categories of NY Times Best Seller lists. I want to download all of them, but the Times makes you download them individually.<br \/>\n*\/<\/p>\n<p>    String [] book_cats = new String [] {<br \/>\n    &#8220;hardcover-fiction&#8221;, &#8220;hardcover-nonfiction&#8221;, &#8220;hardcover-advice&#8221;, &#8220;paperback-nonfiction&#8221;, &#8220;paperback-advice&#8221;, &#8220;trade-fiction-paperback&#8221;, &#8220;picture-books&#8221;, &#8220;chapter-books&#8221;, &#8220;paperback-books&#8221;, &#8220;series-books&#8221;, &#8220;mass-market-paperback&#8221;};<\/p>\n<p>\/\/    String listName = args[0];<\/p>\n<p>    SAXReader [] reader = new SAXReader() [];<br \/>\n    EasyHTTPGet [] getter = new EasyHTTPGet [];<br \/>\n    for (int i = 0; i < 11; i++)\n    {\n        getter[i] = \"http:\/\/api.nytimes.com\/svc\/books\/v2\/lists\/\" + book_cats[n] + \".xml?api-key=f25de92cf1d2615621b68d2d31f81b63:4:1703697\";\n    } \/\/ end for i\n\n\/\/ I got sadly confused somewhere in this block:\n    Document [] document = new Document [];\n    for (int j = 0; j < 11; j++)\n    {\n        document[j] = reader.read(getter[j].responseAsInputStream());\n        List bookTitle = document.selectNodes(\"\/\/results\/book\/book_details\/book_detail\/title\");\n    } \/\/ end for j\n\n\/\/  If I wanted to scramble the authors, . . .\n\/\/  List bookAuthor = \n\/\/  document.selectNodes(\"\/\/results\/book\/book_details\/book_detail\/author\");\n\n    for (Object o: bookTitle) \n    {\n      Element elem = (Element)o;\n      String text = elem.getText();\n      System.out.println(text);\n    } \/\/ end for bookTitle\n  } \/\/ end main\n} \/\/ end class[\/java]\n\nSo, semimanually, then, I ran this on one category list at a time and used cat to agglomerate them. So then I had my list of all the current best sellers' titles:\n\n\n\n<blockquote>HANDLE WITH CARE<br \/>\nCORSAIR<br \/>\nTHE ASSOCIATE<br \/>\nTHE HOST<br \/>\nRUN FOR YOUR LIFE<br \/>\nPROMISES IN DEATH<br \/>\nDEAD SILENCE<br \/>\nHEART AND SOUL<br \/>\nONE DAY AT A TIME<br \/>\nNIGHT AND DAY<br \/>\nTHE GUERNSEY LITERARY AND POTATO PEEL PIE SOCIETY<br \/>\nWHITE WITCH, BLACK CURSE<br \/>\nPATHS OF GLORY<br \/>\nTERMINAL FREEZE<br \/>\nFOOL<br \/>\nTHE HELP<br \/>\nDON&#8217;T LOOK TWICE<br \/>\nFAULT LINE<br \/>\nTRUE COLORS<br \/>\nSTORM FROM THE SHADOWS<br \/>\nOUTLIERS<br \/>\nHOUSE OF CARDS<br \/>\nTHE YANKEE YEARS<br \/>\nOUT OF CAPTIVITY<br \/>\nDEWEY<br \/>\nTHE LOST CITY OF Z<br \/>\nA LION CALLED CHRISTIAN<br \/>\nA BOLD FRESH PIECE OF HUMANITY<br \/>\nMY BOOKY WOOK<br \/>\nTHE UNFORGIVING MINUTE<br \/>\nINSIDE THE REVOLUTION<br \/>\nMELTDOWN<br \/>\nARE YOU THERE, VODKA? IT\u2019S ME, CHELSEA<br \/>\nJESUS, INTERRUPTED<br \/>\nJOKER ONE<br \/>\nNO ANGEL<br \/>\nLORDS OF FINANCE<br \/>\nTHE NEXT 100 YEARS<br \/>\nMULTIPLE BLESSINGS<br \/>\nPICKING COTTON<br \/>\nACT LIKE A LADY, THINK LIKE A MAN<br \/>\nTHE LAST LECTURE<br \/>\nTHE POWER OF SOUL<br \/>\nTHE SECRET<br \/>\nTHE ULTRAMIND SOLUTION<br \/>\nFLAT BELLY DIET!<br \/>\nTHE GREAT DEPRESSION AHEAD<br \/>\nPEAKS AND VALLEYS<br \/>\nUNCOMMON<br \/>\nTHE SURVIVORS CLUB<br \/>\nMAGNIFICENT MIND AT ANY AGE<br \/>\nTHE TOTAL MONEY MAKEOVER<br \/>\nEMOTIONAL FREEDOM<br \/>\nTHE 4 DAY DIET<br \/>\nFIGHT FOR YOUR MONEY<br \/>\nTHREE CUPS OF TEA<br \/>\nTHE MIDDLE PLACE<br \/>\nI HOPE THEY SERVE BEER IN HELL<br \/>\nDREAMS FROM MY FATHER<br \/>\nTHE TIPPING POINT<br \/>\nEAT, PRAY, LOVE<br \/>\nTHE AUDACITY OF HOPE<br \/>\nMY HORIZONTAL LIFE<br \/>\n90 MINUTES IN HEAVEN<br \/>\nTEAM OF RIVALS<br \/>\nSAME KIND OF DIFFERENT AS ME<br \/>\nBLINK<br \/>\nMARLEY &#038; ME<br \/>\nTHE FORGOTTEN MAN<br \/>\nTHE OMNIVORE\u2019S DILEMMA<br \/>\nTHE ZOOKEEPER\u2019S WIFE<br \/>\nBEAUTIFUL BOY<br \/>\nA WHOLE NEW MIND<br \/>\nANIMAL, VEGETABLE, MIRACLE<br \/>\nINFIDEL<br \/>\nTHE LOVE DARE<br \/>\nWHAT TO EXPECT WHEN YOU\u2019RE EXPECTING<br \/>\nEMERGENCY<br \/>\nSUZE ORMAN\u2019S 2009 ACTION PLAN<br \/>\nNATURALLY THIN<br \/>\nTWILIGHT<br \/>\nSKINNY BITCH<br \/>\nTHE FIVE LOVE LANGUAGES<br \/>\nTHE POWER OF NOW<br \/>\nHAPPY FOR NO REASON<br \/>\nTHE PURPOSE-DRIVEN LIFE<br \/>\nA NEW EARTH<br \/>\nTHE BIGGEST LOSER 30-DAY JUMP START<br \/>\nHE\u2019S JUST NOT THAT INTO YOU<br \/>\nTHE BIGGEST LOSER FAMILY COOKBOOK<br \/>\nTHE SHACK<br \/>\nTHE READER<br \/>\nFIREFLY LANE<br \/>\nAMERICAN WIFE<br \/>\nSUNDAYS AT TIFFANY\u2019S<br \/>\nPEOPLE OF THE BOOK<br \/>\nA THOUSAND SPLENDID SUNS<br \/>\nTAKE ONE<br \/>\nTHE ALCHEMIST<br \/>\nSARAH\u2019S KEY<br \/>\nTHE MIRACLE AT SPEEDY MOTORS<br \/>\nTHE BRIEF WONDROUS LIFE OF OSCAR WAO<br \/>\nREVOLUTIONARY ROAD<br \/>\nWATER FOR ELEPHANTS<br \/>\nTHE WHITE TIGER<br \/>\nSTILL ALICE<br \/>\nTHE ELEGANCE OF THE HEDGEHOG<br \/>\nLOVING FRANK<br \/>\nTHE KITE RUNNER<br \/>\nLUSH LIFE<br \/>\nTHE HOUSE IN THE NIGHT<br \/>\nTHE COMPOSER IS DEAD<br \/>\nBLUEBERRY GIRL<br \/>\nLISTEN TO THE WIND: THE STORY OF DR. GREG AND &#8220;THREE CUPS OF TEA&#8221;<br \/>\nLADYBUG GIRL AND BUMBLEBEE BOY<br \/>\nGALLOP!<br \/>\nCAT<br \/>\nSWING!<br \/>\nNAKED MOLE RAT GETS DRESSED<br \/>\nALL IN A DAY<br \/>\nMILES TO GO<br \/>\nTHE GRAVEYARD BOOK<br \/>\nTHIRTEEN REASONS WHY<br \/>\nSCAT<br \/>\nTHE HUNGER GAMES<br \/>\nFADE<br \/>\nTHREE CUPS OF TEA<br \/>\n3 WILLOWS<br \/>\nSEEKERS: GREAT BEAR LAKE<br \/>\nTHE MYSTERIOUS BENEDICT SOCIETY AND THE PERILOUS JOURNEY<br \/>\nEVERMORE<br \/>\nTHE BOY IN THE STRIPED PAJAMAS<br \/>\nTHE BOOK THIEF<br \/>\nTHREE CUPS OF TEA: YOUNG READERS EDITION<br \/>\nTWEAK<br \/>\nWICKED: WITCH AND CURSE<br \/>\nCORALINE<br \/>\nTHE MYSTERIOUS BENEDICT SOCIETY<br \/>\nTHE TALE OF DESPEREAUX<br \/>\nSLAM<br \/>\nTHE TWILIGHT SAGA<br \/>\nHOUSE OF NIGHT<br \/>\nDIARY OF A WIMPY KID<br \/>\nTHE 39 CLUES<br \/>\nTHE CLIQUE<br \/>\nPERCY JACKSON &#038; THE OLYMPIANS<br \/>\nHARRY POTTER<br \/>\nNIGHT WORLD<br \/>\nKISSED BY AN ANGEL<br \/>\nINKHEART<br \/>\nTHE WHOLE TRUTH<br \/>\nHOLD TIGHT<br \/>\nBONES<br \/>\nTHE GRAND FINALE<br \/>\nPLAGUE SHIP<br \/>\nLOST SOULS<br \/>\nMONTANA CREEDS: DYLAN<br \/>\nDANGER IN A RED DRESS<br \/>\nTHE APPEAL<br \/>\nTHE MACKADE BROTHERS: RAFE AND JARED<br \/>\nMAVERICK<br \/>\nSMALL FAVOR<br \/>\nANGELS AND DEMONS<br \/>\nTHE READER<br \/>\nSECRETS<br \/>\nFIRST COMES MARRIAGE<br \/>\nCHASING DARKNESS<br \/>\nSHADOW COMMAND<br \/>\nTEMPTATION RIDGE<br \/>\nCONFESSIONS OF A SHOPAHOLIC<\/p><\/blockquote>\n<p>And then, all I wanted to fucking do was run the Markov code from week 5 and make some crazy new titles. I had it working earlier in this process, when I was working with just the hardcover fiction list, and came up with these completely uninteresting results:<\/p>\n<blockquote><p>TERMINAL FREEZE<br \/>\nTERMINAL FREEZE<br \/>\nTERMINAL FREEZE<br \/>\nTHE ASSOCIETY<br \/>\nONE DAY AT A TIME<br \/>\nNIGHT AND SOUL<br \/>\nONE DAY AT A TIME<br \/>\nFOOL<br \/>\nSTORM FROM THE HELP<br \/>\nFOOL<br \/>\nTERMINAL FREEZE<br \/>\nCORSAIR<br \/>\nONE DAY AT A TIME<br \/>\nTRUE COLORS<br \/>\nFOOL<br \/>\nONE DAY AT A TIME<\/p><\/blockquote>\n<p>Clearly, the set of words was too small to do anything really interesting with, which is when I decided to make one file of all the lists, to get more words. But then . . . I couldn&#8217;t get the Markov code to work anymore. Kept getting this stupid error:<\/p>\n<blockquote><p><code>java.lang.StringIndexOutOfBoundsException: String index out of range: 4<br \/>\n\tat java.lang.String.substring(String.java:1765)<br \/>\n\tat Markov.feedLine(Markov.java:21)<br \/>\n\tat MarkovFilter.eachLine(MarkovFilter.java:12)<br \/>\n\tat com.decontextualize.a2z.TextFilter.internalRun(TextFilter.java:326)<br \/>\n\tat com.decontextualize.a2z.TextFilter.run(TextFilter.java:208)<br \/>\n\tat MarkovFilter.main(MarkovFilter.java:6)<\/code><\/p><\/blockquote>\n<p>And it took me one million years to determine that this was because the code was choking on the colon in<\/p>\n<blockquote><p>LISTEN TO THE WIND: THE STORY OF DR. GREG AND &#8220;THREE CUPS OF TEA&#8221;<\/p><\/blockquote>\n<p>and then the quotation marks, and then something else, and then I still don&#8217;t know what. It will not work if I include any lines after &#8220;GALLOP.&#8221; So, with that frustrating exception noted, here, finally, are the new titles:<\/p>\n<blockquote><p>I HOPE THE STORM FROM MY FATHERE, VODKA? IT\u2019S ME, CHELSEA<br \/>\nTHE FIVE LANE<br \/>\nMY HORIZONTAL LIFE OF DIFFERENT MIND THE LOVE DARE<br \/>\nTHE BRIEF WONDROUS LIFE OF THERE, VODKA? IT\u2019S WIFE<br \/>\nINSIDE THE REVOLUTION AHEAD<br \/>\nTHE 4 DAY JUMP START<br \/>\nMAGNIFICENT AS ME, CHELSEA<br \/>\nTHE ELEGANCE OF SOUL<br \/>\nTHE ELEPHANTS<br \/>\nTHE UNFORGOTTEN TO THE YANKEE YEARS<br \/>\nTHE HEDGEHOG<br \/>\nTHE COLORS<br \/>\nTHE BOY<br \/>\nPROMISES IN HEAVEN<br \/>\nI HOPE THE TOTAL LIFE OF Z<br \/>\nHE\u2019S DILEMMA<br \/>\nPROMISES IN THE TIPPING MIND SPLENDID SUNS<br \/>\nTHE ZOOKEEPER\u2019S JUST NOT THAT TO EXPECT WHEN YOUR LIFE OF NOW<br \/>\nTWILIGHT ANY AGE<br \/>\nA THOUSE OF DIFFERENT MINUTES IN HEAVEN<br \/>\nA BOLD FRESH PIECE OF OSCAR WAO<br \/>\nTHE KIND OF HUMANITY<br \/>\nOUT OF GLORY<br \/>\nSAME KITE TIGER<br \/>\nFIGHT ANY AGE<br \/>\nMAGNIFICENT MIND SOUL<br \/>\nSAME KIND OF HUMANITY<br \/>\nTHE FIVE LANGUAGES<br \/>\nTHE MIDDLE WITH CARDS<br \/>\nTERMINAL FRESH PIE SOCIATE<br \/>\n90 MINUTES IN DEAD SILENCE<br \/>\nHE\u2019S 2009 ACTIONARY AND SOLUTIONAL FRESH PIECE OF DIFFERENT MIND SOUL<br \/>\nNIGHT FOR ELEGANCE OF THERE, VODKA? IT\u2019S WIFE<br \/>\nSTORM FROM THERE, VODKA? IT\u2019S JUST NOT THAT INTO YOUR LIFE OF THE LOVE DAY JUMP START<br \/>\nTHE NIGHT ANY AGE<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Oh, well. We&#8217;re back to kill-me-now territory. This week&#8217;s assignment was Get some XML from a web service. Extract some interesting information from the XML and use it as input to one of your homework assignments from previous weeks. You could do this either by piping the output of your XML parsing program to the &hellip; <a href=\"https:\/\/itp.indiamos.com\/blog\/2009\/03\/31\/markov-bookshelf\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Markov bookshelf<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[26,4,35],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3qY10-9f","_links":{"self":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/573"}],"collection":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/comments?post=573"}],"version-history":[{"count":4,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/573\/revisions"}],"predecessor-version":[{"id":631,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/573\/revisions\/631"}],"wp:attachment":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/media?parent=573"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/categories?post=573"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/tags?post=573"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}