{"id":560,"date":"2009-03-24T05:48:43","date_gmt":"2009-03-24T10:48:43","guid":{"rendered":"http:\/\/itp.indiamos.com\/blog\/?p=560"},"modified":"2009-10-22T22:09:22","modified_gmt":"2009-10-23T03:09:22","slug":"comparalator","status":"publish","type":"post","link":"https:\/\/itp.indiamos.com\/blog\/2009\/03\/24\/comparalator\/","title":{"rendered":"Comparalator"},"content":{"rendered":"<p><a href=\"http:\/\/www.flickr.com\/photos\/nypl\/3109979241\/\"><img loading=\"lazy\" src=\"https:\/\/i2.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/2009\/03\/date_merchant.jpg?resize=383%2C348\" alt=\"date merchant\" title=\"date merchant\" width=\"383\" height=\"348\" class=\"alignnone size-full wp-image-562\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>As you may recall, for my midterm project, I got <a href=\"\/2009\/03\/10\/a2z-midterm-vocabu-lame\/\">stumped<\/a> on several seemingly simple tasks. One of those&#8212;the most important, since upon it depends my semester-long assignment for Mainstreaming Information&#8212;was figuring out a way to compare one list of words to another and pull out the words that were unique to one of those lists. In my head, I can see very easily how this would be done. Given my special way of haphazardly flailing through code, however, I just couldn&#8217;t get it to work.<\/p>\n<p>Until today!<\/p>\n<p>In fiddling with the <a href=\"http:\/\/www.decontextualize.com\/teaching\/a2z\/bayesed-and-confused\/\">Bayesian comparison code<\/a> for this week&#8217;s homework, I finally pulled out a list of unique words. Of course, this is a completely perverse misuse of that code&#8212;like using a steamroller to kill a pillbug&#8212;but as long as it works, I don&#8217;t fucking care.<\/p>\n<p>So, here&#8217;s what I did. In BayesClassifier.java, I replaced the last two <code>for<\/code> loops with the following:<\/p>\n<p>[java]for (String word: uniqueWords)<br \/>\n    {<br \/>\n      for (BayesCategory bcat: categories)<br \/>\n      {<br \/>\n        double wordProb = bcat.relevance(word, categories);<br \/>\n        if (wordProb < 1)\n        {\n        println(word);\n        }\n        else {}\n      } \/\/ end for bcat\n    } \/\/ end for word\n\n    for (BayesCategory bcat: categories) \n    {\n      double score = bcat.score(uniqueWords, categoryWordTotal);\n      println(\"---The following words were not found in \" + bcat.getName());\n    } \/\/ end for bcat[\/java]\n\nAnd in BayesCategory.java I replaced the percentage and relevance blocks with\n\n[java] public double percentage(String word) \n  {\n    if (count.containsKey(word)) \n    {\n      return count.get(word);\n    } \/\/ end if\n    else \n    {\n      return 0.001;\n    } \/\/ end else\n  } \/\/ end percentage\n\n  public double relevance(String word, ArrayList<BayesCategory> categories)<br \/>\n  {<br \/>\n    double percentageSum = 0;<br \/>\n    for (BayesCategory bcat: categories)<br \/>\n    {<br \/>\n      percentageSum += bcat.percentage(word);<br \/>\n    } \/\/ end for bcat<br \/>\n    return percentage(word);<br \/>\n  } \/\/ end relevance[\/java]<\/p>\n<p>So now, if I run the command <\/p>\n<blockquote><p><code>$ java BayesClassifier A2_unique.txt < B1_unique.txt | sort >results.txt<\/code><\/p><\/blockquote>\n<p>I get a list of words that are in B1_unique.txt (<cite><a rel=\"nofollow\" href=\"http:\/\/www.amazon.com\/gp\/product\/0765313383?ie=UTF8&amp;tag=indink-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0765313383indink-20\" >The Masada Scroll<\/a><\/cite> by Paul Block and Robert Vaughan, 2007) but not in A2_unique.txt (<cite><a href=\"http:\/\/books.google.com\/books?id=ko0NAAAAYAAJ\">Zuleika Dobson or, An Oxford Love Story<\/a><\/cite> by Max Beerbohm, 1911). For example,<\/p>\n<blockquote><p>Akbar, Allah, Allahu, Apostolic, Ariminum, Arkadiane, Asmodeus, Astaroth, Barabbas, Beelzebub, Bellarmino, Blavatsky, Brandeis, Breviary, Byzantine, Caiaphas, Calpurnius, Catacombs, Charlemagne, Clambering, DNA, Diavolo, Franciscan, Freemasons, GPS, Gymnasium, Haddad, Hades, IDs, IRA, Jettisoning, Kathleen, Lefkovitz, MD, MRI, Masada, Masonic, Muhammad, Muhammadan, Nazarene, Nazareth, Olympics, Orthodoxy, Palatine, Palazzi, Palestine, Palestinian, Palestinians, Petrovna, Pleasant, Plenty, Plunge, Pocketing, Pontiff, Pontifical, Pontius, Praetorian, Prissy, Professors, Protestants, Rasulullaah, Ratsach, Revving, Rosicrucians, Satan, Scrolls, Seder, Shakespeare, Syracuse, Tacitus, Theosophical, Torah, Trastevere, Turkish, USB, Uzi, VAIO, VCR, Yeah, Yechida, Yeetgadal, Yiddish, adrenalin, agita, airliner, airport, ankh, awesome, bitch, bomb, bookstores, braked, breastplate, briefcase, broadsword, broiler, brotherhood, bulrushes, cellular, checkpoint, chuckling, chutzpah, combatant, computer, dashboard, database, departmental, desktop, divorce, dysentery, electricity, enabling, entrepreneurs, firearms, firestorm, fishtailed, flagon, forensics, goatskin, groggily, gunfire, gunman, gunshots, handbag, handball, handbrake, handgun, helicopter, helmets, highwaymen, hijinks, homeland, homeless, homespun, hometown, innkeeper, internship, journalist, kebob, kidnappers, kilometers, lab, laptop, lyre, mawkish, monitor, muezzin, nickname, nightfall, nonbeliever, northeaster, notebook, notepad, notepaper, numerology, paganism, password, pastries, phone, photo, photocopies, photocopy, photograph, photos, pig, pigeons, pistol, playback, police, quintessentially, recycles, redialed, roadblock, roadway, sandwich, screensaver, site, sites, submachine, superheating, synagogue, taped, taxi, terrorism, terrorist, terrorists, thousandfold, thrashing, toga, tortured, trigonometry, universe, unto, vegetables, vehicles, video, videotape, vinegar, violence, warehouses, waterfall, welfare, wholeheartedly, whoosh, whore, windshield, worker, workstation, worldwide, yardstick, yarmulkes, yeetkadash, zooming<\/p><\/blockquote>\n<p>And if I run the comparison in the opposite direction, I come up with words such as<\/p>\n<blockquote><p>Abernethy, Abiding, Abimelech, Abyssinian, Academically, Academy, Accidents, Achillem, Adam, Adieu, Admirably, Age, Agency, Agents, Alas, Albert, Alighting, America, Atlantic, Australia, Balliol, Baron, Baronet, Britannia, Broadway, Brobdingnagian, Colonials, Cossacks, Crimea, Devon, Dewlap, Duchess, Duke, Dukedom, Earl, Edwardian, Egyptians, Elizabethan, Englishmen, Englishwoman, Europe, Holbein, Ireland, Iscariot, Isis, Japanese, Kaiser, Liberals, London, Madrid, Meistersinger, Messrs, Monsieur, Napoleon, Novalis, Papist, Parnassus, President, Prince, Professor, Prussians, Romanoff, Segregate, Slavery, Socrates, Switzerland, Tzar, Victoria, Wagnerian, Waterloo, Whithersoever, Zeus, absinthes, acolyte, adventures, affrights, affront, afire, afoot, aforesaid, aggravated, album, analogy, anarchy, ankle, ape, aright, aristocracy, ataraxy, automatically, avalanche, avow, balustrade, bandboxes, bank, beastliest, beau, beauteous, billiards, biography, bodyguard, bosky, boyish, broadcast, bruited, bulldog, businesslike, bustle, calorific, casuistry, catkins, chaperons, chidden, cigarettes, clergyman, cloven, comet, compeers, coquetry, cricket, crinolines, custard, dandiacal, dapperest, decanter, devil, dialogue, diet, dipsomaniacal, disemboldened, disinfatuate, drunken, ebullitions, equipage, exigent, eyelashes, eyelids, farthingales, female, femininity, fishwife, fob, forefather, forerunners, freemasonry, furbelows, gallimaufry, goodlier, gooseberry, gorgeous, gypsy, haberdasher, halfpence, handicapped, handicraft, handiwork, handwriting, hearthrug, helpless, hip, hireling, honeymoon, housemaid, housework, hoyden, hussy, idiotic, impertinent, impudence, inasmuch, incognisant, insipid, insolence, insouciance, item, keyboard, landau, legerdemain, loathsome, luck, maid, maidens, manhood, manumission, matador, maunderers, model, mushroom, nasty, newspaper, noodle, nosegay, novel, oarsmen, omnisubjugant, ostler, otiose, parasol, pinafore, poetry, poltroonery, postprandially, prank, prestidigitators, propinquity, queer, romance, sackcloth, salad, sardonic, saucy, schoolmaster, seraglio, sex, skimpy, skirt, snuff, socialistic, streetsters, surcease, surcoat, swooned, teens, telegram, telegraphs, thistledown, thither, thou, threepenny, tomboyish, toys, tradesmen, treacle, ugly, uncouthly, unvexed, vassalage, waylay, welter, wigwam, witchery, withal, woe, woebegone, womanly, womenfolk, wonderfully, wonderingly, wretchedness, wrought, yacht, yesternight, zounds<\/p><\/blockquote>\n<p>Exciting!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As you may recall, for my midterm project, I got stumped on several seemingly simple tasks. One of those&#8212;the most important, since upon it depends my semester-long assignment for Mainstreaming Information&#8212;was figuring out a way to compare one list of words to another and pull out the words that were unique to one of those &hellip; <a href=\"https:\/\/itp.indiamos.com\/blog\/2009\/03\/24\/comparalator\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Comparalator<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[26,46,19,4,35,36,16],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3qY10-92","_links":{"self":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/560"}],"collection":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/comments?post=560"}],"version-history":[{"count":11,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/560\/revisions"}],"predecessor-version":[{"id":730,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/560\/revisions\/730"}],"wp:attachment":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/media?parent=560"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/categories?post=560"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/tags?post=560"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}