Comparalator

date merchant

As you may recall, for my midterm project, I got stumped on several seemingly simple tasks. One of those—the most important, since upon it depends my semester-long assignment for Mainstreaming Information—was figuring out a way to compare one list of words to another and pull out the words that were unique to one of those lists. In my head, I can see very easily how this would be done. Given my special way of haphazardly flailing through code, however, I just couldn’t get it to work.

Until today!

In fiddling with the Bayesian comparison code for this week’s homework, I finally pulled out a list of unique words. Of course, this is a completely perverse misuse of that code—like using a steamroller to kill a pillbug—but as long as it works, I don’t fucking care.

So, here’s what I did. In BayesClassifier.java, I replaced the last two for loops with the following:

for (String word: uniqueWords) 
    {
      for (BayesCategory bcat: categories) 
      {
        double wordProb = bcat.relevance(word, categories);
        if (wordProb < 1)
        {
        println(word);
        }
        else {}
      } // end for bcat
    } // end for word

    for (BayesCategory bcat: categories) 
    {
      double score = bcat.score(uniqueWords, categoryWordTotal);
      println("---The following words were not found in " + bcat.getName());
    } // end for bcat

And in BayesCategory.java I replaced the percentage and relevance blocks with

 public double percentage(String word) 
  {
    if (count.containsKey(word)) 
    {
      return count.get(word);
    } // end if
    else 
    {
      return 0.001;
    } // end else
  } // end percentage

  public double relevance(String word, ArrayList<BayesCategory> categories) 
  {
    double percentageSum = 0;
    for (BayesCategory bcat: categories) 
    {
      percentageSum += bcat.percentage(word);
    } // end for bcat
    return percentage(word);
  } // end relevance

So now, if I run the command

$ java BayesClassifier A2_unique.txt < B1_unique.txt | sort >results.txt

I get a list of words that are in B1_unique.txt (The Masada Scroll by Paul Block and Robert Vaughan, 2007) but not in A2_unique.txt (Zuleika Dobson or, An Oxford Love Story by Max Beerbohm, 1911). For example,

Akbar, Allah, Allahu, Apostolic, Ariminum, Arkadiane, Asmodeus, Astaroth, Barabbas, Beelzebub, Bellarmino, Blavatsky, Brandeis, Breviary, Byzantine, Caiaphas, Calpurnius, Catacombs, Charlemagne, Clambering, DNA, Diavolo, Franciscan, Freemasons, GPS, Gymnasium, Haddad, Hades, IDs, IRA, Jettisoning, Kathleen, Lefkovitz, MD, MRI, Masada, Masonic, Muhammad, Muhammadan, Nazarene, Nazareth, Olympics, Orthodoxy, Palatine, Palazzi, Palestine, Palestinian, Palestinians, Petrovna, Pleasant, Plenty, Plunge, Pocketing, Pontiff, Pontifical, Pontius, Praetorian, Prissy, Professors, Protestants, Rasulullaah, Ratsach, Revving, Rosicrucians, Satan, Scrolls, Seder, Shakespeare, Syracuse, Tacitus, Theosophical, Torah, Trastevere, Turkish, USB, Uzi, VAIO, VCR, Yeah, Yechida, Yeetgadal, Yiddish, adrenalin, agita, airliner, airport, ankh, awesome, bitch, bomb, bookstores, braked, breastplate, briefcase, broadsword, broiler, brotherhood, bulrushes, cellular, checkpoint, chuckling, chutzpah, combatant, computer, dashboard, database, departmental, desktop, divorce, dysentery, electricity, enabling, entrepreneurs, firearms, firestorm, fishtailed, flagon, forensics, goatskin, groggily, gunfire, gunman, gunshots, handbag, handball, handbrake, handgun, helicopter, helmets, highwaymen, hijinks, homeland, homeless, homespun, hometown, innkeeper, internship, journalist, kebob, kidnappers, kilometers, lab, laptop, lyre, mawkish, monitor, muezzin, nickname, nightfall, nonbeliever, northeaster, notebook, notepad, notepaper, numerology, paganism, password, pastries, phone, photo, photocopies, photocopy, photograph, photos, pig, pigeons, pistol, playback, police, quintessentially, recycles, redialed, roadblock, roadway, sandwich, screensaver, site, sites, submachine, superheating, synagogue, taped, taxi, terrorism, terrorist, terrorists, thousandfold, thrashing, toga, tortured, trigonometry, universe, unto, vegetables, vehicles, video, videotape, vinegar, violence, warehouses, waterfall, welfare, wholeheartedly, whoosh, whore, windshield, worker, workstation, worldwide, yardstick, yarmulkes, yeetkadash, zooming

And if I run the comparison in the opposite direction, I come up with words such as

Abernethy, Abiding, Abimelech, Abyssinian, Academically, Academy, Accidents, Achillem, Adam, Adieu, Admirably, Age, Agency, Agents, Alas, Albert, Alighting, America, Atlantic, Australia, Balliol, Baron, Baronet, Britannia, Broadway, Brobdingnagian, Colonials, Cossacks, Crimea, Devon, Dewlap, Duchess, Duke, Dukedom, Earl, Edwardian, Egyptians, Elizabethan, Englishmen, Englishwoman, Europe, Holbein, Ireland, Iscariot, Isis, Japanese, Kaiser, Liberals, London, Madrid, Meistersinger, Messrs, Monsieur, Napoleon, Novalis, Papist, Parnassus, President, Prince, Professor, Prussians, Romanoff, Segregate, Slavery, Socrates, Switzerland, Tzar, Victoria, Wagnerian, Waterloo, Whithersoever, Zeus, absinthes, acolyte, adventures, affrights, affront, afire, afoot, aforesaid, aggravated, album, analogy, anarchy, ankle, ape, aright, aristocracy, ataraxy, automatically, avalanche, avow, balustrade, bandboxes, bank, beastliest, beau, beauteous, billiards, biography, bodyguard, bosky, boyish, broadcast, bruited, bulldog, businesslike, bustle, calorific, casuistry, catkins, chaperons, chidden, cigarettes, clergyman, cloven, comet, compeers, coquetry, cricket, crinolines, custard, dandiacal, dapperest, decanter, devil, dialogue, diet, dipsomaniacal, disemboldened, disinfatuate, drunken, ebullitions, equipage, exigent, eyelashes, eyelids, farthingales, female, femininity, fishwife, fob, forefather, forerunners, freemasonry, furbelows, gallimaufry, goodlier, gooseberry, gorgeous, gypsy, haberdasher, halfpence, handicapped, handicraft, handiwork, handwriting, hearthrug, helpless, hip, hireling, honeymoon, housemaid, housework, hoyden, hussy, idiotic, impertinent, impudence, inasmuch, incognisant, insipid, insolence, insouciance, item, keyboard, landau, legerdemain, loathsome, luck, maid, maidens, manhood, manumission, matador, maunderers, model, mushroom, nasty, newspaper, noodle, nosegay, novel, oarsmen, omnisubjugant, ostler, otiose, parasol, pinafore, poetry, poltroonery, postprandially, prank, prestidigitators, propinquity, queer, romance, sackcloth, salad, sardonic, saucy, schoolmaster, seraglio, sex, skimpy, skirt, snuff, socialistic, streetsters, surcease, surcoat, swooned, teens, telegram, telegraphs, thistledown, thither, thou, threepenny, tomboyish, toys, tradesmen, treacle, ugly, uncouthly, unvexed, vassalage, waylay, welter, wigwam, witchery, withal, woe, woebegone, womanly, womenfolk, wonderfully, wonderingly, wretchedness, wrought, yacht, yesternight, zounds

Exciting!

Leave a Reply

Your email address will not be published. Required fields are marked *