{"id":534,"date":"2009-02-24T07:30:47","date_gmt":"2009-02-24T12:30:47","guid":{"rendered":"http:\/\/itp.indiamos.com\/blog\/?p=534"},"modified":"2009-04-14T01:21:17","modified_gmt":"2009-04-14T06:21:17","slug":"kill-me-now","status":"publish","type":"post","link":"https:\/\/itp.indiamos.com\/blog\/2009\/02\/24\/kill-me-now\/","title":{"rendered":"Kill me now."},"content":{"rendered":"<p><a href=\"http:\/\/www.flickr.com\/photos\/blah_oh_well\/1507400172\/\"><img loading=\"lazy\" src=\"https:\/\/i1.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/2009\/02\/stabby_mcknife.jpg?resize=460%2C251\" alt=\"Stabby McKnife\" title=\"Stabby McKnife\" width=\"460\" height=\"251\" class=\"alignnone size-full wp-image-535\" srcset=\"https:\/\/i0.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/stabby_mcknife.jpg?w=460&amp;ssl=1 460w, https:\/\/i0.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/stabby_mcknife.jpg?w=400&amp;ssl=1 400w\" sizes=\"(max-width: 460px) 100vw, 460px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>Oh, honestly.<\/p>\n<p>For once I actually set out to be a slacker on the homework. The assignment began,<\/p>\n<blockquote><p>Modify, augment, or replace one of the in-class examples. A few ideas, in order of increasing complexity:<\/p>\n<ul>\n<li>Make Unique.java insensitive to case (i.e., \u201cFoo\u201d and \u201cfoo\u201d should not count as different words).<\/li>\n<li>Modify WordCount.java to count something other than just words (e.g., particular characters, bigrams, co-occurrences of words, etc.). . . .<\/li>\n<\/ul>\n<\/blockquote>\n<p>So I thought, &#8220;Today I feel like doing the easy thing. I&#8217;ll just take that first option.&#8221;<\/p>\n<p>Yeah, right. Many hours later, after trying several extremely complicated methods of doing this really fucking simple thing, I finally found the rat-simple method that I&#8217;d been looking for all along and had all but given up hope of. Goddamnit.<\/p>\n<p>The original code was this:<\/p>\n<p>[java]import java.util.HashSet;<br \/>\nimport com.decontextualize.a2z.TextFilter;<\/p>\n<p>public class Unique extends TextFilter {<br \/>\n  public static void main(String[] args) {<br \/>\n    new Unique().run();<br \/>\n  }<\/p>\n<p>  private HashSet<String> uniqueWords = new HashSet<String>();<\/p>\n<p>  public void eachLine(String line) {<br \/>\n    String[] tokens = line.split(&#8220;\\\\W+&#8221;);<br \/>\n    for (String t: tokens) {<br \/>\n      uniqueWords.add(t);<br \/>\n    }<br \/>\n  }<\/p>\n<p>  public void end() {<br \/>\n    for (String word: uniqueWords) {<br \/>\n      println(word);<br \/>\n    }<br \/>\n  }<br \/>\n}[\/java]<\/p>\n<p>and what I came up with after way too much beating my head against the desk is this:<\/p>\n<p>[java]import java.util.HashSet;<br \/>\nimport java.util.regex.*;<br \/>\nimport com.decontextualize.a2z.TextFilter;<\/p>\n<p>public class UniqueCI extends TextFilter<br \/>\n{<br \/>\n  public static void main(String[] args)<br \/>\n  {<br \/>\n    new UniqueCI().run();<br \/>\n  } \/\/ end main<\/p>\n<p>  private HashSet<String> uniqueWords = new HashSet<String>();<br \/>\n  private HashSet<String> lowercaseWords = new HashSet<String>();<\/p>\n<p>  public void eachLine(String line)<br \/>\n  {<br \/>\n    String[] tokens = line.split(&#8220;\\\\W+&#8221;);<br \/>\n    for (String t: tokens)<br \/>\n    {<br \/>\n    \t\/\/ If hashset that&#8217;s all lowercased contains t all lowercased, then don&#8217;t add anything.<\/p>\n<p>\t\tString tLower = t.toLowerCase();<\/p>\n<p>\t\tif (lowercaseWords != null &#038;&#038; lowercaseWords.contains(tLower))<br \/>\n\t\t{<br \/>\n\t\t} \/\/ end if<br \/>\n\t\telse if (lowercaseWords != null)<br \/>\n\t\t{<br \/>\n\t\t\tuniqueWords.add(t);<br \/>\n\t\t\tlowercaseWords.add(tLower);<br \/>\n \t\t} \/\/ end else<br \/>\n    } \/\/ end for<br \/>\n  } \/\/ end eachLine<\/p>\n<p>  public void end()<br \/>\n  {<br \/>\n    for (String word: uniqueWords) {<br \/>\n      println(word);<br \/>\n    } \/\/ end for<br \/>\n  } \/\/ end end<br \/>\n} \/\/ end class UniqueCI[\/java]<\/p>\n<p>If you put this in,<\/p>\n<blockquote><p>It is a truth universally acknowledged, that a single man in possession of a large fortune must be in want of a wife.<br \/>\nIt Is A Truth Universally Acknowledged, That A Single Man In Possession Of A Large Fortune Must Be In Want Of A Wife.<\/p><\/blockquote>\n<p>you get this out:<\/p>\n<blockquote>\n<pre>of\r\npossession\r\nwife\r\ntruth\r\nbe\r\nlarge\r\nIt\r\nfortune\r\nuniversally\r\nsingle\r\nthat\r\nacknowledged\r\nman\r\na\r\nmust\r\nwant\r\nis\r\nin<\/pre>\n<\/blockquote>\n<p>Big whoop. I wish I could say I learned a lot from this, but I think all I learned is that I&#8217;m much more lost than I thought I was.<\/p>\n<p><span style=\"color:gray; font-size:smaller\">Photo: <a href=\"http:\/\/www.flickr.com\/photos\/blah_oh_well\/1507400172\/\">The Downward Knife<\/a> by Jill Greenseth; <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/deed.en\">some rights reserved<\/a>.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Oh, honestly. For once I actually set out to be a slacker on the homework. The assignment began, Modify, augment, or replace one of the in-class examples. A few ideas, in order of increasing complexity: Make Unique.java insensitive to case (i.e., \u201cFoo\u201d and \u201cfoo\u201d should not count as different words). Modify WordCount.java to count something &hellip; <a href=\"https:\/\/itp.indiamos.com\/blog\/2009\/02\/24\/kill-me-now\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Kill me now.<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[26,4,35],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3qY10-8C","_links":{"self":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/534"}],"collection":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/comments?post=534"}],"version-history":[{"count":9,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/534\/revisions"}],"predecessor-version":[{"id":615,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/534\/revisions\/615"}],"wp:attachment":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/media?parent=534"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/categories?post=534"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/tags?post=534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}