{"id":507,"date":"2009-02-13T07:21:14","date_gmt":"2009-02-13T12:21:14","guid":{"rendered":"http:\/\/itp.indiamos.com\/blog\/?p=507"},"modified":"2009-04-14T01:23:25","modified_gmt":"2009-04-14T06:23:25","slug":"jane-says","status":"publish","type":"post","link":"https:\/\/itp.indiamos.com\/blog\/2009\/02\/13\/jane-says\/","title":{"rendered":"\u201cHave you seen my wig around?\u201d"},"content":{"rendered":"<p><a rel=\"nofollow\" href=\"http:\/\/www.amazon.com\/gp\/product\/B000002LEE?ie=UTF8&#038;tag=indink-20&#038;linkCode=as2&#038;camp=211189&#038;creative=374929&#038;creativeASIN=B000002LEEindink-20\" ><img loading=\"lazy\" src=\"https:\/\/i0.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/2009\/02\/janesays.jpg?resize=460%2C163\" alt=\"Jane Says\" title=\"Jane Says\" width=\"460\" height=\"163\" class=\"alignnone size-full wp-image-512\" srcset=\"https:\/\/i2.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/janesays.jpg?w=460&amp;ssl=1 460w, https:\/\/i2.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/janesays.jpg?w=400&amp;ssl=1 400w\" sizes=\"(max-width: 460px) 100vw, 460px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>This week&#8217;s A2Z assignment was as follows:<\/p>\n<blockquote><p>Make a program that creatively transforms or performs analysis on a text using regular expressions. The program should take its input from the keyboard and send its to the screen (or redirect to\/from a file). Your program might (a) filter lines from the input, based on whether they match a pattern; (b) match and display certain portions of each line; (c) replace certain portions of each line with new text; or (d) any combination of the above.<\/p>\n<p>Sample ideas: Replace all words in a text of a certain length with a random word; find telephone numbers or e-mail addresses in a text; locate words within a word list that will have a certain score in Scrabble; etc.<\/p>\n<p>Bonus challenge 1: Use one or more features of regular expression syntax that we didn\u2019t discuss in class. Reference here.<\/p>\n<p>Bonus challenge 2: Use one or more features of the Pattern or Matcher class that we didn\u2019t discuss in class. Of particular interest: regex flags (CASE_INSENSITIVE, MULTILINE), \u201cback references\u201d in replaceAll. Matcher class reference here.\n<\/p><\/blockquote>\n<p>So, what I made is a program that tries to find the proper names in the input text (my preferred input being the first scene of <cite>Pride and Prejudice<\/cite>) and replace them with names of people in our class. For my purposes, a proper name is any capitalized word that (a) follows an honorific, such as Mr., Mrs., Sir, Lady, or Miss, <em>or<\/em> (b) does not immediately follow a carriage return or chunk of terminal punctuation (i.e., a period, exclamation point, or question mark, with or without a closing quotation mark thereafter), and (c) is neither a day of the week, a month of the year, nor the name of a holiday (though the only holiday it&#8217;s really looking for is Michaelmas, as that&#8217;s what appears in my P&#038;P extract).<\/p>\n<p>I tried for hours to do this in a compact way, by looking for the <em>absence<\/em> of certain words or characters (terminal punctuation, honorifics, days of the week, holidays) combined with the <em>presence<\/em> of other patterns, but it just would not fly. So the result requires a lot of intermediate steps, to avoid changing words that have been identified as not likely to be names. In a vain attempt to make testing my regular expressions quicker, I worked them out in BBEdit first, using the grep option in the search and replace panel. Saved me from having to recompile the thing fifty thousand times. And the expressions <em>worked<\/em> in BBEdit, for the most part. They do <em>not<\/em> work so well translated into Java, unfortunately&#8212;the success rate at finding names is much lower, and after ten straight hours, I just didn&#8217;t have the patience to troubleshoot it any more. But these glitches just make the results more funny, which is, of course, the point.<\/p>\n<p>So, here is the code:<\/p>\n<p>[java]import java.util.regex.*;<br \/>\nimport com.decontextualize.a2z.TextFilter;<\/p>\n<p>public class RegexNames extends TextFilter<br \/>\n{<br \/>\n  public static void main(String[] args)<br \/>\n  {<br \/>\n    new RegexNames().run();<br \/>\n  } \/\/ end main(String[] args)<\/p>\n<p>  private String search1  = &#8220;(Mr\\\\.|Mrs\\\\.|Lord|Lady|Sir) ([A-Z]\\\\w+)&#8221;;<br \/>\n  private String replace1 = &#8220;##NAME##&#8221;;<\/p>\n<p>  private String search2  = &#8220;(\\\\.\u201d |\\\\?\u201d |!\u201d |\\\\. |\\\\? |! |\\\\r|\\\\r\u201c)([A-Z]\\\\w+)&#8221;;<\/p>\n<p>  private String search3  = &#8220;( |\u201c|\\\\r)(Mr\\\\.|Mrs\\\\.|Lord|Lady|Sir|Miss|Monday|Tuesday|Wednesday|Friday|Saturday|Sunday|January|February|March|April|May|June|July|August|September|October|November|December|Michaelmas)&#8221;;<\/p>\n<p>  private String search4  = &#8220;( |\u201c)([A-Z][a-z]+)&#8221;;<\/p>\n<p>  private String [] class_names = new String [] {<br \/>\n    &#8220;Adam&#8221;, &#8220;Alejandro&#8221;, &#8220;Andrew&#8221;, &#8220;Bryan&#8221;, &#8220;Caroline&#8221;, &#8220;Dimitris&#8221;, &#8220;Jonathan&#8221;, &#8220;Joseph&#8221;, &#8220;Martin&#8221;, &#8220;Michael&#8221;, &#8220;Ozge&#8221;, &#8220;Sanjay&#8221;, &#8220;Steven&#8221;       };<br \/>\n  public void begin()<br \/>\n  {<br \/>\n    println(&#8220;* * *&#8221;);<br \/>\n  }<\/p>\n<p>  public void eachLine(String line)<br \/>\n  {<br \/>\n    String line_new;<\/p>\n<p>    Pattern p1 = Pattern.compile(search1);<br \/>\n    Matcher m1 = p1.matcher(line);<br \/>\n    if (m1.find())<br \/>\n    {<br \/>\n      line = line.replaceAll(m1.group(2), replace1);<br \/>\n    }<\/p>\n<p>    line_new = line;<\/p>\n<p>    Pattern p2 = Pattern.compile(search2);<br \/>\n    Matcher m2 = p2.matcher(line_new);<br \/>\n    if (m2.find())<br \/>\n    {<br \/>\n      line_new = line_new.replaceAll(m2.group(2), &#8220;##&#8221; + m2.group(2) + &#8220;##&#8221;);<br \/>\n    }<\/p>\n<p>    Pattern p3 = Pattern.compile(search3);<br \/>\n    Matcher m3 = p3.matcher(line_new);<br \/>\n    if (m3.find())<br \/>\n    {<br \/>\n      line_new = line_new.replaceAll(m3.group(2), &#8220;##&#8221; + m3.group(2) + &#8220;##&#8221;);<br \/>\n    }<\/p>\n<p>    Pattern p4 = Pattern.compile(search4);<br \/>\n    Matcher m4 = p4.matcher(line_new);<br \/>\n    if (m4.find())<br \/>\n    {<br \/>\n      line_new = line_new.replaceAll(m4.group(2), &#8220;##NAME##&#8221;);<br \/>\n    }<\/p>\n<p>    line_new = line_new.replace( &#8220;##NAME##&#8221;, class_names[(int)(Math.random() * 13)] );<\/p>\n<p>    line_new = line_new.replace( &#8220;##&#8221;, &#8220;&#8221; );<\/p>\n<p>    println(&#8220;\\t&#8221; + line_new);<br \/>\n  } \/\/ end eachLine(String line)<\/p>\n<p>  public void end()<br \/>\n  {<br \/>\n    println(&#8220;* * *&#8221;);<br \/>\n  }<\/p>\n<p>} \/\/ end RegexNames extends TextFilter[\/java]<\/p>\n<p>And here is the output (you can find the original text at <a href=\"http:\/\/www.gutenberg.org\/catalog\/world\/readfile?fk_files=907737\">Project Gutenberg<\/a>):<\/p>\n<blockquote><p>* * *<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;It is a truth universally acknowledged, that a single man in possession of a large fortune must be in want of a wife.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of someone or other of their daughters.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cAdam dear Mr. Adam,\u201d said his lady to him one day, \u201chave you heard that Netherfield Park is let at last?\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Mr. Bryan replied that he had not.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cSteven it is, returned she; \u201dfor Mrs. Steven has just been here, and she told me all about it.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Mr. Michael made no answer.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cDimitris you not want to know who has taken it?\u201d cried his wife impatiently.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cYOU want to tell me, and I have no objection to hearing it.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This was invitation enough.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cSteven, my dear, you must know, Mrs. Steven says that Netherfield is taken by a young man of large fortune from the north of England; that he came down on Monday in a chaise and four to see the place, and was so much delighted with it, that he agreed with Mr. Morris immediately; that he is to take possession before Michaelmas, and some of his servants are to be in the house by the end of next week.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cJoseph is his name?\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cAdam.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cDimitris he married or single?\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cBryan! Single, my dear, to be sure! A single man of large fortune; four or five thousand a year. What a fine thing for our girls!\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cHow so? How can it affect them?\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cBryan dear Mr. Bryan,\u201d replied his wife, \u201chow can you be so tiresome! You must know that I am thinking of his marrying one of them.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cOzge that his design in settling here?\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cCaroline! Nonsense, how can you talk so! But it is very likely that he MAY fall in love with one of them, and therefore you must visit him as soon as he comes.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cI see no occasion for that. You and the girls may go, or you may send them by themselves, which perhaps will be still better, for as you are as handsome as any of them, Mr. Michael may like you the best of the party.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cMichael dear, you flatter me. I certainly HAVE had my share of beauty, but I do not pretend to be anything extraordinary now. When a woman has five grown-up daughters, she ought to give over thinking of her own beauty.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cJonathan such cases, a woman has not often much beauty to think of.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cMichael, my dear, you must indeed go and see Mr. Michael when he comes into the neighbourhood.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cSteven is more than I engage for, I assure you.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cJonathan consider your daughters. Only think what an establishment it would be for one of them. Sir Jonathan and Lady Lucas are determined to go, merely on that account, for in general, you know, they visit no newcomers. Indeed you must go, for it will be impossible for US to visit him if you do not.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cSanjay are over-scrupulous, surely. I dare say Mr. Sanjay will be very glad to see you; and I will send a few lines by you to assure him of my hearty consent to his marrying whichever he chooses of the girls; though I must throw in a good word for my little Lizzy.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cI desire you will do no such thing. Lizzy is not a bit better than the others; and I am sure she is not half so handsome as Michael, nor half so good-humoured as Lydia. But you are always giving HER the preference.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cAndrew have none of them much to recommend them,\u201d replied he; \u201cthey are all silly and ignorant like other girls; but Lizzy has something more of quickness than her sisters.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cMr. Bryan, how CAN you abuse your own children in such a way? You take delight in vexing me. You have no compassion for my poor nerves.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\u201cCaroline mistake me, my dear. I have a high respect for your nerves. They are my old friends. I have heard you mention them with consideration these last twenty years at least.\u201d<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Mr. Dimitris was so odd a mixture of quick parts, sarcastic humour, reserve, and caprice, that the experience of three-and- twenty years had been insufficient to make his wife understand his character. HER mind was less difficult to develop. Dimitris was a woman of mean understanding, little information, and uncertain temper. When she was discontented, she fancied herself nervous. The business of her life was to get her daughters married; its solace was visiting and news.<br \/>\n* * *<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>This week&#8217;s A2Z assignment was as follows: Make a program that creatively transforms or performs analysis on a text using regular expressions. The program should take its input from the keyboard and send its to the screen (or redirect to\/from a file). Your program might (a) filter lines from the input, based on whether they &hellip; <a href=\"https:\/\/itp.indiamos.com\/blog\/2009\/02\/13\/jane-says\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">\u201cHave you seen my wig around?\u201d<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[26,4,35],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3qY10-8b","_links":{"self":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/507"}],"collection":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/comments?post=507"}],"version-history":[{"count":13,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/507\/revisions"}],"predecessor-version":[{"id":619,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/507\/revisions\/619"}],"wp:attachment":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/media?parent=507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/categories?post=507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/tags?post=507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}