{"id":401,"date":"2009-01-23T04:30:26","date_gmt":"2009-01-23T09:30:26","guid":{"rendered":"http:\/\/itp.indiamos.com\/blog\/?p=401"},"modified":"2009-04-14T01:13:26","modified_gmt":"2009-04-14T06:13:26","slug":"sieve-our-ships","status":"publish","type":"post","link":"https:\/\/itp.indiamos.com\/blog\/2009\/01\/23\/sieve-our-ships\/","title":{"rendered":"Sieve Our Ships"},"content":{"rendered":"<p><a href=\"http:\/\/books.google.com\/books?id=4HcBAAAAQAAJ&#038;pg=PA159\"><img loading=\"lazy\" src=\"https:\/\/i1.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/2009\/01\/life_of_nelson.png?resize=450%2C277\" alt=\"H.M.S. CAPTAIN boarding the SAN NICOLAS and SAN JOSEPH\" title=\"H.M.S. CAPTAIN boarding the SAN NICOLAS and SAN JOSEPH\" width=\"450\" height=\"277\" class=\"alignnone size-full wp-image-404\" srcset=\"https:\/\/i2.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/life_of_nelson.png?w=450&amp;ssl=1 450w, https:\/\/i2.wp.com\/itp.indiamos.com\/blog\/wp-content\/uploads\/life_of_nelson.png?w=400&amp;ssl=1 400w\" sizes=\"(max-width: 450px) 100vw, 450px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>For our first homework assignment in <a href=\"http:\/\/a2z.decontextualize.com\/\">Programming A\u2013Z<\/a>, we were asked to write a terminal command that would take some text in, do something to it, and spit the text out again.<br \/>\n<!--more--><\/p>\n<blockquote><p>Use a combination of the UNIX commands discussed in class (along with any other commands that you discover) to compose a text. Your \u201csource code\u201d for this assignment will simply consist of what you executed on the command line. Indicate what kind of source text the \u201cprogram\u201d expects, and give an example of what text it generates. Use <code>man<\/code> to discover command line options that you might not have known about (<code>grep -i<\/code> is a good one).<\/p><\/blockquote>\n<p>My default text to mess with is, for reasons that are too dorky to get into, <a href=\"http:\/\/www.gutenberg.org\/etext\/947\">Southey&#8217;s <em>Life of Nelson<\/em><\/a>, and I decided that it would be nifty to try to pull out a list of the ships&#8217; names, which are relatively easy to find because in the Gutenberg plain text edition, they&#8217;re typed in all caps.<\/p>\n<p>The first hitch is that there&#8217;s a lot of other stuff in the file that&#8217;s also in all caps, much of it contained in Project Gutenberg&#8217;s header and footer text. It shouldn&#8217;t be hard to trim those segments off, leaving just the contents of the book. Very handily, the text is preceded by a line that begins &#8220;<code>*** START<\/code>&#8221; and succeeded by a line that begins &#8220;<code>*** END<\/code>.&#8221; I&#8217;m sure there&#8217;s some way to find those lines and clip the excess using <code>vi<\/code>, but it didn&#8217;t seem to be something that could be accomplished in a single line of code. So I just pulled out <em>all<\/em> the lines containing a sequence of three or more capital letters.<\/p>\n<p>Second, it&#8217;d be nice to pull out a list of just the chunks of shouty text, without the nonshouty words around them. How do you do that? I don&#8217;t know yet. Some of the ships&#8217; names contain more than one word, so how do you keep those together? Probably using some more complex regular expression than what I&#8217;m currently capable of assembling. At the same time, some lines contain more than one ship name; how do you split those up?<\/p>\n<p>Eventually, I&#8217;d like to be able to solve these problems, alphabetize the list, and list the number of times each name appears. And I&#8217;d like to be able to replace the all-caps text with U&#038;lc, and then wrap each in some kind of tags, such as &lt;i&gt;&lt;\/i&gt;. I&#8217;ll probably fiddle with this again later, when I haven&#8217;t been huffing a lot of epoxy fumes or whatever it was that was reeking up NYCR this evening. In the meantime, however, here&#8217;s my command:<\/p>\n<blockquote><p><code>curl ftp:\/\/ibiblio.org\/pub\/docs\/books\/gutenberg\/9\/4\/947\/947.zip -s | gzip -d | grep -n '[A-Z]\\{3,\\}' >nelsoncaps.txt<\/code><\/p><\/blockquote>\n<p>The <code>-s<\/code> on the <code>curl<\/code> command keeps it from outputting a chatty report. The <code>-d<\/code> on <code>gzip<\/code> makes it uncompress instead of compressing; I could have just used <code>gunzip<\/code> instead. The <code>-n<\/code> on <code>grep<\/code> makes it print the line numbers, and the regular expression looks for any occurrence of three or more adjacent capital letters.<\/p>\n<p>And here&#8217;s some of what it pulls out:<\/p>\n<blockquote><p><code>2560:MINERVE engaged the former, which was commanded by D. Jacobo Stuart,<br \/>\n2562:during which the Spaniards lost 164 men, the SABINA struck. The Spanish<br \/>\n2564:board the MINERVE, when another enemy's frigate came up, compelled her<br \/>\n2568:came in sight. The BLANCHE, from which the CERES had got off, was far<br \/>\n2569:to windward, and the MINERVE escaped only by the anxiety of the enemy to<br \/>\n2617:I will have a long GAZETTE to myself. I feel that such an opportunity<br \/>\n2630:pendant on board the CAPTAIN, seventy-four, Captain R.W. Miller; and<br \/>\n2643:then joined, and the CULLODEN had parted company. Upon this information<br \/>\n2688:brought him into action with the SANTISSIMA TRINIDAD, one hundred and<br \/>\n2689:thirty-six; the SAN JOSEPH, one hundred and twelve; the SALVADOR DEL<br \/>\n2690:MUNDO, one hundred and twelve; the SAN NICOLAS, eighty; the SAN ISIDRO,<br \/>\n2692:in the CULLODEN, immediately joined, and most nobly supported him; and<br \/>\n2693:for nearly an hour did the CULLODEN and CAPTAIN maintain what Nelson<br \/>\n2696:derive from them. The BLENHEIM then passing between them and the enemy,<br \/>\n2698:SALVADOR DEL MUNDO and SAN ISIDRO dropped astern, and were fired into in<br \/>\n2699:a masterly style by the EXCELLENT, Captain Collingwood. The SAN<br \/>\n2700:ISIDRO struck; and Nelson thought that the SALVADOR struck also. \"But<br \/>\n2704:situation;\" for the CAPTAIN was at this time actually fired upon by<br \/>\n2705:three first-rates--by the SAN NICOLAS, and by a seventy-four, within<br \/>\n2706:about pistol-shot of that vessel. The BLENHEIM was ahead, the CULLODEN<br \/>\n2708:just astern, passed within ten feet of the SAN NICOLAS, giving her a<br \/>\n2709:most tremendous fire, then passed on for the SANTISSIMA TRINIDAD. The<br \/>\n2710:SAN NICOLAS luffing up, the SAN JOSEPH fell on board her, and Nelson<br \/>\n2711:resumed his station abreast of them, and close alongside. The CAPTAIN<br \/>\n2721:from the spritsail-yard, which locked in the SAN NICOLAS's main rigging.<\/code><\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>For our first homework assignment in Programming A\u2013Z, we were asked to write a terminal command that would take some text in, do something to it, and spit the text out again.<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[26,4],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3qY10-6t","_links":{"self":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/401"}],"collection":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/comments?post=401"}],"version-history":[{"count":13,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/401\/revisions"}],"predecessor-version":[{"id":610,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/posts\/401\/revisions\/610"}],"wp:attachment":[{"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/media?parent=401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/categories?post=401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itp.indiamos.com\/blog\/wp-json\/wp\/v2\/tags?post=401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}