Sieve Our Ships

H.M.S. CAPTAIN boarding the SAN NICOLAS and SAN JOSEPH

For our first homework assignment in Programming A–Z, we were asked to write a terminal command that would take some text in, do something to it, and spit the text out again.

Use a combination of the UNIX commands discussed in class (along with any other commands that you discover) to compose a text. Your “source code” for this assignment will simply consist of what you executed on the command line. Indicate what kind of source text the “program” expects, and give an example of what text it generates. Use man to discover command line options that you might not have known about (grep -i is a good one).

My default text to mess with is, for reasons that are too dorky to get into, Southey’s Life of Nelson, and I decided that it would be nifty to try to pull out a list of the ships’ names, which are relatively easy to find because in the Gutenberg plain text edition, they’re typed in all caps.

The first hitch is that there’s a lot of other stuff in the file that’s also in all caps, much of it contained in Project Gutenberg’s header and footer text. It shouldn’t be hard to trim those segments off, leaving just the contents of the book. Very handily, the text is preceded by a line that begins “*** START” and succeeded by a line that begins “*** END.” I’m sure there’s some way to find those lines and clip the excess using vi, but it didn’t seem to be something that could be accomplished in a single line of code. So I just pulled out all the lines containing a sequence of three or more capital letters.

Second, it’d be nice to pull out a list of just the chunks of shouty text, without the nonshouty words around them. How do you do that? I don’t know yet. Some of the ships’ names contain more than one word, so how do you keep those together? Probably using some more complex regular expression than what I’m currently capable of assembling. At the same time, some lines contain more than one ship name; how do you split those up?

Eventually, I’d like to be able to solve these problems, alphabetize the list, and list the number of times each name appears. And I’d like to be able to replace the all-caps text with U&lc, and then wrap each in some kind of tags, such as <i></i>. I’ll probably fiddle with this again later, when I haven’t been huffing a lot of epoxy fumes or whatever it was that was reeking up NYCR this evening. In the meantime, however, here’s my command:

curl ftp://ibiblio.org/pub/docs/books/gutenberg/9/4/947/947.zip -s | gzip -d | grep -n '[A-Z]\{3,\}' >nelsoncaps.txt

The -s on the curl command keeps it from outputting a chatty report. The -d on gzip makes it uncompress instead of compressing; I could have just used gunzip instead. The -n on grep makes it print the line numbers, and the regular expression looks for any occurrence of three or more adjacent capital letters.

And here’s some of what it pulls out:

2560:MINERVE engaged the former, which was commanded by D. Jacobo Stuart,
2562:during which the Spaniards lost 164 men, the SABINA struck. The Spanish
2564:board the MINERVE, when another enemy's frigate came up, compelled her
2568:came in sight. The BLANCHE, from which the CERES had got off, was far
2569:to windward, and the MINERVE escaped only by the anxiety of the enemy to
2617:I will have a long GAZETTE to myself. I feel that such an opportunity
2630:pendant on board the CAPTAIN, seventy-four, Captain R.W. Miller; and
2643:then joined, and the CULLODEN had parted company. Upon this information
2688:brought him into action with the SANTISSIMA TRINIDAD, one hundred and
2689:thirty-six; the SAN JOSEPH, one hundred and twelve; the SALVADOR DEL
2690:MUNDO, one hundred and twelve; the SAN NICOLAS, eighty; the SAN ISIDRO,
2692:in the CULLODEN, immediately joined, and most nobly supported him; and
2693:for nearly an hour did the CULLODEN and CAPTAIN maintain what Nelson
2696:derive from them. The BLENHEIM then passing between them and the enemy,
2698:SALVADOR DEL MUNDO and SAN ISIDRO dropped astern, and were fired into in
2699:a masterly style by the EXCELLENT, Captain Collingwood. The SAN
2700:ISIDRO struck; and Nelson thought that the SALVADOR struck also. "But
2704:situation;" for the CAPTAIN was at this time actually fired upon by
2705:three first-rates--by the SAN NICOLAS, and by a seventy-four, within
2706:about pistol-shot of that vessel. The BLENHEIM was ahead, the CULLODEN
2708:just astern, passed within ten feet of the SAN NICOLAS, giving her a
2709:most tremendous fire, then passed on for the SANTISSIMA TRINIDAD. The
2710:SAN NICOLAS luffing up, the SAN JOSEPH fell on board her, and Nelson
2711:resumed his station abreast of them, and close alongside. The CAPTAIN
2721:from the spritsail-yard, which locked in the SAN NICOLAS's main rigging.

One thought on “Sieve Our Ships”

  1. (Wow. And I just realized that I totally inadvertently matched my illustration to the fight that begins at the end of the sample output. I am so good!)

Leave a Reply

Your email address will not be published. Required fields are marked *