Prosaic: A New Approach to Computer Poetry

On Kawara, detail of "One Million Years" (2012)

Working manually with cut-up phrases, humans tend toward contrivance and missed details. Our subconscious will pick one phrase over another or tie certain words together according to some plan or idea our conscious selves may not be fully aware of. Often, this is fine: but there is a wealth of beauty that only the dead logic of code can dredge up for us.

And it is with recognition for this human poetic shortcoming that computer-assisted poetic authorship, often known as computer poetry, has existed for a long time.

In 1962, the Laboratory for Automata Research publicized its Auto-Beatnik, a program which was essentially a glorified Mad Libs. Humans prepared a grammatical form (like a narrative with blanks) and a word-list and used an algorithm to randomly produce works, like "Roses," through their combination.

Auto-Beatnik, 1962

Few fingers go like narrow laughs.
An ear won’t keep few fishes,
Who is that rose in that blind house?
And all slim, gracious, blind planes are coming,
They cry badly along a rose,
To leap is stuffy, to crawl was tender.

In a way, this was worse than Mad Libs: the word-list was chosen and bound up front instead of spontaneously chosen by a player. Any interesting result was thanks mainly to the clever construction of grammatical forms and word lists and just partly to the computer's randomness. Poor computer: it could have been replaced by a disciplined writer and some dice (similar to John Cage's aleatoric compositions).

Later years saw systems like RACTER and McGONAGALL. The former was hardly different from Auto-Beatnik. The latter is a tremendous feat of engineering that is an important contribution to artificial intelligence, but is at its root still fundamentally an extension of the Mad Libs format: human-prepared grammar, human-prepared word-lists, and computerized randomization.

These approaches produce works that routinely fail to compare to fully-original human-authored poetry. The grammatical forms are too limiting, the word lists are too contrived, and the practice of producing works too mechanical and formal.

Running parallel to these attempts is the venerable Markov Chain Generator technique. This process takes an input text—a book, essay, poem, or combinations thereof—and produces a jumbled output using substrings of various length. It’s an approach that has been adopted far more eagerly by technically-inclined poets for its ease of use and surprising results.

Markov generation works with n-grams, or chunks of text of length n. The process involves an input text—a book, poem, essay, or combinations thereof—and outputs a jumbled version of the same text in which n-grams have been shuffled.

Where n = 1, the individual characters of the input are shuffled. This is not very interesting. However, for higher values of n, the output begins to resemble its input in form, often preserving its grammatical coherence while introducing new semantic meaning.

Typically with Markov Chains one works with a small, focused selection of works and performs many acts of revision over the results. One can get far with this approach. Tools such as Gnoetry (and jGnoetry) allow a human poet to work very closely with this process and produce compelling works. "Wholesome Intercourse," by Eric Elshtain, was composed using Gnoetry v0.2:

Wholesome Intercourse
Eric Elshtain & Gnoetry, 2013

It needed no expert eye
to the sweet
and wholesome

intercourse of men in my
own till mine
too heavy

grew, yet now he was the least
push of my
hatchet. He

was the captain’s refrain. He
never heard
that, said I.

Montgomery intimated
that was thine
before; I

missed mine. What an afternoon
for me lay
in torment!

My work in computer poetry, embodied in software called Prosaic, differs from both of these family trees in that it was created in response to the staggering amount of content in the Internet. Humans are incapable of even imagining the amount of textual data available online, let alone processing or reading even a thousandth of it.

Prosaic directly implements the cut-up technique popularized by William S. Burroughs as a way to make sense of and produce beautiful works from this gluttony of text. Prosaic lives out an assumption that, given enough text, a poet can produce meaningful works from the (possibly) mundane through juxtaposition and combination.

Historically, cut-up is a manual process. An author takes some printed text—either written by her or found—and physically cuts it, creating piles of single words, phrases, sentences or paragraphs. These are then rearranged however the author sees fit. Often the reassembly is somehow randomized.

Burroughs used the technique to produce prose narratives like The Nova Trilogy. He also produced poems, such as "Cancer Men... These Individuals Are Marked Foe..."

Cancer Men. . . These Individuals Are Marked Foe. .  .
Wiliam S. Burroughs, 1959

at land coccus germs
by a bacilmouth Jersy phenicol bitoics
the um vast and varied that
specific target was the vast popul – - – -

the vast
cancers that surgery and Xrays C
In the United States the Americi
is considered well worth our feet. . .

Ociety racks up the score like
sons will become new cancer pee
a third of them. . .

Surgery & Radiation be saved
this leaves 225 000
resistant o rso widely
surgeons and radiologists

These individuals are marked foe. . .

“For these the
the opinion of Dr. Robert P
Dushinski with £ fluoro
he helped synthesize
cancer men
growth in some cases
is worth 12,000 dollars $$

cancer men. $

. . . these individuals are marked foe. . .

Burroughs saw this process as an act of divination: a way to get at meaning within text (or mulitple texts) that would be missed upon a linear read. One is limited, however, by the amount of text they can physically cut-up and rearrange: Prosaic exists to apply this process to massive corpora.

Prosaic's process begins with plain text files containing English text. For a given file, prosaic extracts phrases: roughly, whole sentences and sentence clauses that can stand on their own. For each of these phrases Prosaic observes and records a number of characteristics:

  • number of syllables
  • end rhyme sound
  • word stems
  • source file name
  • line position offset in source

These marked-up phrases are stored in a database.

The actual poetic process involves templates written by a human. They are not like the grammatical templates from RACTER or similar, but are instead line-by-line descriptions of a desired poetic output. They are very flexible and can approximate traditional forms like sonnet, haiku or limerick or be used to create new forms.

For each line, a poet can combine zero or more rules from the following:

  • number of syllables
  • exact keyword (phrase contains word)
  • fuzzy keyword (phrase is near other phrases containing keyword)
  • position in rhyme scheme

Prosaic takes such a template and, for the chosen marked-up corpus, attempts to piece together a poem that corresponds to the rules chosen for each line.

Inevitably Prosaic will fail to find an exact match for a line: when this happens, the algorithm "weakens" one of the rules and continues its search. For example, if Prosaic is looking for a 7 syllable phrase that contains the word "thunderclap" and cannot find it, it may

instead accept a six or eight syllable phrase or a seven syllable line that occurs near a phrase with the word "thunderclap" in it.

If all of the rules are exhausted Prosaic simply picks a random line from the corpus.

Working with Prosaic is a process of trial and error. The poet writes a template, does several runs, tweaks the template and continues until something inspiring comes out.

In my work with Prosaic I edit lightly. Line re-ordering is the most frequent operation, followed by line-editing (dropping words), pronoun-agreement changes and punctuation fixes. In some circumstances I will write original lines that finish a rhyme or add some extra coherence.

The Terminal is Red Over Black
Nathaniel Smith & Prosaic, 2013. From a corpus of 31 Cyberpunk novels.

the eurocops know who he is:
their blue fatigues were spotless,

i’d begun to choke.
but i won’t be dead,
the metamartians claim.

another dozen heartbeats,
the glowing girl said,
and then he was up.

A Treacherous Line
Nathaniel Smith & Prosaic, 2012. From a corpus of 200,000 phrases from Project Gutenberg.

from these to emanate
their cries echoed dismay
“We are sure that he cannot reincarnate.”
nervous laughter echoed through the bay.

the other two were Exeter
they felt the double strain and tug;
he will be there next to her,
the treacherous line smug.

her very choice:
new jersey.
she’ll read joyce
on the anniversary.

half choked with sewer gas
none save the rats will pass.

Article Haiku
Nathaniel Smith & Prosaic, 2013. From this article.

given enough text
from a corpus of novels
to crawl is tender

yet he looked mundane
his word lists were too contrived
and then he was up

wholesome intercourse:
he will be there next to her

Prosaic succeeds in maintaining the original voice and structure of its input since it works at the phrase level. Moreover, it performs better—in other words, produces more interesting and varied output—as its corpus grows. Most importantly, Prosaic enables a human poet to accomplish feats impossible without a computer.

Humans are also lacking when it comes to comprehending large swaths of text. When we use a tool like Prosaic on internet-scale corpora we surpass what would otherwise take teams of people working years to accomplish. One billion tweets, every book in the library of congress, every book in every library on Earth, the entirety of the Huffington Post mixed with the Drudge Report: given enough storage space and computing power, software like Prosaic can uncover works of art hidden within these corpora.

What Burroughs did with a typewriter, scissors, and glue to newspapers and manuscripts, Prosaic does to gigabytes' worth of digital text.


Nathaniel Smith


Follow us on Twitter | Facebook | Subscribe by e-mail