Punctuation and Structure in Marking Up Direct Speech

18th May, 2019 / James Tauber

As work continues on the markup of The Lord of the Rings, many of the issues discussed previously with regard to The Hobbit apply. A first pass is almost done, but there is an interesting challenge with Gandalf’s reading of the inscription on Balin’s tomb.

Before we get to that, though, there is a more general issue of direct speech which I want to discuss first and which applies to The Hobbit as well. The marking up of direct speech is an important prelude to many analyses of interest and so it’s an important part of what we’re doing.

Consider the following from Farewell to Lórien, chapter 8 of book 2 of The Lord of the Rings.

There was a silence. ‘They all resolved to go forward,’ said Galadriel looking in their eyes.

‘As for me,’ said Boromir, ‘my way home lies onward and not back.’

‘That is true,’ said Celeborn, ‘but is all this Company going with you to Minas Tirith?’

‘We have not decided our course,’ said Aragorn. ‘Beyond Lothlórien I do not know what Gandalf intended to do. Indeed I do not think that even he had any clear purpose.’

The extract above begins with silence and then Galadriel speaks the sentence:

They all resolved to go forward.

Notice that, although I’ve put one here, there is no actual period in the text. This is a standard (although not universal) typographical practice of using a comma before the “said [character name]” but placing it inside the quotation marks. If we were building a corpus of utterances by Galadriel, the sentence would have a period. But it is not represented in the text itself. The structure of the text is:

  • open single quotation mark indicating the start of direct speech
  • the spoken words but with a final period omitted
  • a comma and closing single quotation mark
  • “said” [character name]
  • (in this case) a participle phrase modifying the verb “said”
  • a period

The next three paragraphs, however, involve the spoken words being interrupted by the “said [character name]”. In the first two (from Boromir and Celeborn) we get the structure:

  • open single quotation mark indicating the start of direct speech
  • the initial part of the spoken words
  • a comma and closing single quotation mark
  • “said” [character name]
  • another comma
  • another open single quotation mark
  • the rest of the utterance, this time ending in a period or question mark
  • a closing quotation mark

Boromir’s overall utterance is thus something like:

As for me, my way home lies onward and not back.

and Celeborn’s:

That is true, but is all this Company going with you to Minas Tirith?

Note that my use of a comma in each case is interpretive as the comma in the corresponding place in the text is ambiguous as we’ll soon see.

But now consider Aragorn’s direct speech. It is not a single sentence. As made clear by the capitalised “Beyond…”, his utterance is:

We have not decided our course. Beyond Lothlórien I do not know what Gandalf intended to do. Indeed I do not think that even he had any clear purpose.

Notice that, as in the Galadriel line, the first period is dropped when reporting this direct speech. However, notice that a period is used after “said Aragorn”:

...course,’ said Aragorn. ‘Beyond...

in contrast to

...true,’ said Celeborn, ‘but...

The pattern is that if X says:

A, B.

it is typeset:

‘A,’ said X, ‘B.’

whereas if X says:

A. B.

it is typeset

‘A,’ said X. ‘B.’

The question for the person doing the markup is how best to convey this in a way that the actual utterance can be extracted (with the correct punctuation and structure) if that is one of the goals.

Before we come to the example of Gandalf translating the inscription on Balin’s tomb, there is one other point to come back to in the Farewell to Lórien opening.

It actually begins:

‘Now is the time,’ he said, ‘when those who wish to continue the Quest must harden their hearts to leave this land. ...

Here, Celeborn’s utterance begins:

Now is the time when those who wish to continue the Quest must harden their hearts to leave this land.

This highlights the ambiguity of the first comma in the text. In the previous example from Celeborn and the one from Boromir, there is arguably a comma that should be supplied when extracting the utterance from the text. It is harder to argue for one here. And so we have an ambiguity.

In other words

‘A,’ said X, ‘B.’

could mean that X said

A, B.

or that X said

A B.

Again, the key question is: can we mark up the text in such a way that these subtleties are captured for accurate extraction of the actual direct speech intended while also retaining the actual text appearing on the page?

* * *

A more complex example arises in Gandalf’s reading of the inscription on Balin’s tomb. In this case I’ve actually taken a photo of the page in the printed text:

It’s clear from our discussion already that “used of old in Moria” is the end of a sentence, even though a comma is used and not a period—not only because the continuation “Here is written…” is a new sentence and so requires it but because “said Gandalf” ends with a period.

But it is what follows that I want to draw attention to now. We have direct speech from Gandalf that includes a translation of an inscription. The complexity is that the visual indication of direct speech and the visual indication of the inscription interfere with each other in an interesting way.

The direct speech is indicated with single quotation marks. A period at the end (within the quote, of course) indicates the end of Gandalf’s sentence beginning “Here is…”. The period is part of Gandalf’s direct speech, and not (presumably) the inscription.

The inscription is indicated by centered, small caps text with a blank line above and below.

Both visual devices seem straightforward until you try to mark up what is going on in XML. The complexity essentially amounts to overlapping markup. Remember that the period is part of the direct speech and the quotation mark indicates the end of that direct speech. But based on the centering and the following blank line, the visual indication of the inscription doesn’t end until after the direct speech. So we effectively have:

  • opening single quotation mark indicating the start of direct speech
  • the beginning of Gandalf’s spoken sentence
  • the blank line, centering and switch to small caps to indicate an inscription Gandalf is translating
  • the period at the end of Gandalf’s spoken sentence
  • the closing single quotation mark indicating the end of direct speech
  • the blank line and end of centered small caps indicating the inscription

Trying to represent that as-is with element structure is the equivalent, for those of you who know HTML, of doing something like <i><b>...</i></b> which is, of course, malformed XML.

So how do we capture it? I’m keen to hear from others marking up novels and similar texts.

Everything I’ve discussed about the presentation in the text is perfectly in keeping with standard typographical conventions in novels. There’s nothing strange or unusual about it in that sense. A reader would never be confused. The challenge is how best to describe it with markup in a way that makes it clear what the original text is, what the individual components are, and which enables subsequent extraction and analysis.

Many thanks to Paul O’Rear and Francesco Mambrini for discussions of the issues dealt with here.