A key ingredient in the sort of digital classics I do on a daily basis is a citation scheme for unambiguously referring to specific passages in a canonical text. Once you have a text marked up structurally, a way of addressing into that structure becomes fairly easy, you just need to map how the structure and the citation scheme relate.

If you say 5.3, what does that mean? The 3rd what in the 5th what? For a particular book of the New Testament it might mean “chapter five verse three” (although note there are multiple versification schemes for the Bible). For Homer it might be “the third line of the fifth book”.

But what would it mean in a novel? The third paragraph of the fifth chapter? That seems reasonable. But what does it mean once the novel contains poetry, or letters? It comes down to just deciding what chunks you want to count.

I had done that earlier with The Hobbit, as had L. F. S. Alden before me. There were a couple of differences, to which I’ll return in another long-promised post. I was keen to do the same for The Lord of the Rings and The Silmarillion (and eventually other texts as they are digitised and marked up). An agreed-upon citation scheme is crucial to doing stand-off annotation so it’s important that whatever is developed works for multiple parties.

I recently found out about Erik Mueller-Harder’s work on the LR Citation System (henceforth, LRC). Back in February, Tolkien scholar and friend Luke Shelton had said to me:

“I was wondering if you were familiar with [Erik’s] work? […] I get the feeling the two of you would get on pretty well!”.

I had the sincere pleasure of spending time with Erik at Tolkien 2019, we did along along well, and we talked a lot about how to collaborate.

I suggested one of the first things I should do is see how well the structural markup I’d done for The Lord of the Rings mapped to the LRC. I have not yet marked up the appendicies or frontmatter so everything from this point refers just to the main text across the six books of The Lord of the Rings.

LRC references look like 1.08.084 which means Book 1, Chapter 8, “chunk” 84 where, roughly speaking each new paragraph or verse stanza is a new “chunk”. This scheme is pretty much identical to what I’d done for The Hobbit and envisaged for The Lord of the Rings so I didn’t forsee many problems. In fact, at first it looked like what Erik considered a “chunk” aligned completely with my own judgement.

One difference is that the LRC doesn’t include chapter titles but as I’d personally numbered them as chunk 000 of the chapter anyway it turned out not to be a problem as Erik started with 001 which we both took to be the first paragraph of the chapter after the title.

But then I hit a number of problems. In some cases it was a straight mistake on my part. Sometimes a pagebreak confused things but a check of other editions proved the LRC correct. In at least one case Hammond and Scull’s corrigenda had a correction to the printed text I was checking and the LRC had taken that correction into account.

Here’s a full list of the differences (with LRC reference) that resulted in me changing my XML.

  • 1.09.075 stanza break I did not have (is this a mistake in the 50th anniversary edition?)
  • 2.04.183 stanza break I had that wasn’t one (confusion due to page break)
  • 2.04.188 stanza break I had that wasn’t one (confusion due to page break)
  • 2.08.081 stanza break I did not have (confusion due to page break)
  • 3.02.065 paragraph break I did not have
  • 3.02.222 paragraph break I did not have
  • 3.03.006 paragraph break I wrongly had (due to page break)
  • 3.05.086 paragraph break I did not have
  • 3.07.105 paragraph break I wrongly had (due to page break)
  • 3.11.080 paragraph break I did not have
  • 3.11.081 paragraph break I did not have
  • 5.01.068 paragraph break I wrongly had (corrected in Hammond and Scull’s corrigenda)
  • 6.01.064 stanza break I did not have (due to page break)
  • 6.02.079 paragraph break I did not have
  • 6.08.111 paragraph break I did not have
  • 6.08.112 LRC treats the Horn-cry as a single chunk despite the line breaks
  • 6.09.060 LRC treats the book title as a single chunk despite the line breaks

There were three other changes that just had to do with me tweaking what I considered a “chunk” for counting purposes:

  • 1.02.074 LRC treats ring inscription as its own chunk
  • 1.10.0731.10.082 how to chunk Gandalf’s letter
  • 6.09.0606.09.062 how to chunk the title and subtitles Frodo added to Bilbo’s book

These didn’t require me to change my XML at all, just the code that generates references for a given chunk of XML. One of the things that’s nice about having code to do this is I have a clearly defined, deterministic specification of how to get the references from a text.

So in short: I have no disagreements with how the LRC has been done for the main text. What slightly differences there were with my approach, I was happy to change my mind on and more importantly, Erik’s work identified mistakes in my markup which are now fixed.

There are still the appendices to do. And a lot to decide about the other books. But I’m confident in saying that the LRC (with the addition of using 000 for chapter titles) is the referencing scheme The Digital Tolkien Project will be using for the main text of The Lord of the Rings from now on.