While there is considerable text preparation and internal tooling work being done by the Digital Tolkien Project that can’t be openly shared, I’ve been thinking for a while about public tools that do not violate the copyright holder’s rights. Today I’m happy to launch the first such tool.
I’ve taken the text of The Hobbit, Lord of the Rings, and the Silmarillion all structured according to the project’s citation systems (which meant finishing the system for the Silmarillion—more on that soon) and indexed sequences of up to seven words, folding case and stripping all punctuation and diacritics. I’ve also added the Letters structured just to the individual letter (and no further at the moment).
This means you can start typing a sequence of words and interactively have that sequence searched across those works with immediately results. Counts are aggregated at each level of the citation hierarchy so you can see at a glance how many types the sequence occurs by work, chapter, etc.
It also provides a way to look up the citation reference for a passage by starting to type the passage.
Because no text is displayed other than what the user types in, there isn’t a problem with copyright.
Over time, I plan to expand the works covered (even if initially just with fairly coarse citation systems to the chapter level) and to also include things like relative frequency and some visualizations.