Citation SystemsSearch TolkienCite Tolkiennew!
Tolkien GlossaryLittle Delvings
YouTube VideosDiscord Server

Digital Tolkien Project

A scholarly project focused on Tolkien from both a corpus linguistic and digital humanities perspective

Our goal is to provide computational and philological support for Tolkien studies
using existing tools, standards, and scholarly best practices
while fostering collaboration and open scholarship
and respecting the rights and wishes of the Tolkien Estate and publisher.

Shortlisted three times for the Tolkien Society Award for Best Online Content

The sorts of things we’re working on:

  • Markup and annotation of the texts of Tolkien’s works themselves
  • Computational text analysis and corpus stylistics
  • Linked Open Data around people, places, events
  • Citation schemes, chronology and bibliography in modern electronic formats
  • Machine-actionable invented language description

We’re also working on a number of things relating to computational Germanic philology.

If you’re a Tolkien scholar with research questions that would benefit from computational or corpus linguistic analysis, please get in touch.

Latest News

Cite Tolkien now includes The Silmarillion and Search Tolkien will link citations to it.

In Search Tolkien, citations are now linked to Cite Tolkien (if available) and it is possible to get a flat list of citations as well as the hierarchy.

We’ve finished a first pass annotation of characters and locations at the paragraph level in Book 2 of Lord of the Rings.

Latest Little Delving

These are little visualizations based on the text and annotations of the Digital Tolkien Project.

See the Little Delvings site for more.

Latest Monthly Update (June 2024)


  • YouTube Channel and Tolkien Glossary Announcement

    I’ve decided to reinvigorate the YouTube channel with an update and a big announcement.

  • Upcoming Talks

    I’m excited to be giving three more Tolkien-related talks this year: at Mythmoot in June, IMC Leeds in July, and Oxonmoot in September.

  • The Arda Python Library

    At various times over the last ten months, I’ve been quietly working on a Python library for doing various Tolkien-related calculations.

  • A Discord Server and the Speaker Identification Crowdsourcing

    It’s been a long time since I’ve blogged but a lot has been happening with the Digital Tolkien Project.

  • Speaking at Mythmoot IX

    I’ll finally be back in person at next week’s Mythmoot IX and I’m excited to be talking about Remaking Text: Text Reuse in Tolkien. Also presenting (virtually) will be my good friend and collaborator Chiara Palladino.

  • Livestream Designing a Citation System for Unfinished Tales

    We recently made some good changes to the Silmarillion citation system and are now (re)turning to Unfinished Tales. I’ve created a YouTube channel and we’re going to livestream our first meeting.

  • Modeling Names in Lord of the Rings: Part One

    In this post I will briefly describe the work that went into the creation and correction of the Indexes, and their gradual evolution into a Tolkien Authority List of names. This post serves as documentation to the early work for the index, and indicates the next steps.

  • Tolkien Reading Day 2022

    The last two years I’ve participated in the Tolkien Reading Day sessions organized by Tolkien Collector’s Guide. This year, Jeremy interviewed people on various topics and he invited me and Elise Trudel Cedeño to talk about Digital Humanities and Education.

  • Search Tolkien Launched

    While there is considerable text preparation and internal tooling work being done by the Digital Tolkien Project that can’t be openly shared, I’ve been thinking for a while about public tools that do not violate the copyright holder’s rights. Today I’m happy to launch the first such tool.

  • Counting Breakfasts

    This weekend I’ll be giving another virtual presentation, this time at New England Moot, talking about some food-themed text analysis on Lord of the Rings.

  • Digital Tolkien on Instagram

    On a fairly regular basis I post charts and visualizations from this project to Twitter and I thought it might be fun to start an Instagram account dedicated to just this.

  • Speaking at Mythmoot VIII

    I’m very excited to be giving a virtual talk at next week’s Mythmoot on Modeling the Multiple Dimensions of Time in Tolkien’s Legendarium.

  • Education and Computers and the Cottage of Lost Play

    One of the most exciting collaborations I’ve embarked on related to the Digital Tolkien Project is the ongoing educational work with Elise Trudel Cedeño and we had the opportunity to give a talk about it at the recent Prancing Pony Podcast Digital Moot.

  • Tokenizing the Hobbit

    How many words are there in The Hobbit?

  • Minimal Prefixes to Identify Hobbit Paragraphs

    The previous blog post introduced a citation system for The Hobbit and linked to an index that showed the first five tokens in each paragraph. How often is five a sufficient number to uniquely identify the paragraph? How often can we get away with less?

  • The Hobbit Citation System

    The Digital Tolkien Project now has a paragraph-based citation system for The Hobbit derived directly from the marked-up version of the text and checked against previous work by others.

  • Prancing Pony Podcast

    It was a true delight to be the guest on episode 172 of the Prancing Pony Podcast.

  • Tolkien Experience Podcast

    Last year I had the pleasure of being interviewed by Luke Shelton for the Tolkien Experience Podcast and the interview has now been published.

  • Longwinded One Podcast

    After my talk at New England Moot earlier in the year, I was invited to be on the Longwinded One podcast as part of their series on language.

  • Silmarillion Textual Variants in Print: Part Four

    This is the fourth in a series of posts about the textual variants I’ve found in printings of The Silmarillion. In this post, I’ll try to put together a broad textual history up to and including the Second Edition Hardcovers, based on all the variants we’ve looked at.

  • Silmarillion Textual Variants in Print: Part Three

    This is the third in a series of posts about the textual variants I’ve found in printings of The Silmarillion. In this post, I’ll update the previous results with data from a few more versions and then cover eleven variations in punctuation (not including hyphenation).

  • Numbering in the Númenórean King Lists

    One of the changes discussed in my second post on textual variants in The Silmarillion was the numbering of the Númenórean kings. I said there that it might be worth a whole post, so here we go.

  • Silmarillion Textual Variants in Print: Part Two

    This is the second in a series of posts about the textual variants I’ve found in printings of The Silmarillion. In this post, I’ll cover six more changes to words in the text. This will finish up all the non-punctuation changes in the main text of the book.

  • Silmarillion Textual Variants in Print: Part One

    This is the first in a series of posts about the textual variants I’ve found in printings of The Silmarillion. In this post, I’ll cover six spelling errors in the original first edition fixed by the latest HarperCollins hardcovers and the ebooks.

  • Aligning with the LR Citation System

    A key ingredient in the sort of digital classics I do on a daily basis is a citation scheme for unambiguously referring to specific passages in a canonical text. Once you have a text marked up structurally, a way of addressing into that structure becomes fairly easy, you just need to map how the structure and the citation scheme relate.

  • Punctuation and Structure in Marking Up Direct Speech

    As work continues on the markup of The Lord of the Rings, many of the issues discussed previously with regard to The Hobbit apply. A first pass is almost done, but there is an interesting challenge with Gandalf’s reading of the inscription on Balin’s tomb.

  • Accepted for Tolkien 2019

    I am truly delighted to announce that my talk “Tolkien and Digital Philology” on applying a philological and corpus linguistics approach to the works of Tolkien was accepted for the Tolkien Society’s 50th anniversary conference Tolkien 2019.

  • Marking Up The Hobbit in XML

    As a starting point, I’m working on the electronic markup of the text of The Hobbit in the Extensible Markup Language (XML).

  • Welcome to Digital Tolkien

    I’ve worked for many years on Ancient Greek and the computer analysis of Biblical and Ancient Greek texts. When my linguistic interests extended to Germanic languages such as Old Norse, I considered starting a new blog.