Sub-Projects
These are the main things we’re working on at the moment:
- text digitization
- History of Middle-earth (initial digitization of vols 1–9 done)
- general markup
- The Hobbit, Lord of the Rings, and the Silmarillion (all mostly done)
- Unfinished Tales, Children of Húrin, Beren and Lúthien, and The Fall of Gondolin (just starting)
- History of Middle-earth (vol 6 complete; vol 4 almost done; vols 1–3, 5, 7–9 just starting)
- referencing/citation systems
- The Hobbit, Lord of the Rings, and the Silmarillion (in progress)
- History of Middle-earth (general principles plus specific for volumes marked up)
- Nature of Middle-earth (general principles)
- initial modelling of named entities in Lord of the Rings
- initial modelling of direct speech in Lord of the Rings and the Hobbit
- initial modelling of time indicators in Lord of the Rings and the Hobbit along with visualizations of narrative time (Mythmoot VIII talk)
- initial sentence tokenization, lemmatization, and dependency analysis of the Hobbit and Lord of the Rings
- term-document matrices and other related analyses
- The Hobbit, Lord of the Rings, and the Silmarillion (all mostly done)
- see also TF-IDF Demo
Secondarily:
- identifying the textual variants in the printed Silmarillion editions
- cataloguing Tolkien’s verse with rhyme and metre information
- digitizing and modelling Tolkien’s A Middle English Vocabulary
- digitizing map data
Also related, see:
- Gothica — linguistic data, text, and code relating to the Gothic language
- digitizing some Old English Texts
- digitizing some Dunsany
- digitizing Sir Gawain and the Green Knight