For my final project, I am planning on doing a topic modeling project. Currently, I am planning to use the transcribed documents from the Jane Addams Papers Project, found at https://digital.janeaddams.ramapo.edu/ to look at what kinds of topics were being discussed around her. JAPP has all of her known surviving correspondence through 1923 transcribed, as well as newspaper articles by and about her. I would be using the transcription that already exists for every document publicly available in the corpus to make it possible for the computer to read it. I know already that not every letter in the corpus has a real topic, since a significant chunk of them are essentially acknowledgments of previous letters and a promise to send a more extensive letter soon, so figuring out how to handle that is a concern for me.
Theoretically in the future, I’d be interested in doing a comparison of the topic modeling analysis to the tags and subjects applied by the humans who create the metadata that accompanies the documents. My biggest anticipated struggle is the technical elements, but I am confident I can figure that out with patience and a lot of tutorials.