This week’s readings on the topic of text mining has helped me to understand a little bit more clearly just why historians might want to consider such an aspect of digital history. Text mining and topic modeling can both help reveal new patterns and themes about events, people, and documents that might otherwise have been overlooked. When information is presented in a visual map, whether it’s a chart or graph, or even word cloud, that information takes on a new perspective that researchers can choose to investigate more fully, although it is important to remember the context that surrounds the original data.
I’ve come to realize that I am very much a visual learner, which makes text mining and topic modeling quite interesting to me. By seeing the data maps that Cameron Blevins used in his article by using a program that measured the instances of geographic locations mentioned in two Houston newspapers during the 1830s/1840s and then later in the 1890s. The idea of “imagined geography” was new to me when I read the article and accompanying website, and I think it is aptly named. At the time when the newspapers’ articles, features, railroad schedules, etc were being written, I hardly doubt that anyone was thinking of all of the locations that were being referenced, nor their sociological/historical impacts.
Having read Ted Underwood’s tips for starting your own text mining project before diving into Blevin’s or Kaufman’s Quantifying Kissinger project, I had a basic understanding of the large amount of text needed for such undertakings. Underwood’s FAQ style post puts the idea of text mining into easy to understand concepts, which I found most helpful. When I saw the Wordle example, I though, “hey, I’ve done that before!” which put a personal connection to this week’s readings. Then I tried out Google’s ngram viewer, which is a cool tool for visualizing the usage of a word or phrases over time, but clearly the context is lacking, especially since we can’t see which texts are being searched.
Moving on from last week’s discussion about “buckets of words,” this week’s readings tell how we can take those buckets, do our keyword searching, and find out how those words stack up across time, for whatever it’s worth. While I think text mining definitely has a cool factor because of its employment of data maps, which I really find helpful, I need to remember that due to poor OCR and lack of context, text mining is just another tool to use in the grand historian toolbox.