Category Archives: readings

Thinking about Searching Databases

This week’s readings all explored how wonderful it is that so many historical documents have been scanned, digitized, run through OCR software and made available through countless different online databases, making lengthy trips to libraries and archives less common. Of course, there are drawbacks to relying on database searching, as the authors have pointed out.

Different databases behave differently when users type in a keyword or search phrase. As Patrick Spedding points out in “The New Machine,” some databases will run OCR for transcription purposes, to be used in the search process, but will not make that original OCR text file available to users. In Spedding’s example of the Eighteenth Century Collections Online database, this lack of transcription is supplanted by using “coded linkage” by highlighting the keyword in the original document. Other issues arise when alternate spellings and synonyms come into play as well. How can you be sure that you are finding all of the documents related to your search when using these handy databases? You can’t.

This reminds me of Lara Putnam’s example of the Benbow Follies in her working paper from this year. While researching using microfilm, she came across and editorial that referenced “Benbow’s Follies” and three years later decided to do some more digging on that serendipitous find. Turning to Google Books, she found more information, but still wanted know how Benbow appeared in her original research in Costa Rica. By searching digitized newspaper sources, she found advertisements for Caribbean Tours by Benbow’s musical troupes, which she had never found in the traditional sources such as music reference sources.

These examples help illustrate the good and the bad of searching databases. On one hand, you might not be able to find what you want, either because you don’t know what you don’t know or because of wonky searching capabilities. But on the other hand, the ability to search such a multitude of documents from the comfort of your home can aid in tackling a research question that previously may not have been answered if one hasn’t had the opportunity to travel to libraries and archives across the world. Either way, this “digital turn” is still evolving, and hopefully the future has in store for us researchers more comprehensive and creative searching abilities.

Thoughts on Digitization

This week’s readings focused on the digitization of historical documents and the various considerations that surround this modern approach to preserving the past. While large scale projects may benefit from outsourcing due to cost-effectiveness rather than scanning materials in-house, the issue of OCR accuracy rates is a factor that cannot be overlooked. Across the readings, it is evident that not all projects are the same, yet all successful projects must be well-planned, organized, and accurate.

The readings were interesting to me as I have played a hand in some small-scale digitization projects as an undergrad working in my college’s archives as a student worker. I performed some of the scanning process in one project that involved glass plate negatives, and in another I helped organize back issues of the school’s student-run newspaper as it was being prepared to be microfilmed and digitized at an outside facility. At that time, I had no real idea about about the OCR accuracy rate of the newspapers (or even if they would be even using OCR), although I assumed that they would be more easily searchable than browsing through the collections in the archives. Clearly, a reading like Cohen’s and Rosenzweig’s Becoming Digital chapter would have been helpful to me at the time to understand the overall layout of the land of digitization, but it was written two years too late for that project.

I find Ian Milligan’s Illusionary Order article interesting in that the author stresses the importance of transparency while researching. His findings that two Canadian newspapers that have been digitized are showing up more frequently in dissertation citations is definitely worth some consideration. One one hand, it’s great that these collections are now searchable online. However, there appears to be a dependency on these newspapers that have been digitized, leaving behind the still print-only materials that may alter the nature or direction of one’s research. Mulligan points out that dissertations have been leaning towards more Toronto-based research due to the scope of the newspapers available in Pages of the Past and Canada’s Heritage since 1844. The fact that the dissertations are more often citing just the newspapers and not the actual online database that the authors’ used to access the articles is something we briefly talked about in class last week that I’ve been spending a lot of time thinking about. While citing the original newspaper is technically correct, I agree that for transparency’s sake in research, it’s important to give credit to the databases that house these sources. I know in my own research, I have failed to do this but will now be making more of a conscientious effort to do so.

What is Digital History?

This week’s readings for Clio Wired were chosen to help illuminate how we can define digital history. After reading, I feel that I still know more about what isn’t digital history, however. Digital history is not merely digitization projects (big or small), or just presenting a paper online on a blog. Digital history is more about the interactivity of technology and history, and making connections about the past that just weren’t possible in the field of “traditional history.” I’m still having trouble coming up with my own definition, but I’m hoping that by the end of the course I will have a much more concrete example.

Some of the readings, such as William G. Thomas II’s “Computations and Historical Imagination” essay in A Companion to Digital Humanities, revealed that computers were first used for historical research in the the 1940s and 50s as tools for conducting quantitative research, followed by a second revolution in the 1960s and 70s. While useful for compiling and analyzing social data, some research lacked context, as notably seen in Robert Fogel and Stanley Engerman’s controversial book, Time on the Cross: The Economics of American Negro Slavery. We are now in the middle of a third digital history revolution, and it’s exciting to think about how the next generation will address digital history as well. Also intriguing to me in the Thomas’ article was my introduction to the term, “cliometrics” or quantitative history.

As the readings traversed time from 1999 – this year, I noticed many themes across them. One being open access. Having worked in libraries for a long time, I have always been a proponent of open access and was glad to know that digital historians feel the same way. I think it also relates to the ideal of the “democratization of history” that allows information to be shared and accessible to anyone with access to a computer, whether its through a personal machine or one at the library.

The Interchange from the Journal of American History is a helpful introduction to the current state of digital history by means of an online discussion between many of the leaders in the digital history field today. By addressing some of the past and current projects, the panelists offered their insights into what it means to be a digital historian. I came away from this reading, as with others, with a belief that knowing how to be a good historian involves understanding all of the tools available, including digital ones. If we want to continue in the digital history field, learning how to actually create digital history projects is crucial.