Category Archives: practicum

Trying out Omeka

Once I was able to finally get Omeka installed on my website, the program proved quite easy to use, which was great! I decided to use image files from the photo album of Henry W. Schweigert, which is an example of the earlier styles of photo albums that were used to contain cartes-de-visite. The album dates back to the 1870s, and I had scanned all 50 pages of it during the summer but have not been able to put them online yet, as part of my Henry Schweigert Diary project. I thought Omeka might be a good option to try out for this, and so far I think it is proving to be easy to work with.

The trickiest part about using Omeka was creating consistent metadata for my items. I had to go back and forth between the items I had already created and the one I was currently editing, to make sure I was using the same vocabulary. I created a collection for the photo album and then placed all of the items into that collection as I added them. I changed the theme from “Thanks, Roy” to “Berlin” and also installed a plug in called “Item Order” so that I could arrange the items in a specific order. In this case, I wanted to resemble the page order of the photo album. However, when you are viewing the Schweigert Family Photo Album collection page, you can only see the first five items, even though there are six. I tried adjusting all the possible settings to reveal all, but I’m thinking this page just might be a preview page, and I will need to do some research on fixing this, if at all possible. You can see all six items if you select “Browse Items” but they are not in the correct order.

I’m excited to keep learning more about how to customize my Omeka installation, as there are lots of plug ins that I am eager to explore. Besides the trouble that I originally had installing it on my own server space, I think Omeka is a worthy tool of presenting historical collections online. Had I originally used Reclaim for my hosting, I wouldn’t have had such difficulty. Eventually my tech support people were able to fix the problem, which I am grateful for.

Here’s the direct link to my Omeka project.

Mapping the Civil War

For this week’s practicum, I created a map using Google Map Engine, which allows users to plot events in specific geographic locations. In this example, the class used data from the 1st Regiment of the Michigan Calvary from 1864 – 1865.

Google Map Engine’s option to include different icons to label the locations helps to identify the various action that the regiment encountered during the Civil War. The list of battles and campaigns have been labeled with explosion-like symbols, which are most common in this map. The map itself reveals the pattern of movement in clusters of icons, but it doesn’t provide the order of events or other details. By clicking on each marked location on the map users can read more details, such as the specific dates. The ability to use layers allows users to separate events visually, however, I did not choose to use layers in this example. Looking back, I could have had different layers for battles, expeditions, and other different types of events. Or I could have made a layer each for 1864 to 1865. I tried creating layers after the events were created, but it wasn’t obvious as to how to place events on different layers, which is something I will need to look into.
The map is interesting to look at, though, if you think about how far this calvary traveled in two years. At the end of their enlistment, they were as far west as Utah.

Testing Out Network Visualization Tools

This week in class, we looked at a few different data visualization tools, such as RAW and Palladio. After class I attempted to try out the tools on my own, which was kind of fun. I like being able to visualize information in a non-textual format, especially when you can use fun colors like RAW allows you.

Using a dataset provided by Dr. Robertson that connects Civil War units with corresponding battles, I created a visualization with the Alluvial Diagram option. RAW allows you to customize the size and color, which I did by enlarging the height (1500px) and width (500px), and changing the colors to ones I thought looked good to me, but were also distinct, as you can see below.

Aldie 2AldieAtlanta 1AtlantaAverasboro 1AverasboroAylett’s 1Aylett’sBealton Station 1Bealton StationBeaver Dam 1Beaver DamBentonville 1BentonvilleBerryville 1BerryvilleBethesda Church 1Bethesda ChurchBeverly Ford 1Beverly FordBrandy Station 1Brandy StationBrentsville 2BrentsvilleBull Run 4Bull RunCassville 1CassvilleCedar Creek 1Cedar CreekCentreville 1CentrevilleChancellorsville 3ChancellorsvilleCharles City Courthouse 1Charles City CourthouseCharlestown 1CharlestownChattanooga 1ChattanoogaCold Harbor 2Cold HarborCross Keys 2Cross KeysCulpepper Court House 2Culpepper Court HouseDallas 1DallasDeep Bottom 1Deep BottomDinwiddle 1DinwiddleFairfax Courthouse 1Fairfax CourthouseFalling Waters 1Falling WatersFisher’s Hill 1Fisher’s HillFive Forks 1Five ForksFort Scott 1Fort ScottFredericksburg 1FredericksburgFront Royal 2Front RoyalGaines Mill 1Gaines MillGettysburg 4GettysburgGrove Church 1Grove ChurchGroveton 1GrovetonHagerstown 1HagerstownHalltown 1HalltownHanover Court House 1Hanover Court HouseHarrisonburg 1HarrisonburgHartwood Church 1Hartwood ChurchHawes’s Shop 1Hawes’s ShopHope Landing 1Hope LandingJefferson 1JeffersonJones Cross Roads 1Jones Cross RoadsJones’ Bridge 1Jones’ BridgeKelly’s Ford 1Kelly’s FordKenesaw Mountain 1Kenesaw MountainLaurel Hill 1Laurel HillLeetown 1LeetownLiberty Mills 1Liberty MillsLuray 1LurayMalvern Hill 1Malvern HillMiddleburg 2MiddleburgMiddletown 2MiddletownMilford Station 1Milford StationMine Run 1Mine RunMonterey 1MontereyNew Creek Station 1New Creek StationNew Market 1New MarketNorth Anna 1North AnnaOld Church 1Old ChurchOpequon 2OpequonPeach tree Creek 1Peach tree CreekPetersburg 1PetersburgPicket 1PicketPiedmont 1PiedmontPiney Branch Church 1Piney Branch ChurchPoplar Springs 1Poplar SpringsPort Republic 1Port RepublicPrince George Court House 1Prince George Court HouseRacoon Ford 1Racoon FordRapidan 1RapidanRapidan Station 1Rapidan StationRappahanock Station 3Rappahanock StationResaca 1ResacaRichmond 1RichmondRobertson’s River 1Robertson’s RiverRobertson’s Tavern 1Robertson’s TavernRood’s Hill 1Rood’s HillShepherdstown 2ShepherdstownSmithfield 2SmithfieldSnicker’s Gap 1Snicker’s GapStone Mountain 1Stone MountainStrasburg 1StrasburgTodd’s Tavern 1Todd’s TavernTom’s Brook 1Tom’s BrookTotopotomoy 1TotopotomoyTrevilian Station 2Trevilian StationTurner’s Ferry 1Turner’s FerryUpperville 1UppervilleWauhatchie 1WauhatchieWeldon Railroad 1Weldon RailroadWhite House 1White HouseWhite Post 1White PostWilderness 2WildernessWillow Springs 1Willow SpringsWinchester 1WinchesterYellow Tavern 1Yellow TavernYorktown 1Yorktown136th New York Infantry 14136th New York Infantry1st Michigan Cavalry 281st Michigan Cavalry29th New York Infantry 629th New York Infantry44th New York Infantry 2244th New York Infantry4th New York Cavalry 544th New York Cavalry

I used the same dataset as above with Palladio, which offers visualization options such as maps, graphs, lists, and galleries. I had some trouble trying to extend the Battles to include their location coordinates, but after reading the FAQ I realized that I needed to identify the new data as “place, coordinates” for it to display properly. I was able to create a moveable graph with the nodes (the units) connecting to the battles, and with the location coordinates I was able to see the battles displayed on a map. Unfortunately there are no embedding options that allow for interactivity, so I’ve taken a screenshot of the map. On the live version, you can click on on hover over each dot to reveal the name of the specific battle.

Palladio screenshot, map view
Palladio screenshot, map view

After playing around with RAW and Palladio, I took a shot at Gephi, which requires a download and installation. Even after re-reading Elena Friot’s tips and personal experience with Gephi, I still didn’t quite grasp its usefulness. I added the data in as described, but all I got were a cluster of dots that I didn’t know what to do with. I much prefer the more user-friendly interfaces of RAW and Palladio, and I’m trying to think of ways to use them in my own research. I could try creating a dataset from the Henry Schweigert diaries that link the diary entries by date to locations in which they were written, or what subjects are mentioned in them, in order to gain a different perspective of the overall patterns in the diaries.

Playing with Patterns

I tracked the usage of the word “railroad” with Google ngram viewer, Bookworm: Chronicling America, and NYT Chronicle and found some interesting results that you can see for yourself below.

I wanted to see how the word railroad appeared in historic newspapers from the Library of Congress’ Chronicling America collection. I was able to view results from 1840 – 1921 with  Bookworm: Chronicling America and saw how the word railroad peaked in 1873 with a frequency rate of 58.9. I expected a larger spike for 1877, the year of the Great Railroad Strike, but this it could be a reflection of the newspapers available in Chronicling America. The Panic of 1873 could very well be the reason for the highest rate of the word I search, since the depression followed a large boom in railroad expansion.

Bookworm: Chronicling America, railroad
Bookworm: Chronicling America, railroad

With NYT Chronicle, which only tracks the usage of words in the New York Times, I noticed a different pattern. I changed the default setting to reflect the actual number of articles that mentioned railroad and saw the highest amount in 1930 with 6,763 articles, followed closely in 1902 with 6,682.  In 1851, the earliest date available, we see that there were only 357 references to railroad. During the Panic of 1873 there were 4,015 articles that mentioned railroad, after a period of steady rising through the Civil War. After the peak in 1930, a steady decline can be seen, eventually getting down to 616 articles in 2013. So far, in 2014, there have only been 465 articles with the word railroad mentioned.

NYT Chronicle: railroad
NYT Chronicle: railroad

I used two terms when searching Google Books’ ngram viewer, train and railroad, just to see how they compared. Except for the earliest years searched, 1840 -1860, the two words show up almost with the same frequency, except for a slight dip with railroad between 1890 and 1900. With Google Books, we can’t see where the words are coming from, which is truly a case of distant reading.

When compared to Voyant, the ngram viewers mostly obscure the content that is being text mined. With Voyant, you see the entire corpus that you select to upload, and then see the pattern within that text. For my Voyant example, I uploaded the text of the 1869 diary of Henry Schweigert, a farmer/college student who lived from 1843 – 1923 in southeastern Pennsylvania. He kept diaries from 1869 – 1881 and I have been in the process of transcribing them and so far only have 1869 completed, the year he attended the now defunct Palatinate College in Myerstown, PA. I was curious to see what the patterns were with this text, as I was relatively familiar with the content.

As I expected, the words day and nice show up frequently, as well as college, home and Myerstown. Schweigert’s diary entries are brief and focus on the daily weather, his location, and any events or activities he had during the day. I removed the typical stop words such as the, and, and also the year 1869, as that showed up quite often as it was mentioned in every entry. Other noticeable words are church, excelsior (he was a member of the Excelsior Literary Society), brother and thrashing (as in thrashing wheat).



Using Databases in Scholarly Articles

While looking through the last three years of the journal Enterprise & Society, it is obvious that more historians have incorporated electronic databases into their research within the last year than in previous years, but I am not sure why. Maybe the editors have relaxed their restrictions on databases (if there ever were any), or the author’s have been more transparent about what sources they are using. Or maybe there just are more databases being used in these later articles. The journal’s standard bibliography of works cited is parsed out by the various types of sources used, and while some articles include databases in the “primary sources” category, there is no specific heading for electronic resources.

In Sept 2013’s issue, Matthew David Mitchell searches Voyages: The Trans-Atlantic Slave Trade Database in his article, “‘Legitimate Commerce’ in the Eighteenth Century: The Royal African Company of England Under the Duke of Chandos, 1720–1726.” By searching the database with the parameters of 1698 – 1807, Matthew was able to quantify the number of slaves brought by independent British slave traders to the Americas, in addition to those brought by the Royal African Company. I actually had not heard of this database before, so its inclusion in the references was informative in itself.

This month, in Sept 2014’s issue, Paula Cruz-Fernández takes her research in another direction in her article,  “Marketing the Hearth: Ornamental Embroidery and the Building of the Multinational Singer Sewing Machine Company.” While the author cites many historical periodicals, such as Godey’s Lady Book and Harper’s Bazaar, she also is clear about how she accessed these sources. Cruz-Fernández provides documentation in her footnotes, as well as in her works cited, that she used Cornell University’s Home Economics Archive: Research, Tradition and History archive, also known as HEARTH. By providing readers with this information, the author makes her methodology somewhat more known and transparent. It also allows the reader to learn of such an online database that may not have been previously known.

There are only four more articles that explicitly reference online databases as sources in Enterprise & Society and all are from 2014. It seems like historians are becoming more comfortable with citing databases when accessing historical periodicals and data, which is something that we as historians should be doing more often anyway. By being more transparent with our research, we can help each other learn of new sources and online archives that can only benefit the breadth and quality of all of our research.

OCR Practicum

When comparing different applications of OCR on historical documents, it is obvious that the accuracy rates vary from source to source. Upon looking at three pages from Chronicling America from the Library of Congress and a page from the Pinkerton archives that used Google Drive’s OCR software, many problematic issues of digitization and OCR were discovered.

Google Drive OCR

While comparing Google Drive’s OCR of page 3 of the Pinkerton records, provided by Dr. Robertson, with a classmate, we discovered that not only were our outcomes different due to the manner in which we manipulated our original image sources, but that the content delivered by the OCR was different as well. While we both had fairly inaccurate results after cropping, adjusting the contrast, and changing the horizontal image to vertical, the fact that some of our incorrect results didn’t match was interesting. I wondered if this has anything to do with the way that Google Search will customize results based on the user’s computer, but I don’t think that would make much sense. Due to the large number of inaccuracies in just one page, I would consider the cost-effectiveness of hand-transcribing the text, which would take longer with more human effort, but the results would be more accurate. Here is the result of my Google Drive OCR experiment.

LOC papers from Chronicling America:

The Library of Congress appears to have a better OCR accuracy than Google, although it is far from perfect. In the three front pages that I reviewed, I noticed that a majority of the content from the articles and headlines were mostly correct. However, it seems to have a problem with the columns. Although the columns present a problem, I still think that OCR is better used here than with Google, and transcribing each page would be fairly time consuming.

Once I started reading through the OCR text, I noticed a pattern. The text was displayed in a long column that was organized in a top to bottom, left to right fashion. Once I discovered this pattern, the OCR text version was much easier to compare. At first glance, I could barely understand the OCR. However, the three papers I compared had varying results due to the quality of their scans.

With the Omaha Daily Bee, the top part of the paper with the headlines spread across the columns, there were some issues with the order of the OCR text. Some lines were just completely out of order. However, the actual content of the articles proved to be mostly accurate. The different sizes of the sections on the front page make for a difficult reading of the OCR, but if you spend enough time and energy trying to make sense of it, you probably can. It’s not perfect, but acceptable, if you focus on just the articles.

The articles in the Richmond Times Dispatch were less easy to read in the OCR text format, as the scanning quality appears to be an issue. Many of the words and letters were light and barely visible, which resulted in some erroneous results. This made reading the text format very difficult.

The Washington Times had accuracy rates similar to the Richmond Times Dispatch, in that the scanning quality was not clear, resulting in poor readability in the OCR. Headlines and larger fonts were more accurate than the faint article text, but the articles themselves were easier to read than the Richmond Times Dispatch.

Finally, I looked at another Chronicling America page from the Day Book, a Chicago advertisement-free newspaper, about the railroad moving into Alaska. This document was the most accurate, although it was also the shortest. It had just a few spelling errors that were most likely due to printing smudges, but the article was relatively error-free.

Overall, this practicum has helped illustrate the various scenarios and factors that go into play when obtaining the OCR for historical documents. It’s important to realize that not all OCR software is equal, and the condition of the original source when scanned plays a large part in the accuracy of the results.

 

Finding Online Digital History Projects

This week’s practicum required our class to search for digital history projects on our research area of interest. I’m interested in 19th century US history, specifically on the social history of rural America. However, when searching good ol’ Google, this topic didn’t receive much attention. So I decided to hone in a little more on the impact of railroads and tried search strings such as (railroad digital history project) and (19th century US railroad history digital projects) and found a variety of results. Some were good, some not so good, and some were great. The usual suspect of the Library of Congress’ American Memory Project led me to historical railroad maps, which was published online in 1998. However, a link on that page directed me to a newer presentation of the maps. The newer page obviously looks more clean and visually appealing, but I am glad the original is still up on the web for comparison. The map collection is searchable and downloadable, and each map is displayed with its bibliographic cataloging information as well. There are also articles and essays that provides context for the maps.

Another good example is Stanford’s Chinese Railroad Workers in North America project, which is still in development. The website offers a sampling of digitized materials that range from photographs to manuscripts, with oral histories and artwork as well, among other sources. The website promises to make the entire collection digitally accessible when it is finished, so online researchers are only left with a handful of primary sources at the moment. However, the website itself is easy to use and also includes a timeline of events that surround the topic of Chinese railroad workers, and it is one I’ll be bookmarking to come back to within the next year or so.

The most complete digital history project on this topic that I was able to find is William G. Thomas’ Railroad and the Making of Modern America project at the University of Nebraska-Lincoln.  The project is broken up into ten different topics, such as Slavery and Southern Railroads, the 1877 Railroad Strike, the Origins of Segregation, and other socially historic related topics. Each topic contains original documents that relate to the theme, which ranges from manuscripts/letters, to maps and photographs, and even payroll records. The site itself also has a section called “Views” which are basically mini-digital history projects in themselves. The Views contain essays and related documents, and even maps and charts, thus presenting history in a contextualized manner. There’s also space dedicated just to data, which includes historical GIS, a search function (which I found to not work with any of my searches), and a place for educators to find teaching materials for their own class use. There is also a page dedicated to student work at UNL that relates to the role of the railroad in American history. This project contains a well-curated selection of materials, but the problems I encountered with searching makes me believe that exploring this project could take a long time since I’ll have to rely mainly on browsing the collection.

Assessing My Online Presence

This week, our class was asked to assess our own online presence. This required us to Google ourselves, which is something that I’m always hesitant to do because I’m afraid of what I might find! Luckily, it appears I have nothing to worry about, as nothing sinister showed up in my results. I had a few friends try Googling me from their computers and based on their screenshots, nothing bad showed up either. However, when one friend searched with only my first and last name (no middle initial) the top results were not even close to being about me. Apparently there is a British actress with my name, as an IMDB.com profile was in the mix of results. When searching with my middle initial, links to my Clio 2 project showed up, which makes sense as the URL is just aprilnkelley.com. I had very similar, if not exact results when I searched on my laptop.

When I searched from my computer at work, I noticed more work-related links, especially in the images. Photos I had posted to our public blog about a Library of Congress tour that I attended and helped to organize was most prominent. I was relieved that there aren’t too many photos that showed up in the search, as I like to maintain my privacy. However, I do need to find at least one good professional photo, or get one taken, so that I can be professionally represented online.