Category Archives: Intro to Digital Humanities

Creating an Omeka Exhibit

For my project, I created an exhibit within Omeka to provide an example for a tutorial I gave for Fenwick Library at George Mason University. The overall aim for the exhibit was to show the capabilities of Omeka for graduate students on campus. The process of creating the exhibit was intriguing in that the reference librarians I work with only had a basic understanding of Omeka, and they were looking to me to become the in-house expert of how to build an exhibit. Gunston Hall was chosen primarily because the research librarian for history was in the process of collecting sources for an exhibit, and suggested I look into using the building as my focus.

I spent the first week learning the basic levels of Omeka: entering items, uploading files, changing the appearance, etc. Once I had completed this, I began searching for sources that I could use in a public online exhibit. In searching the Library of Congress, I discovered they had conducted a survey of the building and grounds and posted the imagery on their site. The images were all considered to be public domain because they were created by the Library of Congress, which is a public entity.

Building and creating the exhibit was a time consuming experience. I had almost fifty separate items to create including the sources I used for my narrative, all with their own set of metadata to enter. Organizing these items into an exhibit was a separate conceptual problem. I had a wealth of sources to present as well as historical information to present along with it. I originally had three main pages: the history of Gunston Hall, a gallery of items, and a history of the people involved with constructing the building. After discussing the project with Dr. Robertson, the professor instructing my course, I realized that this made the website far too dense, and split these pages into smaller sections. Feedback from a fellow student agreed that this was a good idea. Once the overall site was built, I continued to fine tune the exhibit by highlighting specifics within the items.

Another area that I decided to include in my exhibit were a series of maps showcasing a specific area of Gunston Hall. The first map utilized the Geolocation plugin created by Omeka, which allowed me to show where Gunston Hall was built in relation to Washington, D.C. The other two maps I created in a separate plugin called Neatline. This plugin allowed me to set a specific plan as the map, and to put items onto the plan. This allowed me to show where a photograph was taken in relation to its surrounding. The Neatline plugin took a couple of hours to understand how it worked, and then a few hours per map to add the plan and related items.

The process of creating this exhibit allowed me to see firsthand how much work is involved in creating even the simplest of exhibits. I enjoyed the work, even when I was overwhelmed with so much information, and provided me with a tool I can use in the future.

To view the exhibit, click here.

To view the Library of Congress sources, click here.

Social Media Strategy

My final project is an online exhibit for Gunston Hall using a content management software called Omeka. I have created this site in conjunction with Fenwick Library at George Mason University to give a tutorial to graduate students. The tutorial itself has been completed, but the site still exists for those who were unable to attend. This site has three main audiences: students who can utilize it in their coursework, archivists with large amounts of material, and artists wanting an online presence for their work. In order to reach them I have created the following strategy to engage them through social media.

Due to the visual nature of my site, there are three social network sites that appeal to my project: Facebook, Tumblr, and Instagram. These sites have a large number of users that provide more opportunity for exposure. Posting a minimum of twice a week will allow new users a better chance of coming across the project, and to show that it is still active. Tumblr and Instagram only allow small messages with each post. Therefore these posts will likely include the title of a particular aspect of the site (either a specific item or collection) with a brief description. Facebook does allow for longer posts, and the message can be one or two sentences. The posts are intended to highlight a particular aspect of using Omeka or to show an example of using the site. I will include links to my site whenever possible. Anyone viewing the posts can comment or share the post. My aim in using these particular sites is to provide screenshots or images from my site to allow anyone seeing the post to have a clear understanding of the post.

The success of this strategy will be seen by a number of different measures. The first is how many people are following the account. This number will show the reach each post is attaining, and if enough people are viewing the information to make the effort worthwhile. Another key measure is the amount of ‘chatter’ the posts generate. This can be number of comments, likes, shares, etc. that the posts are accumulating. As people talk about the project, it has a better chance of spreading and getting a wider audience. One measure that shows the overall effect of the social media strategy is to see how many people are looking at the project itself. Since I am hosting my project on my personal server space, I am able to look into the amount of traffic on my site – how many views, which days have more, etc.

 

Crowdsourcing in Digital Humanities

There are a number of institutions that have embraced the idea of crowdsourcing in order to help them complete large projects. The best example of this is Wikipedia, where people in the general population become contributors who can then add or edit information on a website. This has become a powerful tool for researchers, as it allows a much larger pool of people to make their documents readable for computational analysis. Crowdsourcing is a way for small institutions to accomplish large amounts of work with limited budgets.

The types of activities that crowdsourcing entails usually involves large amounts of data entry, where people are required to transcribe letters or books or to correct transcriptions done by optical character recognition. This becomes such a time consuming task when the collection is several thousand documents, that the institutions cannot complete the work in any meaningful length of time. By having the general population help with the legwork, the institutions are able to have the time consuming tasks completed with minimal drain on resources.

Crowdsourcing can be very enticing in this respect, but there are a number of concerns that need to be taken into consideration. The first is how are the contributors going to work on the materials. Since most, if not all, of the contributors will not be local to the institution, the source materials need to be in a website. The design of the site is critical, because if it is confusing for newcomers, they may be turned any before being able to make any meaningful contributions. I have contributed to one crowdsourcing site focused on transcribing letters – Transcribe Bentham. It took me a while to figure out which letters were completed, needed further editing, or had not been transcribed at all. Pointing out which areas still need work is a key way to keep people contributing, as it shows not only what needs to be done, but also how much work has already been accomplished.

Another main concern is attracting contributors. No work can be done if no one knows it exists. Another crowdsourcing site I worked with is Trove, which is an Australian site by the Australian National Library hosting a large number of newspaper articles. The site allows users to edit the OCR text that the computer generated. What the library found was that people were contributing to the articles they themselves had a connection to, whether it was a local article or had some tie to their family. The key to any crowdsourcing project is finding those people who are willing to put the effort into completing the work. One way to do this is to allow the contributor to make a personal connection with the material, and to give positive feedback on their work.

Reading an Article on Wikipedia

As scholars, Wikipedia creates a major problem. There is a wealth of knowledge and materials available to anyone, but the drawback is anyone can edit the material. How do scholars utilize this tool? There are a number of steps a person can take in order to solve this issue.

The first is to inspect the content itself. Does the writing sound scholarly? How much depth is there (ie. – is it simple, short paragraph per section, or lengthy details)? Once satisfied with the content, how is it referenced? Any scholarly article needs to have extensive research in order to legitimize its facts and argument.

Another key way to check the accuracy of a Wikipedia article is to check how much and often it has been edited. Anyone can view past versions of a page, including the user who made changes and where. This can be seen by clicking on the ‘View history’ button on the top right of the page. Here people can see trends of how often a page is changed and to what sections, which can show whether the information currently shown is considered fact or if it is still debatable.

Some institutions have large numbers of links throughout Wikipedia. One of the best ways to see them consolidated is by using linkypedia. This site listed all the major institutions with links, and shows the various topics and pages that Wikipedia uses their information. Cross checking the individual pages with these institutions is one way to validate the information shown on Wikipedia. Most institutions frown upon citing Wikipedia as a source, and linkypedia is one way to find sources that do not have negative connotations.

Comparing Network and Visual Tools

My last three posts highlight a particular tool for visualizing data. Each site has its own strengths and weaknesses, so choosing which one to use depends upon what outcome the user wants to show.

Voyant is a useful tool for use in conjunction with text mining. It will use the same dataset used for that practice to show key trends within the information. It can provide users with the common words and phrases used. The visuals provided with Voyant are line graphs that show these words and phrases throughout the dataset.

CartoDB uses geographic information within datasets to display the information on a map. This allows users to see the geographic relationships within the topics of their dataset. The various types of maps it can produces will highlight different aspects of the information.

Palladio, while it does have a map feature, relies on its ability to visualize how pieces of information provided related to others within it. The main output is in the form of word maps, which can complement the line graphs that Voyant provides.

Each tool allows users to gain a particular insight into the information. By seeing the information displayed visually, rather than textually, researchers are able to see various relationships that may not come across through reading the material. Seeing which words are used most often can provide the common language of the time, while seeing a map of where events took place can show some of the biases within the information.

Choosing which tool to use boils down to a couple key questions. The first is what the dataset includes. If there is no geographic information, CartoDB has little to no use. But if there is a long history within the dataset, Voyant can track word usage and vocabulary trends over time. And finally, if there are a wide variety of topics that can be compared with each other, Palladio may be the best option.

There is no rule saying only one tool can be used, and if the dataset has enough information all three can be utilized effectively to highlight different aspects of it.

Network Analysis with Palladio

Palladio is a digital humanities site that allows users to upload a dataset via a .csv file and display the information either through a traditional map or a word map. The spreadsheets themselves are uploaded by simply dragging and dropping the file into a box on the webpage. It is possible to add datasets within others by clicking on the name of the information the new file is a part of and adding the table there.

Once uploaded, the data can be manipulated in a number of way. If there is any geographic information within the data, users can add the set to a map as seen below. The information was provided through the WPA Slave Narrative dataset available from the Library of Congress, which was also used in the previous post regarding CartoDB.

Pilladio Map
Palladio Map

One of the more robust aspects of Palladio is the creation of word maps. The site allows users to choose which information is displayed, and a specific topic that information is mapped to. The next map splits the narrative into the interviewees’ sex, and then maps the topics that were discussed within their interviews.

Word Map
Word Map

As seen here, Palladio is able to show which topics were discussed by only one sex, and then link the ones that both discussed in between them. Some of the maps can get a bit convoluted if there are a lot of points that do not connect with each other. In this particular dataset one interviewer may have only one interviewee, so mapping this particular set gets very clustered.

Pilladio Interviewer Graph
Palladio Interviewer Graph

Palladio can be a powerful and useful tool. The key is to use datasets that will give clear visuals that provide insights that mere text cannot do.

Mapping with CartoDB

CartoDB is an extensive tool to use if a user has a large database of material with a geographic component to it. The site allows for multiple large datasets to be uploaded and plotted onto a map. The map can then be modified to highlight different aspects of the data. As an example, I will utilize the WPA Slave Narrative dataset available from the Library of Congress.

This dataset holds information from a study undertaken to record the experiences of slaves in the South. The dataset contains information regarding the interviewer, interviewee, where the interview took place, where the slave was from, and whether the slave worked in the fields or the house. CartoDB is able to take this information (uploaded as a .CSV file) and plot on a map each person. There are a number of different styles that the site can run to change the look of the map (here they are called Wizards). Multiple datasets can be layered on top of each other to add to the complexity of the map. Each layer can be adjusted independently from each other. The following are some examples of maps that can be created within CartoDB.

One useful type of map is a cluster map, as it counts how many plots are in a specific location and shows the number within the plot. This type of map allows the viewer to see where the interviewer tended to be when they were interviewing former slaves.

Cluster Map
Cluster Map

A related map type is called a heat map. Instead of showing numbers, the dots are replaced with colors, with the center of each blob representing more hits.

Heat Map
Heat Map

One final type of map that can be useful is a category map. This takes the column header from the CSV file and plots one column onto the map. Here is one that plots male and female slaves. These types of maps are useful for comparing different types of information within a particular dataset.

Category Map
Category Map

 

 

Working with Voyant

Voyant is a text analysis tool that allows the user to bring in a large number of text documents and manipulate the content in order to gain a better understanding of the material. There are two things that Voyant creates – a word map and graphs. Word maps are a collection of words in a cluster, with the more common words sized larger than less common words. Graphs will allow the user to plot the usage of specific words throughout the sources entered into the site. Both of these method allow the reader to gain insight into word usage within the texts. Word maps and graphs can be exported at any time, and users are given the option to save them as a URL, HTML page, or static image.

After uploading text documents, Voyant will immediately populate a word map in the top left. It will include all words in the document, so one key step is needed in order to gain the most knowledge from the map. The this step is to eliminate common words such as ‘and’, ‘the’, etc. This is done under the ‘Stop Words’ tool in the summary window. Clicking on the gear will bring up the tool. There are pre-built lists that users can use, and can input their own additional words if needed. Once this is completed, the word map should update without those words.

In order to create a graph, users can click on words either in the word map or in the summary box directly below it. The window titled ‘Words in Entire Corpus’ allows users to check multiple words, which will then appear in the graph in the top left of the screen. If more than one text document has been uploaded, it is possible to use the words that appear within more than one document. Clicking through the various documents in the ‘Keywords in Context’ window in the bottom right will allow the user to choose which documents Voyant will search. This is helpful for comparing the use of these specific words. The graphs allow the user to show the usage of specific words over time, allowing for analysis of changing trends within the selected documents.

JStor Database Review

Overview

JStor is a large database housing a wide range of primary and secondary sources. No content is created by JStor, it is a means of searching for content produced by researchers. It includes full length scholarly articles in the following subjects: Anthropology, Asian Studies, Ecology, Economics, Education, Finance, History, Mathematics, Philosophy, Political Science, Population Studies and Sociology. JStor has what is known as a ‘moving wall’ for the available date range of sources. This means the database will have most or all articles from a database up to five years ago. As time progresses, more recent articles become available for upload.

Searching JStor

There are a number of different ways a researcher can search the database. The first is a quick search from the home page. The search will pull all information from the entire database, and if the search terminology is not specific there can be hundreds of thousands of results. JStor is most effective when using the advanced search option. It is possible to search multiple fields such as whether the input is within the text of an article, an author, in the title, abstract, or caption. Under the advanced search, a researcher can narrow the search within the database to a few criteria. This can be an item type (article, book, review, etc), a specific date range, language, or publication title. It is also possible to only search publications within a specific discipline. Performing a search will show all results within the criteria set. Each result will provide the title, page numbers, author, publication, and date. There are two options when viewing a result: the researcher can view the result online or download a PDF to their computer.

Citing Sources

Each source within JStor has all the relevant information to site the particular source. The first page of every entry has a citation at the top of the page that includes the title, author, publication, page number, date, and number of pages.

History

Launching in 1995, JStor was conceived as a way to provide academia with a wide range of academic works being published. The founders would digitize articles by scanning each page as a TIFF document, compile the pages into a single file, and add then add the file to the ever-increasing database, which could then be accessed by institutions or individuals.

Reviews

Most reviews about JStor are positive. They agree that having access to such a rich database allows a researcher to save time trying to find sources for their work, and to have a much larger library of materials to work with. One of the main critiques is that, because the database is so large, searching the database with more generic terms will provide too many results to look through. They then said the addition of filtering results by subject matter has helped parse down the results in a more meaningful manner.

Access

JStor is a paid subscription to the database. There are two main options for gaining access to JStor: Institutional access or Individual access. Institutions can pay for access to the entire database or for specific journals. Individuals can purchase a one month or one year plan for access to the database for that time frame.