Tag Archives: text mining

Network Analysis with Palladio

Palladio is a digital humanities site that allows users to upload a dataset via a .csv file and display the information either through a traditional map or a word map. The spreadsheets themselves are uploaded by simply dragging and dropping the file into a box on the webpage. It is possible to add datasets within others by clicking on the name of the information the new file is a part of and adding the table there.

Once uploaded, the data can be manipulated in a number of way. If there is any geographic information within the data, users can add the set to a map as seen below. The information was provided through the WPA Slave Narrative dataset available from the Library of Congress, which was also used in the previous post regarding CartoDB.

Pilladio Map
Palladio Map

One of the more robust aspects of Palladio is the creation of word maps. The site allows users to choose which information is displayed, and a specific topic that information is mapped to. The next map splits the narrative into the interviewees’ sex, and then maps the topics that were discussed within their interviews.

Word Map
Word Map

As seen here, Palladio is able to show which topics were discussed by only one sex, and then link the ones that both discussed in between them. Some of the maps can get a bit convoluted if there are a lot of points that do not connect with each other. In this particular dataset one interviewer may have only one interviewee, so mapping this particular set gets very clustered.

Pilladio Interviewer Graph
Palladio Interviewer Graph

Palladio can be a powerful and useful tool. The key is to use datasets that will give clear visuals that provide insights that mere text cannot do.

Mapping with CartoDB

CartoDB is an extensive tool to use if a user has a large database of material with a geographic component to it. The site allows for multiple large datasets to be uploaded and plotted onto a map. The map can then be modified to highlight different aspects of the data. As an example, I will utilize the WPA Slave Narrative dataset available from the Library of Congress.

This dataset holds information from a study undertaken to record the experiences of slaves in the South. The dataset contains information regarding the interviewer, interviewee, where the interview took place, where the slave was from, and whether the slave worked in the fields or the house. CartoDB is able to take this information (uploaded as a .CSV file) and plot on a map each person. There are a number of different styles that the site can run to change the look of the map (here they are called Wizards). Multiple datasets can be layered on top of each other to add to the complexity of the map. Each layer can be adjusted independently from each other. The following are some examples of maps that can be created within CartoDB.

One useful type of map is a cluster map, as it counts how many plots are in a specific location and shows the number within the plot. This type of map allows the viewer to see where the interviewer tended to be when they were interviewing former slaves.

Cluster Map
Cluster Map

A related map type is called a heat map. Instead of showing numbers, the dots are replaced with colors, with the center of each blob representing more hits.

Heat Map
Heat Map

One final type of map that can be useful is a category map. This takes the column header from the CSV file and plots one column onto the map. Here is one that plots male and female slaves. These types of maps are useful for comparing different types of information within a particular dataset.

Category Map
Category Map



Working with Voyant

Voyant is a text analysis tool that allows the user to bring in a large number of text documents and manipulate the content in order to gain a better understanding of the material. There are two things that Voyant creates – a word map and graphs. Word maps are a collection of words in a cluster, with the more common words sized larger than less common words. Graphs will allow the user to plot the usage of specific words throughout the sources entered into the site. Both of these method allow the reader to gain insight into word usage within the texts. Word maps and graphs can be exported at any time, and users are given the option to save them as a URL, HTML page, or static image.

After uploading text documents, Voyant will immediately populate a word map in the top left. It will include all words in the document, so one key step is needed in order to gain the most knowledge from the map. The this step is to eliminate common words such as ‘and’, ‘the’, etc. This is done under the ‘Stop Words’ tool in the summary window. Clicking on the gear will bring up the tool. There are pre-built lists that users can use, and can input their own additional words if needed. Once this is completed, the word map should update without those words.

In order to create a graph, users can click on words either in the word map or in the summary box directly below it. The window titled ‘Words in Entire Corpus’ allows users to check multiple words, which will then appear in the graph in the top left of the screen. If more than one text document has been uploaded, it is possible to use the words that appear within more than one document. Clicking through the various documents in the ‘Keywords in Context’ window in the bottom right will allow the user to choose which documents Voyant will search. This is helpful for comparing the use of these specific words. The graphs allow the user to show the usage of specific words over time, allowing for analysis of changing trends within the selected documents.