Tag Archives: dataset manipulation

Comparing Network and Visual Tools

My last three posts highlight a particular tool for visualizing data. Each site has its own strengths and weaknesses, so choosing which one to use depends upon what outcome the user wants to show.

Voyant is a useful tool for use in conjunction with text mining. It will use the same dataset used for that practice to show key trends within the information. It can provide users with the common words and phrases used. The visuals provided with Voyant are line graphs that show these words and phrases throughout the dataset.

CartoDB uses geographic information within datasets to display the information on a map. This allows users to see the geographic relationships within the topics of their dataset. The various types of maps it can produces will highlight different aspects of the information.

Palladio, while it does have a map feature, relies on its ability to visualize how pieces of information provided related to others within it. The main output is in the form of word maps, which can complement the line graphs that Voyant provides.

Each tool allows users to gain a particular insight into the information. By seeing the information displayed visually, rather than textually, researchers are able to see various relationships that may not come across through reading the material. Seeing which words are used most often can provide the common language of the time, while seeing a map of where events took place can show some of the biases within the information.

Choosing which tool to use boils down to a couple key questions. The first is what the dataset includes. If there is no geographic information, CartoDB has little to no use. But if there is a long history within the dataset, Voyant can track word usage and vocabulary trends over time. And finally, if there are a wide variety of topics that can be compared with each other, Palladio may be the best option.

There is no rule saying only one tool can be used, and if the dataset has enough information all three can be utilized effectively to highlight different aspects of it.

Network Analysis with Palladio

Palladio is a digital humanities site that allows users to upload a dataset via a .csv file and display the information either through a traditional map or a word map. The spreadsheets themselves are uploaded by simply dragging and dropping the file into a box on the webpage. It is possible to add datasets within others by clicking on the name of the information the new file is a part of and adding the table there.

Once uploaded, the data can be manipulated in a number of way. If there is any geographic information within the data, users can add the set to a map as seen below. The information was provided through the WPA Slave Narrative dataset available from the Library of Congress, which was also used in the previous post regarding CartoDB.

Pilladio Map
Palladio Map

One of the more robust aspects of Palladio is the creation of word maps. The site allows users to choose which information is displayed, and a specific topic that information is mapped to. The next map splits the narrative into the interviewees’ sex, and then maps the topics that were discussed within their interviews.

Word Map
Word Map

As seen here, Palladio is able to show which topics were discussed by only one sex, and then link the ones that both discussed in between them. Some of the maps can get a bit convoluted if there are a lot of points that do not connect with each other. In this particular dataset one interviewer may have only one interviewee, so mapping this particular set gets very clustered.

Pilladio Interviewer Graph
Palladio Interviewer Graph

Palladio can be a powerful and useful tool. The key is to use datasets that will give clear visuals that provide insights that mere text cannot do.

Mapping with CartoDB

CartoDB is an extensive tool to use if a user has a large database of material with a geographic component to it. The site allows for multiple large datasets to be uploaded and plotted onto a map. The map can then be modified to highlight different aspects of the data. As an example, I will utilize the WPA Slave Narrative dataset available from the Library of Congress.

This dataset holds information from a study undertaken to record the experiences of slaves in the South. The dataset contains information regarding the interviewer, interviewee, where the interview took place, where the slave was from, and whether the slave worked in the fields or the house. CartoDB is able to take this information (uploaded as a .CSV file) and plot on a map each person. There are a number of different styles that the site can run to change the look of the map (here they are called Wizards). Multiple datasets can be layered on top of each other to add to the complexity of the map. Each layer can be adjusted independently from each other. The following are some examples of maps that can be created within CartoDB.

One useful type of map is a cluster map, as it counts how many plots are in a specific location and shows the number within the plot. This type of map allows the viewer to see where the interviewer tended to be when they were interviewing former slaves.

Cluster Map
Cluster Map

A related map type is called a heat map. Instead of showing numbers, the dots are replaced with colors, with the center of each blob representing more hits.

Heat Map
Heat Map

One final type of map that can be useful is a category map. This takes the column header from the CSV file and plots one column onto the map. Here is one that plots male and female slaves. These types of maps are useful for comparing different types of information within a particular dataset.

Category Map
Category Map