My last three posts highlight a particular tool for visualizing data. Each site has its own strengths and weaknesses, so choosing which one to use depends upon what outcome the user wants to show.
Voyant is a useful tool for use in conjunction with text mining. It will use the same dataset used for that practice to show key trends within the information. It can provide users with the common words and phrases used. The visuals provided with Voyant are line graphs that show these words and phrases throughout the dataset.
CartoDB uses geographic information within datasets to display the information on a map. This allows users to see the geographic relationships within the topics of their dataset. The various types of maps it can produces will highlight different aspects of the information.
Palladio, while it does have a map feature, relies on its ability to visualize how pieces of information provided related to others within it. The main output is in the form of word maps, which can complement the line graphs that Voyant provides.
Each tool allows users to gain a particular insight into the information. By seeing the information displayed visually, rather than textually, researchers are able to see various relationships that may not come across through reading the material. Seeing which words are used most often can provide the common language of the time, while seeing a map of where events took place can show some of the biases within the information.
Choosing which tool to use boils down to a couple key questions. The first is what the dataset includes. If there is no geographic information, CartoDB has little to no use. But if there is a long history within the dataset, Voyant can track word usage and vocabulary trends over time. And finally, if there are a wide variety of topics that can be compared with each other, Palladio may be the best option.
There is no rule saying only one tool can be used, and if the dataset has enough information all three can be utilized effectively to highlight different aspects of it.
Palladio is a digital humanities site that allows users to upload a dataset via a .csv file and display the information either through a traditional map or a word map. The spreadsheets themselves are uploaded by simply dragging and dropping the file into a box on the webpage. It is possible to add datasets within others by clicking on the name of the information the new file is a part of and adding the table there.
Once uploaded, the data can be manipulated in a number of way. If there is any geographic information within the data, users can add the set to a map as seen below. The information was provided through the WPA Slave Narrative dataset available from the Library of Congress, which was also used in the previous post regarding CartoDB.
One of the more robust aspects of Palladio is the creation of word maps. The site allows users to choose which information is displayed, and a specific topic that information is mapped to. The next map splits the narrative into the interviewees’ sex, and then maps the topics that were discussed within their interviews.
As seen here, Palladio is able to show which topics were discussed by only one sex, and then link the ones that both discussed in between them. Some of the maps can get a bit convoluted if there are a lot of points that do not connect with each other. In this particular dataset one interviewer may have only one interviewee, so mapping this particular set gets very clustered.
Palladio can be a powerful and useful tool. The key is to use datasets that will give clear visuals that provide insights that mere text cannot do.
CartoDB is an extensive tool to use if a user has a large database of material with a geographic component to it. The site allows for multiple large datasets to be uploaded and plotted onto a map. The map can then be modified to highlight different aspects of the data. As an example, I will utilize the WPA Slave Narrative dataset available from the Library of Congress.
This dataset holds information from a study undertaken to record the experiences of slaves in the South. The dataset contains information regarding the interviewer, interviewee, where the interview took place, where the slave was from, and whether the slave worked in the fields or the house. CartoDB is able to take this information (uploaded as a .CSV file) and plot on a map each person. There are a number of different styles that the site can run to change the look of the map (here they are called Wizards). Multiple datasets can be layered on top of each other to add to the complexity of the map. Each layer can be adjusted independently from each other. The following are some examples of maps that can be created within CartoDB.
One useful type of map is a cluster map, as it counts how many plots are in a specific location and shows the number within the plot. This type of map allows the viewer to see where the interviewer tended to be when they were interviewing former slaves.
A related map type is called a heat map. Instead of showing numbers, the dots are replaced with colors, with the center of each blob representing more hits.
One final type of map that can be useful is a category map. This takes the column header from the CSV file and plots one column onto the map. Here is one that plots male and female slaves. These types of maps are useful for comparing different types of information within a particular dataset.
Voyant is a text analysis tool that allows the user to bring in a large number of text documents and manipulate the content in order to gain a better understanding of the material. There are two things that Voyant creates – a word map and graphs. Word maps are a collection of words in a cluster, with the more common words sized larger than less common words. Graphs will allow the user to plot the usage of specific words throughout the sources entered into the site. Both of these method allow the reader to gain insight into word usage within the texts. Word maps and graphs can be exported at any time, and users are given the option to save them as a URL, HTML page, or static image.
After uploading text documents, Voyant will immediately populate a word map in the top left. It will include all words in the document, so one key step is needed in order to gain the most knowledge from the map. The this step is to eliminate common words such as ‘and’, ‘the’, etc. This is done under the ‘Stop Words’ tool in the summary window. Clicking on the gear will bring up the tool. There are pre-built lists that users can use, and can input their own additional words if needed. Once this is completed, the word map should update without those words.
In order to create a graph, users can click on words either in the word map or in the summary box directly below it. The window titled ‘Words in Entire Corpus’ allows users to check multiple words, which will then appear in the graph in the top left of the screen. If more than one text document has been uploaded, it is possible to use the words that appear within more than one document. Clicking through the various documents in the ‘Keywords in Context’ window in the bottom right will allow the user to choose which documents Voyant will search. This is helpful for comparing the use of these specific words. The graphs allow the user to show the usage of specific words over time, allowing for analysis of changing trends within the selected documents.
Artemis Primary Sources is a database that holds a large number of historical primary sources. Each source has the same set of metadata attached to it. The source will have the citation of the source: its title, publication, page numbers, and date. There will also be a section of notes that usually gives a brief description of the source. The metadata will only act as a means to provide a citation. It will not describe the content outside of the brief note, if one has been entered. The metadata also does not provide any information on whether there are any photographs within the source (unless the source itself is a photograph, which will appear in the citation). If a user has an account, the possibility of creating and viewing tags about the source becomes available.
The metadata allows the researcher to ask what the source is called, where it comes from, and what period of time it was published. It will allow questions regarding the type of sources available for that subject, i.e. newspaper articles, manuscripts, etc. A key part missing from the metadata is how the object was digitized: was it scanned or photographed, when was it digitized, what format did the images come in as?