Tag Archives: Trove

Crowdsourcing in Digital Humanities

There are a number of institutions that have embraced the idea of crowdsourcing in order to help them complete large projects. The best example of this is Wikipedia, where people in the general population become contributors who can then add or edit information on a website. This has become a powerful tool for researchers, as it allows a much larger pool of people to make their documents readable for computational analysis. Crowdsourcing is a way for small institutions to accomplish large amounts of work with limited budgets.

The types of activities that crowdsourcing entails usually involves large amounts of data entry, where people are required to transcribe letters or books or to correct transcriptions done by optical character recognition. This becomes such a time consuming task when the collection is several thousand documents, that the institutions cannot complete the work in any meaningful length of time. By having the general population help with the legwork, the institutions are able to have the time consuming tasks completed with minimal drain on resources.

Crowdsourcing can be very enticing in this respect, but there are a number of concerns that need to be taken into consideration. The first is how are the contributors going to work on the materials. Since most, if not all, of the contributors will not be local to the institution, the source materials need to be in a website. The design of the site is critical, because if it is confusing for newcomers, they may be turned any before being able to make any meaningful contributions. I have contributed to one crowdsourcing site focused on transcribing letters – Transcribe Bentham. It took me a while to figure out which letters were completed, needed further editing, or had not been transcribed at all. Pointing out which areas still need work is a key way to keep people contributing, as it shows not only what needs to be done, but also how much work has already been accomplished.

Another main concern is attracting contributors. No work can be done if no one knows it exists. Another crowdsourcing site I worked with is Trove, which is an Australian site by the Australian National Library hosting a large number of newspaper articles. The site allows users to edit the OCR text that the computer generated. What the library found was that people were contributing to the articles they themselves had a connection to, whether it was a local article or had some tie to their family. The key to any crowdsourcing project is finding those people who are willing to put the effort into completing the work. One way to do this is to allow the contributor to make a personal connection with the material, and to give positive feedback on their work.