In the Spotlight: Reflections on the British Library’s Latest Transcription Crowdsourcing Project

By Evangeline Athanasiou. March 1, 2021

The problem: As we continue to make advancements in optical character recognition (OCR) technology, there are still circumstances that require the human eye’s capacity for nuanced visual recognition. In the case of hundreds of thousands of texts needing transcription, where can an organization find the time and resources to complete the task? The solution: crowdsourcing.

In 2017, the British Library acted upon their need to create transcriptions of a collection of playbills spanning the mid-eighteenth to twentieth centuries in order to make them easily searchable online. Because of their combination of different fonts, text sizes, and weights within single paragraphs and even sentences, these playbills rendered available OCR applications powerless. And, as an additional complication, even with the resources of an institution like the British Library, dedicating staff hours to transcribing these playbills (234,000 total) would cost too much and take far too long. However, without transcription, the metadata required to make these valuable resources accessible to researchers would remain incomplete. What now?

Enter In the Spotlight.

Image of the British Library’s In the Spotlight homepage. Screenshot by author.

Key personnel from the British Library’s Digital Scholarship and Printed Heritage teams came together to create a user-friendly crowdsourcing platform that invites curious amateurs and inquiring professionals to transcribe these digitized playbills piece by piece. By breaking down the transcription process into simple, clearly defined steps with plenty of examples as aids, In the Spotlight welcomes contributions ranging from the transcription of one title to a century’s worth of genres. Any contributors that are interested in sharing their findings, ideas, or problems while performing transcriptions are encouraged to do so through the discussion forum, which is also an excellent way to get feedback directly from the project’s founder, Dr. Mia Ridge.

In the Spotlight is part of the Library’s larger LibCrowds platform and uses an open-source crowdsourcing framework, PyBossa, in combination with a custom theme interface created through a JavaScript framework, Vue.js. The images of the playbills are made available through the International Image Interoperability Framework (IIIF), which provides a set of standards for the optimal accessibility of digital images across various platforms. While these resources require their users’ understanding of the basics of their programming languages and standardized terminology, each provides a wealth of resources explaining their functionality and providing examples of their practical application.

By facilitating scholarly practice through transparent communication and meaningful engagement with the public, In the Spotlight exemplifies an impactful collaborative project in the digital humanities.