The Ship of Theseus: Representing Nuance in Humanities Data

September 1, 2022 · 4 min read

Kate LeBere

LINCS Vocabularies and Documentation Co-op

If each part of a ship were replaced over time, when, if ever, does it become a new ship?

Little did Heraclitus, Plato, and others know that the problems posed by the “Ship of Theseus” paradox would continue to vex digital humanists in the twenty-first century...

During my MLIS co-op at LINCS I contributed to a vocabulary that aligned the place types of REED London Online (REED), Map of Early Modern London (MoEML), and The Digital Ark (Ark). To complete this task, we first needed to determine how each researcher represented place within their data.

Having worked for MoEML during and after my undergraduate degree, I knew this process would not be simple. Digital Humanities (DH) scholars use digital tools and technologies to study human society and culture from a critical perspective. Research questions are nuanced, which creates a complicated—albeit compelling—tension between the ambiguities of the humanities and the explicit systems and rules necessitated by technology. This tension results in a dilemma: digital humanists have to navigate the biases and values embedded in software, algorithms, and encoding standards to make sure the nuance and detail in their research is not being unfairly simplified.

My favourite example of this tension is the debate between function and structure: in a dataset, should a single location be represented as multiple entities if its function or structure changes? During the early modern period, the site of the Charterhouse assumed many functions: at various times, it was a burial ground for plague victims, a Carthusian monastery, a royal residence, a hospital, a school, and a pensioners’ home. Is Charterhouse, the monastery, the same place as Charterhouse, the pensioners’ home? What about structure: the timber from the 1576 Theatre was used to construct the 1599 Globe. In 1613, this Globe burnt down and a new Globe was constructed. Other places, such as Somerset House, underwent extensive renovations during the early modern period, not to mention that thousands of buildings—including St Paul’s Cathedral—were partially or fully destroyed in the 1666 Great Fire of London.

Making a claim that one location is the same as another is not straightforward. It often depends on a project’s underlying assumptions, research questions, and scholarly goals. This task becomes even more complicated when pulling together multiple projects, each with their own objectives, but all referring to the same places.

In the vocabulary we developed with REED, MoEML, and Ark, we decided to divide places into four broad categories: Administrative Unit, Structural Place, Functional Place, and Topographical Feature. In doing so, we were able to distinguish places we believed to be primarily defined via their structure—for example, bridges, gates, and roads—from those primarily defined via their function—for example, prisons, churches, and playhouses. Below is a visualization of the vocabulary we developed.

SKOS Visualization.

“Place types in early modern London” vocabulary developed in conjunction with REED London Online (REED), Map of Early Modern London (MoEML), and The Digital Ark (Ark). Declared in SKOS and visualized in SKOS Play.

Nevertheless, some edge cases defy simple categorization. For example, Newgate is both a gate (Structural Place) and a prison (Functional Place). London Bridge is a bridge (Structural Place), but could also be considered a place of commerce (Functional Place) because it was home to hundreds of shops during the early modern period.

The lack of a one-size-fits-all solution is not necessarily a bad thing. Contemplating edge cases with the researchers led to productive discussions about how to represent nuance while adhering to the structure imposed by the vocabulary. Most importantly, the creation of the vocabulary helped the researchers rethink how they were categorizing and labeling places within their datasets.

While our vocabulary is yet to be applied to the datasets, it will be integral to facilitating the interoperability and specificity of data categorization. Most importantly, its application will lead to more analysis—and if this analysis is anything like our previous work, it will be complicated, nuanced, and exciting.