3M Basics
Walkthrough Prerequisites
Before begining the walkthrough, download this XML file. This file is the source data for your mapping project and features the Yellow Nineties data for three entities (people).
3M Basic Principles
After data is prepared, it becomes XML. XML can be thought of as a tree of nested elements. At the highest level, here is what you do in 3M:
- Point 3M towards specific elements in the source XML using the language XPath
- Assign those elements classes and properties from your target schema And then 3M will produce RDF triples based on the values in those elements.
As you work though this walkthrough, think about what 3M will grab from your original data and where that piece of information will end up in your converted data.
A Basic Example
3M produces RDF triples as the converted data. These triples follow the pattern:
subject → predicate → object
If we have this starting XML data:
<main_element>
<nested_element>my_value</nested_element>
</main_element>
We can supply the XPath main_element
to tell 3M to repeat a certain conversion step each time a new <main_element>
appears in our XML data.
Then we can give the XPath nested_element/text()
which will make 3M grab my_value
and that value will show up in the converted data.
In this example, there would be new subject for each <main_element>
, with the predicate defined by <nested_element>
and the object my_value
.
<main_element> → <nested_element> → “my value”
Understanding the Data
Mapping your data requires understanding it first. The source data for this walkthrough is a XML file that features some of the Yellow Nineties data for Henry Harland and Ethel Colburn Mayne.
Let's review the file together:
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:y90s="https://personography.1890s.ca/ontology/">
<rdf:Description rdf:about="https://personography.1890s.ca/persons/yeats-w-b/">
<y90s_name>Yeats, W. B.</y90s_name>
<y90s_birth_name>Yeats, William Butler</y90s_birth_name>
<y90s_birth_date rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear">1865</y90s_birth_date>
<y90s_birth_place>1 George's Ville, 5 Sandymount Avenue, Dublin, Ireland</y90s_birth_place>
<y90s_birth_place_uri rdf:resource="http://www.wikidata.org/entity/Q1761"/>
</rdf:Description>
<rdf:Description rdf:about="https://personography.1890s.ca/persons/smith-pamela-colman/">
<y90s_name>Smith, Pamela Colman</y90s_name>
<y90s_aka>P. C. S.</y90s_aka>
<y90s_birth_date rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear">1878</y90s_birth_date>
<y90s_birth_place>28 Belgrave Road, Pimlico, Middlesex, England</y90s_birth_place>
<y90s_birth_place_uri rdf:resource="http://www.wikidata.org/entity/Q2894393"/>
</rdf:Description>
<rdf:Description rdf:about="https://personography.1890s.ca/persons/yeast-john-butler/">
<y90s_name>Yeats, John Butler</y90s_name>
<y90s_birth_date rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear">1839</y90s_birth_date>
<y90s_birth_place>Tullylish, Down, Ireland</y90s_birth_place>
<y90s_birth_place_uri rdf:resource="http://www.wikidata.org/entity/Q60557195"/>
</rdf:Description>
</rdf:RDF>
This file has three entities:
- W. B. Yeats who is represented by the URI
<https://personography.1890s.ca/persons/yeats-w-b/>
which is the attribute of the element<rdf:Description>
- Pamela Colman Smith who is represented by the URI
<https://personography.1890s.ca/persons/smith-pamela-colman/>
which is the attribute of the element<rdf:Description>
- John Butler Yeats who is represented by the URI
<https://personography.1890s.ca/persons/yeats-john-butler/>
which is the attribute of the element<rdf:Description>
These three entities are the subjects of our mapping.
The other elements in this file are all predicates and the values within them are objects.
<y90s_name>
is the name of the entity.
<y90s_birth_name>
is the birth name of the entity.
<y90s_aka>
is the additional name of the entity.
<y90s_birth_date>
is the birth date of the entity.
<y90s_birth_place>
is the birth place of the entity (<y90s_birth_place_uri>
is the URI for that place).
If we imagine these as subject → predicate → object triplets, the source data is saying the following about W. B. Yeats:
W. B. Yeats → has the name → “Yeats, W. B.” W. B. Yeats → has birth name → “Yeats, William Butler” W. B. Yeats → was born in → “1865" W. B. Yeats → was born at → "1 George's Ville, 5 Sandymount Avenue, Dublin, Ireland"
Before reading on, see if you can figure out what the source data is saying about the two other entities. Then read on to check if you are right.
Here is what the source data is saying about Pamela Colman Smith and John Butler Yeats:
Pamela Colman Smith → has the name → “Smith, Pamela Colman” Pamela Colman Smith → has additional name → “P. C. S.” Pamela Colman Smith → was born in → 1878" Pamela Colman Smith → was born at → "28 Belgrave Road, Pimlico, Middlesex, England"
John Butler Yeats → has the name → “Yeats, John Butler” John Butler Yeats → was born in → "1839" John Butler Yeats → was born at → "Tullylish, Down, Ireland"
By using 3M, we can turn these subject → predicate → object triples into actual, machine-readable RDF triples which can then be linked to other datasets and create shared knowledge. In the full Yellow Nineties dataset, there are thousands of entities (people) and tens of thousands of predicates describing them. By setting up mappings in 3M, the same patterns can be applied to thousands of entities as long as the structure of the source data is consistent. This type of automation is what makes 3M so useful for creating RDF triples.
Now that you understand what the data is trying to say, you are ready to map it. First, setup your mapping project.