Develop Conceptual Mapping
Introduction
Every incoming dataset starts with a unique structure and use of terms. To get all of this unique data to connect as linked open data (LOD), each dataset needs to use the same ontology. In this step, LINCS develops a mapping that basically gives us instructions on how each relationship in the original data should look as Resource Description Framework (RDF) triples.
LINCS has adopted CIDOC CRM as its base ontology. CIDOC CRM can be tailored to a specific dataset using existing CIDOC CRM extensions or domain specific vocabularies. LINCS’s use of CIDOC CRM is documented in the LINCS application profile, with project specific conversion details available in the project application profiles.
For more information, see our ontology documentation.
Resources Needed
This step can be a challenging one as research teams are often new to CIDOC CRM and many of the ontology concepts introduced. LINCS is here to help make the step as simple as possible.
For natural language and TEI data, we have tools that let you skip this step and extract RDF directly from your data using templates for common relationships.
For datasets that need custom mappings, we have application profiles that you can follow. These application profiles show the patterns we have used in mappings for previously converted LINCS data. When this is not enough, the Ontology Team can—with your consultation—develop custom mappings for you. This process typically takes 2-4 weeks from the time that LINCS receives a copy of your source data and enough documentation to understand the relationships it contains.
We encourage your team to take an active role through the mapping process, even if LINCS is developing a custom mapping for you. By interrogating the ways in which your data is mapped to CIDOC CRM and the vocabulary terms introduced during this step, research teams often learn new things about their data and reevaluate the best ways to express concepts.
The mapping process has inspired past LINCS projects to go back and enhance their source data once they understood how expressive CIDOC CRM and RDF data can be.
Regardless of the workflow you are following, your team will benefit from reviewing our ontology documentation to understand the goals of this step and the basics of CIDOC CRM. Once your data is converted, understanding some CIDOC CRM will also help you navigate your data and take advantage of its new structure.
Research Team | Ontology Team | Conversion Team | Storage Team | |
---|---|---|---|---|
Develop a Mapping | ✓ | ✓ | ||
Consult and Approve Mapping | ✓ | ✓ |
Develop a Mapping
- Structured Data
- Semi-Structured Data
- TEI Data
- Natural Language Data
Most incoming datasets express some relationships that are common to already converted LINCS datasets. Basic biographic information about people or bibliographic details of written works are common examples. In these cases, your team can, with the support of the Ontology Team, use the LINCS application profile to start mapping those components. As more datasets are added to the LINCS triplestore, more mappings will be available to draw upon.
When there are no existing datasets in LINCS that have similar structure and content, The Ontology Team either drafts a new conceptual mapping or adapts an existing mapping. The Ontology Team iterates the mapping process until a conceptual model has been created that accurately captures and perhaps even enhances the meaning of the original data. The mapping is approved by the Research Team.
Data in this category is unique to each project and, compared to structured data, there is not necessarily a clear structure of entities and the important relationships between them.
The basics of this step typically look like this:
- Your team will identify what from the existing data should be expressed by the new mappings. An important consideration here is how that information can be extracted from the data. Is the data standardized and annotated enough that a script could be written to extract it or will it need to be manually extracted by a human?
- Your team should review our ontologies documentation to gain the relevant background knowledge. Next, review the LINCS application profile to understand how basic information from your data can be mapped to CIDOC CRM. If your data is similar to existing LINCS data, then try to follow the application profile to map your data.
- If you need support or you have data that does not fit within existing LINCS mapping in the application profile, the Ontology Team can help with creating custom mappings for your project. Note that the amount of support we can provide varies depending on the other projects we are supporting at that time.
LINCS has developed tools that include pre-set templates for you to choose from that will extract information from your TEI documents and output CIDOC CRM RDF data. For details on these tools, continue to the Implement Conceptual Mapping step.
If you need additional custom mappings for TEI fields that do not fit in the templates but that fall within our definition of structured data, then you should follow the structured data workflow for those fields or use the conversion XSLTs provided in the XTriples documentation.
If you need additional mappings for TEI fields that do not fit in the templates, refer to the LINCS application profile to map them into CIDOC CRM or consult with LINCS if custom mappings need to be created. Refer to the structured data workflow for details on the process of creating custom mappings.
If you want to handle natural language text fields embedded in the TEI documents then refer to the natural language workflow for that part of the data.
There are two ways LINCS extracts facts from natural language texts.
The preferred method is to use LINCS natural language processing tools which use a preset list of relationships that they can extract. For these extracted facts, you can use LINCS tools to convert the facts into CIDOC CRM. You do not need to create any custom mappings. For details on these tools, continue to the Implement Conceptual Mapping step. Check back soon for the application profile for these tools, which shows what relationships are covered and how they are expressed in CIDOC CRM.
If you would like to extract additional facts from your text that were either missed by our tools or that fall into relationships not covered by our tools, you can manually extract additional facts. This extraction can be by hand or using other relation extraction (RE) systems. For these facts, refer to the LINCS application profile to map them into CIDOC CRM or consult with LINCS if custom mappings need to be created. Refer to the Structured Data tab for details on the process of creating custom mappings.
Consult and Approve Mapping
The mapping process is iterative; once the research team and Ontology Team are satisfied with the mapping, you can move to the next step. If you are using LINCS natural language or TEI conversion tools, then you will not have output from this step and will move on to the Implement Conceptual Mapping step to generate your converted data.
Otherwise, at the end of this step you should have mappings defined that you will implement in the next step.