Aller au contenu principal

Implement Conceptual Mapping

Introduction

The implement conceptual mapping step is where we finally convert your data from its original structure and ontology into LINCS RDF.

Resources Needed

For TEI data and natural language data, your team will do this step using LINCS tools. Our tools use templates for common relationships so you can get output in a few minutes. Though you may need to spend some time playing around with processing your source data to get the output you want.

For structured data and semi-structured data, we still have tools to help, but our approach is customized to each dataset so the process takes longer. An experienced user could convert a dataset in a few days but we find this step ends up taking a few weeks to a few months for the average project when you consider training, implementation, and troubleshooting. For these workflows, it will be a combined effort between the LINCS Conversion Team and your research team.

Research TeamOntology TeamConversion TeamStorage Team
Set Up your Data
Transform your Data

Set Up your Data

To proceed with this step, you must have a conceptual mapping developed for your specific data. Ideally this mapping will be final so that you do not need to redo this implementation step later on. With that said, it is fine to have a mapping that only covers certain relationships of interest as a starting point and then to add to that mapping and to this implementation step in phases.

It is best if you have already cleaned your data before this step. However, if your implementation is going to use code or a tool that can be rerun easily then it is fine to start on this step before you have finished data cleaning. You can rerun the implementation step when the final cleaned data is ready.

Transform Your Data

Whenever possible, use tools or scripts that let you easily edit and rerun this step. That way if you find errors in your source data or if you have more data later on, you can rerun this step to quickly convert it.

Every dataset in this category comes with a unique starting structure and by this point should have its own conceptual mapping. To grab each piece of information from the source data and reconnect it together as CIDOC CRM triples, LINCS prefers to use the 3M mapping tool.

The 3M mapping tool takes XML documents as input and, through its graphical user interface, allows users to select data from their source files and map it into custom CIDOC CRM triples. We have found that this is the easiest method to get consistently converted data. LINCS has developed 3M documentation to guide you through creating your first mapping file, and the Ontology Team and Conversion Team can provide support as you get started.

You may choose to use 3M if:

  • Your data is already in XML or is in a format that can be easily converted to XML (e.g., a spreadsheet or JSON files)
  • You do not have a team member with programming experience and need a tool with a graphical user interface
  • Your data contains many relationships so the reliability of 3M output and its treatment of intermediate nodes will have a large benefit

Alternatively, you may choose to write custom scripts to convert your data instead of using 3M. You may choose to write custom scripts if:

  • You have a team member who understands the source data, understands the conceptual mapping, and has sufficient programming experience
  • Your data only covers a small number of relationships so learning 3M is not worth the time investment
  • Your data is in a highly normalized relational database and the code needed to transform the relational data into XML would be equivalent to code needed to output triples

3M requires your data to be input as XML. If your structured data is not in XML, you can convert it following our Preparing Data for 3M documentation. This documentation also gives suggestions for ways to edit your XML data to make working in 3M easier.

The Conversion Team and Ontology Team use 3M or custom scripts to write out the conceptual mapping and run the transformation on the data, resulting in LINCS RDF. To make sure that the output from 3M is correct, either the research team or the Conversion Team and Ontology Team transform a small sample of the data and vet the results using the built in 3M visualization tools and a manual comparison process. The full dataset is then converted.

You should now have RDF data that follows LINCS’s ontology and vocabulary standards. Your data may not be quite ready for ingestion into the LINCS triplestore yet, but it will be after some final cleanup in the next step.