Aller au contenu principal

Validate and Enhance

Introduction

In this step, you check that your converted data meets the ontological and LOD standards needed for its inclusion in the LINCS Knowledge Graph.

Now that you have converted your data into RDF, we can validate and enhance your data in the same way regardless of what conversion workflow you followed.

Resources Needed

This step is a joint effort between LINCS and your research team. Your team should make an initial attempt at validating and enhancing your converted data. When you think it is ready for the LINCS Knowledge Graph, or if you need help before that point, send your converted data to LINCS and we will do an additional review of the data.

Some basic programming experience (e.g., undergraduate level Python) can make this step easier. LINCS has also made some common validation and enhancement steps easier with the tools discussed below.

The time needed for this step depends on how ready your data is when it comes out of the Implement Conceptual Mapping step. Sometimes there are no errors to fix and it is only a matter a few hours of checking the data and minting entity Uniform Resource Identifiers (URIs). Other times you will find errors that trace back to your original data or to a certain conversion step and will need to spend a few weeks consulting with your team, making edits, and re-checking the data.

Research TeamOntology TeamConversion TeamStorage Team
Handle Data Changes
Validate and Enhance Converted Data
Enhance Converted Data
Use Tools

Handle Data Changes

If you find errors in this step and want to change your data, you have a few options:

  1. Change the RDF directly by editing the TTL file or by using the editing software of your choosing.
    • For small changes, this could be done by hand.
    • For bulk changes, we recommend writing a simple script to make the changes or to use our Linked Data Enhancement API.
    • Remember that if you make changes to the RDF by hand, then you should not re-run the conversion workflow on the same data or your manual changes may be overwritten.
  2. Make notes of the changes needed and wait to implement those changes until the data is in ResearchSpace.
  3. Make the changes to the source data or the conversion step that introduced the error and rerun the conversion workflow until the errors are gone.

Validate Converted Data

Below are validation steps you should perform on your converted data. It is best to do these checks on a combined version of your data where all of the triples are in a single TTL file so that you know if there is missing information or a logical inconsistency across the entire dataset.

Entity Labels

Requirements:

  • Every URI in your data must have at least one an rdfs:label value.
    • The exception is if a URI is being used only as the object of an owl:sameAs relationship or is of rdf:type crm:E73_Information_Object. In these cases, the URI is being used as a link to an external source and is not meant to represent a searchable entity in ResearchSpace.
    • When using external vocabulary terms in your data, add rdfs:label and rdf:type values for those terms from their source vocabularies to your data so that you can query your data without needing to pull from additional sources at the time of querying.

Suggestions:

  • You can add additional rdfs:label values for a single entity.
  • You can add additional labels using skos:altLabel or skos:prefLabel to specify that there is a label or preference that is specific to your project.
  • Whenever possible, include at least one English rdfs:label and one French rdfs:label.
  • Whenever possible, add a language tag for each label and literal value (e.g., "label"@en and "étiquette"@fr).
  • Try to use the same label formats as are used in existing LINCS data. See LINCS LOD Style Guide (Coming Soon) which will for formats for such labels.

Entity Types

Requirements:

  • Every URI in your data must have at least one rdf:type value declared.
    • The exception is if a URI is being used only as the object of an owl:sameAs relationship.
  • The rest of the guidelines here will be specific to the conceptual mapping developed for your data.

URI Validation

Requirements:

Ontological Validation

  • Verify that the relationships present in your converted data match the mappings created in your Develop Conceptual Mapping step.
  • Check back soon for details on future LINCS tools to help you validate CIDOC CRM data.

Logical Validation

  • Manually look through your data and follow some relationship paths through the graph. This can help you do a quick sanity check on the data and spot errors.
  • A common mistake is having a single URI accidentally representing multiple unique entities in the data, or having too specific of an entity when it could have been labelled to be more general and connect to multiple entities.
  • Either through manual checks or using SPARQL queries, see if there are relationships that contradict one another. For example, conflicting family relationships or personal relationships to oneself.
  • Check back soon for details on future LINCS tools to help you check for logical inconsistencies in your data.

Enhance Converted Data

Requirements:

  • If you have not already done so, add reconciled values into the data. For details see our Entity Reconciliation Guide
    • Remember to reconcile against the LINCS Knowledge Graph as much as you can. This will increase the number of connections between your project’s data and other’s. It will also allow you to see yours and other’s contributions to the view of an entity.
  • Before LINCS publishes your final data, all URIs in your data must be official LOD URIs from external sources, from your project, or be minted by LINCS. The Conversion Team will help you with minting.

Suggestions:

  • If desired to take advantage of ResearchSpace’s map visualizations, add coordinates for geographic locations such as GeoNames URIs, following the format from our Data Cleaning Guide.
  • Query the authorities from which you got external URIs to get additional entity labels and any additional information you want included in your dataset.

Use Tools

Use LINCS’s tools to find inconsistencies in your data.

Linked Data Enhancement API

LINCS has bundled common post-processing functionalities so they can be easily applied to any LINCS-compliant RDF data. The functionalities include enhancing the data with reconciliation results, enhancing the data with labels from external LOD sources, minting URIs for entities, and validating the structure of the RDF. For more information, see the Linked Data Enhancement API Documentation.

ResearchSpace

When the Conversion Team and your research team agree that the data is ready for ingestion into LINCS, the converted data is handed over to the Storage Team to be uploaded to the LINCS triplestore as a trial. The review environment for the publishing platform ResearchSpace — called ResearchSpace Review — can then be used as a tool to further validate and improve the data. See the Publish LOD step for details about using ResearchSpace as the final conversion step.

SPARQL Queries

You can load your converted data into a local triplestore or wait for LINCS to load it into the LINCS triplestore. Once in a triplestore, you can use SPARQL queries to find inconsistencies in your data.