Aller au contenu principal

Workflow Overview

Introduction to the Transformation & Publication Workflow

At LINCS, the process of creating linked open data (LOD) is called the transformation workflow. There are various paths through this workflow. The path you take depends on the structure and size of your dataset, your timeline, and your research goals. Finally, the data is published and made available to the public.

LINCS has a team of experts to help your research team through this process.

This is a very brief overview of the steps involved in transforming and publishing data. Although these steps are presented as an ordered list, this is an iterative process. Your research team can expect to have regular meetings with LINCS team members to discuss the transformation process and work collaboratively.

1. Export Your Existing Data

Prepare a version of the data that is easy to share and work with. Learn about exporting data.

2. Clean Your Data

Data cleaning is an important step in data management. It ensures your data is internally consistent, correctly formatted, and complete. Learn about cleaning data.

3. Prepare Your Metadata: Name the Dataset & Create Keywords

The dataset title is the title LINCS uses to refer to a project’s dataset as it appears in LINCS' tools and documentation. Keywords help users explore the data. Learn about naming the dataset and creating keywords

4. Match Entities

Entity matching, also called "entity linking" and "named entity disambiguation" means adding unique identifiers in the form of URIs to your data to represent each unique entity. The goal is to use the same identifier every time that the same real-world thing is mention in your data, other LINCS data, and, ideally, linked data elsewhere on the web. Learn about the entity matching process.

5. Map Existing Data: Develop & Implement Conceptual Mapping

To connect diverse datasets as LOD, each dataset needs to use the same ontology. LINCS maps your data to the CIDOC CRMontology, which provides instructions on how each relationship in the original data should look as Resource Description Framework (RDF) triples. Learn about developing and implementing conceptual mapping.

6. Publish Your Data

Congratulations! You have transformed your data into LOD!

After errors have been noted and corrected, the final version of the dataset will be uploaded to the LINCS triplestore. The final dataset will now be publicly accessible via ResearchSpace as published LOD. The data can now be used in publications and shared with others (except in limited, mutually agreed-upon circumstances). Learn about publishing your data in ResearchSpace.

7. Edit Your Data & Transform More Data

After publishing your data, you can make additions and edits in ResearchSpace Review without repeating all the steps of the transformation workflow.

If you want to add new data that does not have the same structure, the transformation process will need to be altered and repeated. Note that the new data can then be merged with the existing project or can be made into a new, separate project. Learn about transforming additional data and editing data.

Timelines

The time needed to complete the full transformation and publication process varies dramatically based on these factors:

  • How clean is your source data?
  • How many entities are in your data? How many entities need to be de-duplicated internally or matched externally?
  • What is the structure of your source data? The TEI and natural language workflows can be faster because they are more automated and less customized than the structured and semi-structured workflows.
  • How many unique types of relationships are represented in your data?
  • How much time does your team have to dedicate to the process?
  • How many projects is LINCS supporting at the same time as yours?

A small dataset being transformed by an experienced team with dedicated time could get through the whole transformation process in a few weeks. However, most projects working with LINCS tend to take between 6 to 12 months, factoring in time to learn tools, busy schedules, and consultation between your research team and the LINCS team.

We have included time estimates for each step in the transformation workflow documentation. Remember that it does not have to be done all in one go. There is value in completing many of the steps on their own, like cleaning or matching entities in your data, and slowly working towards all of the benefits of LOD.

Next Step

Different types of data require different processes to be transformed into LOD. Whether you are starting with structured or semi-structured data, TEI data, or Natural Language Data, there is a customized workflow. Next, review the workflow that matches your needs.