OpenRefine
OpenRefine is a data processing application that allows you to clean up and transform structured data. It has several functionalities suited for creating Linked Open Data (LOD), such as entity matching, format translation, Resource Description Framework (RDF) mapping, and export options.
OpenRefine and LINCS
Within the LINCS project, OpenRefine is used for data cleaning and entity matching. It is primarily used by researchers bringing their own datasets to the project. OpenRefine allows for these domain experts to have full control over the changes made to their data.
OpenRefine is best suited for structured data, since it will represent the data in a format similar to a spreadsheet or table. Any file type that follows a similar system, such as comma separated values (CSV), is best, though it is also compatible with other file types like XML, JSON, and RDF. If a researcher’s data falls within a certain domain or is unstructured, a different tool may be more appropriate:
- Use LINCS-API or NERVE for an unstructured dataset.
- Use VERSD to match entities for an entire bibliographic dataset.
The software can be downloaded from OpenRefine’s website. When launched, the application will open in a browser tab that runs locally on your computer.
Though this tool can be useful for researchers and data specialists outside of LINCS, it is important for those who are in the process of getting their data into the LINCS system to begin cleaning and matching entities in OpenRefine early in the data preparation process.
Check out the Authority Service to match entities in your data against the LINCS Knowledge Graph from within OpenRefine.
Prerequisites
- You don't need to create a user account.
- You do need to have your own dataset.
- A basic understanding of entity matching and data cleaning is required.
OpenRefine supports the following inputs and outputs:
- Input: CSV, TSV, XLS, XLSX, JSON, XML, RDF, plain text, and more
- Output: CSV, TSV, XLS, XLSX, HTML-formatted tables, and more
Resources
To learn more about OpenRefine, see the following resources:
Clean Data:
- OpenRefine User Manual
- Rue & Hernandez (2019) “Using OpenRefine to Clean Your Data”
- Hervieux (2020) “OpenRefine Activity” [PowerPoint]
- van Hooland, Verborgh, & De Wilde (2021) “Cleaning Data with OpenRefine”
Match Entities:
- OpenRefine User Manual—Reconciling
- Getty Digital (2020) “Getty Vocabularies OpenRefine Tutorial and Tips for Advanced Users”
Information about the team that developed OpenRefine is available on the Tool Credits page.