Table 2

Mapping between FAIR principles and DQA process ontology’s data quality dimensions

FAIR principles (Wilkinson et al., 2016)DQA Ontology’s quality dimensionsScenarios
FindableF1. (meta)data are assigned a globally unique and persistent identifierAccuracy,
Actionability,
Completeness,
Currency,
Integrity,
Precision
Alex, a research scientist, recently deposited his biomedical research dataset.

The RDR curator, John, must ensure that the dataset is findable. The dataset’s findability is achieved by the quality of its metadata and the quality of the RDR’s indexing algorithm. After consulting the DQAO, John classified the finding activity as Representational Dependent. Hence, John needs to ensure that the Intrinsic and Product Level dimensions of the metadata are the starting point for assembling a metadata quality evaluation model for the dataset. John operationalizes each quality dimension, such as Completeness using the Metric class and by identifying the set of metadata elements used by the stakeholder communities to describe and search datasets and their activity-specific priorities
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
AccessibleA1. (meta)data are retrievable by their identifier using a standardized communications protocolAccessibility,
Actionability,
Integrity,
Security,
Ethical Compliance,
Legal Compliance,
Recoverability
Jerome submitted a social science research dataset that includes survey responses and interview transcripts related to the impact of remote work on employee productivity and well-being.

John uses the QualityPolicy class of DQAO to ensure that the dataset’s metadata specifies clear access protocols and policies that define who can access the data, under what conditions, and how.

John also assesses the assembled data product on the Accessibility, and Ethical and Legal Compliance dimensions using the Metric class and assigns it an appropriate accessibility quality rating
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
InteroperableI1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledgeConsistency,
Interoperability
Tina submitted an environmental research dataset that includes data on air quality measurements, meteorological data, and pollution sources in a large metropolitan area. To ensure broad accessibility and reuse, the semantic layer (i.e. the schema and metadata) and the content values of the dataset need to be interoperable with other datasets created by other labs and used by environmental scientists, public health researchers, and policymakers.

The Evaluation activity class also helped John devise an Intervention activity aimed at enhancing dataset interoperability. John advised Tina to recode the data and associated metadata in accordance with relevant standards, formats, and vocabularies recommended by environmental science communities
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
ReusableR1. meta(data) are richly described with a plurality of accurate and relevant attributesAccuracy,
Completeness,
Consistency,
Precision,
Simplicity,
Accessibility,
Ethical Compliance,
Legal Compliance,
Integrity
Traceability,
Understandability
John received agricultural research datasets on crop yields, soil health, irrigation practices, and weather conditions across various farms over the past decade. The dataset needs to be reusable by researchers, agronomists, policymakers, and farmers to optimize agricultural practices and inform policy decisions.

Reusability is a high-level, composite DQA objective. It does not have a direct match in the dimensions’ taxonomy. However, reuse an action for any data dependent activity. After consulting the DQAO, John determined that the reuse goal is part of the objectives of all four activity types: Representation-Dependent, Decontextualizing, Stability-Dependent, and Provenance-Dependent.

Hence, the quality of the dataset and associated metadata objects must be ensured along all categories of dimensions (i.e. Intrinsic and Product level). For example, to ensure a dataset’s Integrity and Traceability, the metadata object must include the dataset’s complete Provenance Metadata detailing how the data was collected, evaluated, processed, and curated, any quality control checks and interventions performed
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards

Source(s): Authors’ own work

or Create an Account

Close Modal
Close Modal