Mapping between FAIR principles and DQA process ontology’s data quality dimensions
| FAIR principles (Wilkinson et al., 2016) | DQA Ontology’s quality dimensions | Scenarios | |
|---|---|---|---|
| Findable | F1. (meta)data are assigned a globally unique and persistent identifier | Accuracy, Actionability, Completeness, Currency, Integrity, Precision | Alex, a research scientist, recently deposited his biomedical research dataset. The RDR curator, John, must ensure that the dataset is findable. The dataset’s findability is achieved by the quality of its metadata and the quality of the RDR’s indexing algorithm. After consulting the DQAO, John classified the finding activity as Representational Dependent. Hence, John needs to ensure that the Intrinsic and Product Level dimensions of the metadata are the starting point for assembling a metadata quality evaluation model for the dataset. John operationalizes each quality dimension, such as Completeness using the Metric class and by identifying the set of metadata elements used by the stakeholder communities to describe and search datasets and their activity-specific priorities |
| F2. data are described with rich metadata (defined by R1 below) | |||
| F3. metadata clearly and explicitly include the identifier of the data it describes | |||
| F4. (meta)data are registered or indexed in a searchable resource | |||
| Accessible | A1. (meta)data are retrievable by their identifier using a standardized communications protocol | Accessibility, Actionability, Integrity, Security, Ethical Compliance, Legal Compliance, Recoverability | Jerome submitted a social science research dataset that includes survey responses and interview transcripts related to the impact of remote work on employee productivity and well-being. John uses the QualityPolicy class of DQAO to ensure that the dataset’s metadata specifies clear access protocols and policies that define who can access the data, under what conditions, and how. John also assesses the assembled data product on the Accessibility, and Ethical and Legal Compliance dimensions using the Metric class and assigns it an appropriate accessibility quality rating |
| A1.1 the protocol is open, free, and universally implementable | |||
| A1.2 the protocol allows for an authentication and authorization procedure, where necessary | |||
| A2. metadata are accessible, even when the data are no longer available | |||
| Interoperable | I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge | Consistency, Interoperability | Tina submitted an environmental research dataset that includes data on air quality measurements, meteorological data, and pollution sources in a large metropolitan area. To ensure broad accessibility and reuse, the semantic layer (i.e. the schema and metadata) and the content values of the dataset need to be interoperable with other datasets created by other labs and used by environmental scientists, public health researchers, and policymakers. The Evaluation activity class also helped John devise an Intervention activity aimed at enhancing dataset interoperability. John advised Tina to recode the data and associated metadata in accordance with relevant standards, formats, and vocabularies recommended by environmental science communities |
| I2. (meta)data use vocabularies that follow FAIR principles | |||
| I3. (meta)data include qualified references to other (meta)data | |||
| Reusable | R1. meta(data) are richly described with a plurality of accurate and relevant attributes | Accuracy, Completeness, Consistency, Precision, Simplicity, Accessibility, Ethical Compliance, Legal Compliance, Integrity Traceability, Understandability | John received agricultural research datasets on crop yields, soil health, irrigation practices, and weather conditions across various farms over the past decade. The dataset needs to be reusable by researchers, agronomists, policymakers, and farmers to optimize agricultural practices and inform policy decisions. Reusability is a high-level, composite DQA objective. It does not have a direct match in the dimensions’ taxonomy. However, reuse an action for any data dependent activity. After consulting the DQAO, John determined that the reuse goal is part of the objectives of all four activity types: Representation-Dependent, Decontextualizing, Stability-Dependent, and Provenance-Dependent. Hence, the quality of the dataset and associated metadata objects must be ensured along all categories of dimensions (i.e. Intrinsic and Product level). For example, to ensure a dataset’s Integrity and Traceability, the metadata object must include the dataset’s complete Provenance Metadata detailing how the data was collected, evaluated, processed, and curated, any quality control checks and interventions performed |
| R1.1. (meta)data are released with a clear and accessible data usage license | |||
| R1.2. (meta)data are associated with detailed provenance | |||
| R1.3. (meta)data meet domain-relevant community standards | |||
| FAIR principles ( | DQA Ontology’s quality dimensions | Scenarios | |
|---|---|---|---|
| Findable | F1. (meta)data are assigned a globally unique and persistent identifier | Accuracy, | Alex, a research scientist, recently deposited his biomedical research dataset. |
| F2. data are described with rich metadata (defined by R1 below) | |||
| F3. metadata clearly and explicitly include the identifier of the data it describes | |||
| F4. (meta)data are registered or indexed in a searchable resource | |||
| Accessible | A1. (meta)data are retrievable by their identifier using a standardized communications protocol | Accessibility, | Jerome submitted a social science research dataset that includes survey responses and interview transcripts related to the impact of remote work on employee productivity and well-being. |
| A1.1 the protocol is open, free, and universally implementable | |||
| A1.2 the protocol allows for an authentication and authorization procedure, where necessary | |||
| A2. metadata are accessible, even when the data are no longer available | |||
| Interoperable | I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge | Consistency, | Tina submitted an environmental research dataset that includes data on air quality measurements, meteorological data, and pollution sources in a large metropolitan area. To ensure broad accessibility and reuse, the semantic layer (i.e. the schema and metadata) and the content values of the dataset need to be interoperable with other datasets created by other labs and used by environmental scientists, public health researchers, and policymakers. |
| I2. (meta)data use vocabularies that follow FAIR principles | |||
| I3. (meta)data include qualified references to other (meta)data | |||
| Reusable | R1. meta(data) are richly described with a plurality of accurate and relevant attributes | Accuracy, | John received agricultural research datasets on crop yields, soil health, irrigation practices, and weather conditions across various farms over the past decade. The dataset needs to be reusable by researchers, agronomists, policymakers, and farmers to optimize agricultural practices and inform policy decisions. |
| R1.1. (meta)data are released with a clear and accessible data usage license | |||
| R1.2. (meta)data are associated with detailed provenance | |||
| R1.3. (meta)data meet domain-relevant community standards | |||
Source(s): Authors’ own work