Skip to Main Content
Purpose

Data-documentation standards offer diverse means to represent processes, an aspect of datasets that generally lack adequate documentation for their reuse. This study aims to generate new knowledge of to what extent data-documentation standards cover and conceptualise data-related processes (i.e. paradata).

Design/methodology/approach

A set of nine major general domain-agnostic research data standards (Data Package, DataCite Metadata Schema, Dublin Core, OAI-ORE, PREMIS CERIF, DDI, PROV, NetCDF) were analysed using close reading of standards documents to identify elements relevant to representing diverse aspects of data-related processes.

Findings

A review of popular standards shows that they vary in what processes are represented: many focus on curatorial history and a part on capturing selected aspects of processes for enabling specific types of prioritised forms of (re)use of data. Also, the representations vary and range from complex data models to limited attribute data with parallel variation in their forms and granularity. The variations affect the capacity of standards to accommodate different ontological and epistemic perspectives to processes and their features. The findings underline how the choice of a particular standard – and standardisation as an approach for generating process documentation – has major consequences to what aspect of a process ends up being documented and from what perspective, with consequences to the usefulness and affordances of produced descriptions.

Originality/value

To the knowledge of the authors, no previous research on to what extend common data documentation standards accommodate paradata exists.

Data documentation standards are cornerstones of what is considered effective data management. They are portrayed as a key remedy to diverse technical and operational conundrums from interoperability and inconsistency to difficulties and cost of data use. In the “data world” that clearly “does not lack standards”, inadequacies in existing schemes typically lead to proposals to develop new ones (Hayslett, 2015, p. 1). A recently highlighted problem is that data standards typically lack systematic means and guidance on documenting information on data creation, processing and use – termed paradata. At the same time, many standards contain elements useful in representing such aspects of data (Börjesson et al., 2020) and it has been observed that currently the main obstacle to comprehensive data documentation might not be the absence of technical means to represent data but the lack of exhaustive understanding of what needs to be documented (Huggett, 2012; Huvila, 2022). Instead, as Cameron et al. urge, it would be important to examine “[w]here paradata may fit into existing metadata schemas would be another fruitful direction for further research” (Cameron et al., 2023, p. 16).

A parallel question to the technical adequacy of standards relates to the contextuality of both data and process descriptions (Million et al., 2025; Andersson et al., 2024) and to what extent the existing standards are hospitable enough to diverse understandings and conceptualisations of data related processes to allow comprehensive and context-independent enough data documentation to facilitate reuse over time, across disciplines and beyond individual contexts of data making (Fenlon et al., 2025; Millerand and Bowker, 2009). In contrast to the literature describing and explaining individual standards, there are few comparative studies (with exceptions focusing on specific domains and types of standards (e.g. Börjesson et al., 2020; Bettivia et al., 2022)) of their features and coverage to substantiate broader claims of the (in)adequacies of data standards and their suitability for comprehensive documentation of data and data-related processes and practices. To the knowledge of the authors, no previous research on to what extent common data documentation standards accommodate paradata exist.

The aim of this study is to address this gap and generate new knowledge of data standards with a focus on to what extent they cover and how they conceptualise data-related processes (i.e. paradata). The focus of the study is to advance the general understanding of the opportunities and limitations of using data documentation standards as an approach for process documentation rather than investigating individual standards and their affordances to describe a specific type of process in detail. Three research questions were addressed: (RQ1) What types of processes are mentioned in the standards documents?, (RQ2) Where in the standards are the process related elements? and (RQ3) How are processes represented? A set of major domain-agnostic research data standards were selected for analysis to explore the variety of approaches to understanding and representing paradata.

The history of research data archiving and data documentation standards goes back several decades (Rasmussen, 2014). Currently, a large number of both domain-specific and agnostic data documentation standards exist. They vary from domain-specific and generic sets of metadata elements for describing data to process standards and sets of principles. As with standards in general, they range from formal de jure standards to de facto standardised practices and pseudo-standards that is arrangements that look like standards without being them (Weber, 1989). Datasets are also sometimes documented using non-data-specific documentation standards, however, with inferior outcomes relating to discovery, ease of access and specificity of data and descriptions (Niu, 2016). Earlier comparisons of standards underline the divergent ontologies of individual standards and how the choice of standard affects how documented entities are described and represented (e.g. Oliveira et al., 2024; Pacheco et al., 2023).

Earlier research covers standardisation processes at standardisation bodies (Markus et al., 2006; Jiang et al., 2022; Kari, 2024; Lehr, 1992) although less so pertaining to data documentation standards (exceptions, e.g. Williams et al., 2017, 2004). Some earlier work exists describing the development of individual standards (e.g. Bøe Sollund and Holm-Olsen, 2013; Rodrigues and Teixeira Lopes, 2022). Others have studied the use of standards. Also here, the bulk of earlier work has not focused on data documentation standards. Of data standards, according to a European survey, the most popular ones followed by researchers relate to methodology, others being used much less frequently. Popular standards pertain to specific disciplines and research communities (DANS et al., 2022).

Standards support manual description of data but are also used for supporting automatic subject indexing (Yeung and Hall, 2007), enrichment of metadata records (e.g. Binding and Tudhope, 2024) and information retrieval. Benefits of using data documentation standards include the fact that they help making research data Findable, Accessible, Interoperable and Reusable as per the widely cited FAIR data principles by reducing variation and inconsistency in documentation (Zoldoske, 2024; Wilkinson et al., 2016), which may be considered a process related “meta” element in and of itself. Standardised documentation also helps, for example, to avoid mistakes and misunderstandings (Chapman, 2001) to an extent that some see proper standardised metadata as a prerequisite for data reuse especially in long term (D'Andrea, 2023; Borghi and Van Gulick, 2021).

Standardisation of data documentation has also drawbacks and can serve to lessen the range of documented processes available. Standards reduce the leeway scientists’ have in producing information while they promote “good recordkeeping” according to a particular set of norms (Shankar, 2009). They simplify (Tóth-Czifra, 2020; Edmond and Lehmann, 2021) and can make data too clean (Börjesson et al., 2022; also Ernst and Siegel, 2015, p. 89) hiding messiness and functioning as a “diversity-hiding trick” (Rawson and Muñoz, 2019, p. 290). Standardised expressions risk losing ancillary connotations of contextually selected terms and ways of documenting (Button et al., 2022), diversity of perspectives of how to understand and describe phenomena (Feinberg, 2022) and to what Gero (2007) terms problematic certitude over what has been ambiguous and uncertain. The “indexer inconsistency” between individual users of standards is not as much a question of variety of expression but as Bates (1986) stresses, variety of meaning that should not be reduced through standardisation. The shortcomings caused by inflexibility can be mitigated by implementing modular extensible standards (Behrens, 2022) and instructing and providing means for additional explanations whenever standard categories are not enough. The both do, however, increase the complexity of documentation effort and risk of errors and non-compliance (cf. Sandoval, 2021).

The proliferation of standards is also problematic (Doerr, 2009; Khazraee Afzali, 2014). Exceeding specificity of standards can turn to a form of exclusion of those not part of a particular narrow community (Rathje et al., 2013). It causes significant problems not only for their intended users to learn them but also for repositories and archives to manage the increasing fragmentation of standardisation landscape (Katuu, 2023). Rather than developing new universal umbrella standards (Semeraro, 2016), especially in interdisciplinary work, it has become increasingly popular to conduct crosswalks and focus on mappings between individual standards to improve discoverability of documented assets (Richards, 2023; Iliadis et al., 2025).

Standards can also be both too simple or too complex, and both outcomes can result in processes and other aspects of the standards' domain not becoming sufficiently documented. A too simple standard might not provide enough support for systematic documentation. In contrast, many standards have been criticised for exceeding complexity that makes them too difficult to use (e.g. Hartley and Schjøtt, 2023; Guillem and Bruseker, 2017). Instead of reducing complexity they do the opposite (Hanseth et al., 2006).

Effective use of standards can be complicated by multiple factors. Even if the use of data standards generally improves systematicity, the presence of multiple individuals participating in documentation increases variation. Developing effective means to guide them to produce uniform documentation is difficult (Vetter, 2022). Data curation is also tedious and can feel rewarding (Plantin, 2021). It takes time, with complex data beyond what is needed to create and use it (Tsai et al., 2016) to an extent that standards are sometime used by those who have the resources to perform compliance and good documentation rather than to demonstrate reliability (Bettivia and Stainforth, 2018). Standards compliant data documentation cannot always be produced directly because of the immediate needs of data producers that lead to an increasing risk of missing details as time passes (Zoldoske, 2024). In some fields, practicable standards are still missing (Tenopir et al., 2011; Tsai et al., 2016) and in others, their uptake (e.g. Lien-Talks, 2024; Marsh et al., 2024) remains limited. Similarly to controlled vocabularies (Lovins and Hillmann, 2017), also standards become obsolete and require maintenance and repair to remain relevant – that also needs to be documented (cf. Cheng et al., 2024). Similarly, the use of standards needs to be kept reflective to avoid turning them to empty rituals (Wagemann and Schneider, 2015).

Developing standards is complex, especially in “long-tail disciplines” (Chao et al., 2015) like archaeology where practices and terminology diverge (Dye and Buck, 2015). Comprehensive standardisation would require systematic change of field-specific practice (Lien-Talks, 2024) with an immanent risk of radically impoverishing it. Further, even when standards are implemented in such contexts, compliance often remains low as demonstrated by ample evidence from field archaeology (Vetter, 2022). Standards are also regularly applied late in the data creation process that increases documentation cost (Chao et al., 2015). A parallel complication is that the flexibility of standards sometimes undermines their capability to standardise (e.g. Force and Smith, 2021; Donnelly, 2016).

While the benefit of domain-specific standards is in how they can help to improve the usefulness (Bates et al., 1993) and specificity of descriptions, it has also been suggested to hamper their applicability in interdisciplinary settings (Bak, 2015). Data standards in neighbouring disciplines and subfields are also often inconsistent (Lien-Talks, 2024), unsurprisingly leading to demands from users to easier conversion between standards (Tenopir et al., 2011). A corresponding boundary exists between data creators, managers and users. Standards vary in to what extent they correspond with data searchers' and users' information needs and perspectives (Papenmeier et al., 2021) rather than merely reflecting data creators' and managers' ontological assumptions. Standards influence interaction between people and standardised knowledge, what aspects of knowledge becomes documented (Park, 2021) and what are the premises of considering that certain pieces of information are “same” or “different” for the purposes at hand of those who use the standard to establish or retrieve information (Busch, 2011). Such observations echo Bearman's (1993) urge to avoid implementing standards unless they have shown to work in practice. Further, even if a standard is functional for a particular use, also their implementation in systems and the systems themselves have an impact on how they are applied (Hansson et al., 2022) with a direct consequence to what ends up being documented and how.

The importance of documenting the practices and processes of creating, processing and using data (paradata) is generally acknowledged as a crucial premise of ensuring the usability and usefulness of datasets (Rasmussen, 2014). The coverage and representations of such information vary. Chao (2014) investigates eight metadata schemes for scientific data to analyse how they represent methods descriptions. Half of the analysed schemes contained explicit metadata elements for methods descriptions whereas the others featured elements that implicitly documented aspects of methods use, for example, on sampling, spatial coverage of data or general summary of the data. Disciplinary variation in the standards and how they conceptualised methods was observed similarly to differences in whether methods information was represented in a single or multiple metadata elements. Chao notices further that the variation reflects differences in how the capturing of methods information is supported and prioritised in individual standards, but also that the presence of a metadata element for methods description is hardly enough to elicit comprehensive descriptions of how research methods are applied (Chao, 2014). In some fields the templates of describing data practices, for example, in terms of study descriptions in social sciences are well-established (Rasmussen, 2014) whereas in areas like field archaeology, the variation can be significant (Huvila and Sköld, 2026). Some standardisation efforts have approached the adequacy of process information by defining per audience category basis what information should be included (Bajena and Kuroczyński, 2023).

A review of heritage visualisation standards by Börjesson et al. (2020) shows that while the primary purpose of many of the reviewed schemes was not process documentation, many of them incorporate structures to represent complex processual information to an extent that using dedicated standards might not be necessary. Such standards include the major cultural heritage standard CIDOC-CRM (Doerr et al., 2007), the conceptual model of visual representation CHARM (Gonzalez-Perez et al., 2012) and several others such as the 3DICONS standards that builds on the earlier CARARE standard for the documentation of three-dimensional heritage objects (D'Andrea, 2013). Dedicated process documentation standards exist as well.

The lack of applicable descriptors in existing standards is sometimes overcome by extending standards. CIDOC-CRM is built with extendibility in mind and has attracted developers to propose a considerable number of extensions, several of which relate to process documentation. Current extensions cover archaeological excavations (CRMarchaeo, Doerr et al., 2018), provenance metadata (CRMdig, Theodoridou et al., 2010; Doerr et al., 2016), metadata about scientific observation, measurements and processed data in descriptive and empirical sciences (CRMsci, Doerr et al., 2014) and about argumentation and inference making in descriptive and empirical sciences (CRMinf, Stead and Doerr, 2015). Proofs of concept exist how to use them together for documenting inference chains, for instance, in stratigraphic reasoning (Guillem et al., 2024) and field observations (Marlet et al., 2019). Comparable new descriptors have also been proposed to other standards. For example, the DataCite documentation scheme (discussed later in more detail) has been proposed to be complemented with elements to increase its usefulness for documenting practice-based research and its outputs (Ranger et al., 2024).

There are also dedicated standards for documenting various aspects of processes. FAIR Implementation Profile (FIP) is a linked data that describes decisions in a community regarding the implementation of FAIR principles (Wang et al., 2024). The Archimedes Palimpsest Metadata Standard is an example of a local process standard developed for documenting image processing (Bognar, 2023).

While standardisation has been identified as a key issue in improving interoperability and reusability of data, it has also been underlined that especially in fields characterised by diversity of data practices, it is equally important to pay attention to the practices of creating, managing and using data (Mosconi et al., 2022). This applies as much to documentation of processes as it is relevant to the documentation of data.

While the literature review provides a useful exposé of standards and standardisation, for the analysis, it is appropriate to consider the theoretical premises of standardisation in some more detail. In a very fundamental sense, standardisation is about cultivating manageability, similarity and comparability. It might be pursued, for example, through reducing complexity (Hanseth et al., 2006), improving interoperability, visibility and longevity, minimising bias (Reiter et al., 2024) and facilitating coordination (Jensen, 2010). Standards “are designed to be held in common” (Thévenot, 2009, p. 802). In contrast to optimistic expectations, making standards work as expected often takes a lot of effort (Jensen, 2010). They also need a lot of repair and maintenance being a part of what has been described as inherently broken rather than flawless world (Lovins and Hillmann, 2017). Their success depends on the contexts where they are developed but also on the motivations and intentions of those who develop and advocate them (Furner, 2021, p. 181). Standards are typically developed, conceptualised and portrayed as firm and comprehensive even if in practice they are at the best means of “partially ordering people and things so as to produce outcomes desired by someone” (Busch, 2011, p. 13). They need not to be interpreted in the same way. Their critical premise is rather “that they engage in actions that lead to the completion of the joint project” (Busch, 2011, p. 7). This means that standards and the similarities and comparabilities they enact are (instituted) institutions (Douglas, 1986; also, Busch, 2011) with historical trajectories (Thévenot, 2009) rather than transcriptions of natural orders of things. Like all classificatory systems, they formalise and affirm the significance of the elements included in them (cf. Leonelli, 2012).

Further, similarly to how formal procedures are not algorithms for producing work (Gasser, 1997), standards do not produce facsimiles of things but rather “are representations for consumption by outsiders” (Gasser, 1997, p. 122). Standardisation is in parallel to being a process of preservation, one of forgetting (Mulvin, 2021) and culprits of what McCullough (2015) discusses in terms of intrinsic information being increasingly at risk in the contemporary society. Standards enact conventions simultaneously, borrowing two terms from Thévenot (2009), in both quieting them by affixing what is established and agreed upon but also inquieting them through exposing their occasional arbitrariness and inauthenticity in relation to what they standardise (Thévenot, 2009). As Langer (2023) points out, standards both reflect and contest existing power structures. As such, they can be approached in ontological sense not merely as embodiments of practical ontologies (Jensen, 2010) but also political ones (Blaser, 2009) to explicate the human-material entanglements and politics of their making.

The multiplicity of standards as ideally whole but in practice partial and political, and as instruments of quieting (or making invisible) and inquieting (making visible) the social world points to a pair of corresponding conceptual affordances (Holbraad and Pedersen, 2017) of using them to affix particular understandings of processes to make data documentation practicable for specific purposes, and at the same time to critique them, as Jensen (2021) might phrase it, to keep the cosmopolitics of knowledge alive amidst of the parochiality of standards.

A set of nine major general domain-agnostic research data standards were analysed to generate new knowledge on process information in data standards, their framing and coverage of paradata. The set of standards is a convenience sample, although it intends to provide a purposive selection of prominent schemes developed for documenting datasets, research and data processes covering a reasonable variety of premises and ground. While the sample is not complete, the analysed standards represent major international initiatives, and thus – we argue – is sufficient to provide workable insight into how paradata is represented in contemporary data documentation standards.

The analysis was based on close reading (DuBois, 2003) of standards documents to identify elements relevant to representing diverse aspects of data related processes. A working spreadsheet with facets explicated in Table 1 was used to facilitate the analysis and its documentation. A preliminary analysis was conducted by NN** and complemented by the first author. A complete analysis draft reviewed and discussed by all authors, and revisited after nine months with an explicit aim of identifying faulty interpretations and missing elements. As a terminological note, the reviewed standards differ occasionally in their terminology regarding the definition and operationalisation of such technical terms as (controlled) vocabularies, attributes and elements. Considering the scope of the analysis we conducted, such differences are mostly not of direct consequence, however, whenever they are, we have discussed the details standards in appropriate detail.

Table 1

Paradata in domain-agnostic research data related standards

StandardProcesses coveredProcess scopeElements and representations (sample)GranularityFocusReference(s)
Data PackageCreation; RevisionsData managementLicenses; Version (number); Sources (e.g. people, literature, identifiable using URIs and email addresses); Contributors (with role: author, publisher, maintainer, wrangler and contributor; organisation[al affiliation]); created (datetime)Resource (Dataset)Object-AttributeOpen Knowledge Foundation (2023) 
DataCite Metadata SchemaCreation; RevisionsIdentification; RetrievalCreator (“main researchers involved in producing the data, or the authors of the publication, in priority order” with optional given name, family name, name identifier and affiliation sub-properties); Publisher (“name of the entity that holds, archives, publishes prints, distributes, releases, issues or produces the resource”; PublicationYear (date); Contributor (with optional given name, family name, name identifier and affiliation sub-properties) ContributorType (ContactPerson, DataCollector, DataCurator, DataManager, Distributor, Editor, HostingInstitution, Producer, ProjectLeader, ProjectManager, ProjectMember, RegistrationAgency, RegistrationAuthority, RelatedPerson, Researcher, ResearchGroup, RightsHolder, Sponsor, Supervisor, WorkPackageLeader, Other); Date (with type sub-property); RelatedItem (relation to another resource) Version (number); Rights (free text, URI, identifier); GeoLocation (with point, box, place and polygon sub-properties); FundingReference (with name, identifier and award related sub-properties)Resource (Dataset)Object-AttributeDataCite Metadata Working Group (2021) 
Dublin CoreAccrual; Creation; Replacement; Issuance; Modification; ReplacementSearch; RetrievalaccrualMethod (value from Collection Description Accrual Method Vocabulary); accrualPeriodicity (value from Collection Description Frequency Vocabulary; accrualPolicy (value from the Collection Description Accrual Policy Vocabulary); audience (non-literal values from a vocabulary of audience types); available (date); conformsTo (standard); contributor (property: agent); created (date); creator (property: agent); date (related to the lifecycle of the resource – date); dateAccepted (date); dateCopyrighted (date); dateSubmitted (date); educationLevel (of the intended audience – property: agent); extent (size/duration – property); hasVersion (property); isPartOf (property); isReferencedBy (property: resource); isReplacedBy (property: resource); isRequiredBy (property: resource); issued (date); isVersionOf (property: resource); license (property: document); mediator (access to resource – property: agent); modified (date); provenance (free text); publisher (property: agent); references (property: resource); replaces (property: resource); requires (property: resource); rights (free text); rightsHolder (URI); source (URI); temporal (period of time); valid (date range)Resource (Resource)Object-AttributeDCMI Usage Board (2020) 
OAI-ORE (Open Archives Initiative Object Reuse and Exchange)(Data) lineageIdentify and describe the constituents of aggregations of digital resourcesLineage (relationships between two objects); proxyFor (aggregated resource); In addition uses Dublin Core Elements and Terms, Friend of a Friend Terms, RDF Terms and RDF Schema TermsAggregations of resources (Compound objects consisting of multiple distributed objects)Object-ObjectOpen Archives Initiative Object Reuse and Exchange (2014) 
PREMISDigital provenance (history of an object); Digital object lifecycleDigital preservationComprehensive metadata model documenting objects, events, agents and rights, with an event entity that aggregates information related to one or more objects (description of events, dates, outcomes, agents involved), and metadata relating to objects documenting, e.g. preservationLevelRole (controlled vocabulary); preservationLevelValue (controlled vocabulary); preservationLevelRationale; preservationLevelRationale (free text); dates and agents for descriptions; versions; creatingApplication (name, version, date, extensions); inhibitors to access, use, migration; originalName; storage (where object is stored); environment used to render or execute an object; relationship (between objects, e.g. derivation; object sequences; to events; event sequences)Resource (Digital object)Entity-RelationshipPREMIS Editorial Committee (2015) 
CERIFScientific researchConstituents of (incl. activities relating to) scientific researchActivity Types (incl. Project, Conference, Fellowship, Networking, Infrastructure, Studentship) and subtypes; Activity Structure (relationships); Activity Statuses; Events (conference, workshop); PersonEventInvolvement; Research Infrastructure Usage; Research data sets and databases (output)Domain (Scientific research and its constituents)Entity-RelationshipCERIF and CRIS Architectures Task Group (2012) 
DDIResearch lifecycle in quantitative and qualitative social, behavioral, economic and health sciences researchResearch lifecycleMethodological objects (incl. sample selection, data capture, weighting, quality control, process management); Processing (incl. data capture, data processing, analysis, data management); Data management (incl. ownership, access, rights management, restrictions, quality standards, organization, agent management, relationship between products, versioning, provenance); Conceptual objects (incl. representation, universe); Quantitative and qualitative data objects (incl. universe, representation, usage, record relationships, storage, access)Resource (Dataset)Object-AttributeDDI Alliance (2020) 
PROVProvenanceExchange of provenance information in “heterogeneous environments such as the Web” (Groth and Moreau, 2013)Entities; Activities; AgentsResource (Provenance information relating to a resource)Entity-RelationshipGroth and Moreau (2013) 
NetCDFHistory; ProvenanceModificationsHistory; Provenance attributesResource (Array-oriented scientific data)Object-AttributeRew et al. (1989), Unidata (2019) 

The study analysed altogether nine by and large domain-agnostic standards for research and research data documentation. The focus was on identifying elements (i.e. descriptors, varyingly referred to as, e.g. metadata elements, attributes or descriptors) included for representing process-related information and understand how the standards conceptualise processes. We reviewed what data-related processes are covered in the investigated standards, the scope (and purpose) of representing processes, identified elements of standards where the process descriptions are placed and investigated how the processes are represented in the standards by explicating the granularity (resource, aggregations of resources, domain) and focus (objects, entities and relationships) of representation. By granularity we refer to the principal entity type in the focus of the standard that set its scope of detail, including that of processes represented. These can be, for example, individual datasets, digital resources their aggregations or as with CERIF, “scientific research” as a whole. Focus refers to how the standard is structured, that is whether it focuses on documenting objects and their attributes (Object–Attribute; e.g. datasets and their properties), objects and their relations to other objects (Object–Object) or entities and their relationships (Entity-Relationship, e.g. objects, events and agents and their relationships eventually forming chains, lineages or workflows). This helps to understand whether processes descriptions are documented, for example, as attributes to objects, or as relations between entities. A summary of the results is presented in Table 1.

Data Package. Data Package is a “container format used to describe and package a collection of data” (Open Knowledge Foundation, 2023) with accompanying metadata. The standard is a part of the Frictionless Data project at the Open Knowledge Foundation with an aim of developing a non-proprietary alternative to closed datafile formats. Data Package is a generic JSON-based wrapper format for data exchange with a focus on describing the structure of data rather than a standard for long-term preservation, data citation or “documentation of research processes or protocols” (Fowler et al., 2018, p. 275). The obligatory metadata in Data Package consists of a short preferably human-readable “name” of the package. In addition, the standard recommends additional descriptors, including id, licenses, profile, title, description, home page (URI), version, sources, contributors, keywords, image and created (date).

Aligned with the focus of the standard on collections of data, information in Data Package with process-relevant information (Table 1) describe a dataset, that is the package (incl. version, datetime of creation, licenses) and its constituents (incl. sources, contributors) through object-attribute relations, that is documenting process related information in relation to a data package.

DataCite Metadata Schema: The aim of the DataCite Schema is to “make data citable, searchable and accessible” by furnishing datasets with uniform consistent metadata. The schema is developed and maintained since 2009 by the international DataCite Consortium (Brase et al., 2015). The focus is on identification of resources for citation and retrieval. DataCite Schema is intentionally generic and aims not to replace community or discipline specific standards. Mandatory properties include Identifier, Creator, Title, Publisher, PublicationYear and ResourceType. The recent versions of DataCite Schema incorporate a scheme for defining relationships between resources using the relatedItem property and its relationType sub-property to specify type of the relation (e.g. Cites, IsPartOf, IsDocumentedBy or IsOriginalFormOf) (DataCite Metadata Working Group, 2021).

From the perspective of process documentation, DataCite is a detailed (cf. e.g. Data Package) standard with a large number of attributes for documenting aspects of datasets (e.g. funding, dates, version, rights), related items and actors (incl. creators, different types of contributors, publishers) related to them through object-attribute relations.

Dublin Core: The Dublin Core Metadata and the Dublin Core Metadata Initiative have their origins in the discussions on semantics for web resources started at the second World Wide Web Conference in 1994 and continued the following year in a workshop held in Dublin, Ohio. The concern at the time before Internet search engines and large-scale harvesting and indexing of web data was that many web resources lacked harvestable descriptions to make them discoverable, manageable and useable (Weibel, 2009). The core standards in the Dublin Core stack are the Dublin Core Metadata Element Set (ISO 15836–1:2017 and ISO 15836 Part 2) from 2017/2022 and 2019 (ISO, 2017, 2019) and the DCMI Metadata Terms recommendation from 2020 (DCMI Usage Board, 2020).

The focus of Dublin Core is in developing and maintaining metadata standards for resource description. The process elements in the Dublin Core metadata focuses on resource accrual, creation, issuance, modification and replacement of resources with other resources through documenting, for example, policies, audiences, related standards, agents (incl. creators and contributors), dates, rights and provenance information through free text description (DCMI Usage Board, 2020) using object-attribute relations.

OAI-ORE (Open Archives Initiative Object Reuse and Exchange): The focus of Open Archives Initiative Object Reuse and Exchange standard is aggregations of web resources. Its aim is to make them identifiable and to help to describe the constituents or boundaries of individual aggregations (Open Archives Initiative Object Reuse and Exchange, 2014). OAI-ORE is linked to a broader initiative to redesign the scholarly record, to capture and preserve it and make it accessible (Calhoun, 2014).

The standard incorporates elements to describe the process through the lineage of aggregations of objects in terms of relationships between individual resources through object-object relations, that is relationships between documented objects. For additional descriptions not covered by the standard itself, OAI-ORE refers to Dublin Core Elements and Terms and other data documentation standards (Open Archives Initiative Object Reuse and Exchange, 2014).

PREMIS: The purpose of the PREMIS (Preservation Metadata: Implementation Strategies) standard is to implement “preservation metadata in digital preservation systems” (PREMIS Editorial Committee, 2015, p. 1). The work on the standard started in 2003 based on the work of an earlier, similarly to PREMIS, an Online Computer Library Center (OCLC) and Research Libraries Group (RLG) sponsored Preservation Metadata Framework (PMF) working group. The standard is built on the Open Archival Information Systems (OAIS) model. Its principal target group are preservation repositories and its focus is on their information needs relating to long-term preservation of digital materials. PREMIS also aims to support the “viability, renderability, understandability, authenticity, and identity of digital objects” (PREMIS Editorial Committee, 2015, p. 1) in the context where they are preserved with an emphasis on implementability in practice, even in automated workflows. The PREMIS controlled vocabulary is extensible to allow inclusion of unsupported metadata elements but does not on purpose encompass domain-specific descriptive metadata, characteristics of agents, technical metadata, media details or business rules (PREMIS Editorial Committee, 2015).

The metadata model of PREMIS standard incorporates a comprehensive apparatus of entities including objects, events, agents and rights that allow detailed documentation of processes. The event entity aggregates information on events relating to one or more objects, including descriptions, dates, outcomes and actors (agents). In addition, the object metadata incorporates elements to describe object and event sequences relationships between objects such as that an object supersedes or is superseded by, derives or is derived from or is a source of another object. Moreover, PREMIS allows documentation of the ideal, actual and planned preservation level of objects with its associated rationale, applications used to create objects, inhibitors (i.e. barriers) to access, use and migration of objects and where objects are stored (PREMIS Editorial Committee, 2015).

CERIF: The Common European Research Information Format (CERIF) is a “comprehensive information model for the domain of scientific research” (EuroCRIS, 2020) developed by CERIF and CRIS Architectures Task Group of euroCRIS association, originally dating back to late 1980s (Asserson et al., 2002; EuroCRIS, 2020). The standard aims to facilitate interchange of research information between and within current research information systems (CRIS), databases that store information on research activities for the purposes of research assessment, administration and institutional decision-making at research performing organisations. The focus of CERIF is on research processes on research administrative level rather than on the substance of research.

Applicable for process documentation, CERIF incorporates entities for Events (Conference, Workshop), individuals' involvement in events and Activity Types (e.g. Project, Fellowship), the status of activities and relationships between them for documenting scientific research processes. CERIF allows also documentation of, for example, research infrastructure use and production of research outputs that include specific category for “Research data sets and databases” (EuroCRIS, 2020). The model is entity-relationship based and thus capable of complex representations of processes in detail, however, the focus of the standard is on information relevant to research administration (e.g. rights, agents, funding, preservation) rather than meticulous documentation of analysis or data collection tasks and their details for reproducibility or generation of new research.

Data Documentation Initiative (DDI): The Data Documentation Initiative (DDI) is a set of standards describing metadata for documenting and managing different stages of research data lifecycle conceptualised as model with a set of distinct set of steps including conceptualisation, collection, processing, distribution, discovery, analysis and archiving of data (Vardigan, 2014). The initiative was started in 1995 with a focus on social science survey data but has expanded ever since to cover qualitative and quantitative research data in a broad sense (Green and Humphrey, 2014). There has also been a proposal to incorporate survey marginalia, that is survey paradata, in DDI (Barkow et al., 2020).

Relevant to processes, the standard covers metadata for the documentation of research method (incorporating information sample selection, data capture, quality control and process management), data processing (incl. analysis and data management) and data management (incl. data access, organisation, relationships between data products and provenance). In addition, conceptual objects and data objects include contextual information on, for example, data representation and relationships between data records. DDI incorporates an option to include full Dublin Core citations. There are also preliminary mappings between PREMIS and DDI-L and a planned METS profile for DDI (DDI Alliance, 2020). The focus of the standard is research data and it is based on documenting objects by attribute values. This applies also to methods and (data) processing (objects) that represented as objects with attributes.

PROV: The Provenance standard PROV is a group of specifications published in 2013 by the Provenance Working Group. It is a part of Semantic Web technologies developed by the world wide web Consortium with an aim of defining a data model and serialisations for representing different aspects of provenance (Missier, 2017). For PROV, provenance refers to “information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability, or trustworthiness” (Groth and Moreau, 2013).

The focus of PROV is exchange of provenance information in “heterogeneous systems” with an original emphasis on the Semantic Web, e-science and cyberinfrastructure and systems capable of producing and administering their own provenance data (Missier, 2017). It provides a “vocabulary” (in practice, an ontology) and a model to document a history of an object (Bettivia et al., 2022). ProvONE (Cuevas-Vicenttín et al., 2016) is an extension to PROV model with a purpose of enabling the documentation of hybrid provenance, that is both historical retrospective and prospective provenance, that is “a plan, a workflow or a recipe” for future enactment of activities (Bettivia et al., 2022). PROV is based on three key concepts of entities being produced, used and manipulated by activities that can be influenced in various ways by different agents. Entities often represent data but can be anything that can have a provenance. Unlike entities that refer to fixed states, activities (e.g. drafting, editing, processing, consuming) have a duration. Agents that may be both humans and, for example, technical systems “bear some form of responsibility” that an activity takes place, an entity exists or another agent acts in a particular manner (Missier, 2017).

Network Common Data Form (NetCDF): Rather than a mere data model and controlled vocabulary, NetCDF is a set of machine-independent data formats and software libraries “support the creation, access, and sharing of array-oriented scientific data” and a “community standard for sharing scientific data” (Rew et al., 1989). NetCDF focuses on making data “self-describing” (i.e. dataset includes information defining the data it contains) and “portable” (i.e. data is represented in a form accessible using computers storing variable data in different technical formats) (Unidata, 2019).

NetCDF uses attributes to keep (ancillary) data about data. From the process documentation perspective, the key attributes are history for storing an audit trail composed of a line for run of a program that has modified the dataset, preferably with date, time of day, user name, program name and command arguments and provenance attributes, which can be used to store provenance information using versioned key-value pairs (Unidata, 2019). Technically the approach is object-attribute based, however, the opportunity to provide code as paradata allows capturing a computational process in its entirety. In contrast, the non-computational aspects of paradata can be documented using provenance attributes only.

The findings show that while documenting and keeping paradata is not in the focus of any of the analysed standards – with PROV as a partial exception with its focus on provenance – they incorporate an extensive range of elements with contents qualifying as paradata (Table 1). This applies also to standards like Data Package, which is explicitly not about “documentation of research processes or protocols” (Fowler et al., 2018, p. 275) that is paradata. These findings correspond to those of earlier studies, where process information was often observed to be present in standards documentation although with varying levels of detail and framing (e.g. Chao, 2014; Börjesson et al., 2020).

The reviewed standards vary in what processes are represented (RQ1). The standards can be roughly categorised to repository standards where the process focus is on the origins of datasets and their curatorial history (Dublin Core, PREMIS, DDI), and exchange and access standards (Data Package, DataCite, OAI-ORE, CERIF, PROV, NetCDF) with an emphasis on documenting processes to enable the (re)use of datasets. All analysed standards are selective in their representation of the totality of data-work. With the exception of CERIF that focuses on documenting individual research activities with elements to document their relations to each other, the standards conceptualise processes from the outset of datasets or objects rather than processes, practices or actors. Even when a standard technically incorporates research as a whole (e.g. lineage in OAI-ORE, research lifecycle in DDI or scientific research in CERIF), their focus on particular artefacts or activities through set descriptors or examples in practice fixes their scope and perspective. From this perspective, paradata and diverse data-related processes may end up being documented de facto, the analysed standards are highly selective in what should be documented de jure (cf. Weber, 1989).

Regarding the elements used for process information (RQ2), a part of the standards included in the review provide comprehensive data models for representing processes and activities (e.g. PROV, PREMIS, DDI) while others incorporate process information in data(set) related attribute data (e.g. DataPackage, DataCite, Dublin Core). The first approach allows documentation of processes on a step-by-step basis from the perspective of events or activities whereas the latter focuses on documenting states of data(sets). All analysed standards reduce data documenters' leeway on purpose, encouraging cleanness and to hide the messiness of processes – if their users let the standards to quiet (cf. Thévenot, 2009) themselves, trading more naturalistic data descriptions and categorisations for other data potentialities, e.g. increased opportunities for aggregation and integration.

Processes are represented in the standards using different forms of data and levels of granularity (RQ3). The standards with elements to model processes and activities (e.g. PROV, PREMIS, DDI) come with the most detailed sets of elements to represent them in detail. Simultaneously they enforce specific framings of what are relevant processes and their representations. Other standards encode aspects of process states using structured descriptors based on controlled vocabularies or strong typing including dates, relationships between data objects and for instance, agents involved in processes (e.g. Dublin Core, DataPackage, DataCite, OAI-ORE). Finally, some of the standards incorporate semi-structured (e.g. history in NetCDF) and unstructured paradata elements (e.g. preservationLevelRationale in PREMIS, provenance in Dublin Core). For the representation of process information, the standards mix different approaches, evidently based on the focus of the standards and whether a particular detail was considered possible or relevant to itemise in detail. For example, PROV focuses exclusively on modelling provenance whereas it is of secondary significance for Dublin Core, which caters for it using an open-ended text field. While it is outside of the scope of this study, the impact of structured versus open-ended text fields for the outcome of paradata would be highly relevant to investigate further.

The findings have several theoretical and practical implications to understanding the analysed standards and standards in general, including their opportunities and limitations pertaining to data-related processes and paradata. The reviewed standards have many similarities and many of them are used in tandem with each other to facilitate different aspects of data exchange, management and reuse. Yet it is also apparent that their perspectives to processes and relevant paradata differ.

The major line of division goes between approaches that focused on describing processes through lineages of states of datasets in particular moments of time (e.g. Dublin Core, OAI-ORE, DataPackage) and approaches focusing on activities, their constituents and relations (e.g. CERIF, DDI, PROV, PREMIS). The two approaches differ theoretically in how processes and their stages can be represented as properties of objects, or as chains of relations between entities (e.g. objects, events, agents). The entity-relationship based approaches allow richer and more complex and flexible representation of processes whereas the property-based object-attribute approaches are much simpler to work and still adequate if the standard incorporates all descriptors deemed necessary for a specified purpose. Despite their greater a priori flexibility, also the reviewed entity-relationship oriented standards incorporate foundational assumptions of the significant relations and steps in processes. For example, DDI is based on specific research data lifecycle model, CERIF has a list of activity and event types (see Table 1) and PREMIS emphasises preservation relevant sequences of objects and events.

The reviewed standards also differ in how they operate. Similar to what Bettivia and Stainforth (2018) observe of CIDOC-CRM, the ways all the analysed standards are formulated suggest them being first and foremost accounting frameworks. They facilitate recording key aspects of a lifecycle of data understood from their specific frames of reference. The frames vary according to the standard and its focus on enabling, for example, data exchange (e.g. PROV, NetCDF), data description (e.g. Data Package, DDI), digital preservation (PREMIS), data citation (DataCite), research administration (CERIF) or resource description for retrieval (e.g. Dublin Core). For Data Package, the aim is explicitly not to describe research processes even if it allows documenting parts of them. Of the reviewed standards, OAI-ORE, DDI and NetCDF are most explicit about their goals of enabling data reuse. The purpose of none of the reviewed standards is to confirm the reliability of the data or describe datasets, resources or processes for general transparency or to make them universally intelligible for unspecified audiences or purposes. Instead, each of them is premised by highly specific ideas of what is data, how it is generated and processed and reused and consequently what aspects of processes are relevant to document and how. NetCDF follows the logic of computational research, DDI that of quantitative social science and OAI-ORE digital web resources. These assumptions have consequences to what ends up being documented.

The different standards provide a broad variety of perspectives to processes and their key constituents. Much similarly to what has been observed of provenance solutions (Longpre et al., 2024), they tend to address specific aspects of the broader issue that have been deemed relevant from the outset and envisioned purposes of the standard. At the same time, by their design they are all focused on creation and management of documentation and what is documented rather than how the documentation would best serve the interests of its users. This observation aligns with the earlier findings that point to the discrepancies in what paradata data creators and users find useful (Huvila et al., 2024). In an ideal world, like with the expert archaeologists studied by Sandoval (2021), users can bend and combine standards, adding information beyond their immediate stipulations and affordances. There is, however, a risk that this is not happening if standards use becomes too routinised, or they are used without in-depth understanding of what is being documented, why and for what purpose.

A parallel challenge is that while standards offer diverse means to represent paradata, their varying focus on particular aspects of processes and their key constituents means that their structure and guidelines are likely to influence heavily what is being documented beyond what documenters find most relevant. Rather than records of processes they become prescriptions of what is a process and how to record it (cf. Grossman, 2025) and as Davet et al. (2023) note of Digital Preservation Information in OAIS standard, even if they contain elements that can be useful to represent various aspects of processes, they do not necessarily contain all or even the majority of that information. Most standards direct attention to repository issues and curatorial history, and generic research lifecycle-oriented conceptualisations of research processes that are at the best rough approximations of how research is practiced. Moreover, the paradigmatic assumptions (cf. “workflow paradigm” in Thomer et al., 2018) underpinning the standards have implications to how the represented processes become conceptualised in the documentation in terms of computable workflows (e.g. PROV), attributes to objects (e.g. Dublin Core, DataCite) or entity-relationship bundles (e.g. PREMIS, CERIF). In contrast, using the reviewed standards to document research practices as per practice-oriented theorising (e.g. Nicolini, 2013; Gherardi, 2021) can be close to impossible.

Understanding how the studied standards facilitate describing and forgetting (as for Mulvin, 2021) aspects of processes also requires reflection on how the standards operate in the immediate vicinity of their “data world” (Hayslett, 2015, p. 1). While standards regulate and streamline paradata descriptions and categories in the way shown in Table 1, the informational space that they exist in, is more extensive than what is covered by the standard per se. As shown by Börjesson et al. (2022), there are relatively ample and flexible opportunities to document a wide array of paradata in spaces connected to data description standards but that are not regulated by them, for example, in datasets and in various forms of documentation associated with the dataset (Huvila et al., 2021). This points to the importance of spatial considerations when documenting paradata; is not only important what paradata to document and for what (reuse) purposes, but also where it is suitable to document. The range of the homogenising impact of standards is not infinite, and the relationship between the standards and related but distinct informational spaces is heterogeneous.

It is nevertheless the case considering the influence of standards on human behaviour, “they should not be taken lightly” also remembering that ignoring them or some of their elements is likewise having an impact on what is documented and left out (Busch, 2011, p. 301) formalising, affirming (cf. Leonelli, 2012) and institutionalising what is included in them and silencing others following their historical trajectories (Thévenot, 2009; Douglas, 1986). Despite evident short-comings and incompatibilities identified in the analysis, as Wagemann and Schneider (2015) suggest, standards can still be useful as points of reference. This applies to standards in general and to how they cover and neglect paradata. In the latter sense, as Broudoux (2024) suggests, comparisons between documentation standards and attempts to use them can be useful in revealing tensions in ideas of what is known and relevant to know.

The finding that the reviewed standards provide extensive technical means to represent paradata even if their granularity and approach to process information varies considerably confirms the earlier observations that the greatest problem relating to paradata is not how to represent it but identifying and understanding what needs to be to documented (Huggett, 2012). While there is undoubtedly room for extending and developing existing data documentation standards and their guidelines to facilitate documentation of paradata, it is doubtful whether focusing on developing and introducing new paradata standards is enough to address process-related documentation needs (cf. Katuu, 2023; Rathje et al., 2013). New standards will incontrovertibly replicate the inherent limitations of the reviewed ones.

Rather than pursuing a perfect standard, a more practicable approach could be to choose a standard based on carefully reflected documentation needs and to be open and conscious about its limitations and epistemic consequences. A viable option is also to combine different standards of which several are already linked or working towards greater compatibility, although there might be factors at play (epistemic, knowledge-organisational, institutional or even information system-related) that might limit or extend data authors' possibilities to choose. The caveat with using multiple standards and expectations of exhaustiveness is the increasing workload and complexity of documentation, which the literature shows can give raise to unwanted exclusion effects and less effective overlaps between knowledge domains and standards used (Rathje et al., 2013). The critique of the assumption of compositionality of knowledge (Hjørland, 2026) points further to that adequate paradata is not necessarily the sum of a set of descriptors. The different epistemic underpinnings of the standards mean that individual pieces of information on processes might not be compatible with each other, or the epistemic ideas that are planned to be represented, to form a comprehensive whole. Therefore, even if crosswalks and mappings between standards are important and in demand among users in how they facilitate searching and retrieval of data (Tenopir et al., 2011), they come with a risk of reducing the variety of meaning (Bates, 1986; Button et al., 2022) and contributing to what Gero (2007) terms problematic certitude on matters that are ambiguous. As Bates (1986) emphasises, the “invariable variety of meaning” manifested in the inconsistency of documentation – often falsely attributed to carelessness and lack of competence (cf. e.g. Zoldoske, 2024; Wilkinson et al., 2016) – means that it is better to help searchers orient to the complexity of information systems than to give a false impression of what Huvila (2016) has conceptualised as regimes of easiness and solvability. Using the standards to document processes to support data reuse rather than data management or to incorporate other epistemic framings of how data generation is practiced than those guiding their design might be possible albeit difficult and onerous. Considering the diversity of the reviewed standards and their perspectives to paradata, rather than directly starting with one of them, a better approach to context-sensitive documentation of processes is to start with deciding what needs to be documented and how and first later consider how and what available standards accommodate the needs selecting appropriate schemes and developing documentation guidelines. Following the principle of subsidiarity (Busch, 2011), local documentation needs should be prioritised and generalisations pursued unless, for example, improving findability or technical interoperability of data requires some degree of global uniformity. Even then, however, it is important to avoid unnecessarily discarding nuances through limiting documentation to one narrow set of descriptors only but instead see them as a part of a broader totality of documentation that includes mutually complementing formal and informal information (cf. Bowker and Star, 2000).

Also, one must ask who chooses one standard over another, or whose decision it is to combine elements from two or more standards for a specific documentary purpose. The individual data author is often faced with elements from standards as they are represented in a metadata scheme to fill out to complete a work task like data preservation or data publishing. It is the information architect or system designer preparing the metadata scheme upstream making at least the macro-level choices of what process documentation elements to include and whether to make these mandatory or optional. Therefore, we must be cautious when imagining the individual data authors' intentions, options and actual freedom of choice in situations of data documentation.

The analysis shows that data documentation standards contain a plethora of elements that enable documenting processes but that they also in different ways delimit how and what aspects of processes turn out to be documented when they are followed. The reviewed standards vary in what specific processes they direct to represent. There is a strong tendency to focus on curatorial history and a part on selected aspects of processes for the purposes of enabling specific types of prioritised forms of (re)use of data. Also, the representations vary and range from complex data models to limited attribute data with parallel variation in the forms and granularity of representations and direct consequences to the capacity of standards to accommodate different ontological and epistemic perspectives to processes and their features. This means that the studied standards facilitate and regulate process description in a variety of ways that might not be obvious especially for non-specialists, for example, researchers or other data producers documenting their data-related processes. Naturally even routinised specialists might risk assuming that a standard guarantees completeness beyond what is was designed for and is capable to leverage.

We posit that reviewing standards is important to understand their agency in relation to data authors and potentialities of the resulting documentation to contribute, for instance, to data findability or aggregability. Every standard has a distinct set of (data) affordances that work alongside with external impetuses of choosing one standard over the other. This means that it is important to critically reflect on the influence of the agency of standards on the agency of data authors, and to render tangible the range of possibilities and strategies available for putting standards to work (cf. Huvila, 2018) when creating and curating datasets. Especially when working with supposedly domain-agnostic data documentation standards, the completeness of documentation in a particular context or for a specific task can be better secured by taking the process rather than the standard as a starting point.

The diversity, limitations and affordances of individual standards mean that standard-driven data work is not a monolithic practice. A watershed in the process of putting standards to work is the consideration of the trade-offs of different standards in what and how they push us to document. Cognisance of that in particular terms “cleaned” and standardised process descriptions might convey information in a different way is key, as is considering what the benefits of standardising them might be. Grasping the trade-offs of standardising data descriptions – paradata – is however difficult. The literature indicates that the paradata is commonly used and sought as a means at lessening epistemic distance between data creators and data re-users (Axelsen and Fredriksen, 2024), but the nature of this epistemic distance is contextual as are the means to mitigate it, and the extent to which specific standards are conducive to these goals.

We are grateful for our colleagues Dr. Ying-Hsang Liu, Dr. Michael Olsson, Dr. Jessica Kaiser and Ms. Zanna Friberg for useful discussions relating to data documentation standards and this manuscript.

Andersson
,
L.
,
Huvila
,
I.
and
Sköld
,
O.
(
2024
), “An introduction to paradata”, in
Huvila
,
I.
,
Andersson
,
L.
and
Sköld
,
O.
(Eds),
Perspectives on Paradata
,
Springer
,
Cham
, Vol. 
13
, pp. 
1
-
14
, doi: .
Asserson
,
A.
,
Jeffery
,
K.G.
and
Lopatenko
,
A.
(
2002
), “CERIF: past, present and future: an overview”, in
CRIS2002
,
euroCRIS
,
Brussels
.
Axelsen
,
I.
and
Fredriksen
,
C.
(
2024
), “
Organically grown archaeological databases and their ‘messiness’: hobby metal detecting in Norway
”,
European Journal of Archaeology
, Vol. 
27
No. 
3
, pp. 
1
-
21
, doi: .
Bajena
,
I.
and
Kuroczyński
,
P.
(
2023
), “Metadata for 3D digital heritage models. In the search of a common ground”, in
Münster
,
S.
,
Pattee
,
A.
,
Kröber
,
C.
and
Niebling
,
F.
(Eds),
Research and Education in Urban History in the Age of Digital Libraries
,
Springer Nature Switzerland
,
Cham
, Vol. 
1853
, pp. 
45
-
64
, doi: .
Bak
,
G.
(
2015
), “Digital repository”, in
Duranti
,
L.
and
Franks
,
P.C.
(Eds),
Encyclopedia of Archival Science
,
Rowman & Littlefield Publishers
,
Lanham, MD
, pp. 
170
-
173
.
Barkow
,
I.
,
Michaud
,
G.
,
Perry
,
A.
,
Schiller
,
D.
and
Thomas
,
W.
(
2020
), “
Paradata – from by-product to standard documentation
”,
Presentation at 12th Annual European DDI User Conference
, doi: .
Bates
,
M.J.
(
1986
), “
Subject access in online catalogs: a design model
”,
JASIS
, Vol. 
37
No. 
6
, pp. 
357
-
376
, doi: .
Bates
,
M.
,
Siegfried
,
S.
and
Wilde
,
D.N.
(
1993
), “
An analysis of search terminology used by humanities scholars: the getty online searching project report number 1
”,
The Library Quarterly
, Vol. 
63
No. 
1
, pp. 
1
-
39
.
Bearman
,
D.
(
1993
), “
Strategy for development and implementation of archival description standards
”,
Toward International Descriptive Standards for Archives; Papers Presented at the ICA Invitational Meeting of Experts on Descriptive Standards, National Archives of Canada
,
4-7 October 1988
,
K. G. Saur
,
München
, pp. 
161
-
171
.
Behrens
,
R.
(
2022
), “
Standards in a new bibliographic world
”,
JLIS.it
, Vol. 
13
No. 
1
, pp. 
19
-
24
.
Bettivia
,
R.
and
Stainforth
,
E.
(
2018
), “Performative metadata: reliability frameworks and accounting frameworks in content aggregation data models”, in
Chowdhury
,
G.
,
McLeod
,
J.
,
Gillet
,
V.
and
Willett
,
P.
(Eds),
Transforming Digital Worlds
,
Springer International Publishing
,
Cham
, pp. 
592
-
597
.
Bettivia
,
R.
,
Cheng
,
Y.-Y.
and
Gryk
,
M.R.
(
2022
),
Documenting the Future: Navigating Provenance Metadata Standards. Synthesis Lectures on Information Concepts, Retrieval, and Services
,
Springer
,
Cham
.
Binding
,
C.
and
Tudhope
,
D.
(
2024
), “
KOS-Based enrichment of archaeological fieldwork reports
”,
Knowledge Organization
, Vol. 
51
No. 
5
, pp. 
292
-
299
, doi: .
Blaser
,
M.
(
2009
), “
Political ontology: cultural studies without ’cultures'?
”,
Cultural Studies
, Vol. 
23
Nos
5-6
, pp. 
873
-
896
, doi: .
Bøe Sollund
,
M.-L.
and
Holm-Olsen
,
I.M.
(
2013
), “
Monitoring cultural heritage in a long-term project: the Norwegian sequential monitoring programme
”,
Conservation and Management of Archaeological Sites
, Vol. 
15
No. 
2
, pp. 
137
-
151
, doi: .
Bognar
,
M.A.
(
2023
), “
Gaps in the standard: a case study of the multi-spectral imaging metadata strategies at KU leuven libraries
”,
Journal of Library Metadata
, Vol. 
0
No. 
0
, pp. 
1
-
20
, doi: .
Borghi
,
J.A.
and
Van Gulick
,
A.E.
(
2021
), “
Data management and sharing: practices and perceptions of psychology researchers
”,
PLoS One
, Vol. 
16
No. 
5
, e0252047, doi: .
Börjesson
,
L.
,
Sköld
,
O.
and
Huvila
,
I.
(
2020
), “
The politics of paradata in documentation standards and recommendations for digital archaeological visualisations
”,
Digital Culture and Society
, Vol. 
6
No. 
2
, pp. 
191
-
220
, doi: .
Börjesson
,
L.
,
Sköld
,
O.
,
Friberg
,
Z.
,
Löwenborg
,
D.
,
Pálsson
,
G.
and
Huvila
,
I.
(
2022
), “
Re-purposing excavation database content as paradata: an explorative analysis of paradata identification challenges and opportunities
”,
KULA: Knowledge Creation, Dissemination, and Preservation Studies
, Vol. 
6
No. 
3
, pp. 
1
-
18
, doi: .
Bowker
,
G.C.
and
Star
,
S.L.
(
2000
),
Sorting Things Out: Classification and its Consequences
,
MIT Press
,
Cambridge, MA
.
Brase
,
J.
,
Lautenschlager
,
M.
and
Sens
,
I.
(
2015
), “
The tenth anniversary of assigning DOI names to scientific data and a five year history of DataCite
”,
D-lib Magazine
, Vol. 
21
Nos
1/2
, doi: .
Broudoux
,
E.
(
2024
), “
Systems of classification and categorization as revealing tensions within knowledge
”,
Knowledge Organization
, Vol. 
51
No. 
4
, pp. 
277
-
285
, doi: .
Busch
,
L.
(
2011
),
Standards: Recipes for Reality
,
MIT Press
,
Cambridge, MA
.
Button
,
G.
,
Lynch
,
M.
and
Sharrock
,
W.
(
2022
),
Ethnomethodology, Conversation Analysis and Constructive Analysis: On Formal Structures of Practical Action
,
Routledge
,
London
.
Calhoun
,
K.
(
2014
),
Exploring Digital Libraries : Foundations, Practice, Prospects
,
Facet Publishing
,
London
.
Cameron
,
S.
,
Franks
,
P.
and
Hamidzadeh
,
B.
(
2023
), “
Positioning paradata: a conceptual frame for AI processual documentation in archives and recordkeeping contexts
”,
Journal on Computing and Cultural Heritage
, Vol. 
16
No. 
4
, pp. 
1
-
19
, doi: .
CERIF and CRIS Architectures Task Group
(
2012
),
Common European Research Information Format (CERIF)
,
EuroCRIS
,
Nijmegen
.
Chao
,
T.C.
(
2014
), “
Enhancing metadata for research methods in data curation
”,
Proceedings of the Association for Information Science and Technology
, Vol. 
51
No. 
1
, pp. 
1
-
4
, doi: .
Chao
,
T.C.
,
Cragin
,
M.H.
and
Palmer
,
C.L.
(
2015
), “
Data Practices and Curation Vocabulary (DPCVocab): an empirically derived framework of scientific data practices and curatorial processes
”,
Journal of the Association for Information Science and Technology
, Vol. 
66
No. 
3
, pp. 
616
-
633
, doi: .
Chapman
,
H.
(
2001
), “
Understanding and using archaeological topographic surveys - the ‘error conspiracy’
”,
Stančič, Z. and Veljanovski, T. (Eds)
,
Computing Archaeology for Understanding the Past, CAA 2000, Computer Applications and Quantitative Methods in Archaeology, Proceedings of the 28th Conference of BAR International Series
,
Ljubljana
,
April 2000
, Vol. 
931
,
Archaeopress
,
Oxford
, pp. 
19
-
23
.
Cheng
,
Y.-Y.
,
Choi
,
I.
,
Bettivia
,
R.
,
Lee
,
W.-C.
and
Watson
,
B.M.
(
2024
), “
Knowledge organization systems and provenance: experiences and challenges
”,
Proceedings of the Association for Information Science and Technology
, Vol. 
61
No. 
1
, pp. 
741
-
744
, doi: .
Cuevas-Vicenttín
,
V.
,
Ludäscher
,
B.
,
Missier
,
P.
,
Belhajjame
,
K.
,
Chirigati
,
F.
,
Wei
,
Y.
,
Dey
,
S.
,
Kianmajd
,
P.
,
Koop
,
D.
,
Bowers
,
S.
,
Altintas
,
I.
,
Jones
,
C.
,
Jones
,
M.B.
,
Walker
,
L.
,
Slaughter
,
P.
,
Leinfelder
,
B.
and
Cao
,
Y.
(
2016
),
ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance
,
DataONE
,
Santa Barbara, CA
.
DANSDCCDirectorate-General for Research and Innovation (European Commission)EFISVisionary Analytics
(
2022
),
European Research Data Landscape: Final Report
,
Publications Office of the European Union
,
Luxembourg
.
DataCite Metadata Working Group
(
2021
),
DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs v4.4
,
DataCite
.
Davet
,
J.
,
Hamidzadeh
,
B.
and
Franks
,
P.
(
2023
), “
Archivist in the machine: paradata for AI-based automation in the archives
”,
Archival Science
, Vol. 
23
No. 
2
, pp. 
275
-
295
, doi: .
DCMI Usage Board
(
2020
),
DCMI Metadata Terms
,
DCMI
,
Leesburg, VA
.
DDI Alliance
(
2020
),
DDI Lifecycle (3.3) Technical Guide
,
DDI Alliance
,
Ann Arbor, MI
.
Doerr
,
M.
(
2009
), “Ontologies for cultural heritage”, in
Staab
,
S.
and
Studer
,
R.
(Eds),
Handbook on Ontologies
,
Springer
,
Berlin
, pp. 
463
-
486
.
Doerr
,
M.
,
Ore
,
C.-E.
and
Stead
,
S.
(
2007
), “
The CIDOC conceptual reference model - a new standard for knowledge sharing
”,
Grundy, J., Hartmann, S., Laender, A.H., Maciaszek, L. and Roddick, J.F. (Eds)
,
26th International Conference on Conceptual Modeling (ER 2007)
,
Auckland, New Zealand
,
Australian Computer Society
,
Sydney
, pp. 
51
-
56
.
Doerr
,
M.
,
Kritsotaki
,
A.
,
Rousakis
,
Y.
,
Hiebel
,
G.
and
Theodoridou
,
M.
(
2014
),
CRMsci: The Scientific Observation Model an Extension of CIDOC-CRM to Support Scientific Observation
,
FORTH
,
Heraklion
.
Doerr
,
M.
,
Stead
,
S.
and
Theodoridou
,
M.
(
2016
),
Definition of the CRMdig: An Extension of CIDOC-CRM to Support Provenance Metadata
,
version 3.2.1 ed
,
FORTH
,
Heraklion
.
Doerr
,
M.
,
Felicetti
,
A.
,
Hermon
,
S.
,
Hiebel
,
G.
,
Kritsotaki
,
A.
,
Masur
,
A.
,
May
,
K.
,
Ronzino
,
P.
,
Schmidle
,
W.
,
Theodoridou
,
M.
,
Tsiafaki
,
D.
and
Christaki
,
E.
(
2018
),
Definition of the CRMarchaeo. An Extension of CIDOC CRM to Support the Archaeological Excavation Process
, (1.4.7 ed) ,
PIN, University of Florence
,
Prato
.
Donnelly
,
V.
(
2016
), “
A study in grey: grey literature and archaeological investigation in England 1990 to 2010
”,
Ph.D. thesis, University of Oxford, Oxford
.
Douglas
,
M.
(
1986
),
How Institutions Think
,
Syracuse University Press
,
Syracuse
.
DuBois
,
A.
(
2003
), “Close reading: an introduction”, in
Lentricchia
,
F.
and
DuBois
,
A.
(Eds),
Close Reading: A Reader
,
Duke University Press
,
Durham, NC
, pp. 
1
-
40
.
Dye
,
T.S.
and
Buck
,
C.E.
(
2015
), “
Archaeological sequence diagrams and Bayesian chronological models
”,
Journal of Archaeological Science
, Vol. 
63
, pp. 
84
-
93
, doi: .
D'Andrea
,
A.
(
2013
), “
3D-ICONS metadata schema for 3D objects
”,
Newsletter di Archeologia CISA
, Vol. 
4
, pp. 
159
-
181
.
D'Andrea
,
A.
(
2023
),
I Dati Archeologici Nella Società Dell’informazione
,
UniorPress
,
Napoli
.
Edmond
,
J.
and
Lehmann
,
J.
(
2021
), “
Digital humanities, knowledge complexity, and the five ‘aporias’ of digital research
”,
Digital Scholarship in the Humanities
, Vol. 
36
No. 
Supplement_2
, pp. 
ii95
-
ii108
, doi: .
Ernst
,
W.
and
Siegel
,
A.
(
2015
),
Stirrings in the Archives: Order from Disorder
,
Rowman & Littlefield
,
Lanham, MD
.
EuroCRIS
(
2020
),
Main Features of CERIF
,
EuroCRIS Association
,
Nijmegen
, available at: https://www.eurocris.org/cerif/main-features-cerif
Feinberg
,
M.
(
2022
),
Everyday Adventures with Unruly Data
,
MIT Press
,
Cambridge, MA
.
Fenlon
,
K.
,
Organisciak
,
P.
,
Thomer
,
A.
and
Weber
,
N.M.
(
2025
), “
Conceptual models of the sociotechnical: introduction to special issue
”,
Journal of the Association for Information Science and Technology
, Vol. 
76
No. 
2
, pp.
349
-
352
.
Force
,
D.C.
and
Smith
,
R.
(
2021
), “
Context lost: digital surrogates, their physical counterparts, and the metadata that is keeping them apart
”,
The American Archivist
, Vol. 
84
No. 
1
, pp. 
91
-
118
, doi: .
Fowler
,
D.
,
Barratt
,
J.
and
Walsh
,
P.
(
2018
), “
Frictionless data: making research data quality visible
”,
IJDC
, Vol. 
12
No. 
2
, pp. 
274
-
285
, doi: .
Furner
,
J.
(
2021
),
Information Studies and Other Provocations: Selected Talks, 2000-2019
,
Library Juice Press
,
Sacramento, CA
.
Gasser
,
L.
(
1997
), “Introduction: design theory and CSCW”, in
Bowker
,
G.
,
Star
,
S.L.
,
Turner
,
W.
and
Gasser
,
L.
(Eds),
Social Science, Technical Systems, and Cooperative Work: Beyond the Great Divide
,
Lawrence Erlbaum
,
Mahwah, NJ
, pp. 
121
-
129
.
Gero
,
J.M.
(
2007
), “
Honoring ambiguity/problematizing certitude
”,
Journal of Archaeological Method and Theory
, Vol. 
14
No. 
3
, pp. 
311
-
327
, doi: .
Gherardi
,
S.
(
2021
), “A posthumanist epistemology of practice”, in
Neesham
,
C.
(Ed.),
Handbook of Philosophy of Management
,
Springer
,
Cham
, pp. 
1
-
22
.
Gonzalez-Perez
,
C.
,
Martín-Rodilla
,
P.
,
Parcero-Oubiña
,
C.
,
Fábrega-Álvarez
,
P.
and
Güimil-Fariña
,
A.
(
2012
), “Extending an abstract reference model for transdisciplinary work in cultural heritage”, in
Dodero
,
J.
,
Palomo-Duarte
,
M.
and
Karampiperis
,
P.
(Eds),
Communications in Computer and Information Science
,
Springer Berlin Heidelberg
, Vol. 
343
, pp. 
190
-
201
, doi: .
Green
,
A.E.
and
Humphrey
,
C.
(
2014
), “
Building the DDI
”,
IASSIST Quarterly
, Vol. 
37
Nos
1-4
, p.
36
, doi: .
Grossman
,
J.H.
(
2025
), “
How standards became documents: uniform screw threads and standardization in the age of industrial print
”,
Technology and Culture
, Vol. 
66
No. 
3
, pp. 
649
-
673
, doi: .
Groth
,
P.
and
Moreau
,
L.
(
2013
), “
PROV-overview: an overview of the PROV family of documents
”,
available at:
 http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
Guillem
,
A.
and
Bruseker
,
G.
(
2017
), “
The CIDOC CRM game: a serious game approach to ontology learning
”,
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
, Vol. 
62W5
, pp. 
317
-
323
, doi: .
Guillem
,
A.
,
van Ruymbeke
,
M.
,
Eide
,
Ø.
and
de Luca
,
L.
(
2024
), “
Spatio-temporal reasoning on stratigraphic data in archaeology: formalization of the Harris Laws as inferences using CIDOC CRM
”,
SWODCH’24: 4th International Workshop on Semantic Web and Ontology Design for Cultural Heritage
.
Hanseth
,
O.
,
Jacucci
,
E.
,
Grisot
,
M.
and
Aanestad
,
M.
(
2006
), “
Reflexive standardization: side effects and complexity in standard making
”,
MIS Quarterly
, Vol. 
30
No. 
Special Issue
, pp. 
563
-
581
, doi: .
Hansson
,
K.
,
Dahlgren
,
A.N.
and
Pargman
,
T.C.
(
2022
), “
Datafication and cultural heritage: critical perspectives on exhibition and collection practices
”,
Information and Culture
, Vol. 
57
No. 
1
, pp. 
1
-
5
, doi: .
Hartley
,
J.M.
and
Schjøtt
,
A.
(
2023
), “Imagining publics through emerging technologies”, in
Hartley
,
J.M.
,
Sørensen
,
J.K.
and
Mathieu
,
D.
(Eds),
DataPublics: The Construction of Publics in Datafied Democracies
,
Bristol University Press
,
Bristol
, pp. 
99
-
120
.
Hayslett
,
M.
(
2015
), “
Data world does not lack standards
”,
Journal of Librarianship and Scholarly Communication
, Vol. 
3
No. 
2
, 1245, doi: .
Hjørland
,
B.
(
2026
), “
Semantic primitives and compositionality
”,
Annual Review of Information Science and Technology
, Vol. 
77
No. 
1
, pp. 
198
-
223
, doi: .
Holbraad
,
M.
and
Pedersen
,
M.A.
(
2017
),
The Ontological Turn: An Anthropological Exposition
,
Cambridge University Press
,
Cambridge
.
Huggett
,
J.
(
2012
), “
Promise and paradox: accessing open data in archaeology
”,
Mills, C., Pidd, M. and Ward, E. (Eds)
,
Proceedings of the Digital Humanities Congress 2012
,
Sheffield
,
6–8th September 2012
,
Humanities Research Institute
,
Sheffield
.
Huvila
,
I.
(
2016
), “
Affective capitalism of knowing and the society of search engine
”,
Aslib Journal of Information Management
, Vol. 
68
No. 
5
, pp. 
566
-
588
, doi: .
Huvila
,
I.
(
2018
), “
Putting to (information) work: a Stengersian perspective on how information technologies and people influence information practices
”,
The Information Society
, Vol. 
34
No. 
4
, pp. 
229
-
243
, doi: .
Huvila
,
I.
(
2022
), “
Improving the usefulness of research data with better paradata
”,
Open Information Science
, Vol. 
6
No. 
1
, pp. 
28
-
48
, doi: .
Huvila
,
I.
,
Andersson
,
L.
and
Sköld
,
O.
(
2021
), “
Citing methods literature: citations to field manuals as paradata on archaeological fieldwork
”,
Information Research
, Vol. 
27
No. 
3
, doi: .
Huvila
,
I.
,
Andersson
,
L.
and
Sköld
,
O.
(
2024
), “
Patterns in paradata preferences among the makers and reusers of archaeological data
”,
Data and Information Management
, Vol. 
8
No. 
4
, 100077, doi: .
Huvila
,
I.
and
Sköld
,
O.
(
2026
), “
A fieldwork manual as a regulatory device: instructing, prescribing and describing documentation work
”,
Journal of Information Science
, Vol. 
52
No. 
2
, pp.
615
-
630
, doi: .
Iliadis
,
A.
,
Acker
,
A.
,
Stevens
,
W.
and
Kavakli
,
S.B.
(
2025
), “
One schema to rule them all: how Schema.org models the world of search
”,
Journal of the Association for Information Science and Technology
, Vol. 
76
No. 
2
, pp. 
460
-
523
, doi: .
ISO
(
2017
),
ISO 15836-1:2017
.
ISO
(
2019
),
ISO 15836-2:2019
.
Jensen
,
C.B.
(
2010
),
Ontologies for Developing Things: Making Health Care Futures through Technology
,
Brill
,
Leiden
.
Jensen
,
C.B.
(
2021
), “
Practical ontologies redux
”,
Berliner Blätter
, Vol. 
84
, pp. 
93
-
104
.
Jiang
,
H.
,
Motohashi
,
K.
,
Liu
,
W.
and
Zhang
,
X.
(
2022
), “
Knowledge-oriented leadership and technology standard innovation: a temporary-team perspective
”,
Journal of Knowledge Management
, Vol. 
26
No. 
8
, pp. 
2061
-
2083
, doi: .
Kari
,
M.
(
2024
), “
Toimikuntatyöskentely Tietojohtamisen Käytäntönä: Tapaustutkimus Rakentamisalan Ohjeistuksen Laatimisesta
”,
Ph.D. thesis, Tampere University, Tampere
.
Katuu
,
S.
(
2023
), “
Soup du jour – existing and emerging trends in archives and records management standardization
”,
Records Management Journal
, Vol. 
34
No. 
1
, pp. 
15
-
28
, doi: .
Khazraee Afzali
,
S.E.A.
(
2014
), “
Archaeology of archaeology: a study of the creation of archaeological knowledge in practice
”,
Unpublished doctoral dissertation, Drexel University, Philadelphia
.
Langer
,
S.
(
2023
), “
The relation of standards and power in management and organization research: core notions and alternative avenues
”,
International Journal of Management Reviews
, Vol. 
25
No. 
4
, pp. 
647
-
665
, doi: .
Lehr
,
W.
(
1992
), “
Standardization: understanding the process
”,
Journal of the American Society for Information Science
, Vol. 
43
No. 
8
, pp. 
550
-
555
, doi: .
Leonelli
,
S.
(
2012
), “
Classificatory theory in biology
”,
Biological Theory
, Vol. 
7
No. 
4
, pp. 
338
-
345
, doi: .
Lien-Talks
,
A.
(
2024
), “
How FAIR is Bioarchaeological data: with a particular emphasis on making archaeological science data reusable
”,
Journal of Computer Applications in Archaeology
, Vol. 
7
No. 
1
, pp. 
246
-
261
, doi: .
Longpre
,
S.
,
Mahari
,
R.
,
Obeng-Marnu
,
N.
,
Brannon
,
W.
,
South
,
T.
,
Kabbara
,
J.
and
Pentland
,
S.
(
2024
), “
Data authenticity, consent, and provenance for AI are all broken: what will it take to fix them? An MIT exploration of generative AI
”.
Lovins
,
D.
and
Hillmann
,
D.
(
2017
), “
Broken-world vocabularies
”,
D-lib Magazine
, Vol. 
23
Nos
3/4
, doi: .
Markus
,
M.L.
,
Steinfield
,
C.W.
,
Wigand
,
R.T.
and
Minton
,
G.
(
2006
), “
Industry-wide information systems standardization as collective action: the case of the U.S. Residential mortgage industry
”,
MIS Quarterly
, Vol. 
30
No. 
Special Issue
, pp. 
439
-
465
, doi: .
Marlet
,
O.
,
Zadora-Rio
,
E.
,
Buard
,
P.-Y.
,
Markhoff
,
B.
and
Rodier
,
X.
(
2019
), “
The archaeological excavation report of Rigny: an example of an interoperable logicist publication
”,
Heritage
, Vol. 
2
No. 
1
, pp. 
761
-
773
, doi: .
Marsh
,
D.E.
,
Fenlon
,
K.
,
Sorensen
,
A.H.
and
Wise
,
N.M.
(
2024
), “
Primary sources as linked data: exploring motives across the sciences and social sciences
”,
Proceedings of the Association for Information Science and Technology
, Vol. 
61
No. 
1
, pp. 
232
-
245
, doi: .
McCullough
,
M.
(
2015
),
Ambient Commons: Attention in the Age of Embodied Information
,
MIT Press
,
Cambridge, MA
.
Millerand
,
F.
and
Bowker
,
G.C.
(
2009
), “Metadata standards: trajectories and enactment in the life of an ontolgy”, in
Lampland
,
M.
and
Star
,
S.L.
(Eds),
Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life
,
Cornell University Press
,
Ithaca
, pp. 
149
-
165
.
Million
,
A.J.
,
York
,
J.
,
Lafia
,
S.
and
Hemphill
,
L.
(
2025
), “
Data, not documents: moving beyond theories of information-seeking behavior to advance data discovery
”,
Journal of the Association for Information Science and Technology
, Vol. 
76
No. 
4
, pp. 
649
-
664
, doi: .
Missier
,
P.
(
2017
), “Provenance standards”, in
Liu
,
L.
and
Özsu
,
M.T.
(Eds),
Encyclopedia of Database Systems
,
Springer
,
New York
, pp. 
1
-
8
.
Mosconi
,
G.
,
Karasti
,
H.
and
Randall
,
D.
(
2022
), “Designing a data story: an innovative approach for the selective care of qualitative and ethnographic data”, in
Burkhardt
,
M.
,
van Geenen
,
D.
,
Gerlitz
,
C.
,
Hind
,
S.
,
Kaerlein
,
T.
,
Lämmerhirt
,
D.
and
Volmar
,
A.
(Eds),
Interrogating Datafication: Towards a Praxeology of Data
,
transcript Verlag
,
Bielefeld
, pp. 
207
-
230
.
Mulvin
,
D.
(
2021
),
Proxies: The Cultural Work of Standing in
,
Polity
,
Cambridge
.
Nicolini
,
D.
(
2013
),
Practice Theory, Work, and Organization: An Introduction
,
Oxford University Press
,
Oxford
.
Niu
,
J.
(
2016
), “
Organisation and description of datasets
”,
Archives and Manuscripts
, Vol. 
44
No. 
2
, pp. 
1
-
13
, doi: .
Oliveira
,
C.C.D.
,
Löw
,
M.M.
and
Barros
,
T.H.B.
(
2024
), “
Knowledge organization possibilities for archives: comparative semantic analysis between CIDOC-CRM and RiC-CM
”,
Knowledge Organization
, Vol. 
51
No. 
5
, pp. 
362
-
370
, doi: .
Open Archives Initiative Object Reuse and Exchange
(
2014
),
ORE Specifications and User Guides
,
Open Archives Initiative
,
Ithaca, NY
.
Open Knowledge Foundation
(
2023
), “
Data package
”,
available at:
 https://specs.frictionlessdata.io/guides/data-package/#suite-of-specifications
Pacheco
,
A.
,
Da Silva
,
C.G.
and
De Freitas
,
M.C.V.
(
2023
), “
A metadata model for authenticity in digital archival descriptions
”,
Archival Science
, Vol. 
23
No. 
4
, pp. 
629
-
673
, doi: .
Papenmeier
,
A.
,
Krämer
,
T.
,
Friedrich
,
T.
,
Hienert
,
D.
and
Kern
,
D.
(
2021
), “
Genuine information needs of social scientists looking for data
”,
Proceedings of the Association for Information Science and Technology
, Vol. 
58
No. 
1
, pp. 
292
-
302
, doi: .
Park
,
J.
(
2021
), “
An actor-network perspective on collections documentation and data practices at museums
”,
Museum and Society
, Vol. 
19
No. 
2
, pp. 
237
-
251
, doi: .
Plantin
,
J.-C.
(
2021
), “
The data archive as factory: alienation and resistance of data processors
”,
Big Data and Society
, Vol. 
8
No. 
1
, doi: .
PREMIS Editorial Committee
(
2015
),
PREMIS Data Dictionary for Preservation Metadata
, (3.0 ed) ,
Library of Congress
,
Washington, DC
.
Ranger
,
H.
,
Evans
,
J.
and
Moore
,
A.V.
(
2024
), “
A DOI is not enough – can practice research be captured by libraries and archives?
”.
Rasmussen
,
K.B.
(
2014
), “
Social science metadata and the foundations of the DDI
”,
IASSIST Quarterly
, Vol. 
37
Nos
1-4
, pp. 
28
-
35
, doi: .
Rathje
,
W.L.
,
Shanks
,
M.
and
Witmore
,
C.
(
2013
),
Archaeology in the Making: Conversations through a Discipline
,
Routledge
,
London
.
Rawson
,
K.
and
Muñoz
,
T.
(
2019
), “Against cleaning”, in
Gold
,
M.K.
and
Klein
,
L.F.
(Eds),
Debates in the Digital Humanities 2019
,
University of Minnesota Press
,
Minneapolis
, pp. 
279
-
292
.
Reiter
,
S.S.
,
Staniuk
,
R.
,
Kolář
,
J.
,
Bulatović
,
J.
,
Rose
,
H.A.
,
Ryabogina
,
N.E.
,
Speciale
,
C.
,
Schjerven
,
N.
,
Paulsson
,
B.S.
,
Lee
,
V.Y.K.
,
Canteri
,
E.
,
Revill
,
A.
,
Dahlberg
,
F.
,
Sabatini
,
S.
,
Frei
,
K.M.
,
Racimo
,
F.
,
Ivanova-Bieg
,
M.
,
Traylor
,
W.
,
Kate
,
E.J.
,
Derenne
,
E.
,
Frank
,
L.
,
Woodbridge
,
J.
,
Fyfe
,
R.
,
Shennan
,
S.
,
Kristiansen
,
K.
,
Thomas
,
M.G.
and
Timpson
,
A.
(
2024
), “
The BIAD standards: recommendations for archaeological data publication and insights from the big interdisciplinary archaeological database
”,
Open Archaeology
, Vol. 
10
No. 
1
, 20240015, doi: .
Rew
,
R.
,
Davis
,
G.
,
Emmerson
,
S.
,
Cormack
,
C.
,
Caron
,
J.
,
Pincus
,
R.
,
Hartnett
,
E.
,
Heimbigner
,
D.
,
Appel
,
L.
and
Fisher
,
W.
(
1989
),
Unidata NetCDF
,
UCAR/NCAR - Unidata
.
Richards
,
J.D.
(
2023
), “
Getting it Together: combining information about archaeological sites and artefacts in ARIADNE
”,
Internet Archaeology
, No. 
64
, doi: .
Rodrigues
,
J.
and
Teixeira Lopes
,
C.
(
2022
), “
Describing data in image format: proposal of a metadata model and controlled vocabularies
”,
Journal of Library Metadata
, Vol. 
22
Nos
3-4
, pp. 
1
-
21
, doi: .
Sandoval
,
G.
(
2021
), “
Single-context recording, field interpretation and reflexivity: an analysis of primary data in context sheets
”,
Journal of Field Archaeology
, Vol. 
46
No. 
7
, pp. 
496
-
512
, doi: .
Semeraro
,
G.
(
2016
), “
Form, function and descriptive analysis in archaeology
”,
Les Nouvelles de l’archéologie
, Vol. 
144
No. 
144
, pp. 
26
-
28
, doi: .
Shankar
,
K.
(
2009
), “
Ambiguity and legitimate peripheral participation in the creation of scientific documents
”,
Journal of Documentation
, Vol. 
65
No. 
1
, pp. 
151
-
165
, doi: .
Stead
,
S.
and
Doerr
,
M.
(
2015
),
CRMinf: The Argumentation Model - an Extension of CIDOC-CRM to Support Argumentation
, (version 0.7 ed) ,
Paveprime
,
Purley
.
Tenopir
,
C.
,
Allard
,
S.
,
Douglass
,
K.
,
Aydinoglu
,
A.U.
,
Wu
,
L.
,
Read
,
E.
,
Manoff
,
M.
and
Frame
,
M.
(
2011
), “
Data sharing by scientists: practices and perceptions
”,
PLoS One
, Vol. 
6
No. 
6
, e21101, doi: .
Theodoridou
,
M.
,
Tzitzikas
,
Y.
,
Doerr
,
M.
,
Marketakis
,
Y.
and
Melessanakis
,
V.
(
2010
), “
Modeling and querying provenance by extending CIDOC CRM
”,
Distributed and Parallel Databases
, Vol. 
27
No. 
2
, pp. 
169
-
210
, doi: .
Thévenot
,
L.
(
2009
), “
Postscript to the special issue: governing life by standards: a view from engagements
”,
Social Studies of Science
, Vol. 
39
No. 
5
, pp. 
793
-
813
, doi: .
Thomer
,
A.K.
,
Wickett
,
K.M.
,
Baker
,
K.S.
,
Fouke
,
B.W.
and
Palmer
,
C.L.
(
2018
), “
Documenting provenance in noncomputational workflows: research process models based on geobiology fieldwork in Yellowstone National Park
”,
JASIST
, Vol. 
69
No. 
10
, pp. 
1234
-
1245
, doi: .
Tóth-Czifra
,
E.
(
2020
),
The Risk of Losing the Thick Description: Data Management Challenges Faced by the Arts and Humanities in the Evolving FAIR Data Ecosystem
,
Open Book Publishers
,
Cambridge
, pp. 
235
-
266
.
Tsai
,
A.C.
,
Kohrt
,
B.A.
,
Matthews
,
L.T.
,
Betancourt
,
T.S.
,
Lee
,
J.K.
,
Papachristos
,
A.V.
,
Weiser
,
S.D.
and
Dworkin
,
S.L.
(
2016
), “
Promises and pitfalls of data sharing in qualitative research
”,
Social Science and Medicine
, Vol. 
169
, pp. 
191
-
198
, doi: .
Unidata
(
2019
),
NetCDF User's Guide
,
UniDATA
,
Boulder, CO
.
Vardigan
,
M.
(
2014
), “
The DDI matures: 1997 to the present
”,
IASSIST Quarterly
, Vol. 
37
Nos
1-4
, p.
45
, doi: .
Vetter
,
J.
(
2022
), “
The history of fieldwork
”,
Histories
, Vol. 
2
No. 
4
, pp. 
457
-
465
, doi: .
Wagemann
,
C.
and
Schneider
,
C.
(
2015
), “
Transparency standards in qualitative comparative analysis
”,
Qualitative and Multi-Method Research
, Vol. 
13
No. 
1
, pp. 
38
-
42
.
Wang
,
S.
,
Maineri
,
A.
,
Singh
,
N.K.
and
Kuhn
,
T.
(
2024
), “FAIR implementation profiles for social science”, in
Garoufallou
,
E.
and
Sartori
,
F.
(Eds),
Metadata and Semantic Research
,
Springer
,
Cham
, pp. 
284
-
290
.
Weber
,
L.
(
1989
), “
Archival description standards: establishing a process for their development and implementation: report of the working group on standards for archival description
”,
The American Archivist
, Vol. 
52
No. 
4
, pp. 
431
-
537
.
Weibel
,
S.L.
(
2009
), “Dublin core metadata initiative (DCMI): a personal history”, in
Encyclopedia of Library and Information Sciences
, (3 ed) ,
CRC Press
.
Wilkinson
,
M.D.
,
Dumontier
,
M.
,
Aalbersberg
,
I.J.
,
Appleton
,
G.
,
Axton
,
M.
,
Baak
,
A.
,
Blomberg
,
N.
,
Boiten
,
J.-W.
,
da Silva Santos
,
L.B.
,
Bourne
,
P.E.
,
Bouwman
,
J.
,
Brookes
,
A.J.
,
Clark
,
T.
,
Crosas
,
M.
,
Dillo
,
I.
,
Dumon
,
O.
,
Edmunds
,
S.
,
Evelo
,
C.T.
,
Finkers
,
R.
,
Gonzalez-Beltran
,
A.
,
Gray
,
A.J.
,
Groth
,
P.
,
Goble
,
C.
,
Grethe
,
J.S.
,
Heringa
,
J.
,
’t Hoen
,
P.A.
,
Hooft
,
R.
,
Kuhn
,
T.
,
Kok
,
R.
,
Kok
,
J.
,
Lusher
,
S.J.
,
Martone
,
M.E.
,
Mons
,
A.
,
Packer
,
A.L.
,
Persson
,
B.
,
Rocca-Serra
,
P.
,
Roos
,
M.
,
van Schaik
,
R.
,
Sansone
,
S.-A.
,
Schultes
,
E.
,
Sengstag
,
T.
,
Slater
,
T.
,
Strawn
,
G.
,
Swertz
,
M.A.
,
Thompson
,
M.
,
van der Lei
,
J.
,
van Mulligen
,
E.
,
Velterop
,
J.
,
Waagmeester
,
A.
,
Wittenburg
,
P.
,
Wolstencroft
,
K.
,
Zhao
,
J.
and
Mons
,
B.
(
2016
), “
The FAIR Guiding Principles for scientific data management and stewardship
”,
Scientific Data
, Vol. 
3
No. 
1
, 160018, doi: .
Williams
,
R.
,
Bunduchi
,
R.
,
Gerst
,
M.
,
Graham
,
I.
,
Pollock
,
N.
,
Procter
,
R.
and
Voss
,
A.
(
2004
), “
Understanding the evolution of standards : alignment and reconfiguration in standards development and implementation arenas
”,
Proceedings of the 4S & EASST Conferenc
,
EASST
,
Paris
, pp. 
225
-
239
.
Williams
,
R.D.
,
Shankar
,
K.
and
Eschenfelder
,
K.R.
(
2017
), “
Two views of the data documentation initiative: stakeholders, collaboration and metadata standards creation
”,
Proceedings of the Association for Information Science and Technology
, Vol. 
54
No. 
1
, pp. 
455
-
462
, doi: .
Yeung
,
A.K.W.
and
Hall
,
G.B.
(
2007
), “Spatial data standards and metadata”, in
Yeung
,
A.K.W.
and
Hall
,
G.B.
(Eds),
Spatial Database Systems: Design, Implementation and Project Management
,
Springer Netherlands
,
Dordrecht
, pp. 
129
-
173
.
Zoldoske
,
T.
(
2024
), “
Metadata for discovery. Planning for an information network
”,
Internet Archaeology
, No. 
65
, doi: .
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

or Create an Account

Close Modal
Close Modal