Many data providers use non-standardized and heterogeneous licenses. Using different terms for the same rights and covering various levels of detail and facets make data licensing complex and ambiguous. These issues make it difficult to create sound licensing terms and make licenses hard to understand. They might further undermine data owners' intellectual property rights and impede the full use of data. We address the problem by creating a knowledge organization system – a faceted classification – to represent and organize key data licensing information, thereby removing the barrier to data licensing and sharing.
We used grounded facet analysis, following the steps of facet analysis and using grounded theory to generate the facets and foci of the classification. We used a stratified, systematic sampling method to select licenses from 73 data repositories. Three rounds of iterative testing were conducted to ensure the reliability and validity of the product. We validated the final product by assessing its theoretical soundness and practical implementation.
We created terms of use faceted classification (TUFC) for research data. The study demonstrates that TUFC is an effective tool for representing domain knowledge in data licensing.
TUFC is a foundational knowledge organization system for data licensing. It breaks down complex licenses into facets and foci, providing a common language and structure to dissolve the issues caused by license heterogeneity. TUFC facilitates data providers to make sound Terms of Use for research data, helps users quickly understand data licenses and promote data reuse, and provides a formal scheme for research data license checking for data repositories.
