Skip to Main Content
Article navigation
Purpose

This paper investigates how innovation teams construct shadow archives, defined as unauthorised, machine-readable knowledge bases that circumvent copyright restrictions, to fuel retrieval-augmented generation (RAG) systems under competitive time pressure. The study examines whether such evasion measurably improves innovation outcomes and how institutional actors, specifically academic librarians, enable or tolerate these practices.

Design/methodology/approach

A computational digital ethnography was conducted during a two-week university bootcamp for 20 student teams (N = 60) preparing for the 2025 Mathematical Contest in Modelling. The research triangulated (1) SBERT vector-space comparison of each team's shadow archive against official library holdings, (2) RAG query-log analysis to quantify shadow dependency and (3) participant observation and post-competition interviews to trace data-laundering routines and librarian mediation.

Findings

Shadow archives diverged semantically from licensed collections (mean cosine d = 0.47, p < 0.001) and incorporated 68% grey literature, preprints and pirated texts. Teams with higher shadow dependency achieved significantly better competition scores (β = 0.62, p < 0.001, R2 = 0.68). Librarians facilitated this outcome through studied ambiguity: teaching digital rights management (DRM) removal tools while disclaiming legal responsibility, thereby normalising a five-stage data-laundering pipeline (acquisition, DRM circumvention, OCR, format conversion, vectorisation).

Originality/value

The study reconceptualises copyright not as a binary compliance variable but as a socio-material boundary that spawns parallel knowledge subsystems. It introduces data laundering as an empirical process and quantifies an innovation premium from copyright evasion, demonstrating that rigid licensing paradoxically undermines the knowledge resilience it purports to protect.

Licensed re-use rights only
You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$41.00
Rental

or Create an Account

Close Modal
Close Modal