Copyright and AI – Who Owns the Future of British Media?
The UK government is exploring the creation of a copyright-cleared British media asset training dataset, which could provide AI developers with high-quality content for training models. The outcome of this debate will define not just the future of British creative industries, but also the ethical framework that governs AI development.
But does this risk undermining creators’ rights, or can a balance be struck between AI innovation and fair compensation?
In our view, revisions to copyright laws to facilitate the creation of AI training sets raises much wider ethical questions.
Cultural heritage pervades all aspects of society and our daily lives. It provides a livelihood not only for artisans, but for those that work in the tourist industry, it is the very essence of the nation’s character.
Once freely given, cultural heritage cannot be recovered. Its loss is irreversible, and the UK will have no control over how its cultural assets are used or interpreted by other nations.
This is not a new problem; it was first encountered with the distribution of music. Practical and enduring regulations for the distribution and fair compensation of composers, based upon the Statute of Anne (1710), have been in place for centuries.
There are theoretical reasons to believe that data driven AI has reached maturity. As AI systems saturate existing datasets, adding more data offers diminishing returns. Future performance improvements will not come from harvesting vast quantities of data, but through better understanding of the mathematical function of algorithms and refining model architectures.
This consideration is never mentioned by those who hold barriers to entering the AI market, namely Open AI and Meta. They promote a view that the only way to maximise the economic benefits of AI, which have not been independently and objectively quantified by the UK Government, is by ‘scraping’ data from output of current and historical contributors.
If the future of AI will not depend upon current machine learning techniques, then what logic is there in providing weakened regulation of the past?
1. Should content creators be compensated?
Arguments for Compensation
Moral rights: Creators deserve fair recognition and reward if their works contribute to profitable AI systems.
Economic incentive: Without clear financial models, the creative sector may suffer, AI firms benefit while creators are left unpaid. In a UK DCMS consultation, 78% of respondents from the creative sector expressed concerns about the lack of compensation for AI training use.
Growing global trend: Countries like France and Japan are exploring copyright levies and licensing schemes for AI training. Over 20 lawsuits (as of 2024) are ongoing worldwide, including major cases against OpenAI, Meta, and Stability AI for unlicensed use of copyrighted content.
Counter Arguments
Enforcement complexity: AI models train on billions of data points, it’s often unclear which works contributed meaningfully to the model.
Innovation slowdown: Complex licensing structures could stifle small AI developers or research labs with limited budgets.
Data transformation: Some argue that once data is processed into embeddings or patterns, it's no longer a direct reproduction.
Key Policy Questions
Should the UK implement a collective licensing model, like PRS for Music, to manage AI dataset rights?
Commonwealth country copyright regulations currently afford greater protection to creators than the regulatory frameworks of the EU and USA. Is this sustainable, given that large corporations are lobbying their governments to protect them from litigation?
2. Fair use vs. AI exploitation
AI Innovation Arguments
Training models require scale: AI systems need enormous datasets, some proponents argue that training qualifies as non-expressive, transformative use.
Of course, this is an assertion, not a fact and the gain in function achieved by increasing the size of the dataset does not seem to be significant. Recent iterations of LLMs have been described by independent researchers as ‘nothing burgers’, sometimes representing a regression in performance.
Fair dealing (UK equivalent to fair use) includes uses for research, non-commercial study, parody, and news reporting; some think AI training could fall within this. The EU AI Act includes provisions requiring transparency on training datasets and opt-out mechanisms for copyright holders.
Do we want to promote a situation where everyone is faced with the approvals mechanisms implemented by website providers regarding collection and distribution of data? Is there any merit in requiring creators to be obliged to display constraints on the use of information in their shops, offices and historical sites?
Lack of viable alternatives: If AI developers can't use web-scale data, only Big Tech with deep licensing deals will remain competitive.
“There is no alternative, you will be destroying your economic growth if you don’t do what we say.”
This strapline, implicitly or explicitly promoted by corporate AI providers, masks the reality that they only fund AI research into methods that consume excessive amounts of energy and data because these technologies increase the barriers to entering their markets. This is despite strong evidence that classical methods of data analysis are orders of magnitude more efficient than current data driven AI algorithms.
Copyright Protection Arguments
Training ≠ research: Commercial AI firms training on content for profit likely fall outside fair dealing exceptions.
Unclear legal precedent: No UK court has ruled definitively on whether AI training constitutes infringement, uncertainty may chill both sides. The Generative AI Copyright Report (UKIPO, 2024) found no legal clarity on whether training on copyrighted material without permission is lawful in the UK.
Creator backlash: Visual artists, authors, and journalists have formed coalitions to push back against data scraping without consent. The UK dropped a controversial proposal in 2022 to allow text and data mining for commercial AI training after widespread creative sector opposition.
Key Policy Questions
Should the UK create a mandatory AI training registry, where developers declare which datasets were used?
To what extent can synthetic or synthetic–human hybrid datasets that avoid these copyright pitfalls?
An emerging example is the recently announced Isaac GROOT N1 toolset, which enables humanoid robots to replicate human tasks within virtual 3D spaces. This approach synthesises entirely new training datasets without relying on copyrighted material, offering a potential route to innovation without the legal and ethical baggage of scraping real-world content.
3. Cultural and ethical implications
Concerns About AI Impact
Cultural dilution: Over-reliance on past content risks creating derivative AI-generated media that lack originality or authenticity.
Erosion of media trust: Without transparency, AI-generated content may be mistaken for genuine journalism or creative work. The DCMS Creative Industries Sector Vision forecasts that AI adoption could contribute £30 billion to the UK creative economy by 2030, but only with trust and IP frameworks in place.
Labelling and provenance: Just as GM foods and synthetic voices require labels, so too should AI-generated cultural content. A YouGov poll (2024) found 74% of British adults support mandatory labelling of AI-generated cultural content.
Without clear labelling standards, trust in AI-generated media may collapse, undermining both consumer confidence and the reputation of British creative industries.
There is virtually no discussion about the introduction of quality metrics for AI generated content. It is a step forward to identify the sources used to generate AI content, but what is required is a measure of how good that data may be.
Opportunities from AI
Accessibility and scale: AI tools can help local creators scale distribution, enhance creativity, or reach underserved audiences.
Preserving heritage: AI can be used to restore old films, digitise archives, or create new interpretations of British cultural works.
Final Thoughts
The UK faces a pivotal choice: create a world-leading framework that protects creators while enabling AI, or risk fragmenting trust in its cultural industries, including tourism.
The current discussions around copyright revisions are being driven by individuals who stand to benefit the most from a relaxed scheme.
There is no coordinated response by scientists and engineers to investigate the validity of the claims of economic benefit arising from data driven AI. It is unlikely that a rigorous analysis can be performed without government support. The risk of inaction is a 21st Century South Sea Bubble, a speculative frenzy driven by unproven promises, destined to collapse.