AI as Infrastructure, Not Innovation Theatre
Moving from hype to operational reality, and why the difference matters more than ever.
There is a pattern in how many organisations have approached AI over the past few years, and it is becoming harder to ignore. A working group is assembled. A pilot is commissioned, often in a corner of the organisation where risk is low and visibility is high. A demo goes well. Leadership is impressed. A press release is issued. And then, quietly, the project stalls. It never makes it into daily operations. It is not maintained, not measured, and not improved. Six months later, the same organisation launches another pilot.
This is innovation theatre. And it is extraordinarily common.
According to RAND Corporation analysis covering over 2,400 enterprise AI initiatives, around 80% of AI projects fail to deliver their intended business value, a failure rate roughly twice that of conventional IT projects, and one that has barely shifted in three years. Some studies put the figure higher still. One MIT assessment found that 95% of organisations deploying generative AI had seen zero measurable return in terms of documented business impact. Gartner has forecast that 60% of AI projects lacking AI-ready data will be abandoned through 2026, not because the technology failed, but because the organisational and data foundations were never in place.
These are not edge-case statistics from peripheral researchers. They represent a systemic failure not of AI technology, but of how organisations are choosing to deploy it. The problem is not that AI does not work. The problem is that most organisations are treating it as a performance rather than a system.
What Innovation Theatre Actually Looks Like
Innovation theatre tends to follow a recognisable script. It begins with a demonstration: a tool that can summarise documents, generate reports, answer queries, or classify images. The demonstration works. Under controlled conditions, with a curated dataset and a sympathetic audience, most AI tools work. The question is what happens next.
In the theatre model, what happens next is a press release, a conference presentation, or an entry in an annual report. The AI "initiative" serves primarily as a signal, to funders, to stakeholders, to competitors, that the organisation is modern and forward-looking. It is not primarily intended to solve a problem.
The tell-tale signs are familiar to anyone who has been on the inside of these deployments:
Success metrics are defined loosely or not at all before the work begins.
The system is run by a small project team rather than integrated into established workflows.
Maintenance is not budgeted for after the initial deployment.
Staff training is superficial or absent.
When the project lead moves on, the initiative quietly dies.
This is not a technology failure. Leadership and organisational factors drive the majority of AI project failures, the absence of clear success criteria, weak ownership of outcomes, and the habit of treating AI as a technology problem rather than a business and operational one.
The cost is not just financial, though that is real enough. Abandoned AI initiatives erode organisational trust in the technology, demoralise the staff who invested in them, and make the next genuine attempt harder to fund and harder to sustain.
From Experiment to Infrastructure
The organisations that extract consistent value from AI are not necessarily those with the largest budgets or the most sophisticated models. They are those that have made a conceptual shift: from treating AI as a one-off experiment, to treating it as infrastructure.
Infrastructure is a useful frame because it sets expectations differently. Nobody builds a server room, runs it for three months to generate interest, and then switches it off. Infrastructure is designed to run. It is maintained, monitored, and upgraded over time. When it breaks, there is a process to fix it. When the organisation's needs change, the infrastructure is updated to reflect that.
The same logic applies to AI systems. A well-deployed AI tool embedded in an operational workflow is maintained, not abandoned. It is monitored for accuracy and drift. It is updated when the data it relies on changes. The people who use it are trained not just in how to operate it, but in what it can and cannot do, and what to do when it gives a wrong answer.
This last point matters enormously in the context of responsible AI practice. Human-in-the-loop design is not just a regulatory requirement under frameworks such as the EU AI Act, it is what makes infrastructure-grade AI systems trustworthy in practice. The system surfaces information and recommendations; humans apply judgement, catch errors, and remain accountable for decisions. That division of responsibility is not a limitation of AI. It is good engineering.
For SMEs, public sector bodies, heritage organisations, and research institutions operating on constrained resources, the infrastructure framing also has practical implications for how investments are sized and justified. A superficial deployment of a fashionable tool is easy to approve and easy to write off. A considered, well-specified AI system embedded in a genuine workflow, one with defined success criteria, documented maintenance requirements, and a named owner, is a much harder investment to justify at the outset, but a far better one over time.
Measuring What Matters
One of the most reliable predictors of AI project failure is the absence of defined success metrics before development begins. Research suggests that pre-defining quantified success criteria can improve the chances of a successful outcome by a factor of four or more. This is not a complex finding. It is the same principle that applies to any serious piece of work: you need to know what you are trying to achieve in order to know whether you have achieved it.
In practice, the AI projects most likely to succeed are those that start with a problem, not a technology. The question is not "what can we do with a large language model?" but "where in our operations are we making decisions that take too long, that carry unnecessary uncertainty, or that rely on information that is hard to retrieve and synthesise?" AI may or may not be the right answer to those questions. But starting from a genuine operational need produces a very different kind of project than starting from a desire to appear innovative.
Measurement also needs to continue after deployment. AI systems are not static. The data they depend on changes; the problems they are solving evolve; the people using them develop new habits and new questions. Without ongoing monitoring, a system that worked at launch may quietly degrade over months or years without anyone noticing. In sensitive applications, where AI is informing decisions about individuals, managing scarce resources, or providing guidance in high-stakes contexts, that kind of unmonitored drift is not merely a waste of money. It is a governance risk.
For organisations in the public sector and heritage sector, where accountability is heightened and data provenance carries particular importance, robust measurement is also part of the accountability architecture. Being able to demonstrate that a system is performing as intended, on known data, with understood limitations, is foundational to maintaining public trust.
Data: The Infrastructure Beneath the Infrastructure
There is a reason that data quality appears consistently at the top of AI failure analyses. Gartner has estimated that only around 12% of organisations have data of sufficient quality to support AI applications at scale, meaning that the vast majority are, in effect, building on unstable foundations.
This is a problem that cannot be solved by selecting a better model or a more powerful platform. It can only be solved by treating data governance as a first-class engineering concern: knowing where data comes from, how it was collected, what its known gaps and biases are, and how it should be updated and maintained over time. Data provenance is not a bureaucratic nicety. It is the precondition for AI systems that produce results you can trust and account for.
For organisations working with sensitive or specialised data, patient records, heritage collections, environmental monitoring data, planning documents, this is particularly acute. The value of these datasets comes precisely from their richness and their context. Deploying AI systems against them without adequate governance of the underlying data is not innovation. It is a way of compounding existing data quality problems at scale.
The good news is that investment in data foundations, in documentation, curation, quality assurance, and governance, is not wasted if an AI deployment is later changed or replaced. Sound data infrastructure is reusable and generalisable. The same cannot be said for a failed pilot.
What Infrastructure-Grade AI Looks Like in Practice
The principles here are not complex, even if they require sustained effort to apply. Organisations moving from innovation theatre to operational AI infrastructure tend to share a few common characteristics.
They define the problem before selecting the tool. The starting point is always an operational challenge with clear stakes, not a technology capability seeking an application.
They set success criteria up front, in terms that operational staff and senior leadership can both understand and verify. These are not technical metrics alone. They include adoption rates, error rates, time savings, and decision quality measures that map to real outcomes.
They invest in integration. The AI system is part of the workflow, not adjacent to it. Staff training covers not just operation but limitation, what the system does not do well, and what to do in those cases.
They budget for maintenance and improvement from the outset. Deployment is not the end of the project. It is the beginning of an operational system that will need to be monitored, updated, and periodically reviewed.
And they design for human oversight at every stage. The system generates outputs; humans make decisions. That accountability structure is made explicit, documented, and enforced in practice, not just described in a policy document.
This is not a particularly glamorous description of AI deployment. It lacks the excitement of a successful demo or the visibility of a press release. But it describes the kind of AI work that produces durable value, for the organisations that commission it, for the people who rely on it, and for the broader case that responsible AI adoption is both possible and worthwhile.
Final Thought
The gap between AI as innovation theatre and AI as infrastructure is, in the end, a question of intent. Theatre is designed to be seen. Infrastructure is designed to work.
The organisations making the most effective use of AI right now are not necessarily those that talk about it the most. They are those that have been quietly embedding it into their operations, in workflow automation, in document analysis, in pattern recognition and decision support, treating it with the same rigour they would apply to any other piece of critical operational technology.
AI is not a performance. It is a system. It should be specifiable, maintainable, measurable, and accountable. And when it breaks, as all systems occasionally do, there should be a clear process for identifying what went wrong and putting it right.
That is the standard worth aiming for. Not the best demo in the room, but the most reliable system in the building.