The great data delusion — where to invest for AI winners

The dominant narrative around artificial intelligence (AI) is simple: more AI means more data, more storage, more data centres — an endless, exponential buildout. New data is expected to multiply at a compound annual growth rate of 24% through 2028. Adding to the apparent chaos, much of this data is expected to be “unstructured” (see Chart 1) — that is, information without a predefined schema; or information that, absent further processing, is effectively and predominantly unusable.

From data to knowledge

The relentless growth of data, at first glance, seems both inevitable and logical. But this view is too simple. It assumes that intelligence is a function of accumulation — the more you store, the more you know.

In reality, intelligence, whether human or artificial, does not work that way. It works through compression.

A child learns to cross the street not by recalling every past crossing, but by abstracting a rule: Look left, look right, judge distance and speed, then decide whether it is safe to cross. This decision-making process happens across countless other experiences as well. Millions of experiences are distilled into simple, reusable models.

To an extent, this is how AI models are trained, too. During training, models ingest vast volumes of data, processing and distilling them into model weights — numerical values that capture statistical patterns present in the training data. Just as repeated experiences shape a child’s intuitions, the resulting model weights reflect no single datapoint, but the aggregate trend across numerous instances. These weights allow the model to recognise patterns and generate responses without accessing the original training corpus. Once trained, the model no longer needs the raw data in the same way. The knowledge has been compressed.

The data crisis debunked

This is where the current enthusiasm for the exploding demand for data centres begins to overreach.

The assumption that all data must be stored indefinitely ignores a fundamental economic reality: costs. Storage is not free and redundant data has diminishing value.

Based on present headlines, we appear to be heading full throttle into a future doomed for drowning in an ever-expanding deluge of digital noise. But look past the sensationalised projections, and a different picture emerges.

The vast majority of data that is generated — especially low-signal data such as machine-generated logs, Internet of Things (IoT) sensor data and other ephemeral outputs — is, in fact, culled at the source. Only a small fraction — commonly estimated at low single digits — gets retained and stored. Moreover, as mentioned, even retained datasets rapidly lose marginal value once they have been distilled into AI models.

But this does not mean that we will need less infrastructure or fewer data centres in absolute terms. The salient point here is that storage will not be the bottleneck.

Storage is not the problem

At present, industry standard practices — such as data cleaning and filtering during AI training preprocessing — can significantly shrink the size of datasets. However, these practices serve broader goals beyond simply minimising storage. Cleaner, curated datasets improve data quality and lower computational complexity, in turn enabling higher-quality outputs and greater energy efficiency. As we discuss below, energy is arguably the primary performance consideration for AI workloads.

Indeed, guidance from International Data Corp (IDC) on AI-ready data infrastructure does not even list raw storage capacity as a core priority (see table). Data quality, speed, security and access take greater precedence. Ironically, achieving these priorities can often conflict with the goal of minimising stored data. For example, reaching “five nines” (99.999%) availability — the gold standard for data centres, which translates into less than five minutes of downtime per year — necessitates the strategic duplication of data by design to ensure resilience against hardware failures and network interruptions.

For more stories about where money flows, click here for Capital Section

What does this imply?

Although current headlines often emphasise storage growth — how many zettabytes (one zettabyte equals one billion terabytes) of data the world will generate — the more relevant constraint in fact lies elsewhere. Specifically, in processing power and energy.

The power paradigm

It is telling that data centres are now often measured by the amount of power they consume at peak loads, rather than by traditional metrics such as storage capacity or floor space. McKinsey projects that, by 2030, AI-related demand from data centres worldwide will draw 156GW of power (see Chart 2). Of this, inference workloads — as opposed to training — are expected to dominate the usage, making up more than a third of total data centre power demand (see Charts 3 and 4 for a comparison between status quo and 2030 projections in data centre demand by workload).

This shift from present-day training-heavy to inference-heavy workloads has significant implications for the design and operations of data centres. During inferencing, every query to an AI model involves active computation, rather than simple retrieval. Unlike conventional databases that merely fetch stored records, AI models generate responses in real time. Multiply that by billions of users, continuous applications and machine-to-machine interactions, and the scale of required computation becomes immense. Almost unimaginably large.

Data centres will evolve from warehouses of information to factories of computation. This, in turn, will create fundamentally different demands on data storage and management practices.

The new economics of data

Unlike training workloads, which are throughput-optimised to process maximum volumes of data at once, inference-workloads are latency-sensitive. Speed matters more than sheer volume. Rather than scanning entire datasets, AI models draw on what has already been distilled: weights fixed during training, and from repositories of compressed knowledge.

This changes the economics of data itself. As AI workloads skew towards inferencing, value progressively lies in compressed representations rather than raw accumulation. The marginal value of additional data declines faster than commonly assumed. Quality, uniqueness and contextual relevance matter more than sheer volume.

Data will become increasingly ephemeral — processed, abstracted and then discarded. What persists therefore is not the raw data itself, but the patterns extracted from it, distilled into model weights or stored within specialised retrieval systems for use during inferencing. We store less but we process more. The locus of intelligence shifts from data to the AI model.

Over time, AI models will become more efficient and more capable, but as deployment scales with more users, more queries and more complex tasks, all of which run on larger models, they will also become more energy-intensive. This means that Al will not scale on storage, but it will scale with power.

The world will not run out of space, but it may run out of watts.

Investment outlook: Applications win, infrastructure loses

What are the implications for long-term investments?

The market is currently still pricing the AI opportunity as a data and infrastructure story. But the reality is that it is an energy plus computation plus decision-making story. Not everything scales equally.

Infrastructure will be commoditised over time, whether it is data centres, fibre networks or cloud capacity. Returns will revert to the mean because of capital intensity, competition and regulated markets. Having said that, the pace of commoditisation is not uniform. Some segments will retain pricing power temporarily by virtue of technological complexity, supply constraints or other structural factors.

Compute sits in this category today. Specifically, advanced chips such as graphics processing units (GPUs) from Nvidia and tensor processing units (TPUs) from Google — are the current sweet spot. But it will be cyclical. Scarcity today means pricing power, but supply will catch up, customers will vertically integrate, and margins will compress over time. The development of Google’s TPUs is itself evidence of this dynamic, having been developed internally to meet the company’s own specific AI needs and reduce reliance on Nvidia’s GPUs.

Meanwhile, from power generation to grid infrastructure to cooling systems, the demand for energy will be structural. AI is fundamentally dependent on electricity. Demand will be continuous, non-discretionary and hard to substitute. At the same time, supply will be constrained because of regulations, physics and time.

Beyond energy and infrastructure, the next layer where value accrues is models and platforms. This layer is dominated by “winner-takes-most” dynamics, where there is huge upside, but rapid commoditisation driven by competition from open-source models, which are freely accessible and rapidly improving. Because switching costs are low, outcomes will ultimately be unstable.

Finally, it is applications where the long-term economic rent will reside. On the one hand, sticky, industry-specific workflows and integration systems feature stable margins, high switching costs and trust that compounds with the user base. On the other hand, volume-driven inference applications such as cloud-based AI services generate recurring, high-volume revenue as inference scales. While margins per query are low, the sheer volume will be huge.

In short, it is the combination of stickiness and scale that will make the application layer the eventual winner. Why? Because whoever owns the decisions owns the value.

History has a lesson. With the internet, the infrastructure companies (the telcos) did not win. It was the platforms plus applications built on top that did.

The actionable insight

For now, the clearest actionable takeaway for investors is to focus on energy. AI’s potential is fundamentally constrained by power. Beyond traditional sources such as coal and natural gas, nuclear has emerged as the leading candidate for reliable, low-emission power for data centres. However, large-scale deployment of advanced nuclear technologies such as small modular reactors (SMRs) remains years away. As such, medium-term options — encompassing solar, wind, hydro, geothermal and emerging fuels such as hydrogen and ammonia — are well worth monitoring.

In the near term, fresh developments will continue to create pockets of opportunity across the AI stack as markets inevitably overreact. Case in point: Memory stocks, which have surged in recent months, dipped briefly at end-March after Google released research on a memory compression algorithm. They have since recovered and chartered new highs. Such dislocations are likely to persist as uncertainty around technological progress and the eventual winners remains high — making it critical to track developments closely. Keep an eye on this column as we monitor how these opportunities unfold over time.

See also ‘The AI grid rewrites the value chain’ — next page

Disclaimer: This is a personal portfolio for information purposes only and does not constitute a recommendation or solicitation or expression of views to influence readers to buy/sell stocks, including the particular stocks mentioned herein. It does not take into account an individual investor’s particular financial situation, investment objectives, investment horizon, risk profile and/or risk preference. Our shareholders, directors and employees may have positions in or may be materially interested in any of the stocks. We may also have or have had dealings with or may provide or have provided content services to the companies mentioned in the reports.

The great data delusion — where to invest for AI winners

From data to knowledge

The data crisis debunked

Storage is not the problem

The power paradigm

The new economics of data

Investment outlook: Applications win, infrastructure loses

The actionable insight

Related News

The future of media in the AI age (Part 7/9): The interpretation economy

A recurring pattern in the history of information

The future of media in the digital age (Part 6 of 9): Why some media companies survived digital, while others did not

Highlights

Temasek reports record net portfolio value of $518 bil despite challenging environment

Related News

The future of media in the AI age (Part 7/9): The interpretation economy

A recurring pattern in the history of information

The future of media in the digital age (Part 6 of 9): Why some media companies survived digital, while others did not