Across Asia Pacific and Japan (APJ), the AI conversation has been dominated by the glamour of model training: building increasingly complex language models, sourcing vast datasets, and running GPU clusters that rival the cost of major infrastructure projects. Yet this is only half the story.
The real test begins when those models step out of controlled environments and into the unpredictability of real business operations. This is the domain of inference, the moment where AI actually starts earning its keep. It is where trained models are embedded into workflows and make decisions in real time, determining whether an organisation’s AI strategy delivers measurable value or remains an expensive experiment.
Unlike training, which is episodic, capital-intensive, and bounded in time, inference happens constantly and at scale. Every customer query, fraud alert, manufacturing sensor reading, or logistics forecast represents a moment of inference—a decision made by a model in motion. This makes inference the operational heartbeat of enterprise AI.
In a region where 78% of professionals now use AI at work weekly, the economics of inference are becoming impossible to ignore. While training might cost tens of millions upfront, inference defines ongoing cost efficiency, latency, and sustainability. The winners will not simply build the biggest models, but run them more intelligently across hybrid environments, with the right balance of performance, control, compliance, and cost.
From model training to market impact
Training models builds potential. Inference builds performance. Training is like teaching a student; inference is the graduate applying their knowledge under real-world pressure.
See also: Anthropic pulls Mythos, Fable models as US bans foreign access
Training state‑of‑the‑art models demands large upfront investments and significant energy, yet that expenditure is finite. Inference, by contrast, runs continuously - every second, in every data stream, across every customer interaction. Business impact happens when AI quietly makes thousands of micro-decisions that improve response times, personalise services, and predict demand before it surges. Each of those decisions relies on infrastructure tuned for resilience and responsiveness.
For chief information officers (CIOs), this reframes the AI conversation. The challenge is no longer only about building smarter models, but about running them smarter. This requires infrastructure that keeps latency low, ensures governance, and scales predictably without forcing teams to rebuild systems for each deployment. That is where cloud-native, Kubernetes-based infrastructures come into play, providing the scalability, portability, and control to operationalise inference at enterprise scale.
In essence, training creates intelligence. Inference operationalises it.
See also: Jeff Bezos’ AI start-up Prometheus valued at US$41 bil in funding round
Hybrid infrastructure: Where performance meets practicality
Running inference efficiently has shifted from a technical preference to a strategic differentiator.
Enterprises that can execute inference seamlessly across on-premises, edge, and public-cloud environments gain faster insights, stronger compliance, and sharper cost control. A recent study suggests that 75% of enterprise AI workloads in APJ will be deployed on hybrid infrastructure by 2027, confirming this as the dominant operating model.
Hybrid architectures enable inference near the data source, reducing latency by up to 40% compared with cloud-only setups and cutting compute costs by up to 60% for certain workloads. By designing for locality, placing inference nodes close to customer records, sensor data, or financial transactions, organisations gain the dual benefits of compliance with data-sovereignty rules and improved responsiveness.
Increasingly, technology leaders are asking not “Where can I run inference?” but “How do I design an infrastructure that continuously optimises for performance, sustainability, and cost, no matter where inference happens?”
The answer is a single operating model that spans environments. Platforms that unify virtualisation, storage, networking, and Kubernetes empower teams to deploy and manage inference consistently wherever it runs. This reduces operational complexity while keeping AI projects scalable and sustainable.
Sustainability and localisation
To stay ahead of the latest tech trends, click here for DigitalEdge Section
The sustainability conversation in AI is shifting from the cost of training to the lifetime energy consumed by inference.
While a single training cycle may devour thousands of megawatt-hours, inference, though lighter per transaction, runs perpetually and at scale. Analysts estimate it may account for up to 60% of AI’s lifetime energy footprint.
Running inference closer to data sources, in regional data centres or at the edge, reduces energy waste and unnecessary data movement. For APJ, this localised approach aligns with rising expectations for data sovereignty and greener operations, particularly in markets such as Singapore, Japan, and India.
Building sustainable, containerised inference infrastructure is no longer just good practice; it is a competitive advantage. Enterprises that align efficiency with compliance will be the ones best positioned to scale AI responsibly while maintaining brand trust and regulatory confidence.
Turning inference into strategy
The era of AI hype was driven by breakthroughs in model training. The era of real enterprise value will be defined by excellence in inference: where models do not just exist, but deliver measurable results, responsible and repeatable outcomes.
For APJ organisations, success will depend on creating flexible, open, and sustainable systems that keep inference close to data, optimise performance across hybrid environments, and maintain full control over security, sovereignty, and cost.
By investing in modern, cloud-native hybrid infrastructure, business leaders can transform inference from a technical detail into a strategic lever, fuelling better decisions, more personalised customer experiences, and more energy-efficient, resilient operations.
Inference is no longer a supporting act. It is the stage where AI delivers.
Jay Tuseth is the vice president and general manager for Asia Pacific and Japan at Nutanix
