A small Chinese AI firm has shaken financial markets with language and reasoning models developed at a fraction of the cost of their US counterparts. How will this achievement shape the future of AI?
Generative AI (Gen AI) has once again sent shockwaves worldwide, but this time, a Chinese name is hogging the headlines. In just a week, Chinese AI firm DeepSeek has disrupted the tech community, investors and government leaders alike, shattering the long-held belief that running Gen AI is costly and energy-intensive.
DeepSeek claims its V3 large language model (LLM) and reasoning model, R1, are on par with and can surpass OpenAI’s ChatGPT o1 on key benchmarks despite operating at a fraction of the cost and on less advanced Nvidia chips. Reasoning models are the latest fad in the Gen AI world. They are essentially LLMs that break a question into smaller parts and explore different ways of answering it.
Breaking down helps the model handle complex problems in a way that mimics human thinking but typically requires more computing power and energy than simpler AI tasks like pattern recognition.
Impact on AI infrastructure
DeepSeek’s models are estimated to be 20 to 40 times less expensive than OpenAI’s. Since ChatGPT was launched, OpenAI has drastically reduced the cost of its models. GPT4o tokens now cost about US$4 ($5.44) per million tokens, compared to GPT4’s price of US$36 per million tokens at its initial release in March 2023.
See also: Microsoft creates in-house AI models it believes rival OpenAI’s
This considerable price differential raises questions about the necessity of large investments in AI infrastructure by governments, hyperscalers, data centre providers and telcos.
For instance, US President Donald Trump recently announced the new US$500 billion Stargate AI venture from OpenAI, Softbank and Oracle — a colossal figure from the US$100 billion previously floated. Meanwhile, Singapore said it will invest up to $500 million to secure high-performance computing resources to power AI innovation in the private and public sectors.
“We’ve always said that the cost of intelligence will continue to fall. We’ve shown this with 4o-mini and o1-mini so more people can access and also try to prioritise features, like low latency and enterprise-grade reliability, that are important to the user and customer experience,” says OpenAI in a written response to The Edge Singapore on the emergence of a cost-effective challenger.
See also: Nvidia chips, Trump's tariffs and AI's future
“Our frontier models continue to set the standard and we’re just getting started. The reasoning paradigm is still in its infancy, yet we’ve seen tremendous progress — from o1 to o3 — in just months. There’s so much more to come. Over time, we believe the cost of intelligence will continue to fall exponentially, while demand and consumption will grow dramatically.”
Singtel’s regional data centre arm, Digital InfraCo, has been leasing GPU-as-a-Service (GPUaaS) to enterprises since 3Q2024. It says that having more foundational models like DeepSeek will only encourage more enterprises to adopt AI to transform their businesses and become more efficient.
CEO Bill Chang says its GPUaaS business is mostly utilised by large enterprises, including many publicly listed companies and agencies, which are increasingly commencing their AI use cases. “Models like DeepSeek, which can run on much smaller GPUs, will make AI more affordable to enterprises, thereby driving the volume of enterprises adopting AI and creating more demand for AIaaS offerings from RE:AI,” says Chang.
RE:AI is Singapore Telecommunications ’ (Singtel’s) Digital Infraco’s AI cloud service. Chang notes that most customers currently use GPUaaS to train their enterprise models, retrieval augmented generation experiments augmented generation experiments, and perform fine-tuning exercises. Meanwhile, inferencing is still in the early stages and he expects it to grow in the next 18 months as fine-tuning exercises yield more stable and usable models for customers.
Cutting AI’s computing costs could also ease environmental worries. The data centres powering these models guzzle electricity and water, mostly to keep servers from overheating, occupying land and producing electrical waste. Current estimates suggest that data centres account for 3% of global electricity consumption, with predictions indicating a rise to a potential 10% by 2030.
Governments around the world have scrambled to undertake efforts to reduce the environmental impact of AI. Last May, Singapore launched a green data centre roadmap aiming to add at least 300 megawatts (MW) of capacity soon, with more through green energy.
The plan includes boosting the energy efficiency of all data centres in the city-state, deploying energy-efficient IT equipment and offering incentives or grants for resource efficiency. No announcement has been made about how the green data centre roadmap will be executed.
To stay ahead of Singapore and the region’s corporate and economic trends, click here for Latest Section
Meanwhile, data centre operators and hyperscalers like Equinix and Google are exploring new cooling methods and clean energy sources. For instance, recycled water is being reused multiple times to reduce a data centre’s water intake, while direct-to-chip liquid cooling helps dissipate heat from AI chips more efficiently, using less energy than traditional methods by directly targeting the heat source at the chip level. They are also considering using nuclear energy to power their energy-intensive data centres.
More efficient AI models like DeepSeek can complement those efforts as they use less resources and energy without compromising performance. Recognising this, OpenAI told The Edge Singapore that it is “constantly working to improve efficiency”. “We carefully consider the best use of our computing power and support our partners’ efforts to achieve their sustainability goals. We also believe that AI can play a key role in accelerating scientific progress in the discovery of climate solutions.”
Experts, however, warn of the possibility of Jevons paradox, in which greater (model) efficiency could drive down costs, fuelling higher demand and offsetting the savings.
When bigger isn’t always better
DeepSeek’s cost-cutting achievement has been attributed to the “mixture of experts” (MoE) technique, wherein the AI model comprises smaller models, each with expertise in specific domains. When given a task, the AI model only activates the specific “experts” (or smaller models) needed, significantly reducing computation costs during pre-training and achieving faster performance during inference time.
Neither the MoE technique nor the idea of small language models (SLMs) is new. Companies such as French AI company Mistral and IBM have been popularising the MoE architecture over the past year and saw greater model efficiency by combining the technique with open source.
IBM’s Granite 13B is one such example. “Despite being five times smaller than models like LLaMA-2 70B, Granite 13B performs competitively across various tasks, particularly in specialised fields like finance and cybersecurity. Additionally, available in base and instruction-following model variants, Granite is especially suitable for tasks such as complex application modernisation, code generation, fixing bugs, explaining and documenting code, maintaining repositories and more,” claims Tan Siew San, general manager of IBM Singapore.
Instead of focusing on the size of an AI model, Tan emphasises the need for businesses to be able to customise and tailor their foundation models for evolving use cases — in short, to have fit-for-purpose AI models.
“Think of a bus that is carrying just one passenger. Is that the most efficient way to transport that person? In the world of Gen AI, that’s like an enterprise running a complex LLM of more than 70 billion parameters to complete specific tasks that are only accessing and using up to 2% of the data in the model. They do not need to run (or pay for) a model that large. Many enterprise use cases are best served with the enterprise’s own data, and every use case has unique needs. The key for businesses is finding a way to tap into that valuable data by picking the right ‘vehicle’ to eliminate those costly ‘empty seats’,” says Tan.
The usefulness of SLMs is more prominent in enterprises operating in specialised domains like telecommunications and healthcare. Tan says: “With SLMs, the cost of training them with domain-specific enterprise data is lower as they are not retraining an entire model with hundreds of billions of parameters. In addition, these models can be hosted in an enterprise’s data centre instead of the cloud; computation and inferencing take place as close to the data as possible, making it faster and more secure than through a cloud provider.”
Ying Shao Wei, senior partner and chief scientist at NCS, shares a similar view. “Designed for specific tasks, SLMs demonstrate that size is not everything. These models are highly efficient at handling specialised tasks with minimal computational resources, resulting in less energy consumption and a reduced environmental footprint. Given the looming US export controls on AI chips and the restriction of closed model weights, we anticipate a rise in SLMs as businesses seek cost-effective, task-focused solutions.”
Despite the benefits of SLMs, LLMs will continue having a role in the enterprise as they are suited for tasks requiring a broad understanding of various topics and handling complex queries. However, LLMs have been said to “hallucinate” (or produce incorrect or misleading results) and suffer from model drift (wherein its predictive accuracy degrades from the performance during the training period) over time.
Common ways of addressing the accuracy and reliability concerns around LLM include retrieval augmented generation (RAG) and fine-tuning. RAG plugs an LLM into an organisation’s proprietary database so that the model can return more accurate responses with the added context of the internal data. Meanwhile, fine-tuning means retraining a model based on a focused set of data so that the model generates more accurate, domain-specific results.
“By leveraging external, up-to-date knowledge sources, RAG minimises the need for expensive and resource-heavy retraining. It allows businesses to access the most current data without overhauling entire models. While fine-tuning LLMs remains valuable for highly specialised applications, it can require significant investment, making RAG a more versatile and economical alternative for many enterprises,” says NCS’s Ying.
He continues: “The combination of SLMs, RAG, and fine-tuning will play a crucial role in shaping the future of AI. Each approach offers distinct advantages depending on the specific needs of the business, and the growing diversity in AI solutions ensures that companies can choose the most appropriate tools to optimise both performance and cost-efficiency.”
AI agents taking action
DeepSeek’s R1 also uses reinforcement learning, in which an AI agent learns to optimally perform a task (or make a decision) through trial and error without any instructions from a human user. What is interesting is that AI agents are more action-oriented and autonomous than LLM-based chatbots, which create content based on human input.
The tech industry believes AI agents are the next phase of Gen AI and will bring us a step closer to making artificial general intelligence a reality. OpenAI recently released an AI agent called Deep Research, which can conduct time-consuming and complex online research on various topics. This is in addition to its Operator AI agent, which can help users book flights, plan grocery orders, and even complete purchases. Both AI agents are available to the platform’s Pro subscribers on ChatGPT’s online chatbot.
For enterprises, Salesforce and Zendesk offer out-of-the-box AI agents that can autonomously handle basic customer queries or lead qualifications to deliver better customer experience.
Investors are also optimistic that AI agents will be the next frontier of Gen AI. According to CB Insights, 37% of venture capital funding and 17% of the deal activity was to AI start-ups in 2024. Autonomous AI agents saw the largest venture capital deal funding growth of 150% y-o-y last year.
“Real-world applications of AI agents span across multiple domains. For example, AI agents can autonomously place orders with suppliers or adjust production schedules to maintain optimal inventory levels. In healthcare, AI agents can monitor patient data, adjust treatment recommendations based on new test results and provide real-time feedback to clinicians,” says IBM’s Tan.
AI agents can perform complex tasks with a high degree of autonomy as they leverage multiple AI models that operate across various data types. “In practice, AI agents often involve a combination of traditional AI models and generative AI models, including SLMs, which support distinct steps in each workflow. SLMs are crucial in handling specific tasks like real-time speech-to-text conversion. In an agentic AI system, these models work together to provide comprehensive solutions. With the use of SLMs, AI agents can then be fine-tuned for specific tasks, delivering highly effective outcomes and revolutionising industries reliant on intricate workflows,” explains NCS’s Ying.
In the case of a call centre, SLMs can be trained to understand local accents and unique vocabularies, transcribing audio signals into text. The transcribed text is then processed by LLMs that assist human agents by suggesting responses. After a call, additional LLMs can handle tasks such as summarising the conversation, assessing quality and compliance, and alerting supervisors of discrepancies. This seamless workflow — where different AI models are integrated and specialised — has led to measurable gains in productivity, as exemplified in a project NCS did for Singapore’s Ministry of Manpower (MOM).
NCS partnered with AWS to add Gen AI features to MOM’s contact centre, reducing handling time by 12% and cutting average after-call work by over 50%, Ying shares.
Using AI tools also streamlined the tasks of call centre agents allowing them to focus more on each caller’s specific queries. Job satisfaction increased and overall productivity improved by 6%.
Dr Leslie Teo, senior director of AI Products at AI Singapore, agrees that SLMs offer a complementary approach to AI agents. “AI agents are not a specific model per se but rather a framework and approach for groups of AI models (which may or may not be LLMs or SLMs) working to accomplish a particular task. With their compact size and optimised architectures, SLMs can perform complex reasoning and decision-making tasks required by AI agents while maintaining speed and efficiency. SLMs can also be trained to understand specific contexts and domains, allowing AI agents to operate effectively in their intended environments,” he adds.
Generative AI at the edge
SLMs and AI agents, including DeepSeek, are expected to accelerate the advancement of edge devices such as robots, personal computers (PCs) and smartphones.
“The edge AI (or AI-powered devices) and mobile AI space could see long-term implications. While DeepSeek is still a cloud-first model, its efficiency breakthroughs point toward a future where powerful AI can run more effectively on local hardware. This could drive further investment in AI-optimised chips, benefiting firms like Qualcomm, Apple and AMD,” says Leslie Joseph, principal analyst at research and advisory firm Forrester.
NCS’s Ying adds that SLMs and AI agents can make “AI-powered devices more efficient, privacy-conscious, and widely accessible” by minimising the need to transmit sensitive data to cloud services.
SLMs enhance smartphone functionality and privacy. The latest devices from Google and Samsung feature Google’s smallest AI model, Gemini Nano, while iOS devices integrate on-device foundation models. By doing so, users can interact with their phones more seamlessly without needing an Internet connection or cloud-based processing, ensuring sensitive data stays on the device.
Research firm International Data Corp (IDC) predicts that Gen AI-enabled phones will be “the next big thing the mobile industry has to offer consumers”. Worldwide shipments of such smartphones are expected to grow at a CAGR of 78.4% to reach 912 million by 2028.
As for AI robots, SLMs empower them with efficient, on-the-fly language understanding and decision-making. This allows robots to interpret instructions, navigate environments, and perform tasks autonomously without heavy reliance on cloud resources. “Due to their smaller size, SLMs are ideal for edge devices, where power and computational resources are limited. This makes robots more agile and responsive, enhancing their ability to perform real-time tasks in dynamic, real-world environments,” says NCS’s Ying.
AI robots are expected to become mainstream soon. A Citi GPS report in December 2024 indicated investors’ optimism as venture capital investment in robotics reached US$10 billion in 2023, of which 38% went to Asia. The report also estimates that the global AI robot population will hit 1.3 billion by 2035 and increase to 4 billion by 2050. These robots will transform various sectors, including healthcare, manufacturing and hospitality.
Embedding AI into PCs
First introduced in late 2023, AI PCs are expected to see greater adoption this year due to AI advancements and the promise of improved productivity. Data from research firm Canalys shows that 13.3 million AI PC units were shipped globally in 3Q2024, accounting for one-fifth of all PC shipments that quarter. The top three Windows-based AI PC providers were HP, Lenovo and Dell.
These AI PCs are equipped with neural processing units (NPUs) or specialised AI chips, which can more efficiently manage complex computations compared to regular PCs.
“AI PCs represent a significant leap in computing, requiring a sophisticated interplay of hardware and software. Central to this, the NPU handles AI-specific workloads, freeing up the CPU and GPU to focus on their core functions. This specialised processing allows for significantly faster speeds, enabling complex tasks like running generative AI models and AI-assistant applications locally on the PC,” says Jacinta Quah, vice-president for the APJC client solutions group at Dell Technologies.
She continues: “Integration with small language models and AI applications is also essential for providing the intelligent features users anticipate, including improved search capabilities, studio effects, and real-time translations, [even when the PC is not connected to the Internet].”
AI PCs are also energy efficient. Quah says: “Specialised energy efficiency and cooling solutions are important, especially for higher-performance AI PCs, to ensure optimal performance and longevity. Efficient thermal design is another key component for managing the heat generated by high-performance components such as NPUs.”
Enhanced personalisation is another advantage of AI PCs. “AI PCs leverage on-device AI capabilities to learn and adapt individual needs and preferences to help streamline workflows, optimise performance, and enhance user experience,” says Ivan Cheung, vice-president and chief operating officer for Asia Pacific at Lenovo.
The Lenovo AI Now, for example, is an on-device AI agent that offers a user-adaptive computing performance. This is in addition to the AI tools running locally on Lenovo’s AI PCs for task automation, document summaries, and natural language interactions.
Lenovo's ThinkPad X9 Aura Editions feature advanced AI tools like Lenovo AI Now for task automation and workflow optimisation. Photo: Lenovo
Cheung also highlights that AI PCs process data locally on the device instead of in cloud services or platforms, minimising the risk of data breaches and unauthorised access. “This is paramount for data privacy in the AI era, also helping enterprises to comply with regulatory requirements.”
Since AI PCs come at a premium, price-sensitive consumers and businesses have been hesitant to adopt them. In response, AI PC makers are introducing devices at various price points. “Our AI PC lineup — Dell, Dell Pro, and Dell Pro Max — offers a range of configurations across silicon partners (AMD, Intel, Qualcomm, Nvidia), allowing customers to choose the AI capability that best fits their needs and budget,” says Dell’s Quah.
AI PC makers also expect users to recognise the productivity benefits of AI PCs soon, driving greater demand for these devices.
According to Dell’s Quah, AI PCs are reshaping how we work and interact with technology in two ways. The first is the integration of intelligence into familiar productivity apps, which elevates users’ productivity by automating tasks, providing insights, and streamlining workflows, making every action more impactful.
“Equipped with more compact language models, AI PCs also allow businesses to tailor their tools to meet specialised needs in customer service or retail. With advancements in SLMs, these devices can now efficiently support multiple models running simultaneously. This capability opens the door for highly customised, intelligent workflows directly on the PC,” she says.
Lenovo’s Cheung adds that AI PCs will cater to diverse users. “Beyond helping users be more productive and businesses gain a competitive edge, AI PCs can enhance workflows for creative users and deliver the next generation of immersive and interactive experiences to gamers. Even educators are recognising the potential of AI PCs to revolutionise education by providing personalised learning experiences.”
Realising Gen AI’s promise
By open-sourcing R1, DeepSeek enables researchers and developers to use, modify and commercialise the model freely. This could ultimately accelerate the real-world deployment of Gen AI.
“The rise of models like DeepSeek will shift the focus from model development to deployment [that will drive] more practical applications of AI. Companies that prioritise data readiness and ensuring AI outputs are trustworthy, explainable, and actionable will emerge as leaders, while those clinging to outdated notions of AI exceptionalism risk falling behind,” says Mike Capone, CEO of Qlik, a data integration, quality and analytics solutions provider.
He adds: “Ultimately, this shift presents an opportunity for global businesses, especially in Asia. With the region’s diverse linguistic and cultural landscape, localised AI models like AI Singapore’s Sea-Lion have already demonstrated the potential for tailored innovation. DeepSeek accelerates this trend by lowering barriers to entry, encouraging the development of region-specific AI solutions that can cater to unique local demands.”
The South East Asian Languages in One Network (Sea-Lion) is a family of open-source LLMs designed to understand better Southeast Asia’s diverse contexts, languages and cultures. Teo shares that AI Singapore will release a set of small models under Sea-Lion in the next few months, which can be used independently or in an agentic system to provide multi-lingual and multi-cultural perspectives.
“The open nature of our Sea-Lion models means they should be accessible. However, organisations need more than just a model. So, we are also building services and tools such as application programming interfaces (APIs) to reduce the technical barriers to using Sea-Lion models. We’re also actively working with industry partners to identify and develop real-world applications across various sectors, demonstrating their practical value,” he says.
He adds that driving widespread adoption of Gen AI at the national level requires affordable and scalable compute resources, open datasets that meet standardised quality benchmarks, and a robust AI regulatory framework with clear governance and ethical guidelines. Also, fostering research and innovation and upskilling and retraining the workforce are essential.
Meanwhile, organisations looking to fully harness generative AI — including SLMs and AI agents — must design and optimise their IT infrastructure to support AI effectively. “Typically, only 10% of an AI system’s code is the actual AI model; the rest comprises supporting infrastructure and applications. This makes the resilience and robustness of the underlying digital infrastructure crucial for AI success,” says NCS’s Ying.
A recent global survey by NCS and IDC found a strong correlation between AI adoption and digital resilience. Companies with the highest levels of both achieved 1.3 times more revenue and 1.4 times more cost savings than others, highlighting the need for a strong digital foundation before embarking on AI adoption.
Cybersecurity should be another priority, especially for edge devices using SLMs. While SLMs can enhance data security and privacy by operating on-device and keeping the data local, there is no guarantee. Ying says those models still require cloud connections for updates and may be part of hybrid systems that share data externally.
He adds that SLMs are not immune to adversarial attacks and can be vulnerable to data poisoning or manipulation if they are not well-secured. As SLM-powered edge devices become more widespread, enterprises must also address the increasingly complex challenge of managing security across numerous distributed edge devices.
According to IBM’s Tan, there is the risk of malfunction when organisations implement multiple AI agents to execute specific complex tasks. “Multi-agent systems built on the same foundation models may experience shared pitfalls. Such weaknesses could cause a system-wide failure of all involved agents or expose vulnerability to adverse attacks. This highlights the importance of data governance in building foundation models and thorough training and testing processes.”
Data and AI governance are also key to addressing the ethical concerns of SLMs and AI agents. “SLMs still carry risks of bias, misinformation, and ethical dilemmas, yet they are not as good as LLMs in detecting and correcting for these. Their limited capacity may even inadvertently amplify biases present in their training data. With AI agents, their capacity for autonomous decision-making introduces potential risks, from unintended actions to biases in the agent’s learning process. As organisations consider SLMs and agentic AI, robust governance frameworks will be essential to prevent undesirable outcomes,” says Ying.
With DeepSeek open-sourcing R1, we can expect the emergence of new AI models that challenge R1’s efficiency advantages. Forrester’s Joseph advises enterprises to regularly evaluate new AI models against cost, performance, and task suitability to avoid missing out on better alternatives. This requires setting up an LLM evaluation pipeline that tracks inference efficiency, accuracy, and overall ROI across multiple models.
“Net-net, AI adoption is no longer about choosing the best model — it is about strategically integrating multiple models for optimal performance and cost-efficiency. Companies that master this approach will gain a significant competitive advantage as AI continues its rapid evolution,” he says.
Read more about the impact of DeepSeek and generative AI here: