Highlights
The dawn of smaller footprint reasoning models has arrived.
The startup DeepSeek AI, a relatively unknown Chinese company, has garnered attention for its development of a large language model (LLM), DeepSeek R1. Their approach challenges conventional wisdom in LLM training and raises interesting questions about the future of generative AI (GenAI). While the long-term impact remains to be seen, DeepSeek’s streamlined and energy-efficient open source model using a reasoning-centric approach rather than relying on extensive training may provide new opportunities across businesses and industries.
This could significantly lower the entry barrier for AI across sectors and accelerate its adoption by improving accessibility and affordability. More enterprises will have an opportunity to experiment with GenAI use cases to gain a competitive advantage through new products, services, and efficiencies.
DeepSeek’s approach demonstrates a new art of the possible in building open source large language models¾a new, more efficient way that potentially consumes a smaller amount of compute resources but still delivers a mighty performance.
DeepSeek claims a substantially lower training cost (under $6 million) in comparison to other models that can cost $200 million or more. The training methodology emphasizes reinforcement learning, reducing the reliance on supervised fine-tuning. While this approach can lower costs, it’s important to understand the details—training parameters, datasets used, and how performance is measured influences any potential impact to the speed to market of experiments and proofs of concept. These differences can greatly influence comparisons to knowledge-first models.
“DeepSeek’s approach demonstrates a new art of the possible in building open source large language models—a more efficient way that potentially consumes a smaller amount of compute resources but still promises to deliver a mighty performance.” says Naresh Mehta, Chief Technology and Innovation Officer, Manufacturing, TCS. “By leveraging what we call a smaller footprint reasoning model, DeepSeek reportedly managed to optimize its compute cost infrastructure.”
On the technology front, DeepSeek’s approach may positively impact the development of agentic AI, where AI systems can autonomously perform tasks and make decisions. It can enable the creation of more sophisticated AI agents that can be deployed in different business contexts, leading toward SaaS 2.0 (service as a software), with AI capabilities integrated directly into services. Businesses may need to reconsider their AI infrastructure investments to accommodate these advancements.
This streamlined approach helps bring edge AI to life through the optimization of smaller, more efficient AI models, including specialized language models—a key enabler of expanded edge AI capabilities. These models can be optimized for the resource constraints of edge devices, allowing for more complex AI tasks to be performed locally.
Open source models are also best leveraged by enabling more offerings to meet an expanded set of needs, and partnerships will serve a key role in enabling their continued innovation and successful implementation. This AI development opens new avenues for mid-sized businesses in semi-conductors, service integrators, cloud service providers, and startups.
A reasoning-first, knowledge-next approach turns the AI story on its head.
What makes the smaller footprint reasoning models a giant change? It reverses the traditional approach of knowledge-first, reasoning-next by adopting the opposite: reasoning-first, knowledge-next. To illustrate, think of a chess scenario: You train an AI model on the rules of the game (reasoning) instead of data from 5,000 games (knowledge). As the reasoning model plays more games by the rules (reasoning) and accepts the outcome (knowledge), it is encouraged to explore different moves but also taught to bias moves toward those that win through a reward/penalty mechanism. Over time, this reinforcement learning approach autonomously builds the knowledge required to win more games.
This reasoning-first, knowledge-next approach is the biggest factor in the huge training cost advantage. When a model does not require massive datasets with billions of parameters, training costs get reduced. The model, however, is trained to reason with a bare minimum and critical dataset. This streamlined approach cascades down to the value delivered (Fig. 1).
Both approaches offer distinct advantages, and the optimal choice depends on the specific application. A reasoning-centric model demonstrates strengths in logical deduction and structured problem-solving tasks. A knowledge-centric model, given its training on vast datasets, is more adept at contextual understanding, nuanced language processing, and capturing real-world relationships. It’s important to note, however, that effective data privacy practices are crucial for all models, regardless of the training approach. Protecting user data requires robust data governance policies, appropriate privacy-preserving techniques, and careful attention to data management throughout the AI lifecycle.
In our comparative analysis and initial tests, DeepSeek impressed us if prompts were mathematical or scientific or related to coding tasks. Knowledge-centric models did, however, outperform DeepSeek when a prompt called for knowledge that must be learned.
While not unique to DeepSeek, the utilization of several techniques in LLM development can contribute to improvements in transparency, learning, and accessibility when implemented effectively:
With the developer community contributing to the DeepSeek model’s development, it is on a fast track for continuous improvement and updates.
DeepSeek has adopted an open source approach for DeepSeek R1, making the model publicly available for use and modification. This decision offers several benefits:
However, it’s important to acknowledge that the success of an open source project depends on various factors, including the level of community engagement, the quality of contributions, and effective project management.
From major chipmakers to large cloud hyperscalers, the smaller footprint reasoning model is contributing to an evolving landscape across the industry. This evolution will inspire more innovation.
Impact across chipmakers and evolving dynamics
The trend toward more efficient AI models and edge computing presents both challenges and opportunities for chipmakers. While the demand for AI chips is growing, the market is becoming more diverse, and competition is intensifying.
Impact across cloud hyperscalers and adapting strategies
Cloud providers are adapting their AI strategies in response to evolving hardware trends and the increasing demand for efficient AI deployments.
The industry is trending toward an inference-dominated focus.
A significant trend, the industry is effectively evolving from a training-dominated focus to an inference-dominated focus. The rise of edge computing is a key aspect of this shift. For example, rather than training for self-driving car models happening exclusively in big data centers, inference increasingly happens in each car with edge computing. The car runs the model locally, reducing reliance on constant cloud connectivity. This is a valid trend, and the specific monetization models for edge AI are still evolving and not just limited to monthly subscriptions.
The DeepSeek innovation has disrupted carefully crafted roadmaps to AI maturity and thrown open a multitude of new possibilities for value creation and growth.
The evolution of the smaller footprint reasoning model has taken the AI story to a higher trajectory with a growing emphasis on efficiency and accessibility. It will inspire innovation and potential ripple effects that include:
DeepSeek’s approach highlights a major shift toward open source models that are smaller in footprint and higher in efficiency. The innovation will effectively reshape the AI landscape, spawning the creation of more powerful models while improving overall accessibility and affordability. The release of new AI models is already contributing to improvements in terms of development costs, performance, and speed to market.
While it’s impossible to predict the next disruptive event, we can share trends that we’re watching closely. As open source models begin to dominate the industry, partnerships will become a critical element of the AI landscape. With lower costs, mid-size businesses will become universally interesting to everyone from chip makers to hyperscalers. An acceleration in agentic AI will accelerate, in turn, SaaS 2.0 (service as a software). And the sheer speed of smaller footprint reasoning models will require the right people and infrastructure to scale.
The DeepSeek disruption has thrown open a multitude of possibilities and forced everyone to rethink their carefully crafted roadmaps to AI maturity. It is a textbook case study of why enterprises need to perpetually adapt to stay ahead of the technology curve. How a company responds to all of this—whether making decisions for the present or future—could make or break its business.