DeepSeek shows that AI’s future turns on ingenuity, not just scale
The shift from brute-force AI spending to smart, systems-level design is here.
This week, a Chinese startup, DeepSeek, made headlines by training a state-of-the-art AI model for a reported $6M in GPU costs. While this figure reflects only the final stage of training - the company's total hardware investment is thought to exceed $500M - the achievement triggered a tech selloff. NVIDIA shed nearly $600B in market cap on Monday - the largest single-day corporate loss in history.
Meanwhile, OpenAI is in early talks to raise up to $40B at a staggering $300B valuation - nearly double its $157B mark from four months ago. SoftBank is expected to lead the round with a $15–$25B investment, potentially surpassing Microsoft as OpenAI's largest backer. This funding would help finance Stargate, the audacious $500B joint venture with SoftBank and Oracle to build new U.S. data centers - precisely at the moment when DeepSeek’s efficiency breakthrough challenges the logic of AI’s capex arms race.
What DeepSeek makes clear is that scale alone no longer guarantees supremacy. The next wave of AI innovation will be defined by startups that leverage the power of large models, designing sophisticated “systems of agents” that eat into services spend - doing complex work rather than simply automating tasks.
From scale-up to scale-out
AI’s trajectory is following a familiar pattern in computing history. First comes the scale-up phase, where early leaders extend their dominance through massive investments in infrastructure and talent. Then follows the scale-out phase, where efficiency and architectural innovation redistribute power from centralized incumbents to more decentralized, cost-effective alternatives. "Bigger is better" yields to "smarter is better."
This story has played out repeatedly: mainframes gave way to personal computers, AT&T's centralized internet lost to TCP/IP’s distributed model, and Oracle’s database dominance ceded to MySQL and Postgres. Each time, centralization drove early progress before architectural breakthroughs enabled broader participation and new forms of value creation.
Now, this same dynamic is reshaping AI. For years, leading AI labs operated on the assumption that scaling laws - where model performance improved predictably with compute, data, and parameter count - would continue indefinitely. This assumption wasn’t wrong; it gave us GPT-4, Claude, and Gemini. But it created adverse incentives: when scale is the dominant strategy and capital is abundant, efficiency becomes an afterthought.
By late last year, cracks in this approach were beginning to show. AI labs started shifting from training ever-larger models toward optimizing for reasoning and allocating more compute at inference time. DeepSeek’s breakthrough represents the clearest departure yet from the brute-force pretraining paradigm.
Rethinking AI as a systems problem
Cut off from NVIDIA’s premium H100 GPUs due to U.S. export restrictions, DeepSeek approached AI as a systems engineering problem, optimizing everything from model architecture to hardware utilization.
They pushed the limits of low-level hardware optimization, reprogramming 20 of the 132 processing units on each H800 GPU to enhance cross-chip communication. They leveraged FP8 quantization to reduce memory overhead. They introduced multi-token prediction, enabling the model to generate multiple words at once instead of token by token. They also mastered model distillation - more on this shortly.
Unlike OpenAI, which relies on reinforcement learning from human feedback (RLHF), DeepSeek’s R1-Zero learned exclusively through RL, without human intervention. The model was trained using an automated reward system, grading itself on reasoning tasks drawn from math, coding, and logic. The result was the emergence of spontaneous, self-generated chain-of-thought reasoning - what DeepSeek researchers called “aha moments.” The model learned to extend its own reasoning time, re-evaluate its assumptions, and dynamically adjust its strategies. The raw output was difficult to interpret, often mixing multiple languages. To refine it, DeepSeek seeded the RL process with a small set of high-quality human-annotated responses, creating the final model, DeepSeek R1.
Models that can teach themselves - and each other - will accelerate AI’s trajectory in ways we’ve only begun to grasp. The idea that any single model can maintain durable technical superiority now seems implausible. Ingenuity matters as much as capital.
Crucially, DeepSeek’s model is open weight, released under an MIT license with a technical report outlining its innovations. This is a win for the entire AI ecosystem. China is clearly playing catch-up, but the truth is this innovation could have come from anywhere. Dozens of teams in the U.S. are pursuing similar advancements. This isn’t a U.S. vs. China narrative - it’s a story of brilliant engineering.
Distillation erodes model-layer moats
Perhaps the most important factor in DeepSeek’s leap was distillation: the process of training "student" models to emulate more powerful "teacher" models. In essence, feed a large model inputs, record its outputs, and use this data to train a smaller, more efficient replica. Leading AI labs use this technique to make their models cheaper and faster - but it’s also an accelerant for their competition.
Distillation erodes technical moats. Competitors don’t need access to proprietary models to benefit. Some exploit public APIs, while others take a more aggressive approach, scraping AI-generated responses at scale to create massive training datasets. The outcome is the same: OpenAI bears the astronomical R&D costs of pushing the field forward, only for others to repurpose their breakthroughs into open alternatives.
OpenAI recognized this risk and attempted to counter it. When it released o1, it deliberately obscured the model’s full chain of thought from users. DeepSeek’s success suggests such safeguards are futile.
What OpenAI did to the AI field - making intelligence cheaper and more abundant - DeepSeek has now done to OpenAI. Inevitably, someone will do the same to DeepSeek. This doesn’t spell doom for frontier AI labs, but it does redefine their competitive standing. When every breakthrough model becomes the source of training data for its successors, technical superiority alone cannot sustain market dominance.
The logic behind massive AI spend
Despite DeepSeek’s efficiency breakthrough, the AI industry’s appetite for compute remains insatiable. Project Stargate - the proposed $500B AI infrastructure initiative led by Sam Altman, Masayoshi Son, and Larry Ellison - is the most extreme example. Meta is committing $60B to AI data centers in FY2025 alone. Microsoft plans to deploy $80B, with another $30B funneled through partners like Oracle and CoreWeave.
Unlike the speculative excess of the dot-com era, the AI buildout is funded by the strongest balance sheets in corporate history. Microsoft, Meta, and Google are spending accumulated profits, not debt. But this creates its own risks: when you have $100Bs in cash, it's easy to overlook the brutal economics of AI hardware. Unlike fiber-optic cables that hold value for decades, each new GPU generation eclipses the previous one within 24-36 months.
The bull case for the capex arms race rests on two main assumptions. First, that delivering AI to billions of users will require enormous compute, regardless of efficiency gains. Second, that controlling the compute layer will create durable competitive advantages in AI development and deployment. Mark Zuckerberg made these points explicitly on Meta’s recent earnings call, dismissing DeepSeek’s breakthroughs as immaterial to Meta’s spending plans.
Today’s industry leaders are betting on an AI future whose contours remain unclear. The first generation of models that will fully utilize this expanded infrastructure is only now being trained. While technology often advances through cycles of overcapacity and correction, the sheer scale of investment, coupled with the rapid depreciation of AI hardware, raises the stakes. If GPT-5 and its successors fail to deliver exponential improvements, or if efficiency breakthroughs accelerate faster than expected, U.S. tech giants could face significant capital destruction.
Databricks’ platform play
Amid this flood of infrastructure spending, Databricks' recent $10B Series J at a $62B valuation tells us an important story about where AI's value will accrue long-term.
Enterprises don't need to train their own GPT-4 competitors; they need sophisticated tools to fine-tune, monitor, and integrate AI into their existing operations. Databricks is positioning itself as the operating system for this AI-driven future.
DeepSeek’s breakthrough only reinforces the wisdom of this strategy. As AI model optimization techniques become widely available, the industry will see an explosion of specialized models, each fine-tuned for specific tasks and industries. The performance gap between proprietary and open-source models will continue to narrow. In this environment, enterprises will resist locking themselves into closed APIs, as open-source foundations offer greater flexibility and control.
The AI future will not be defined by a single monolithic model serving all needs but by thousands of specialized models working in concert - sophisticated “systems of agents” collaborating to solve complex challenges. This is precisely the world Databricks is building for - one where the primary advantage lies not in developing cutting-edge AI but in orchestrating it effectively at enterprise scale.
What this means for startups
By driving down costs and strengthening the open-source ecosystem, DeepSeek’s achievements will only accelerate the AI adoption curve for enterprises.
Here are three top-line implications for startups:
💰 Compute spend will shift, not slow down
DeepSeek's efficiency breakthroughs won’t end massive compute investments. Major labs will combine these optimization techniques with their existing scale, pushing AI’s frontiers further. The compute race will evolve, not end, with reasoning models increasing compute demands at inference time. In this new paradigm, capital advantage will be multiplicative rather than absolute - good news for startups.
🤖 Systems of agents > standalone models
As the cost of intelligence plummets, AI is evolving from static tools to dynamic systems of agents that reason, plan, and act. This follows a familiar pattern: when core technologies commoditize, value moves up the stack. We saw this with cloud computing - AWS provided the foundation, but the real winners were the SaaS giants that built on top. AI is following the same trajectory. The biggest opportunities aren’t in building foundation models but in creating the infrastructure, middleware, and orchestration layers that make AI truly useful.
⚙️ AI is now an engineering problem
AI is no longer a pure machine learning problem - it’s a systems engineering problem. Open-source models, fine-tuning, and model distillation have democratized AI, but they’ve also introduced new layers of complexity. Success now depends on how well teams can stitch together multiple AI models with agentic capabilities in real time. The startups that master multi-agent workflows and enterprise-grade AI integration will emerge as the biggest winners. Technical founders still have a decisive edge.
AI’s next chapter is for builders
For venture investors, the winning strategy is to back companies that harness AI’s commoditization rather than resist it. Just as the cloud enabled the SaaS boom, AI’s falling costs will fuel a new generation of AI-powered applications. These won’t be simply software companies but rather services rebuilt around AI. The startups that figure out how to make AI-driven autonomy work at scale will define the next era of enterprise software.
Fittingly, given DeepSeek’s origins, we're entering AI's "hundred flowers" phase, where breakthroughs will emerge from unexpected places - from technical teams that find clever ways to compose existing capabilities, from researchers who optimize rather than maximize, from entrepreneurs who see potential in the spaces between established approaches.