AnalysisAi4 min read
Why AlphaGo's David Silver Thinks AI Is on the Wrong Path
DeepMind's David Silver reportedly raised $1.1 billion for a new AI company focused on learning without human data. Here is why that matters.
Omer YLD
Founder & Editor-in-Chief
4 min · 793 words
Filed from · IstanbulPhoto · Amos K / Unsplash
David Silver, the DeepMind researcher best known for AlphaGo, has reportedly raised $1.1 billion for a new AI company built around a provocative thesis: modern AI is too dependent on human-generated data. TechCrunch and Wired both covered the raise and Silver's argument that today's dominant approach may be the wrong path for the next leap.
That does not mean large language models are useless. It means the next breakthrough may not come from scraping more text, images, code, and video from the internet. It may come from agents that learn by acting, experimenting, failing, and improving in environments where human data is not the ceiling.
The current AI recipe
Most mainstream AI progress since 2020 has followed a familiar pattern: collect huge amounts of human-generated data, train a model to predict and generate patterns from it, then refine behavior with human feedback and tool use. That recipe produced astonishing systems. It also created obvious limits.
Human data is messy, biased, copyrighted, repetitive, and finite. The best text on the internet does not automatically teach a robot to manipulate objects, a scientific agent to run better experiments, or a planning system to discover strategies humans never wrote down.
Silver's critique is that imitation can only take AI so far. At some point, a system has to learn from consequences.
Why AlphaGo is the reference point
AlphaGo mattered because it showed a different kind of learning. The system used human games, but its most important gains came from self-play and reinforcement learning. It improved by playing millions of games against itself, discovering strategies that even elite players found surprising.
That is the dream Silver is reviving at broader scale: systems that can generate their own learning signal. Instead of asking, "What would a human write next?" the system asks, "What action leads to a better outcome?"
TechnerdoThe internet taught AI to talk like us. Reinforcement learning is the bet that AI can learn to discover things we never wrote down.
Where this could matter first
The most plausible early wins are domains with clear feedback loops:
- Games and simulations. Agents can practice endlessly and measure success.
- Robotics. Simulated environments can teach grasping, navigation, and manipulation before real-world transfer.
- Scientific discovery. Models can propose experiments, simulate outcomes, and optimize toward measurable targets.
- Software agents. Code either passes tests or fails them, giving a feedback signal beyond human-written examples.
- Operations and logistics. Scheduling, routing, and resource allocation can be optimized with reward functions.
The challenge is that real life rarely provides clean rewards. A Go game has a winner. A personal assistant booking travel has preferences, risks, exceptions, and human taste.
Why investors care
The AI market is crowded with companies building variations on the same stack: more data, bigger clusters, larger models, enterprise wrappers. A credible team offering a different route attracts attention because it could change the cost curve.
If AI systems can improve through synthetic environments, self-play, and feedback loops, they may need less licensed human data and fewer brute-force scaling gains. That is a big if, but it is exactly the kind of if that gets funded when the current path is expensive.
What could go wrong
Reinforcement learning is powerful but brittle. Reward functions can be gamed. Simulations can fail to transfer to the real world. Agents can discover shortcuts that technically satisfy a metric while violating human intent. Anyone who has watched a game AI exploit physics glitches understands the problem.
There is also a safety question. Systems that learn by acting need boundaries. An AI that improves through experimentation should not be experimenting freely on users, markets, infrastructure, or public networks.
Note
The real benchmark
The question is not whether reinforcement learning can beat humans in controlled games. It already has. The question is whether it can produce reliable general-purpose systems in messy real-world domains.
Bottom line
Silver's new company is a bet against imitation as the final form of AI. It argues that future systems need to learn from the world, not just from our records of the world.
That is a credible bet. It is also a hard one. If it works, the next wave of AI may look less like a chatbot trained on the internet and more like a problem-solver trained through experience.
— ∎ —
Was this piece worth your five minutes?
Join the conversation — sign in to leave a comment and engage with other readers.
Loading comments...



