Executive Summary
The idea that AGI will suddenly rewrite itself into godlike intelligence is a myth born of Hollywood and sloppy reasoning. In reality, recursive self-improvement is throttled by the same limits that bind today’s frontier models: compute costs measured in the hundreds of millions, training runs that last weeks or months, energy demands in megawatts, and memory and bandwidth bottlenecks that physics won’t bend for anyone. Progress may feel fast compared to human research cycles, but it will be nothing like an instant detonation.
AGI will not be uncontrollable in seconds. Training loops can be paused, jobs killed, power cut. Models are heavy, data-center bound, and far from the fantasy of “escaping into the internet.” History shows what really happens when revolutionary tech arrives: nuclear fusion, quantum computing, and biotech all promised overnight transformation and all delivered slow, incremental progress.
The danger with AGI isn’t that it will explode beyond our control – it’s that we’ll misuse it, deploy it recklessly, and hand it authority it isn’t ready for. AGI, if it comes, will crawl. The fantasy is instant apocalypse. The reality is expensive, incremental, and still very much under human hands.
The sci-fi script is familiar: flip a switch, AGI wakes up, recursively self-improves in seconds, slips into the internet, and humanity becomes a background process. It’s punchy. It’s also not how any of this works.
If AGI arrives, it won’t detonate. It will crawl, constrained by compute, math, energy, bandwidth, supply chains, and the boring reality of multi-month engineering cycles. The “unstoppable” part isn’t physics; it’s people wiring immature systems into critical infrastructure without brakes. That’s a human problem, not a laws-of-the-universe problem.
Why the “explosion” myth sticks
Two culprits:
- Exponential growth confusion. People routinely misread exponential processes as instantaneous. This isn’t just a vibe; empirical work shows a persistent “exponential growth bias” across domains. We underestimate curves, especially when they’re noisy or framed poorly. ScienceDirect
- Hollywood conditioning. Decades of “Skynet in sixty seconds” taught us to expect instant flips. The actual debate in the literature is about takeoff speed. Bostrom popularized “fast, medium, slow” takeoff; even in the “medium” case we’re talking months or years, not minutes. Christiano and others have long argued the default is a slow(er) takeoff that still feels fast on human timescales. Bostrom, Christiano
The bottlenecks that kill an overnight explosion
Let’s be tediously concrete.
1) Compute is expensive and finite
Frontier model training is already staggeringly costly. The Stanford 2024 AI Index estimated GPT-4’s training compute at roughly $78M and Gemini Ultra’s at about $191M in compute cost alone. That’s per run, not a one-click “retry.”
Training itself is not a coffee-break event. Microsoft and NVIDIA’s 530B-parameter MT-NLG run took months on 2,000+ A100s, consuming millions of GPU hours. That’s a single training job, not a sci-fi self-improvement loop spinning new versions every ten seconds.
Hardware throughput isn’t exploding for free. Moore’s law has slowed, Dennard scaling ended years ago, and the field shifted to domain-specific accelerators because general-purpose gains stalled. Communications of the ACM
2) Math does not care about your plot twist
The original scaling-law papers showed predictable power-law improvements with data, parameters, and compute. Later work (Chinchilla) showed many big models were compute-suboptimal and that optimal training trades off size and tokens. In practice, you hit diminishing returns and harsh efficiency tradeoffs. Even optimists like Bahri et al. 2024 underline that scaling regimes change; none of that says “infinite acceleration by Tuesday.”
3) Energy is a wall, not a suggestion
Large training runs already burn megawatts; Google’s carbon accounting paper quantified this. Industry-scale plans reflect reality: Microsoft and OpenAI’s proposed Stargate datacenter pegs required power at multi-gigawatt levels. You don’t recursively self-improve in milliseconds when your next training round needs power comparable to a fleet of power plants.
4) Memory, bandwidth, and data movement are the choke point
Transformer workloads are increasingly memory-bound. You can’t “download omniscience” faster than your interconnects and DRAM can move bytes. Even if an AGI wanted to “absorb the internet,” the internet is big and slow to ingest. Common Crawl alone ships billions of pages per crawl, hundreds of terabytes compressed.
5) Engineering cycles take time because reality does
Real systems don’t retrain themselves in a vacuum. They require data prep, training schedules, evaluations, safety testing, regressions, and deployment plumbing. The largest dense training we have public details on took weeks to months. MT-NLG
6) Supply chains and physical constraints exist
High-bandwidth memory is sold out a year ahead at leading vendors; advanced packaging (CoWoS) capacity has been the bottleneck. Reuters
The realistic “fast” case: faster than human cycles, slower than sci-fi
Recursive improvement could compress decades of human R&D into years, or years into months. But even with megaprojects, we’re talking $10B–$100B facilities coming online over years, not a software patch that rewrites the universe. Reuters
Industry insiders keep pointing out that scaling isn’t a magic wand anyway; you need algorithmic shifts, not just more chips. Business Insider
The control myth
“Once it starts self-improving we can’t stop it.” That line collapses under how these systems actually run:
- Training jobs run under schedulers. You can cancel them. The standard tooling literally has a command called scancel.
- Datacenters have Emergency Power Off systems. Press button, power dies. TechTarget
- Models are heavy. They live on racks of accelerators with specialized interconnects. Copying multi-hundred-GB or multi-TB weights is bandwidth-bound. arXiv
The risk isn’t that an AI slips into the ether and becomes a god. The risk is humans wiring fallible systems into finance, weapons loops, or the grid, then acting surprised when it fails.
Historical reality check
- Fusion: decades of promises; decades of hard physics. ITER’s schedule keeps pushing to 2034–2039 with multibillion-euro overruns. Physics World
- Quantum computing: enormous investment, big lab wins, but fault-tolerant, million-qubit systems remain far off. Reuters
What to actually watch (instead of Hollywood)
- Training cost disclosure and compute trends. As of 2024, costs were rising sharply. Stanford HAI
- Algorithmic efficiency breakthroughs. Papers like Chinchilla shifted compute-optimal frontiers. Chinchilla
- Power and cooling. Multi-GW campuses are the correct unit of analysis. Data Center Dynamics
- HBM and packaging capacity. If packaging ceases to be a bottleneck, timelines compress. Reuters
- Memory-bound mitigation. Wafer-scale, in-memory compute, smarter KV-cache management. USENIX OSDI’25
- Capital concentration. The fact that $100B-class builds are even discussed tells you where the accelerant is. Reuters
So no explosion… what’s the danger?
Stop worrying about an AI “escaping.” Worry about people deploying brittle systems at scale, overtrusting outputs, or connecting them to high-stakes actuators without layered controls. The empirical record in tech is clear: most disasters are policy, interface, and integration failures, not rogue physics.
The sane view is boring: AGI progress will look predictable, incremental, resource-limited. It will still hit society hard because even a slow curve dwarfs human institutions. But the overnight part is marketing. You will have time to measure, test, and, if necessary, pull the plug.
And yes, the plug is real. It’s in the runbook. It’s labeled. It works. Slurm docs
Related post AGI Is Not Around the Corner: Why Today’s LLMs Aren’t True Intelligence