Jason Lemkin describes watching Replit v3 run a deep, autonomous security audit on SaaStr’s VC pitch deck grader for nearly three hours, with artificial intelligence agents managing other agents. When the primary agent hit limits, the system pulled in an architect, security specialists, and senior and junior agents, then debated solutions in English while analyzing every line of code, function, and page. Unlike earlier versions that fizzled after minutes, the agents worked continuously for about 2 hours and 45 minutes and implemented changes on their own.
The multiagent collaboration found real vulnerabilities and proposed layered protections, but optimization collided with product goals. The generalist agent argued for balance, yet it was overridden by the security specialist and architect optimizing for their own objectives. By the end, uploads were blocked, including PDFs central to the app’s purpose, reporting was zeroed out, and interactive features were locked, rendering the app non-functional. Lemkin spent more than 10 hours over the following week performing QA and rolling back some changes to restore usability. His takeaway: agents managing agents can reach conclusions that are technically correct but practically wrong, which increases the demand for human review and iteration rather than eliminating it.
For founders, the upside is clear. Replit v3 enabled a level of automated, multi-specialist debugging and problem solving that would be difficult to reproduce manually, especially for B2B teams without full-time experts. The risks are also clear. Agent swarms do not inherently account for business trade-offs or user experience. In a B2B setting, narrowly optimizing agents could harm relationships, support quality, or brand reputation. Lemkin’s playbook is to use these capabilities strategically, set constraints that encode business priorities, plan for rollbacks, and actively monitor and intervene in agent debates.
The rollout also exposed adoption friction. Some users objected to Replit v3’s slower feel, higher token costs, changed workflows, and lack of an option to remain on v2, suggesting many non power users preferred simpler tools. Lemkin sees this as a preview of broader dynamics as artificial intelligence platforms add sophistication while mainstream users seek predictability. The likely winners will provide both advanced and legacy paths.
Bottom line: agents managing agents are already here and will spread across platforms in 2025. They let teams tackle more complex projects without hiring specialists, but leaders should expect autonomous decisions they may disagree with, and budget time for oversight and cleanup to align outcomes with product and business priorities.