How reinforcement learning is upending the Artificial Intelligence infra stack

Reinforcement learning can generate renewable, proprietary data that changes where value sits across the Artificial Intelligence infrastructure market. Post-training and agent builders may displace commodity inference providers by producing owned interaction data.

At TechCrunch Disrupt 2025, Eric Anderson of Scale Venture Partners and Kyle Corbit, CEO of OpenPipe, argued that data scarcity is the central constraint on the next wave of Artificial Intelligence breakthroughs. Historical datasets such as ImageNet and the accumulated public web that powered early large language models are finite. Speakers pointed to the need for a renewable, ownable data source and positioned reinforcement learning as a mechanism to create exactly that through continuous interaction and feedback.

The talk used Google as a useful analogy: the first algorithm scraped the open web and produced a commodity dataset, while a second, interaction-driven algorithm captured proprietary user behavior and produced a durable advantage. Reinforcement learning has already been applied in limited forms, notably OpenAI’s use of human feedback in 2022 with ChatGPT 3.5, and more recently in models tuned for reasoning. The next step is agents that learn by interacting inside defined environments such as spreadsheets, CRMs, or websites. Those purpose-built agents can outperform general models within their containers, though their learned behaviors do not necessarily generalize across domains.

OpenPipe’s demonstrations highlighted practical tradeoffs: smaller models like QWEN 14B can be far cheaper and much faster, and when allowed to explore and reinforce their own behavior they can surpass larger models in task performance. That dynamic suggests a market shift: post-training, which injects experience-driven updates, may grow to subsume parts of inference and evaluation infrastructure. As models cycle between sampling and continued training, vendors that provide post-training and agent tooling could capture upstream value and bundle inference, while GPT wrappers and niche agent builders may regain defensibility by creating proprietary interaction datasets. The speakers concluded that reinforcement learning will reshape where investment and product opportunity sit across the Artificial Intelligence infra stack.

62

Impact Score

Most UK firms see Artificial Intelligence training gap as shadow tool use grows

New research finds that 6 in 10 UK businesses say employees lack comprehensive Artificial Intelligence training, even as shadow use of unapproved tools becomes widespread and investment surges. Executives warn that without stronger skills, governance and strategy, many organisations risk missing out on expected Artificial Intelligence returns.

COSO issues internal control roadmap for governing generative artificial intelligence

COSO has released governance guidance that applies its Internal Control-Integrated Framework to generative artificial intelligence, offering audit-ready control structures and implementation tools for organizations. The publication details capability-based risk mapping, aligned controls, and practical templates to help institutions manage emerging technology risks.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.