How reinforcement learning is upending the Artificial Intelligence infra stack

Reinforcement learning can generate renewable, proprietary data that changes where value sits across the Artificial Intelligence infrastructure market. Post-training and agent builders may displace commodity inference providers by producing owned interaction data.

At TechCrunch Disrupt 2025, Eric Anderson of Scale Venture Partners and Kyle Corbit, CEO of OpenPipe, argued that data scarcity is the central constraint on the next wave of Artificial Intelligence breakthroughs. Historical datasets such as ImageNet and the accumulated public web that powered early large language models are finite. Speakers pointed to the need for a renewable, ownable data source and positioned reinforcement learning as a mechanism to create exactly that through continuous interaction and feedback.

The talk used Google as a useful analogy: the first algorithm scraped the open web and produced a commodity dataset, while a second, interaction-driven algorithm captured proprietary user behavior and produced a durable advantage. Reinforcement learning has already been applied in limited forms, notably OpenAI’s use of human feedback in 2022 with ChatGPT 3.5, and more recently in models tuned for reasoning. The next step is agents that learn by interacting inside defined environments such as spreadsheets, CRMs, or websites. Those purpose-built agents can outperform general models within their containers, though their learned behaviors do not necessarily generalize across domains.

OpenPipe’s demonstrations highlighted practical tradeoffs: smaller models like QWEN 14B can be far cheaper and much faster, and when allowed to explore and reinforce their own behavior they can surpass larger models in task performance. That dynamic suggests a market shift: post-training, which injects experience-driven updates, may grow to subsume parts of inference and evaluation infrastructure. As models cycle between sampling and continued training, vendors that provide post-training and agent tooling could capture upstream value and bundle inference, while GPT wrappers and niche agent builders may regain defensibility by creating proprietary interaction datasets. The speakers concluded that reinforcement learning will reshape where investment and product opportunity sit across the Artificial Intelligence infra stack.

62

Impact Score

Europe weighs technology sovereignty push amid internal debate

Europe is preparing a new policy push to reduce reliance on major technology platforms, but internal disagreements are shaping the scope and pace of the effort. The Artificial Intelligence Development Act is due to be unveiled on June 3 after repeated delays.

EU Artificial Intelligence Act omnibus deal delays high-risk rules

A provisional EU agreement would push back key high-risk Artificial Intelligence Act deadlines while keeping major transparency duties on track for 2 August 2026. The deal also adds a new ban on non-consensual intimate imagery and child sexual abuse material generated by Artificial Intelligence systems.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.