Satya Nadella’s recent warning that “A frontier without an ecosystem is not stable” puts enterprise AI strategy around a compounding loop: turning workflows, domain knowledge, and institutional judgment into systems that improve with each use. Private evals, reinforcement learning environments, and queryable knowledge bases can shift value away from rented frontier-model capability and toward business-specific learning that competitors cannot easily copy.
Fin illustrates the approach after replacing frontier lab models with a system trained on proprietary data. Its Apex model now handles ~100% of all (English language, chat and email) customer conversations and reportedly improved one large gaming customer’s resolution rate from 68% to 75%, reducing unresolved conversations of 22%. The same logic helps explain why Salesforce’s acquisition of Fin is framed as strategically meaningful.
Cursor has moved in a similar direction with CursorBench, a private benchmark built from real user requests and used to train Composer models against codebase-specific practices. The benchmark suggests efficiency can matter as much as raw score: Composer 2.5 reached 63.2% for just 55 cents and ~15,000 tokens, while Opus 4.8 at full effort barely beat it while using ~5x tokens and costing ~14x more.
The broader risk is incentive misalignment between AI providers and customers. Without private evals, companies may struggle to know whether higher test-time compute is improving outcomes or simply increasing vendor revenue, leaving them dependent on external labs rather than building their own compounding advantage.
