The article describes how the author spent 6 months building artificial intelligence agents in a way that felt wrong, only to later see new research from 12 institutions show that the simple modular approach he used can be 70x more efficient than complex fine tuned agents. The research, involving institutions such as UIUC, Stanford, Princeton, Harvard, UW, Caltech, UC Berkeley, UCSD, Georgia Tech, Northwestern, TAMU, and Unity, identifies 4 main ways to optimize an artificial intelligence agent system and gives names to architectures the author had already built in practice. The key finding is that small specialized tools built around a frozen large language model, referred to as the T2 approach, beat fine tuning massive monolithic agents, referred to as the A2 approach, in data efficiency while matching accuracy.
The framework described in the research divides agent design into 4 approaches: T1, T2, A1, and A2. T1 uses portable, agent agnostic tools such as markdown files and git that can work with any large language model, while T2 builds agent supervised tools around a frozen model such as Claude 3.5. On the agent training side, A1 trains agents from tool feedback when outcomes are verifiable, and A2 fine tunes entire agents from final answers, as in the case of Search-R1, which needed 170,000 training examples and weeks of compute. In contrast, an S3 system following the T2 approach used 2,400 training examples, and the article states that the S3 system and Search-R1 perform the same task, with the S3 system using 70x fewer examples and matching Search-R1’s accuracy while using 70x less training data.
The author explains that his own workflows map directly to the T1 and T2 patterns identified by the research. For T2, he describes a research agent architecture with a frozen agent, Gemini 2.0 Flash via OpenRouter, that is never trained or modified, paired with four specialized tools: query expansion that generates four different search angles, parallel searches via SerpAPI, content extraction using Jina AI to convert webpages into markdown, and a synthesis with memory step that produces a comprehensive report, all costing $0 per query within free tiers. For T1, he uses local markdown files in git, searchable with grep, that work with Claude, GPT, or Gemini and require no vector database, embeddings, or training. He ties this to his earlier “Memory Palace” idea of specialized stores instead of a single massive context, arguing that simple, portable tools that are independent of any specific agent are more robust, since they do not break when switching models, and that the new research formally validates this simple modular philosophy.
