Artificial Intelligence has reached an awkward middle stage. Companies have built the technology and promised far-reaching transformation, but the route from capability to dependable economic value remains uncertain. Critics argue that the missing step should involve regulation, while industry advocates focus on the promise of an “economically transformative technology” without much clarity on how that transformation will actually happen in practice.
Recent research shows why the debate is so unsettled. One study from Anthropic predicted which jobs are most likely to be affected by large language models, suggesting major change for managers, architects, and media workers, while groundskeepers, construction workers, and hospitality staff may see less disruption. But those projections rest on assumptions about which tasks language models appear to handle well, not on direct evidence of workplace performance. Another study, put out in February by researchers at Mercor, an Artificial Intelligence hiring startup, tested several Artificial Intelligence agents powered by top-tier models from OpenAI, Anthropic, and Google DeepMind on 480 workplace tasks frequently carried out by human bankers, consultants, and lawyers. Every agent they tested failed to complete most of its duties.
The disagreement reflects both incentives and methodology. Companies making bold forecasts often have a stake in the outcome, and many predictions rely heavily on rapid improvements in coding tools. That can obscure the fact that many jobs require other abilities, including strategic judgment, where large language models have been found to perform poorly. The result is a growing gap between sweeping claims about the future and the more limited evidence emerging from practical deployment.
Real workplaces add another layer of difficulty. Artificial Intelligence systems do not arrive in clean, controlled environments. They must fit into organizations shaped by existing processes, habits, and human decisions, and in some cases they can make those workflows worse rather than better. Rebuilding operations around the technology may eventually unlock larger gains, but doing so will take time and willingness to change how work is organized.
The central problem is a lack of shared evidence about what Artificial Intelligence can do in the real world and how it should be deployed. That vacuum is being filled by exaggerated claims that can sway public opinion and even markets. More reliable progress will require greater transparency from model makers, closer coordination between researchers and businesses, and better evaluation methods that measure real-world outcomes instead of relying on speculation.