the genai paradox: superhuman models but mixed success with enterprise Artificial Intelligence

Frontier models are showing superhuman performance on targeted tasks, yet enterprise adoption is lagging because the bottleneck is integration and process rather than model quality. New studies on GPT-5 and voice agents highlight capability and clear routes to practical impact for Artificial Intelligence when workflows are redesigned.

Frontier Artificial Intelligence models continue to advance on high‑stakes benchmarks while real‑world enterprise returns remain uneven. A controlled evaluation of GPT‑5 on multimodal medical reasoning found large gains over GPT‑4o on the MedXpertQA benchmark (+29.26% in reasoning and +26.18% in understanding) and reported performance above pre‑licensed human experts (+24.23% and +29.40% respectively). The paper also noted a nuance: on the smaller VQA‑RAD dataset, GPT‑5‑mini slightly outperformed the full GPT‑5 model, suggesting that right‑sizing can sometimes beat brute‑force scaling for niche tasks.

A separate large field experiment examined voice agents in hiring, randomizing more than 70,000 applicants for customer service roles in the Philippines to human interviewers, an Artificial Intelligence voice agent, or a choice between the two. The AI‑led interviews produced materially better hiring outcomes: 12% more job offers, 18% more job starts, and 17% higher 30‑day retention. When given a choice, 78% of applicants chose the AI interviewer. Transcript analysis pointed to greater consistency in interviews as the likely mechanism, and reported gender discrimination by interviewers nearly halved under the AI condition.

These capability wins sit against an enterprise adoption backdrop described by MIT’s NANDA initiative in The GenAI Divide. The report finds only about 5% of corporate Artificial Intelligence pilots drive rapid revenue acceleration, with most pilots stalling due to a learning and integration gap rather than model quality. Purchasing specialized tools and partnering succeed roughly two‑thirds of the time, while internal builds succeed only about one‑third as often. The newsletter draws practical lessons: treat adoption as a process problem, separate interaction from adjudication so humans make final decisions, redesign workflows for consistent, auditable signal capture, right‑size models to the task, and favor buy‑then‑integrate when speed and reliability are critical. The piece also flags related industry moves such as NVIDIA’s Granary dataset and Anthropic and Mistral model updates, underscoring a fast‑moving technical landscape alongside persistent organizational challenges.

70

Impact Score

EA’s Artificial Intelligence tools reportedly cost game developers time

Electronic Arts announced a partnership with Stability AI even as employees tell Business Insider that in-house Artificial Intelligence tools have created more work than relief, producing hallucinated code and requiring manual fixes. Artists say their work was used to train models, and roughly 100 quality assurance roles were reportedly eliminated after automation of tester feedback.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.