the genai paradox: superhuman models but mixed success with enterprise Artificial Intelligence

Frontier models are showing superhuman performance on targeted tasks, yet enterprise adoption is lagging because the bottleneck is integration and process rather than model quality. New studies on GPT-5 and voice agents highlight capability and clear routes to practical impact for Artificial Intelligence when workflows are redesigned.

Frontier Artificial Intelligence models continue to advance on high‑stakes benchmarks while real‑world enterprise returns remain uneven. A controlled evaluation of GPT‑5 on multimodal medical reasoning found large gains over GPT‑4o on the MedXpertQA benchmark (+29.26% in reasoning and +26.18% in understanding) and reported performance above pre‑licensed human experts (+24.23% and +29.40% respectively). The paper also noted a nuance: on the smaller VQA‑RAD dataset, GPT‑5‑mini slightly outperformed the full GPT‑5 model, suggesting that right‑sizing can sometimes beat brute‑force scaling for niche tasks.

A separate large field experiment examined voice agents in hiring, randomizing more than 70,000 applicants for customer service roles in the Philippines to human interviewers, an Artificial Intelligence voice agent, or a choice between the two. The AI‑led interviews produced materially better hiring outcomes: 12% more job offers, 18% more job starts, and 17% higher 30‑day retention. When given a choice, 78% of applicants chose the AI interviewer. Transcript analysis pointed to greater consistency in interviews as the likely mechanism, and reported gender discrimination by interviewers nearly halved under the AI condition.

These capability wins sit against an enterprise adoption backdrop described by MIT’s NANDA initiative in The GenAI Divide. The report finds only about 5% of corporate Artificial Intelligence pilots drive rapid revenue acceleration, with most pilots stalling due to a learning and integration gap rather than model quality. Purchasing specialized tools and partnering succeed roughly two‑thirds of the time, while internal builds succeed only about one‑third as often. The newsletter draws practical lessons: treat adoption as a process problem, separate interaction from adjudication so humans make final decisions, redesign workflows for consistent, auditable signal capture, right‑size models to the task, and favor buy‑then‑integrate when speed and reliability are critical. The piece also flags related industry moves such as NVIDIA’s Granary dataset and Anthropic and Mistral model updates, underscoring a fast‑moving technical landscape alongside persistent organizational challenges.

70

Impact Score

Treasury secretary indicates federal interest in Intel stake

Treasury Secretary Scott Bessent told CNBC the U.S. is considering converting federal CHIPS Act grants to Intel into an ownership stake and possibly increasing investment to stabilize domestic chip production. He framed that action alongside other measures, including taking revenue shares from Artificial Intelligence chip sales to China, as part of a broader security policy.

NVIDIA prepares B30A Blackwell Artificial Intelligence accelerator for China

NVIDIA has designed a China-specific single-die B30A accelerator derived from the B300 Blackwell Ultra to meet export restrictions while retaining HBM and NVLink. The B30A is expected to halve the dual-die B300´s peak performance across precisions and target domestic Artificial Intelligence labs.

Build an Artificial Intelligence agent workflow to create content faster

Christina Blake describes how she built six Artificial Intelligence agents using Claude and Zapier MCP to turn raw ideas into publication-ready posts while preserving her voice. She shares the agent roles, workflow steps and the practical setup that took her under an hour to build.

###CFCACHE###

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.

Please check your email for a Verification Code sent to . Didn't get a code? Click here to resend