ADeLe Offers Predictive and Explanatory Evaluation for AI Models

A new method called ADeLe breaks down Artificial Intelligence tasks by ability, enabling clearer predictions of model performance and revealing the ´why´ behind successes or failures.

Researchers supported by Microsoft and its Accelerating Foundation Models Research initiative have introduced a novel approach, ADeLe (annotated-demand-levels), for systematically evaluating Artificial Intelligence model performance. Unlike conventional benchmarks, ADeLe predicts how models will perform on unfamiliar tasks and provides detailed explanations for their successes and failures. It does this by decomposing tasks into demands across 18 cognitive and knowledge-based ability scales, such as reasoning, attention, and domain-specific knowledge, quantifying how much each ability is required using a detailed 0–5 rubric initially designed for human evaluation.

To generate an ability profile for an Artificial Intelligence model, researchers compare the model’s capabilities on a large, annotated benchmark to these task requirements. The result is a profile that highlights which abilities a particular model possesses and clarifies why it may fail or succeed on given tasks. This ability matching not only supports rigorous analysis but also enables accurate performance prediction, achieving about 88% success in forecasting whether leading models like GPT-4o and LLaMA-3.1-405B will correctly solve new, even unfamiliar, challenges—outperforming traditional single-metric approaches.

Extensive testing across 63 tasks and 20 benchmarks revealed measurement shortcomings in existing Artificial Intelligence evaluation methods, such as tests not genuinely assessing the abilities they claim or lacking variation in difficulty. Analysis also exposed distinct model strengths and weaknesses: newer and larger models generally perform better but with diminishing returns; reasoning-specific models excel where logical inference or social cognition is needed; and different training approaches critically impact knowledge base abilities. Further, ADeLe’s results provide nuanced visualizations through radial ability plots, helping developers and policymakers better grasp a model’s readiness for deployment. Researchers suggest that this approach could become a standardized framework for evaluating future Artificial Intelligence, extending to multimodal or embodied systems, and facilitating safer, more transparent societal adoption.

77

Impact Score

Microsoft 365 Copilot Tuning enables task specific enterprise agents

Microsoft 365 Copilot Tuning lets organizations create customized, task specific Copilot agents grounded in their own data, security, and standards. The preview capability focuses on document centric workflows, expert Q&A, optimization scenarios, and governed model refinement.

Ajinomoto’s quiet grip on a material powering Artificial Intelligence chips

Japanese food giant Ajinomoto has become a critical chokepoint in the semiconductor supply chain by controlling nearly all production of a specialized insulating film used in advanced Artificial Intelligence processors. Its Ajinomoto Build-up Film underpins high performance Nvidia-style chips and is extremely difficult for rivals to replicate.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.