Evaluating transparency in Artificial Intelligence/machine learning model characteristics for FDA-reviewed medical devices

The authors reviewed publicly available FDA summaries for 1,012 Artificial Intelligence/machine learning-enabled medical devices to measure how well model development and performance were reported. They find low transparency across key elements and only modest improvement after the FDA’s 2021 guidance.

This study systematically reviewed 1,012 publicly accessible summaries of safety and effectiveness (SSEDs) for Artificial Intelligence/machine learning-enabled medical devices authorized by the U.S. Food and Drug Administration through December 2024. The authors developed a 17-point AI Characteristics Transparency Reporting (ACTR) score to quantify disclosure across dataset, model, performance, and clinical reporting elements. Across all devices the mean ACTR score was 3.3 out of 17 (standard deviation 3.1), with a minimum annual mean of 1.1 and a maximum annual mean of 4. The single-device maximum ACTR was 12 and 304 devices (30%) scored zero. After publication of the FDA’s 2021 Good Machine Learning Practice guidance, ACTR scores increased by 0.88 points (95% confidence interval, 0.54-1.23) controlling for model complexity and predicate device use, but the absolute change was small.

The review identified pervasive gaps in public reporting. Of 1,016 devices on the FDA list, 1,012 had accessible SSEDs; 96.4% of devices were cleared via the 510(k) pathway (n = 976). Only 53.1% of devices reported a clinical study; among those, 60.5% used retrospective designs, 14% prospective designs, and 75% reported sample size. Reporting on datasets was sparse: 93.3% did not report training data sources, 75.5% did not report testing data sources, training dataset size was reported by 9.4% (n = 95), test dataset size by 23.2% (n = 235), and demographics by 23.7% (n = 240). Performance metrics were absent in 51.6% of device summaries; the most commonly reported metrics were sensitivity (23.9%, n = 242) and specificity (21.7%, n = 220), with fewer reporting AUROC (10.9%, n = 110), positive predictive value (6.5%, n = 66), accuracy (6.4%, n = 65), and negative predictive value (5.3%, n = 54). Median reported discrimination metrics were high but the authors caution these may reflect optimistic premarket designs.

The authors highlight consequences for generalizability and postmarket surveillance, noting only 15 devices (1.5%) reported a predetermined change control plan and that 70.9% of 510(k) clearances exceeded the FDA’s 90-day review target. ACTR scores correlated weakly with time to clearance (Pearson r = 0.15). The paper documents modest improvements after guidance but persistent under-reporting of model, data, and subgroup performance. The authors recommend enforceable, standardized public reporting such as a machine-readable model card appended to SSEDs and strengthened postmarket monitoring to ensure trust and equitable performance in deployed medical devices.

58

Impact Score

Adaptive training method boosts reasoning large language model efficiency

Researchers have developed an adaptive training system that uses idle processors to train a smaller helper model on the fly, doubling reasoning large language model training speed without sacrificing accuracy. The method aims to cut costs and energy use for advanced applications such as financial forecasting and power grid risk detection.

How to run MiniMax M2.5 locally with Unsloth GGUF

MiniMax-M2.5 is a new open large language model optimized for coding, tool use, search, and office tasks, and Unsloth provides quantized GGUF builds and usage recipes for running it locally. The guide focuses on memory requirements, recommended decoding parameters, and deployment via llama.cpp and llama-server with an OpenAI-compatible interface.

Y Combinator backs new wave of computer vision startups in 2026

Y Combinator’s 2026 computer vision cohort spans infrastructure, developer tools, and industry-specific applications from retail security to aquaculture and healthcare. Startups are increasingly pairing computer vision with large vision language models and foundation models to tackle real-time video, automation, and domain-specific analysis.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.