Evaluating transparency in Artificial Intelligence/machine learning model characteristics for FDA-reviewed medical devices

The authors reviewed publicly available FDA summaries for 1,012 Artificial Intelligence/machine learning-enabled medical devices to measure how well model development and performance were reported. They find low transparency across key elements and only modest improvement after the FDA’s 2021 guidance.

This study systematically reviewed 1,012 publicly accessible summaries of safety and effectiveness (SSEDs) for Artificial Intelligence/machine learning-enabled medical devices authorized by the U.S. Food and Drug Administration through December 2024. The authors developed a 17-point AI Characteristics Transparency Reporting (ACTR) score to quantify disclosure across dataset, model, performance, and clinical reporting elements. Across all devices the mean ACTR score was 3.3 out of 17 (standard deviation 3.1), with a minimum annual mean of 1.1 and a maximum annual mean of 4. The single-device maximum ACTR was 12 and 304 devices (30%) scored zero. After publication of the FDA’s 2021 Good Machine Learning Practice guidance, ACTR scores increased by 0.88 points (95% confidence interval, 0.54-1.23) controlling for model complexity and predicate device use, but the absolute change was small.

The review identified pervasive gaps in public reporting. Of 1,016 devices on the FDA list, 1,012 had accessible SSEDs; 96.4% of devices were cleared via the 510(k) pathway (n = 976). Only 53.1% of devices reported a clinical study; among those, 60.5% used retrospective designs, 14% prospective designs, and 75% reported sample size. Reporting on datasets was sparse: 93.3% did not report training data sources, 75.5% did not report testing data sources, training dataset size was reported by 9.4% (n = 95), test dataset size by 23.2% (n = 235), and demographics by 23.7% (n = 240). Performance metrics were absent in 51.6% of device summaries; the most commonly reported metrics were sensitivity (23.9%, n = 242) and specificity (21.7%, n = 220), with fewer reporting AUROC (10.9%, n = 110), positive predictive value (6.5%, n = 66), accuracy (6.4%, n = 65), and negative predictive value (5.3%, n = 54). Median reported discrimination metrics were high but the authors caution these may reflect optimistic premarket designs.

The authors highlight consequences for generalizability and postmarket surveillance, noting only 15 devices (1.5%) reported a predetermined change control plan and that 70.9% of 510(k) clearances exceeded the FDA’s 90-day review target. ACTR scores correlated weakly with time to clearance (Pearson r = 0.15). The paper documents modest improvements after guidance but persistent under-reporting of model, data, and subgroup performance. The authors recommend enforceable, standardized public reporting such as a machine-readable model card appended to SSEDs and strengthened postmarket monitoring to ensure trust and equitable performance in deployed medical devices.

58

Impact Score

AMD unveils Ryzen artificial intelligence Halo developer box at CES 2026

AMD is positioning its new Ryzen artificial intelligence Halo box as a compact desktop and full artificial intelligence development platform aimed at consumer applications, drawing a comparison to NVIDIA’s DGX Spark. The system combines Strix Halo silicon with a custom cooling design and unified memory to attract developers targeting Windows and Linux.

Nandan Nilekani’s next push for India’s digital future

Nandan Nilekani, the architect of India’s Aadhaar system and wider digital public infrastructure, is now focused on stabilizing the country’s power grid and building a global “finternet” to tokenize assets and expand financial access. His legacy is increasingly contested at home even as governments worldwide study India’s digital model.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.