PadChest-GR, created by the University of Alicante in collaboration with Microsoft Research, University Hospital Sant Joan d’Alacant, and MedBravo, emerges as the world’s first multimodal, bilingual radiology dataset focused on chest X-rays. This dataset contains 4,555 chest X-ray studies, each paired with granular sentence-level Spanish and English descriptions and spatially precise bounding box annotations for both positive and negative findings. This initiative marks a pivotal shift from traditional, unstructured radiology narratives to a more systematic, grounded approach, facilitating improved collaboration between clinicians and Artificial Intelligence systems while reducing the risk of fabricated or ambiguous interpretations.
The benchmark is designed to catalyze research in vision-language models for healthcare by offering the first public resource to train and evaluate fully grounded radiology reports in both English and Spanish. PadChest-GR underpins the latest generation of Artificial Intelligence-powered systems such as MAIRA-2, a state-of-the-art model developed by Microsoft Research for interpretable report generation. The dataset’s creation leveraged GPT-4 via Microsoft Azure OpenAI Service for sentence extraction and translation, with expert radiologist oversight through the Centaur Labs platform ensuring high-quality, clinically relevant annotations.
Collaboration was central to realizing PadChest-GR, combining the technical strengths of Microsoft Research with the clinical insight of the University of Alicante. The annotation protocol enforced consistency and accuracy, resulting in a robust resource for advancing grounded radiology reporting models. PadChest-GR’s impact is already visible, supporting published research and inspiring new models and standards for evaluation. The stakeholders highlight the importance of open scientific collaboration, anticipating that widespread access to PadChest-GR will accelerate progress in medical imaging Artificial Intelligence and deliver meaningful improvements in patient care. The resource is now available for the global research community, poised to drive innovation in grounded vision-language applications within medical imaging.