Minireviews
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
Artif Intell Med Imaging. Jun 8, 2025; 6(1): 107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Artificial intelligence assisted ultrasound report generation
Jia-Hui Zeng, Kai-Kai Zhao, Ning-Bo Zhao
Jia-Hui Zeng, Kai-Kai Zhao, Department of Ultrasound, The Third People’s Hospital of Shenzhen, Shenzhen 518000, Guangdong Province, China
Ning-Bo Zhao, Department of Ultrasound, National Clinical Research Centre for Infectious Disease, The Third People’s Hospital of Shenzhen, The Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen 518116, Guangdong Province, China
Author contributions: Zeng JH and Zhao NB contributed to conceptualization of the study, literature review, drafting and critical revision of the manuscript; Zhao KK contributed to critical revisions of the manuscript and addressing reviewer comments. All authors have read and approved the final manuscript.
Conflict-of-interest statement: All the authors have no potential conflicts of interest to disclose.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Ning-Bo Zhao, Associate Chief Physician, Associate Professor, Department of Ultrasound, National Clinical Research Centre for Infectious Disease, The Third People’s Hospital of Shenzhen, The Second Affiliated Hospital of Southern University of Science and Technology, No. 29 Bulan Road, Longgang District, Shenzhen 518116, Guangdong Province, China. 971599910@qq.com
Received: March 16, 2025
Revised: April 14, 2025
Accepted: May 26, 2025
Published online: June 8, 2025
Processing time: 83 Days and 16.6 Hours
Abstract

Artificial intelligence (AI) assisted ultrasound report generation represents a technology that leverages artificial intelligence to convert ultrasound imaging analysis results into structured diagnostic reports. By integrating image recognition and natural language generation models, AI systems can automatically detect and analyze lesions or abnormalities in ultrasound images, generating textual descriptions of diagnostic conclusions (e.g., fatty liver, liver fibrosis, automated BI-RADS grading of breast lesions), imaging findings, and clinical recommendations to form comprehensive reports. This technology enhances the efficiency and accuracy of imaging diagnosis, reduces physicians’ workloads, ensures report standardization and consistency, and provides robust support for clinical decision-making. Current state-of-the-art algorithms for automated ultrasound report generation primarily rely on vision-language models, which harness the generalization capabilities of large language models and large vision models through multimodal (language + vision) feature alignment. However, existing approaches inadequately address challenges such as numerical measurement generation, effective utilization of report templates, incorporation of historical reports, learning text-image correlations, and overfitting under limited data conditions. This paper aims to introduce the current state of research on ultrasound report generation, the existing issues, and to provide some thoughts for future research.

Keywords: Artificial intelligence; Ultrasound report generation; Vision-Language Models; Natural language generation; Large language model

Core Tip: This article investigates artificial intelligence assisted ultrasound report generation using vision-language models, addressing challenges unique to ultrasound imaging, such as numerical measurement accuracy, multi-image correlation, and template integration. Unlike standardized radiological imaging, ultrasound variability stems from operator-dependent acquisition and image noise, complicating automated analysis. The framework integrates Transformer-based Optical Character Recognition for measurement extraction, pseudo-case synthesis for data augmentation, and cross-modal alignment to improve report precision. Innovations include leveraging historical reports, video data, and clinical expertise to enhance diagnostic outputs. Ethical protocols ensure data privacy, while template-driven workflows enhance clinical relevance. Future advancements focus on real-time reporting, personalized diagnostics, and multimodal models like GPT-4 vision. This article bridges artificial intelligence capabilities with clinical demands to standardize reports, reduce workloads, and support ultrasound decision-making.