Copyright
©The Author(s) 2025.
Artif Intell Med Imaging. Jun 8, 2025; 6(1): 107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Table 1 Comparison of artificial intelligence models for ultrasound report generation
| Method | Architectural features | Clinical relevance |
| CNN-LSTM | Combines CNN and LSTM, suitable for processing sequential data | Performs well in handling image and sequence information, applicable for ultrasound image analysis |
| Transformer-based models | Based on self-attention mechanisms, capable of capturing long-range dependencies, suitable for parallel processing | Excels in generating natural language reports, suitable for complex ultrasound report generation |
| VLMs | Integrates visual and linguistic information, capable of understanding image content and generating related text | Outstanding performance in multimodal learning, enhances the accuracy and clinical relevance of ultrasound reports |
Table 2 Key concepts in ultrasound report generation
| Concept | Description | Significance |
| AI-assisted ultrasound report generation | Technology using AI to convert ultrasound imaging into structured diagnostic reports | Enhances efficiency, accuracy, and consistency of diagnosis |
| VLMs | AI models that integrate visual (images) and linguistic (text) information | Enable understanding of image content and generation of descriptive text |
| Image encoder | A component of VLMs that encodes image information | Transforms images into a format that the model can process |
| Text encoder | A component of VLMs that encodes text information | Transforms text into a format that the model can process |
| Attention mechanism | A technique that allows the model to focus on specific parts of the input (image or text) | Improves the model's ability to focus on important image regions and text |
| LLMs | Transformer-based models pre-trained on large text corpora | Enhance the quality and fluency of generated text |
Table 3 Challenges and proposed solutions in visual language model -based ultrasound report generation
| Challenge | Proposed solution |
| Poor accuracy in text generation related to measurement results | Extract numerical values from ultrasound images using tools like TrOCR[15] and insert them into the report |
| Suboptimal handling of correspondence between text and images | Annotate the correspondence between text and images and design mechanisms to learn these relationships |
| Ineffective utilization of report templates | Use report templates as input, treat template prediction as an intermediate task, or have the model learn to modify templates |
| Issues with training data volume | Split existing reports into text-image pairs and reassemble them to create pseudo-cases for training |
| Ineffective utilization of historical reports | Use historical reports along with current ultrasound images as input |
| Neglect of image selection task | Explicitly model the image selection process to choose representative images for the report |
| Lack of utilization of ultrasound-related expertise | Fine-tune LLM models to learn this prior knowledge |
| Lack of exploration of predictive tasks | Conduct in-depth research on ultrasound examination scenarios to define effective predictive tasks |
- Citation: Zeng JH, Zhao KK, Zhao NB. Artificial intelligence assisted ultrasound report generation. Artif Intell Med Imaging 2025; 6(1): 107069
- URL: https://www.wjgnet.com/2644-3260/full/v6/i1/107069.htm
- DOI: https://dx.doi.org/10.35711/aimi.v6.i1.107069
