Kwolek K, Gądek A, Kwolek K, Lechowska-Liszka A, Malczak M, Liszka H. Artificial intelligence-based diagnosis of hallux valgus interphalangeus using anteroposterior foot radiographs. World J Orthop 2025; 16(6): 103832 [DOI: 10.5312/wjo.v16.i6.103832]
Corresponding Author of This Article
Henryk Liszka, MD, PhD, Professor, Department of Orthopedics and Physiotherapy, Jagiellonian University Collegium Medicum, Macieja Jakubowskiego 2, Kraków 30-688, Małopolska, Poland. liszkah@gmail.com
Research Domain of This Article
Orthopedics
Article-Type of This Article
Observational Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: Kwolek K and Kwolek K elaborated analytic tools; Kwolek K, Kwolek K, Malczak M, and Liszka H wrote the paper; Kwolek K, Kwolek K, and Liszka H designed research and performed research; Kwolek K, Gądek A, Kwolek K, Lechowska-Liszka A, Malczak M, and Liszka H analyzed data; all of the authors read and approved the final version of the manuscript to be published.
Institutional review board statement: This study protocol was reviewed and approved by authors’ institution.
Informed consent statement: The informed consent statement has been provided.
Conflict-of-interest statement: The authors have no conflict of interest concerning the materials or methods used in this study or the findings specified in this article.
STROBE statement: The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.
Data sharing statement: No additional data are available.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Henryk Liszka, MD, PhD, Professor, Department of Orthopedics and Physiotherapy, Jagiellonian University Collegium Medicum, Macieja Jakubowskiego 2, Kraków 30-688, Małopolska, Poland. liszkah@gmail.com
Received: December 2, 2024 Revised: March 13, 2025 Accepted: May 13, 2025 Published online: June 18, 2025 Processing time: 198 Days and 11.7 Hours
Abstract
BACKGROUND
A recently developed method enables automated measurement of the hallux valgus angle (HVA) and the first intermetatarsal angle (IMA) from weight-bearing foot radiographs. This approach employs bone segmentation to identify anatomical landmarks and provides standardized angle measurements based on established guidelines. While effective for HVA and IMA, preoperative radiograph analysis remains complex and requires additional measurements, such as the hallux interphalangeal angle (IPA), which has received limited research attention.
AIM
To expand the previous method, which measured HVA and IMA, by incorporating the automatic measurement of IPA, evaluating its accuracy and clinical relevance.
METHODS
A preexisting database of manually labeled foot radiographs was used to train a U-Net neural network for segmenting bones and identifying landmarks necessary for IPA measurement. Of the 265 radiographs in the dataset, 161 were selected for training and 20 for validation. The U-Net neural network achieves a high mean Sørensen-Dice index (> 0.97). The remaining 84 radiographs were used to assess the reliability of automated IPA measurements against those taken manually by two orthopedic surgeons (OA and OB) using computer-based tools. Each measurement was repeated to assess intraobserver (OA1 and OA2) and interobserver (OA2 and OB) reliability. Agreement between automated and manual methods was evaluated using the Intraclass Correlation Coefficient (ICC), and Bland-Altman analysis identified systematic differences. Standard error of measurement (SEM) and Pearson correlation coefficients quantified precision and linearity, and measurement times were recorded to evaluate efficiency.
RESULTS
The artificial intelligence (AI)-based system demonstrated excellent reliability, with ICC3.1 values of 0.92 (AI vsOA2) and 0.88 (AI vs OB), both statistically significant (P < 0.001). For manual measurements, ICC values were 0.95 (OA2vs OA1) and 0.95 (OA2vs OB), supporting both intraobserver and interobserver reliability. Bland-Altman analysis revealed minimal biases of: (1) 1.61° (AI vs OA2); and (2) 2.54° (AI vs OB), with clinically acceptable limits of agreement. The AI system also showed high precision, as evidenced by low SEM values: (1) 1.22° (OA2vs OB); (2) 1.77° (AI vs OA2); and (3) 2.09° (AI vs OB). Furthermore, Pearson correlation coefficients confirmed strong linear relationships between automated and manual measurements, with r = 0.85 (AI vs OA2) and r = 0.90 (AI vs OB). The AI method significantly improved efficiency, completing all 84 measurements 8 times faster than manual methods, reducing the time required from an average 36 minutes to just 4.5 minutes.
CONCLUSION
The proposed AI-assisted IPA measurement method shows strong clinical potential, effectively corresponding with manual measurements. Integrating IPA with HVA and IMA assessments provides a comprehensive tool for automated forefoot deformity analysis, supporting hallux valgus severity classification and preoperative planning, while offering substantial time savings in high-volume clinical settings.
Core Tip: This study presents an automated method for evaluating hallux interphalangeus angle using high-resolution, weight-bearing anteroposterior foot radiographs. Reference points are identified based on defined criteria applied to the automatically segmented bones of the hallux. Despite the anatomical complexity of the distal phalanx, the proposed technique reliably calculates the interphalangeal angle. Experimental findings show high consistency between the algorithm's measurements and those performed by clinicians.
Citation: Kwolek K, Gądek A, Kwolek K, Lechowska-Liszka A, Malczak M, Liszka H. Artificial intelligence-based diagnosis of hallux valgus interphalangeus using anteroposterior foot radiographs. World J Orthop 2025; 16(6): 103832
Hallux valgus (HV) is a common, progressive, and complex deformity of the forefoot, characterized by lateral deviation of the big toe combined with medial displacement and pronation of the first metatarsal bone[1,2]. It often leads to pain, discomfort, and difficulty wearing shoes, and occurs more frequently in women. Orthopedic surgeons often use radiographic angles to assess the severity of HV in symptomatic patients[3-5]. Traditional manual and computer-assisted measurement methods utilize weight-bearing anteroposterior radiographs[6-9] to assess the severity of HV. This assessment is based on key parameters, including the HV angle (HVA), first intermetatarsal angle (IMA), and the interphalangeal angle (IPA) for evaluating the severity of HV interphalangeus (HVI) (Figure 1)[10]. Surgical decisions are guided by the clinical presentation and the degree of the deformity (Figure 2)[11]. Various radiographic measurements used in HV treatment have been widely discussed[6,12]. Lee et al[13] suggest that HVA, first IMA, IPA, first metatarsal protrusion distance, and sesamoid rotation angle offer the best assessment of the three-dimensional (3D) severity of HV.
Figure 2 Operative treatment algorithm of hallux valgus proposed by the European Federation of National Associations of Orthopaedics and Traumatology.
HVA: Hallux valgus angle; HVI: Hallux valgus interphalangeus; IMA: Interphalangeal angle; MTP: Metatarsophalangeal; TMT: Tarsometatarsal.
HVI is the lateral deviation of the distal phalanx (DP) in the great toe. Its etiology is multifactorial, including growth development disturbances, external pressure, and biomechanical alteration involving the interphalangeal joint. HVI is a prevalent condition that significantly contributes to the overall valgus deformity of the hallux, emphasizing the need to incorporate HVI into treatment strategies. Effective treatment algorithms should consider the total valgus deformity of the hallux. They should not focus solely on separate assessments of HV and HVI deformities[14].
HVI can be assessed using several key angles (Figure 1), including IPA also known as the hallux IPA (HIA or HIPA); the proximal to distal phalangeal articular angle (PDPAA)[15]; and the proximal phalangeal articular angle (PPAA)[2,16], also referred to as the distal articular set angle (DASA)[17]. Additionally, the Delta proximal phalanx (PP), which measures the difference in medial and lateral PP wall length, is utilized in evaluating this deformity[18].
The IPA is particularly useful for identifying inherent deformities in the head of the PP or the base of the DP. It is measured as the angle between the longitudinal axes of the proximal and distal phalanges of the hallux, with a normal value being less than 10°. An IPA[2] greater than 10° indicates a HVI deformity[4,5,13,14,19]. The PPAA (or DASA) is the angle formed between the proximal articular surface of the PP and a line perpendicular to the midline of the PP. A value less than 6° is considered normal, while values greater than 6°, 10°, and 20° represent mild, moderate, and severe deformities, respectively[17,20,21]. Unlike PPAA, both IPA and PDPAA demonstrate significantly higher interobserver reliability during both pre-operative and post-operative assessments[16]. The reliability for measuring IPA has been reported at 81%-86% for interobserver agreement and 86%-88% for intraobserver agreement[13,16,22]. Among these angles, IPA is the most commonly utilized.
HVI deformity can put pressure on the second toe, which may lead to further pain and deformity. Hallux limitus is associated with a higher IPA compared to normal hallux[23]. Coughlin and Shurnas[24] reported an average IPA of 18° in hallux rigidus, hypothesizing that resistance at the metatarsophalangeal (MTP) joint predisposes patients to an increased IPA. Studies have shown a correlation between an IPA of 14.5° or greater and the development of an ingrown hallux nail[25]. Patients with a dislocation of the second MTP joint is associated with HV exhibit a greater IPA and an increased inclination of the second metatarsal compared to those with HV alone[26]. It is essential to assess HVI deformity preoperatively to consider surgical correction[27-29] with a medial closing wedge osteotomy, with or without fixation, as originally described by Akin in 1925[30]. This surgical indication is typically for an HVI deformity characterized by an enlarged IPA (> 10°). Correction of an HVI deformity is typically performed as an additional procedure during HV surgery (Figure 2).
Traditionally, all of the aforementioned angles were manually measured on hard-copy radiographs and then using semi-automated (computer-assisted) methods[3,31,32]. With advancements in technology, however, several methods have been developed for fully automatic determination of measurements assessing bone and joint deformations in the limbs, including the patellar index[33], as well as HVA and IMA[34]. However, research on foot deformities remains insufficient, leading to a need for new automated methods to assess the remaining aspects of deformations, including HVI, MTP joint congruency, metatarsus adductus, first tarsometatarsal (TMT) instability, and first ray pronation (shape of the first metatarsal head)[35], as well as the presence of joints osteoarthritis (e.g. MTP, TMT), bunionette deformity, pes planus and various other factors.
Kwolek et al[36] proposed a clinically applicable approach for the automatic estimation of HVA and IMA, achieving high reliability. In this research, we designed methods to enhance the analysis of HV using foot X-rays[34]. The relevant bones, including the proximal and distal phalanges of the hallux, were segmented and labeled with a U-Net network, enabling automatic calculation of IPA. Orthopedic surgeons performed manual IPA measurements using computer software and then compared their measurements with each other and with those generated by our method to validate its accuracy. Additionally, our method was tested solely on radiographs of patients who later underwent surgery for HV. Our dataset includes a significant number of radiographs with severe forefoot deformities, such as toe overlapping and severe pronation. In cases of overlapping toes, our bone segmentation-based method proved to be reliable and capable of handling these challenging situations. The U-Net neural network was trained to segment bones, followed by image analysis for fully automated angle calculation and surgical recommendations (Akin osteotomy when IPA > 10°). All digital radiographs were stored and assessed using a picture archiving and communication system using the IMPAX software.
This study introduces an innovative automated IPA measurement approach that enhances existing radiographic tools, offering a reliable and clinically applicable method for comprehensive assessment and planning in HV correction surgeries.
MATERIALS AND METHODS
Algorithm outline
The IPA measurements were performed automatically on segmented and labeled bones by a U-Net neural network (Figure 3)[37]. The U-Net was first trained using anteroposterior foot radiographs and corresponding images with manually segmented and labeled bones. After segmentation, bone axes and reference points were automatically determined, and the IPA was calculated. The U-Net model was trained exclusively on right foot radiographs to streamline manual segmentation and reduce associated costs. Left foot images were mirrored to create corresponding right foot representations before processing by the U-Net. Afterward, the output images were mirrored back to the original orientation for further measurements.
Figure 3 Data flow in the proposed approach.
Stage-A: Bones are manually segmented and labeled from anonymized input radiographs to prepare dataset for multi-class segmentation using a U-Net neural network; Stage-B: Radiographs are randomly divided into three subsets: (1) Training; (2) Validation; and (3) Testing; Stage-C: During each training cycle, the accuracy of bone segmentation by the U-Net is validated on a fixed validation subset of 20 radiographs. The U-Net is initially trained on a subset of 50 radiographs, which is incrementally increased by 10 per cycle until the average Sørensen–Dice index (SDI) exceeds 97% on the validation set; Stage-D: Once the U-Net achieves an SDI > 0.97 on the testing subset, the training process is considered complete. If the SDI does not exceed 0.97, the training subset is expanded, and the network is retrained; Stage-E: The final trained U-Net is used to segment and label bones on all testing radiographs; Stage-F: These segmented bones are then utilized to automatically determine reference points and calculate interphalangeal angle measurements.
Dataset
This study included 133 randomly selected patients with a total of 265 anteroposterior foot radiographs taken prior to HV surgery (either unilateral or bilateral) between 2014 and 2021, sourced from the authors' institution’s electronic database (demographics in Table 1). Patients were excluded if they had (1) Prior osteotomies; (2) Severe osteoarthritis; (3) Complex rheumatoid forefoot deformities; and (4) Artificial elements distorting the bone image.
The radiographs were obtained using Eidos RF439 and Luminos dRF equipment and were transmitted electronically for further analysis. The data were randomly divided into three subsets: training, validation, and testing (Figure 3 stage-B). Of the 265 radiographs in the dataset, 161 were selected for training and 20 for validation, and the remaining 84 for testing. The U-Net network for bone segmentation was developed (trained) and validated using the training and validation subsets, while the testing set was utilized to assess its performance and automatically calculate the IPA.
Anonymization and manual labeling
The input radiograms were anonymized (Figure 3, stage-A) and saved in Portable Network Graphics format to enable lossless compression. Each radiograph was assigned a unique, random ID. To ensure high-precision bone segmentation, bones were manually annotated on original high-resolution radiograms for training the U-Net network. The first author manually segmented and labeled the radiographs using Adobe Photoshop. To optimize measurement resolution and the number of labeled radiographs, we used a network operating on radiographs size of 1024 × 768 pixels, as in the previous study[36]. To ensure that the U-Net focuses only on important information in images (region of interest), unnecessary bones (such as the tibia, fibula, and hindfoot) were removed during image preprocessing. We identified the main bones on each foot radiogram through multi-class segmentation as described previously (Figure 3, stage A). This enabled us to accurately separate the necessary bones for analysis, specifically the hallucial PP and DP, which are essential for determining IPA. Images were padded with black pixels to maintain the 1024 × 768 aspect ratio without altering resolution. This dataset is available upon request from the first author.
U-Net network training and validation
The radiograms were processed as previously outlined to train a U-Net convolutional neural network for bone segmentation[37]. In contrast to the original architecture introduced by Ronneberger et al[37], our model is symmetric—meaning the input and output image dimensions are identical. It supports multi-class segmentation and employs the Dice loss during training and the Sørensen-Dice Index (SDI) for performance evaluation. Segmentation accuracy was quantified using the SDI, a standard metric in the field of medical image segmentation[38,39]. We defined a target SDI threshold with a mean value above 0.97 and a minimum above 0.92. Once this criterion was met, training was considered complete. The model’s generalization ability was then confirmed by testing it on a separate subset, assessing whether it maintained segmentation performance in previously unseen data. In our study, we maintained the same number of images as in previous work while accounting for an additional class (the DP), and we achieved the required SDI of 0.97 on the validation set (Figure 3, stage-C). The model was trained for 60 epochs using the Adam optimizer with Dice loss. Data augmentation during training included mirroring, rotation, and contrast enhancement. Neural network training was conducted on a notebook GPU (RTX 2060), while testing was carried out on a Google Colab GPU (V100). In post-processing, small holes in segmented bones were filled using morphological operations, and minor artifacts such as small blobs were removed.
Automatic determination of reference points and IPA measurements.
The segmented and labeled bones (PP and DP) were used to automatically determine bone axes, reference points and for measuring the IPA (Figures 3, stage-F and 4). The expert who trained the network did not participate in the manual IPA measurements and was blind to the results prior to statistical analysis.
According to American Orthopaedic Foot and Ankle Society, reference points on hallucial PP are metaphyseal/diaphyseal[3,14,40,41]. Distal DP reference point is also metaphyseal/diaphyseal as suggested by Strydom et al[14]. Due to the complex anatomy of the proximal part of DP, the reference point used to define the guideline for automated bone axis measurement was located in the diaphyseal region (Figure 4). The final bone partition coefficients that best defined the metaphysis and diaphysis portions and allowed for obtaining appropriate reference points were selected based on different combinations and their testing. For the PP, reference points were positioned at 25% of the bone length distally from the proximal articular surface and at 25% proximally from the distal articular surface. In the case of the hallucial DP, the corresponding points were placed at 40% (metaphyseal region) and 25% of the bone length, respectively (Figure 4).
Figure 4 Automated determination of proximal and distal phalanx bone axes.
A: Input X-ray and post-processed bone segmentation results; B: The bones are approximated using ellipses, from which the ellipse axes are extracted. The axes within the bones are indicated by pink lines, with their proportions determined experimentally: (1) 0.25: 0.25 for the proximal phalanx; and (2) 0.40: 0.25 for the distal phalanx; C: The central points of the ellipse subaxes are represented by light blue lines; D: Lines perpendicular to the ellipse subaxes are drawn at their endpoints and illustrated in purple; E: The midpoints of these purple lines are marked with light blue dots, serving as key points for defining the bone axis, which is depicted as blue and red lines; F: The final axes, shown as yellow lines, for interphalangeal angle estimation are computed as a weighted average of the ellipse axis and the bone axis. IPA: Interphalangeal angle.
Statistical analysis
We evaluated the reliability and agreement of the artificial intelligence (AI)-based IPA measurement system compared to manual measurements using comprehensive statistical methods. The Intraclass Correlation Coefficient (ICC) for absolute agreement was calculated using a two-way mixed-effects model, appropriate for fixed raters with single measurements[42]. This assessed the reliability between AI measurements and those by orthopedic surgeons.
To determine intraobserver reliability, we calculated the consistency ICC between repeated measurements by the same observer (OA1 and OA2). Interobserver reliability was assessed by comparing measurements between two orthopedists [OA2 and orthopedic surgeon (OB)]. Bland-Altman analysis identified systematic differences, calculating the mean difference (bias) and limits of agreement between methods. Standard error of measurement (SEM) quantified measurement precision, while the SD provided context on measurement variability. We used Pearson correlation coefficients to evaluate the strength and linearity of relationships between AI and manual measurements. The mean absolute error quantified the average discrepancy between methods.
Manual IPA measurements were obtained by two independent orthopedists using computer-assisted Radiant/Carestream software with standardized reference points based on established guidelines. Measurements were performed by an OA with 8 years of experience, who repeated the measurements after a two-month interval in a blinded manner (OA1 and OA2), and another OB with 15 years of experience. Observers were blinded to patient clinical outcomes and each other's measurements to ensure unbiased assessments.
The AI system using the U-Net neural network for bone segmentation, was run on a Google Colab GPU (V100). It not only measured the IPA but also identified cases indicating an Akin osteotomy (IPA > 10°). All statistical analyses were conducted using Statistical Package for the Social Sciences software, with results verified for accuracy. Statistical significance was set at P < 0.001.
RESULTS
We applied a fully automated deep learning–based method to measure IPA in 84 radiographs. The algorithm performed precise segmentation and assignment of reference points on hallucial PP and DP bones. The automated method achieved absolute agreement ICCs of 0.92 and 0.88, respectively (Table 2). For manual measurements, both intraobserver reliability (OA2vs OA1) and interobserver reliability (OA2vs OB) were high, with ICCs of 0.95 and 0.95, respectively, indicating consistent and reproducible assessments by orthopedic surgeons. Bland-Altman analysis showed minimal mean differences, with biases of 0.93°, 1.61°, and 2.54° for comparisons between OA2 and OB, AI and OA2, and AI and OB, respectively (Figure 5). The limits of agreement for these comparisons were within clinically acceptable ranges. Low SEM values indicated high precision for both AI and manual measurements. Pearson correlation coefficients revealed strong linear relationships between AI and manual measurements (P < 0.001), as illustrated in Figure 6.
Figure 5 Bland-Altman plots.
A: Agreement between the first and second manual measurements (OA2vs OB); B: Comparison between the artificial intelligence (AI)-generated measurements and the manual measurement of the first orthopedist (AI vs OA2); C: Comparison between the AI-generated measurements and the manual measurement of the second orthopedist (AI vs OB). The plot indicates the mean difference (bias) and limits of agreement, helping to assess systematic differences and consistency between repeated manual measurements. AI: Artificial intelligence; LoA: Limits of agreement; O: Orthopedic surgeons.
Figure 6 Scatter Plots with regression lines showing correlation.
A: Between the two manual measurements [Orthopedic surgeons (OA2vs OB)]; B: Between the artificial intelligence (AI)-generated measurements and the manual measurement of the first orthopedist (AI vs OA2); C: Between the AI-generated measurements and the manual measurement of the second orthopedist (AI vs OB). The linear regression line indicates the relationship strength, where points closer to the line reflect stronger alignment between AI and manual measurements. AI: Artificial intelligence.
Table 2 Statistical results for interphalangeal angle measurements.
Statistical measure
AI vs OB
AI vs OA2
OA2 vs OB
ICC-absolute agreement
0.88 (95%CI: 0.81-0.93)
0.92 (95%CI: 0.86-0.95)
0.95 (95%CI: 0.91-0.96)
Interpretation
Good reliability
Excellent reliability
Excellent reliability
ICC-consistency
N/A
N/A
0.95
Interpretation
N/A
N/A
Excellent reliability
Bland-Altman mean difference (bias)
2.54°
1.61°
0.93°
SD of differences
3.06°
3.35°
2.26°
Limits of agreement
-3.46° to 8.53°
-4.97° to 8.18°
-3.51° to 5.37°
Interpretation
Minimal systematic bias
Minimal systematic bias
Minimal systematic bias
Standard error of measurement
2.09°
1.77°
1.22°
Mean absolute error
3.23°
2.79°
1.89°
Interpretation
Low average difference, high precision
Low average difference, high precision
Low average difference, high precision
Pearson correlation coefficient (r)
0.90
0.85
0.90
P value
< 0.001
< 0.001
< 0.001
Interpretation
Very strong positive correlation
Very strong positive correlation
Very strong positive correlation
In terms of time efficiency, the AI system measured all 84 angles in about 4.5 minutes, whereas manual measurements required approximately 36 minutes. Detailed statistical values and comparisons are provided in Table 2.
DISCUSSION
Radiographs continue to be the standard due to their low cost and wide availability for evaluating symptomatic forefoot deformities[43,44]. This work aligns with emerging research on the application of deep learning in orthopedics[45-49], and it extends our previous algorithm[34,36]. As shown experimentally, the proposed approach estimates the IPA from high-resolution radiographs and identifies the presence of HVI deformity, suggesting the need for surgical interventions such as an Akin osteotomy. Furthermore, this work may contribute to new advances in forefoot surgery by providing a reliable method to compare preoperative measurements in large databases eliminating intra-observer errors, and ultimately lead to improved treatment standards for foot deformities such as HV.
The prevalence of HVI deformity in patients with HV has been reported at 62.1% by Strydom et al[14] and van Deventer et al[50], who also found that IPA contributed 37.9% to the overall HV deformity. Studies[14,50] confirmed the inverse relationship between IPA and HVA, indicating that IPA does not necessarily increase with HVA. Severe HV cases often present with an abnormal mean IPA of 16.6°, suggesting that the extent of HVI deformity may be restricted by joint structure, except in cases with joint hyperlaxity. Stability of the first MTP joint—determined by the metatarsal head shape, ligaments, muscles, joint capsule, and sesamoid bones—may influence the development of HVI. Although the precise mechanism remains unclear, it is believed that the eccentric pull of the flexor hallucis longus tendon may play a role. Given the high prevalence of HVI, particularly in cases without severe HV, identifying and surgically correcting HVI is essential, as untreated HVI may contribute to the progression or recurrence of HV[51,52]. This underscores the need for active identification and surgical correction of HVI, rather than treating it as an optional aspect of treatment.
The main limitation of this study is that it focuses solely on the angular deformities of HVI in the transverse plane, as assessed through plain dorso-plantar weight-bearing radiographs. Given the 3D complexity of both HV and HVI, and the significant impact of pronation on HVI, caution is needed when interpreting these angles. Various angles and indicators are used to assess HVI. IPA is commonly used in clinical practice and research papers; however, a consensus on the optimal radiographic evaluation of this deformity has not yet been reached. In the literature, there is no single, clearly distinguishing angle/index that could be used in the assessment of HVI that would be free from significant disadvantages[16]. The presence of a pronation deformity of the great toe in HV cases may complicate the preoperative diagnosis of HVI pathology. One proposed solution is additional intraoperative verification of HVI pathology after metatarsal correction in moderate to severe HV cases[53].
Collecting a large, validated dataset of HV patient radiographs presents challenges, so we relied on segmented bone measurements in this study. Our findings demonstrate that this approach improves IPA measurement accuracy with limited data for model training, particularly compared to sparse keypoint-based methods. The complexity of the DP anatomy, however, can introduce minor biases in automated axis determination, estimated here to be approximately 1.61° to 2.54° (Figure 5). To improve accuracy, we used a diaphyseal reference point to define the proximal DP axis.
Orthopedic surgeons recognize that surgical correction of HV deformity depends on factors beyond HVA, IMA, and IPA measurements, including the patient's clinical presentation and coexisting deformities. The lateral X-ray view is an additional element essential for a comprehensive assessment of the complexity of foot deformity, and these considerations may limit our method’s applicability in certain cases. Nevertheless, our method has shown potential as a rapid and clinically effective tool for evaluating HVI and HV cases, with further development necessary to achieve fully automated preoperative planning.
The high Pearson correlation values and minimal biases in Bland-Altman analysis confirm the AI system's reliability and precision, demonstrating its consistency with manual measurements. The strong correlation between automated and manual measurements suggests that the automated approach is a viable alternative to manual assessments, offering both efficiency and consistency (Figures 7 and 8). Compared to traditional manual methods, which require approximately 36 minutes to perform IPA measurements on 84 radiographs, the AI model can complete these same measurements in about 4.5 minutes, representing a substantial time reduction of nearly 87%. High ICC values (0.92 in comparison with manual measurements) and a low SEM confirm the model’s accuracy and consistency, surpassing the variability often observed in manual measurements. These results indicate that automated IPA measurement could be effectively integrated into clinical practice, enhancing diagnostic accuracy, facilitating early identification of deformity recurrence or residual deformities, and reducing the burden on medical staff. This efficiency has the potential to decrease patient wait times and streamline clinical workflows, enabling orthopedic surgeons to focus more on complex cases that require their specialized expertise. Overall, the model holds promise as a tool to improve patient outcomes and optimize resource allocation in the assessment and treatment of forefoot deformities.
Figure 7 Line graph of interphalangeal angle measurements across subjects.
This graph illustrates the consistency of interphalangeal angle measurements across subjects, comparing values obtained using artificial intelligence (AI) and those measured by orthopedists (OA2 and OB), highlighting the alignment between manual and AI-based assessments. AI: Artificial intelligence; IPA: Interphalangeal angle. O: Orthopedic surgeons.
Figure 8 Box plot comparison of artificial intelligence and orthopedist-performed measurements.
This plot visualizes the distribution of interphalangeal angle measurements across different methods. The blue line represents the median, while the boxes indicate the interquartile range. AI: Artificial intelligence; IPA: Interphalangeal angle; IQR: Interquartile range; O: Orthopedic surgeons.
Despite these very promising results, certain limitations remain. The AI model’s accuracy may be challenged in extreme deformities, such as those seen in rheumatoid feet, infections with bone osteolysis, posttraumatic cases, or joint dislocations. Although such cases are rare, they may still require clinician oversight to ensure diagnostic accuracy. While the U-Net model performs exceptionally well on this challenging dataset—despite noise, artifacts, and variations in contrast or resolution—differences in radiographic quality and equipment across institutions may impact its performance. This underscores the need for multicenter studies to validate the model’s robustness across diverse patient populations and clinical settings. Implementing a secondary validation step, where outlier measurements are flagged for manual review, could further help mitigate potential inaccuracies. Future development should focus on enhancing the model’s adaptability to complex cases and expanding its functionality to assess additional forefoot parameters, thereby increasing its utility in comprehensive clinical assessments.
When incorporated into screening programs, automated IPA measurements could accelerate the identification of cases requiring surgical intervention. By significantly reducing diagnostic time, the automated measurements improve healthcare efficiency and enable more precise surgical planning. These systems offer the dual benefits of enhancing diagnostic accuracy and reducing hospital staff time, ultimately contributing to lower diagnostic costs.
CONCLUSION
This study demonstrates the reliability, efficiency, and clinical utility of a fully automated, AI-based system for the measurement of the IPA and assessment of HVI deformities. The AI system achieved a strong correlation with manual measurements and significantly reduced measurement time by 87%, offering substantial benefits for clinical workflows, especially in high-volume settings. By providing consistent and accurate measurements, our AI method can improve diagnostic accuracy, facilitate early detection of deformities, and enhance preoperative planning. Additionally, the rapid, automated assessment enables clinicians to focus on complex cases requiring their expertise and supports more precise planning of interventions, such as Akin osteotomies, by enhancing measurement reliability. Implementing this AI-based system in clinical practice could improve patient outcomes, reduce diagnostic costs, and establish a new standard in orthopedic assessments of forefoot deformities. Further multi-center validation across diverse patient populations will be carried out to asses potential of the proposed method for clinical adoption, and future research will not only leverage data from various hospitals and clinics but also incorporate a larger and more diverse dataset to train neural networks for enhancing their generalizability and performance across different hardware equipment.
Footnotes
Provenance and peer review: Unsolicited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Orthopedics
Country of origin: Poland
Peer-review report’s classification
Scientific Quality: Grade A, Grade B, Grade B
Novelty: Grade A, Grade B, Grade B
Creativity or Innovation: Grade A, Grade B, Grade B
Scientific Significance: Grade A, Grade B, Grade B
P-Reviewer: Guedes A; Zhang WM S-Editor: Luo ML L-Editor: A P-Editor: Zhao YQ
Erdil M, Kuyucu E, Ceylan HH, Sürücü S, Erdil I, Kara A, Gülenç BG, Bülbül M. The Effect of Incorrect Foot Placement on the Accuracy of Radiographic Measurements of the Hallux Valgus and Inter-Metatarsal Angles for Treating Hallux Valgus.ACHOT. 2017;84:196-201.
[PubMed] [DOI] [Full Text]
Kwolek K, Liszka H, Kwolek B, Gądek A.
Measuring the Angle of Hallux Valgus Using Segmentation of Bones on X-Ray Images. In: Tetko IV, Kůrková V, Karpov P, Theis F, editor. Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions. Berlin: Springer, 2019.
[PubMed] [DOI] [Full Text]
Ronneberger O, Fischer P, Brox T.
U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editor. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Berlin: Springer, 2015.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 13000][Cited by in RCA: 9706][Article Influence: 970.6][Reference Citation Analysis (0)]