Published online Sep 28, 2022. doi: 10.4329/wjr.v14.i9.319
Peer-review started: March 12, 2022
First decision: May 31, 2022
Revised: June 14, 2022
Accepted: September 13, 2022
Article in press: September 13, 2022
Published online: September 28, 2022
Processing time: 193 Days and 12.3 Hours
The 2018 ovarian-adnexal reporting and data system (O-RADS) guidelines are aimed at providing a system for consistent reports and risk stratification for ovarian lesions found on ultrasound. It provides key characteristics and findings for lesions, a lexicon of descriptors to communicate findings, and risk characterization and associated follow-up recommendation guidelines. However, the O-RADS guidelines have not been validated in North American institutions or amongst less experienced readers.
To evaluate the diagnostic accuracy and inter-reader reliability of ultrasound O-RADS risk stratification amongst less experienced readers in a North American institution with and without pre-test training.
A single-center retrospective study was performed using 100 ovarian/adnexal lesions of varying O-RADS scores. Of these cases, 50 were allotted to a training cohort and 50 to a testing cohort via a non-randomized group selection process in order to approximately equal distribution of O-RADS categories both within and between groups. Reference standard O-RADS scores were established through consensus of three fellowship-trained body imaging radiologists. Three PGY-4 residents were independently evaluated for diagnostic accuracy and inter-reader reliability with and without pre-test O-RADS training. Sensitivity, specificity, positive predictive value, negative predictive value (NPV), and area under the curve (AUC) were used to measure accuracy. Fleiss kappa and weighted quadratic (pairwise) kappa values were used to measure inter-reader reliability. Statistical significance was P < 0.05.
Mean patient age was 40 ± 16 years with lesions ranging from 1.2 to 22.5 cm. Readers demonstrated excellent specificities (85%-100% pre-training and 91%-100% post-training) and NPVs (89%-100% pre-training and 91-100% post-training) across the O-RADS categories. Sensitivities were variable (55%-100% pre-training and 64%-100% post-training) with malignant O-RADS 4 and 5 Lesions pre-training and post-training AUC values of 0.87-0.95 and 0.94-098, respectively (P < 0.001). Nineteen of 22 (86%) misclassified cases in pre-training were related to mischaracterization of dermoid features or wall/septation morphology. Fifteen of 17 (88%) of post-training misclassified cases were related to one of these two errors. Fleiss kappa inter-reader reliability was ‘good’ and pairwise inter-reader reliability was ‘very good’ with pre-training and post-training assessment (k = 0.76 and 0.77; and k = 0.77-0.87 and 0.85-0.89, respectively).
Less experienced readers in North America achieved excellent specificities and AUC values with very good pairwise inter-reader reliability. They may be subject to misclassification of potentially malignant lesions, and specific training around dermoid features and smooth vs irregular inner wall/septation morphology may improve sensitivity.
Core Tip: This study supports the applied utilization of the ovarian-adnexal reporting and data system (O-RADS) ultrasound risk stratification tool by less experienced readers in North America. KEY RESULTS: The O-RADS ultrasound risk stratification requires validation in less experienced North American readers; Excellent specificities (85%-100%), area under the curve values (0.87-0.98) and very good pairwise reliability can be achieved by trainees in North America regardless of formal pre-test training; Less experienced readers may be subject to down-grade misclassification of potentially malignant lesions and specific training about typical dermoid features and smooth vs irregular margins of ovarian lesions may help improve sensitivity.