4.6 Article

Intra- and interobserver agreement when describing adnexal masses using the International Ovarian Tumor Analysis terms and definitions: a study on three-dimensional ultrasound volumes

Journal

ULTRASOUND IN OBSTETRICS & GYNECOLOGY
Volume 41, Issue 3, Pages 318-327

Publisher

WILEY
DOI: 10.1002/uog.12289

Keywords

Doppler ultrasound; ovarian neoplasms; reproducibility of results; 3D imaging; ultrasonography

Funding

  1. Swedish Medical Research Council [K2001-72X-11605-06A, K2002-72X-11605-07B, K2004-73X-11605-09A, K2006-73X-11605-11-3]
  2. region of Scania

Ask authors/readers for more resources

Objectives To estimate intraobserver repeatability and interobserver agreement in: (1) describing adnexal masses using the International Ovarian Tumor Analysis (IOTA) terms and definitions; (2) the risk of malignancy calculated using IOTA logistic regression model 1 (LR1) and model 2 (LR2); and (3) the diagnosis made on the basis of subjective assessment of ultrasound images. Methods One-hundred and three adnexal masses were examined by transvaginal gray-scale and power Doppler ultrasound. Three-dimensional ultrasound volumes of the mass were saved. After 12-18 months the volumes were analyzed twice, 1-6 months apart, by each of two independent experienced sonologists who used the IOTA terms and definitions to describe the masses. The risk of malignancy was calculated using LR1 and LR2. The sonologists also classified the masses as benign or malignant using subjective assessment. Results Eighty-four masses were benign, eight were borderline and 11 were invasively malignant. There was substantial variability within and between observers in the results of measurements included in LR1 and LR2 and some variability also when assessing categorical variables included in the models (agreement = 51-100% and kappa = 0.42-1.00). This resulted in substantial variability in the calculated risk of malignancy, the limits of agreement indicating that the calculated risk of malignancy could vary by a factor of 5-20 within and between observers. The reliability of the calculated risk of malignancy was moderate (LR1) or poor (LR2) when the calculated risk of malignancy was > 10% (intraclass correlation coefficients varied from 0.21 to 0.73). Interobserver agreement when classifying tumors as benign or malignant using the predetermined risk of malignancy cut-off of 10% was fair to good (agreement = 85% and kappa = 0.61 for LR1; agreement = 81% and kappa = 0.52 for LR2). Intra-and interobserver agreements for subjective assessment were 96%, 96% and 96% with kappa values of 0.89, 0.87 and 0.88, respectively. Conclusions Intra- and interobserver agreement in classifying tumors as benign or malignant using the risk of malignancy cut-off of 10% for LR1 and LR2 was fair or good, whilst the reproducibility of subjective assessment was excellent. The reliability of calculated risks > 10% was poor, and calculated risk > 10% cannot be used to discriminate between individuals at different risk. These results cannot be extrapolated to real-time ultrasound examinations. Copyright. (C) 2012 ISUOG. Published by John Wiley & Sons, Ltd.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available