Interobserver Agreement: Since the two pathologists graded interstitial infiltrations on slightly different scales, Pearson’s coefficients of correlation (r) were reported as a measure of relatedness, instead of kappa as a measure of concordance. We acknowledge that Pearson’s coefficient of correlation is usually higher than any other measure of reliability. However, it usually approximates other intraclass correlation coefficients such as kappa when it is assumed that the source of error is related to random variation instead of systematic deviation of an observer’s grading. The interobserver agreement regarding the overall assessment of the biopsy specimens was calculated using weighted kappa statistics. We qualified, a priori, the magnitude of the interobserver agreement as follows: a kappa ranging from 0 to 0.20 would denote a slight agreement; 0.21 to 0.40, a fair agreement; 0.41 to 0.60, a moderate agreement; and larger than 0.60, a substantial agreement. Two-tailed Fisher’s Exact Tests were used to compare the proportion of cases falling in each diagnostic category.
Likelihood Ratios: A likelihood ratio (the ratio of true-positive rate over false-positive rate) indicates by how much a given test result raises or lowers the pretest probability of the target disorder. Given a pretest probability that has been determined previously on the basis of the clinical history and other noninva-sive test results, the likelihood ratio associated with the test result is useful to calculate the posttest probability. The nomogram proposed by Fagan (Fig 2) facilitates this conversion. As a rough guide, likelihood ratios >10 or <0.1 generate large and often conclusive changes from pretest to posttest probability; likelihood ratios of 2 to 5 and 0.5 to 0.2 generate small changes in probability; likelihood ratios between 0.5 and 2 barely alter the probability. This approach to report diagnostic test accuracy is preferred over the concepts of sensitivity and specificity when the test results are measured as responses on an ordinal categorical scale rather than a dichotomous scale. Likelihood ratios and associated 95% confidence intervals (CIs) were calculated for each pathologic criterion at different cut-off points, for combinations of criteria, and for each of the four diagnostic categories.
Figure 2. Nomogram for interpreting diagnostic test results. Adapted from Fagan.