Our initial analysis strategy involved seeking differences in the mean values produced by a number of instruments of a given model across a range of analyte values utilizing ANOVA. However, we found this strategy failed to detect many appreciable differences. Specifically, when two models exhibit a “crossover” pattern of mean model values as a function of the analyte values, ANOVA may fail to detect a significant difference, because consistently positive differences over half of the analyte range cancel out consistently negative differences over the other half. Therefore, we supplemented our statistical analyses with graphic analyses.
When a crossover pattern was visualized graphically, the portion of the data sets where the divergences were seen eg, the upper or lower portion) was used for a second ANOVA. This often statistically confirmed the visually apparent differences between models. This analysis strategy increased the incidence of statistically significant model differences from 65% (100/153) to 75% (115/153). Although we have presented mean differences from the full range of values in the “Results” section, it is clear that such comparisons may sometimes minimize real model differences in one portion of the analyte range, as noted in Table 3 and seen in Figs 4-6.
Limitations and Advantages
Do data analyses based on FCE ampules measured in a proficiency testing survey validly reflect model differences for blood that might be reflected in clinical practice? Currently no proficiency testing material available for shipment is equivalent to freshly tonometered fresh human blood (see below). Another possible disadvantage is that the measurements are made by hundreds of technicians with differing experience and training using hundreds of different instruments. As previously noted/ laboratories with better quality control, more equipment, more frequent analyses, and dedicated personnel are likely to have less imprecision and less inaccuracy in their analyses. Advantages of using this database are as follows: it is unlikely that the distribution of any one model of analyzer is concentrated at high or low altitudes or “better” or poorer” laboratories; the differences between technician practices tend to cancel out when a large number of technicians are used; a large amount of data can be collected and analyzed uniformly; the inherent differences between models tend to be more evident when large numbers of analyses are made; ; it is unlikely that any single site would have this diversity of operating analyzer models available at any one time; and the infectious problems associated with the handling of large quantities of blood are avoided. Certainly, the finding of consistent deviations between models using any single type of proficiency testing material is strong evidence that the models actually differ in some way from each other.