Confidence scales are generally of 2 formats—discrete ordinal or quasi continuous Table 2 —and there is no optimal scale that suits all studies. It had been conventional to collect confidence scores using a discrete scale of 5—7 categories 17 , 18 , with quasi continuous point, 0— scales suggested as a solution to degenerate datasets poorly constructed ROC curves 18 , The scale must suit the study design, but in some situations a quasi continuous scale may provide a more precise measure of diagnostic accuracy When using a discrete scale, the observer is frequently provided with a series of statements or percentage points to select from when scoring an image Table 2.
This style has been used to allow observers to rate their relative confidence that an abnormality is present 5. If observers are confident that no abnormality is present, they do not score the image and a default score of zero is applied.
[Full text] Update on new technologies in digital mammography | IJWH
Quasi continuous scales have recently become popular and in some instances have been applied through the use of a slider-bar style of confidence scale. These scales are classed as point, possible confidence ratings and zero for unscored images. Consequently, these scales can be effectively used to measure observer confidence in terms of percentage certainty of disease presence i.
However, there has been concern that observers cannot use such a scale effectively as they may be unable to distinguish between a large range of rating points 17 , 19 or have poor precision in using the scale Furthermore, rebinning continuous data to a discrete scale of 1—5, 1—10, or 1—20 equal bins has not been found to generate significantly different results Eleven-point or point scales are considered adequate for reliable ROC curve fitting The overriding value of multicategory rating scales is the ability to acquire information at a variety of thresholds, with a higher rating always indicating a higher level of suspicion.
The selection of an appropriate rating scale is an important consideration when an ROC study is planned. Suggestive areas are then granted further interpretation, during which the radiologists must decide whether they are seeing signal lesion or noise no lesion. The visual search has often been focused on detection, but this is complicated in radiology since the number of signals lesions is unknown These are considerations in image evaluation, and one of the drawbacks of conventional ROC methods is the inability to take into account location information and multiple lesions.
Conventional ROC analysis would allow an observer to misinterpret an image but still arrive at what could be considered the correct answer For example, a low-conspicuity lesion could be overlooked and a lesion mimic seen and scored, with the observer deemed to score a true-positive result. This type of assumption, as described in Figure 2 , is clearly unacceptable, and location-based methods have been developed to overcome this problem.
Potential for error in visual search task in conventional ROC methods. A One observer finds lesion true and scores confidence 4. B Another observer finds lesion mimic false and scores confidence 4. C Third observer finds lesion and scores confidence 1. Incorporating localization information into the analysis is a valuable way of increasing the statistical power of the test in comparison to conventional ROC studies.
LROC was the first location method to be developed that could predict observer performance as a result of both detection and localization This was a significant step in overcoming the failings of ROC to deal with multiple targets and location. LROC requires observers to localize and score a single region in an image that they deem to be the most suggestive The LROC curve is then described as the ability to detect and correctly localize actual targets in an image. The first realization of this analysis was described by Swensson in In this type of detection task, experimental control is strongly influential since the method requires that all images contain either one lesion or no lesions 1 , 3.
This task does not necessarily reflect a true detection task, as a radiologist would not terminate the search of an image after the localization of a solitary lesion. In a clinical setting, the visual search would continue until the radiologist was satisfied that the image did not contain any further noteworthy abnormalities. The observers can localize any area within the image that they deem to be abnormal, thus allowing both correct and incorrect localizations to be made on a single image 3. This paradigm is currently considered the most representative of the clinical situation.
However, precisely localizing multiple lesions within an image can be a challenging perceptual task The localization must be within what is considered a clinically acceptable distance from the true lesion 3. Those localizations within this acceptable distance are known as lesion localizations, and those outside are known as nonlesion localizations Each localization is accompanied by a confidence score leading to the production of mark-rating pairs for each observer decision.
The clinically acceptable distance that determines mark-rating pairs as lesion or nonlesion localization is known as an acceptance radius or proximity criterion The acceptance radius will allow a slight error in localization, with the radial size from the center of the lesion a predetermined value appropriate to the study. Different sizes of acceptance radius have an impact on the figure of merit FOM calculated at analysis, therefore showing a different measure of observer performance.
Less strict radii can lead to an inflated FOM The size of the acceptance radius should be a careful consideration in study design, with a significant effect on the classification of mark-rating pairs as either lesion or nonlesion localization This disease process should have a strong influence on this decision. A potential error that can occur with an inappropriate acceptance radius is shown in Figure 3. Size of acceptance radius can influence classification of mark-rating pairs.
If acceptance radius of low-conspicuity malignant lesion overlaps high-conspicuity benign lesion, perceptual hit could occur 25 , being classified incorrectly as lesion localization. Since all observer decisions lesion and nonlesion localizations are accounted for when FROC methods are used, the observer is penalized for localizations of mimic lesions in addition to being rewarded for localizations of true lesions.
Furthermore, overlooked lesions also contribute to a decrease in diagnostic accuracy. Alternative FROC AFROC analysis can be used to analyze the data acquired from FROC studies, assuming independence between lesion localization and nonlesion localization mark-rating pairs while using only the highest nonlesion localization score if there are multiples on the same image 3.
This highest-scoring nonlesion localization is used to create an equivalent ROC rating that can calculate the false-positive fraction The AFROC curve, produced as a result of this analysis, is a plot of lesion localization fraction against false-positive fraction, with the area under the AFROC curve frequently used to define lesion detectability 12 , Nonlesion localization marks on abnormal images are ignored. For studies with no normal images, the JAFROC-1 FOM can be used to analyze data for which the most highly rated nonlesion localization on each image is included in the analysis For FROC studies, allowing multiple nonlesion localization decisions is thought to yield increased statistical power over conventional ROC methods 3.
Observer performance studies must yield good statistical power. An adequate sample size can give confidence in the reliability of the conclusions, whereas an underpowered study can cast doubts over the outcomes Typically, large numbers of observers can be required for multireader, multicase studies, in which the aim is to reduce the potential impact of interobserver variation and increase statistical power. In some instances, the effect of interobserver variation has been found to be at least as significant as the difference in modality performance Nevertheless, it is often desirable to keep the numbers of observers and cases to a minimum such that the time commitment of the observers and the costs of the study can also be kept to a minimum The importance of statistical significance should encourage investigators to perform a sample size calculation during the developmental stages of an observer performance study.
Dorfman, Berbaum, and Metz developed a multireader, multicase method for analyzing data acquired using a jackknife method This method can account for significant differences in observer performance as a result of a change in imaging modality while also indicating that the observed effect may not be the same for all observers This form of statistical analysis uses the Wilcoxon statistic 31 and can be successfully used for readers interpreting the same set of images obtained by 2 or more modalities 3.
Suited to data produced from mark-rating pairs, this analysis enables the investigator to perform a sample size calculation based on the data used in analysis. The desired effect size P value, i. The P value represents the probability of finding a difference between 2 tests in a population of cases that contains no difference For a statistical power calculation, the sources of variation observers and cases must be considered such that a meaningful conclusion can be drawn. Variance figures for observers, treatments, and cases are produced in JAFROC analysis, and these can then be entered into the sample size calculator for an accurate estimate of the number of observers and cases required for optimal statistical power.
The assumptions used in this calculator available at www.
In observer performance studies, it has been conventional to aim for high statistical power 0. As an example, for a multireader, multicase free-response study to be analyzed using the JAFROC FOM, it is possible to perform a calculation, based on the number of observers completing the proposed study and the desired effect size i. Although the statistical power of free-response studies increases with more lesions per image, investigators performing phantom simulations or adding simulated lesions to clinical images must be wary of exceeding what would be clinically realistic.
The increased power is thought to be due to the consideration of location information 3. A table of optimal sample sizes has been produced 15 as an alternative method for estimating adequate sample size, which can be useful for study planning. There are also several other points that must be considered during study development. It is important to have the correct case mix, reflecting the clinical presentation and prevalence of the disease process under investigation.
It is also relevant to inform the observer of the range of lesions size, shape, density, and average number before beginning the study. Furthermore, the conspicuity of lesions and the difficulty in localization need to be tightly controlled such that acceptable numbers of nonlesion localization marks are made on normal images For optimal statistical power, it is also desirable to present the observers with an equal ratio of normal and abnormal images that have been classified by a gold standard.
The sample of observers enrolled into the study should also be representative of the population as a whole. The following example will describe how FROC methods can be used in image optimization and dose reduction. Consider the optimization of the CT component of a hybrid scanner.
With an aim to reduce dose, there is always a concern that the image will deteriorate to a level that is not clinically acceptable because dose reduction reduces the diagnostic performance of the observer. Before a clinical study, it may be desirable to perform a phantom study to simulate the effect of a reduced dose on image quality and lesion conspicuity under controlled conditions, where lesion size and distribution are known exactly. The CT component of hybrid systems can be used to aid accurate anatomic localization of lesions suspected or incidental for many examinations.
The high inherent contrast between lung parenchyma and lung lesion within the thorax allows a low radiation dose tube current to be used to provide images of acceptable quality. Therefore, it would be interesting to compare a high-dose low-noise acquisition with a low-dose high-noise acquisition for accurate localization of lesions within the thorax.
Evidence in the literature of this type of optimization in hybrid imaging is sparse. A statistical power calculation determines that approximately random cases would be suitable for a sample of 5 observers. At this stage, the research team must select a suitable confidence scale and decide on a suitable image-viewing regime. It has been suggested that 40 min is a suitable length of time for each image observation session 33 and viewing conditions should be consistent for all observers.
Images for the high-dose and low-dose CT acquisition should be acquired such that the same lesions are shown in the same position case-matched and then displayed in a randomized order to avoid case memory. If P is greater than 0. In the case of image optimization, this result would be desirable, showing that the diagnostic performance lesion detection at a low dose is equal to that at a high dose.
The suitability of FROC methods for dose optimization in hybrid imaging has recently been described. This work also found that lesion detectability was equal at the 4 dose settings In both of these phantom simulations, there was evidence to suggest potential dose savings in patients. This phantom simulation showed that the CT attenuation correction images as would be acquired for myocardial perfusion imaging were of vastly different qualities, thus allowing significantly different lesion detection rates A typical example is shown in Figure 4 , comparing a 99m Tc-methylene diphosphonate bone scan with an 18 F attenuation-corrected bone scan.
Many more lesions can be identified in the 18 F images than in the WBS images. However, visually it is difficult to quantify the advantage that 18 F holds over WBS, as must be done to evaluate the benefit of the new technique. If no statistical advantage is identified, there may be a reluctance to explore the new technique. The free-response method would be well suited to an evaluation of this type, where the observer could accurately localize all suggestive areas of the image to define lesion detection performance over a range of cases for the 2 modalities.
A statistical evaluation would then reveal any advantage held by the 18 F technique. Free-response study would have greater power than conventional ROC analysis in defining difference in diagnostic accuracy because of requirement to accurately localize lesions. Reprinted from The advantages of FROC methods for evaluation of diagnostic accuracy are clear, but it is not a one-size-fits-all solution to observer performance. Conventional ROC methods still have a role in observer performance, particularly when a diffuse disease rather than focal disease is the central issue.
For diffuse disease, classifying an image as normal or abnormal using ROC methods is acceptable, with FROC methods best saved for focal or multiple focal diseases FROC methods can also suffer by not producing a clinically relevant answer. Although accurate localization is a good test of observer skill, it does not necessarily inform a clinician about the need for further diagnostic work-up 25 , and even though the free-response method is the closest solution to clinical that is currently available, it is still not the real thing Furthermore, it has been suggested that some of the information provided by either the ROC or the FROC paradigm can be irrelevant to the clinical question.
The free-response method has developed significantly over time, with methods of analysis changing and statistical validation ongoing as the paradigm evolves. Research in this area will continue to address these issues as the observer performance community strives to make these methods even more reliable. The 2-alternative forced choice procedure presents observers with pairs of images, one containing a signal lesion and the other no signal 12 , with the observer forced to decide which image contains the signal.
In this situation, observers can be highly sensitive to changes in image presentation during a side-by-side review This method, similar to ROC, does not require the observer to keep a consistent decision threshold throughout the test However, the 2-alternative forced choice does not provide information on the trade-off between true-positive and false-positive rates 12 and requires a greater number of observations to achieve the same accuracy. An excellent example of the value of the 2-alternative forced choice has been described by Good et al.
Observer performance assessment, in particular the FROC paradigm, can be highly valuable in the assessment of system performance and can be applied in the pursuit of image optimization and dose reduction. Combined with previous extensive research and guidance on performing observer studies, these techniques are providing great potential to optimize practice within nuclear medicine.
Thompson 1 , 2 , David J. LA14 4LF. Previous Section Next Section. View this table: In this window In a new window. Previous Section. Receiver operating characteristic curves: a basic understanding. Medline Google Scholar. Chakraborty D. Statistical power in observer-performance studies: comparison of the receiver operating characteristic and free-response methods in tasks involving localization.
Acad Radiol. The Fleiss kappa statistic for agreement among 4 readers was 0. Reader 1 graded 22 DBT cases Reader 2 graded 10 DBT cases Reader 3, meanwhile, graded 26 DBT cases Reader 4, graded 20 DBT cases There were a total of 11 histologically proven malignancies and 20 benign findings. The readers 1 and 2 chose DBT as having a superior or equivalent image quality for malignancies in 10 Readers 3 and 4 rated DBT as superior or equivalent image quality for malignancies in 8 For benign lesions, the preference for DBT was slightly higher in reader 4.
But these differences were not significant Table 5 , Figures 1 and 2. Table 5. Figure 1. Figure 2. Previous clinical experience with DBT for assessing breast microcalcifications had revealed possible pitfalls, and few studies had compared DBT and FFDM for evaluating microcalcifications. As the data showed, all readers tended to prefer DBT for microcalcification visibility, supporting an early report by Kopans et al.
They suggested the lengthy exposure time of the tomosynthesis acquisition may have introduced motion-related blur, obscuring additional microcalcifications, and morphology of microcalcifications. In this point, with technologic modifications for reducing acquisition time, image quality of microcalcifications on DBT could be much better than that on FFDM. A detection study by Splanger et al 9 determined that FFDM was slightly more sensitive than tomosynthesis for the detection of microcalcifications.
They adduced that the factor of the lower sensitivity of DBT was thin slice thickness.go site
Digital breast tomosynthesis: observer performance study.
Although increasing the slice thickness will increase the ability to perceive a distribution of microcalcifications in the breast, the spatial resolution of each individual calcification is compromised with slabbing. Slab thickness can be tailored by clinical radiologist during assessing the DBT images. Radiologists could optimize slab thickness for specific purpose such as detection or characterization of microcalcifications.
Our results are discordant with recent study by Clauser et al 35 ; they concluded that there were no significant differences between wide scan-angle DBT and FFDM for detection and characterization of microcalcifications. This could be related to different scan settings and small sample sizes. We found slight inter-reader agreement K : 0. This result is comparable with recent study which showed significant inter-reader differences about visibility of microcalcifications.
This result is similar to those of a prior study by Smith et al 36 that showed that DBT could improve diagnostic performance and reduce recall rates even in less experienced radiologists. Thus, some degree of training is necessary for detecting and identifying microcalcifications on DBT.
All readers perceived the DBT demonstrated microcalcifications in malignant lesions better or at least equally than FFDM in malignant lesions, although the results were not statistically significant under the study conditions. For benign lesions, visibility of microcalcifications was slightly lower on DBT.
This result is in agreement with recent study by Clauser et al 35 showed lower visibilities for benign lesions. They concluded that this could be related to lesion distribution or different natures of microcalcifications and associated findings. Specimen radiography is a long established procedure for confirming the presence of both calcified and noncalcified targeted lesions after core needle biopsy and surgical excision. Our study had a number of limitations. Moreover, we did not classify the microcalcifications by BI-RADS category, but characteristics such as size, distribution, form, and density could be clues to potential malignancy.
Finally, each obtained specimen was placed in a conventional specimen container for the DBT, and these containers produce artifacts in DBT reconstructions. Peer Review: Six peer reviewers contributed to the peer review report. Author Contributions JB and JEL conceived and designed the experiments; analyzed the data; and contributed to the writing of the manuscript. JB wrote the first draft of the manuscript. JEL made critical revisions and approved final version. All authors reviewed and approved the final manuscript.
Disclosures and Ethics As a requirement of publication, author s have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality, and where applicable protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material.
Any disclosures are made in this section. Skip to main content. Breast Cancer: Basic and Clinical Research. Article Menu. Download PDF. Open EPUB. Cite Citation Tools.
- essays leadership qualities.
- essay on the day after tomorrow;
- nuclear weapons cold war essay?
- chat qui essaye de vomir.
- People also read?
How to cite this article If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Download Citation If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Share Share. Recommend to a friend. Sharing links are not available for this article. I have read and accept the terms and conditions. Copy to clipboard. Request Permissions View permissions information for this article.
See all articles by this author Search Google Scholar for this author. Jee Eun Lee. Eun Suk Cha. Jin Chung Jin Chung. Jeoung Hyun Kim. Article information. Article Information Volume: Materials And Methods:. Keywords Breast microcalcification , digital breast tomosynthesis , full-field digital mammography. Materials and Methods. Case and data collections. Image analysis. View larger version. Statistical analysis. Characteristics of the specimens. Table 2. Pathology results.
Table 3. Table 4. Open in new tab. Download in PowerPoint. View Abstract. Article available in: All Articles. Digital breast tomosynthesis versus full-field digital mammography: comparison of the accuracy Nieun Seo and more Acta Radiologica. Digital breast tomosynthesis DBT : initial experience in a clinical setting.
Journal of the Optical Society of America A
International consortium on mammographic density: methodology and population diversity captured across 22 countries. Cancer Epidemiol. Mammographic density assessed on paired raw and processed digital images and on paired screen-film and digital images across three mammography systems. Breast Cancer Res. Estimation of percentage breast tissue density: comparison between digital mammography 2D full field digital mammography and digital breast tomosynthesis according to different BI-RADS categories.