Pranav Bhagirath

66 Chapter 3 described in Amado et al. (2004), where the user clicked on hyper-enhanced regions within myocardium and an ensuing multi-pass region growing algorithm segmented infarct using the FWHM criterion. Algorithms are often evaluated on various different metrics. This makes comparison of algorithms challenging. Most of the methods surveyed in Table 1 either use LGE volume or represent it as a percentage to evaluate detected enhancement (for example in Flett et al. (2011) ; Harrison et al. (2014) ), or compare the amount of overlap with manual segmentation using the Dice metric (for example in Tao et al. (2010) ; Ravanelli et al. (2014) ). The framework evaluated algorithms on both scales-volume and Dice metric. For the Dice metric, segmentations were evaluated on individual infarcted regions in the image. A Dice metric on the entire image has its pitfalls as it is difficult to ascertain within which local regions algorithms fail or succeed. This was addressed using a localized Dice evaluation strategy. Future algorithms tested on the framework will be subjected to the same metrics enabling algorithms and their segmentations to be compared in a reliable manner. The presence of pseudo infarct, which mimics scar in LGE CMR images, poses various challenges for algorithms. Earlier algorithms have not addressed or incorporated this into its segmentation models. The framework provided delineations of pseudo infarct regions from an experienced observer. Algorithms were assessed on the proportion of false positives due to pseudo infarct regions. This has allowed a more objective evaluation within this framework. The n-SD and FHWM fixed models segmented a large proportion of pseudo infarct labeled by the observer. The algorithms segmented significantly less pseudo infarcts than fixed models (paired t-test p < 0.05). Furthermore, images in the database were qualitatively rated for its quality by five different observers. Algorithms’ segmentations were also evaluated separately based on the image’s rating. The proposed framework has several limitations. An important limitation is that the framework cannot be used to directly evaluate clinical utility or anatomic accuracy of the algorithms. This is since the reference standard does not include any information about outcomes (for the patient data set) or histology (for the pig data set). Another limitation is the image database size which is 30 images, of which 20 that can be used for testing and 10 usable for training. However, within this small sample, it provides a range of datasets from different scanner vendors, scanner resolution and cohorts. A second limitation is the dimensionality of the dataset. The human datasets are 2D acquisitions with 8 mm slice thickness. 2D images are commonly employed clinically for treatment stratification. For example based on the infarct volume and ejection fraction

RkJQdWJsaXNoZXIy MTk4NDMw