Chapter 5 176 viable tumor-cells. In parallel, the pattern of staining in CD4 stained slides, which also stain CD4+ lymphocytes and macrophages, was evaluated and compared to PDL1 stained slides in order to avoid false positive assessment due to PD-L1 expressing macrophages in between tumor cells. Assessment of expression levels was performed in sections that included at least 100 tumor cells that could be evaluated. Spectral acquisition and processing Samples were processed using standardized operating procedures. We used the Deep MALDI® method of mass spectrometry on a matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometer (SimulTof Systems, Marlborough, MA, USA) to generate reproducible mass spectra from small amounts of serum (3 µL)13. This approach reveals mass spectral (MS) peaks with a greater dynamic range than previously possible by exposing the samples to 400,000 MALDI laser “shots”, rather than the several thousand used in standard applications. The spectra were processed to render them comparable between patients and 274 MS features (peaks) were selected for further analysis for their known reproducibility and stability (listed in supplement). Sample processing and MS analysis followed methods previously presented14,15 and are outlined in the supplementary materials. Parameters for these procedures were established using only the 116-sample development set, and this fixed procedure was applied to all other sample sets without modification. Test Development Test development was carried out using the Diagnostic Cortex® platform16, which has been used previously to design tests able to stratify patients by outcome in various settings, for example, to identify patients with advanced melanoma likely to be sensitive to checkpoint inhibitors14,15. The approach incorporates machine learning concepts and elements of deep learning17 to facilitate test development in cases where there are more measured attributes than samples. The potential for overfitting is minimized, thus allowing the creation of tests that can generalize to unseen datasets. Tests are created averaging over many splits of the development set into training and test sets, and reliable test performance estimates can be obtained from the development set by restricting averages to the test set evaluations (‘out-ofbag estimates’)18. For successful supervised learning, suitable training class labels are required. We used a semi-supervised approach19 that does not require accurate pre-specification of patients into better or worse outcome training classes and allows us to be guided by the gold standard time-to-event outcomes of OS and PFS. An approximation is
RkJQdWJsaXNoZXIy MTk4NDMw