-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
Deep learning based screening and ancillary testing for thyroid cytopathology
- Authors: Dov, David and Range et al.
- Journal: The American journal of pathology
- Year: 2023
- Link: https://pdf.sciencedirectassets.com...
Introduction
- In this article, we propose a deep-learninge based ternary, rather than a classic binary, model, which classifies each
FNAB scan into one of the three categories: _benign, indeterminate, or malignant. - At the expense of providing indeterminate classifications in some of the cases, the model is tuned to provide accurate low and high ROM (Risk of Malignancy) for the benign and malignant categories, respectively.
- This approach allows us to apply the algorithm in two practical use cases while achieving clinical-grade performance:
- Screening to identify determinate cases (i.e., providing definitive and reliable predictions that do not require further manual review by pathologists)
- Ancillary testing for disambiguating and reducing the number of indeterminate cases, to help reduce unnecessary surgeries. The algorithm screens and definitively classifies 45.1% (130/288) of the scans as either benign or malignant, while providing human expertlevel ROMs of 2.7% and 94.7%, respectively. The algorithm further reduces the number of indeterminate cases by
definitively classifying 21.3% (23/108) with a ROM of 1.8%.
Dataset
- The cohort comprised 2169 FNAB slides. The authors excluded FNABs diagnosed as non-diagnostic by the EMR
cytopathologist (CP), which comprised 3.2% of the cases. The final data set was divided into a training set of 964 FNABs and a test set of 601 FNABs. - All slides were cleaned and scanned with a 40 objective and nine levels of Z-stack on a Leica AT-2 scanner.
- The authors used the middle Z-stack, which was further down-sampled by a factor of four in each dimension to reduce processing time.
Algorithm
- The proposed algorithm is inspired by the workflow of cytopathologists and comprises two CNNs -
Informativeness CNN,Malignancy CNN.- The first network, termed
informativeness CNN, discriminates thyroid follicular cells. These diagnostically relevant areas
typically comprise only a tiny fraction of the entire slide, which is otherwise mostly occupied by irrelevant cellular material (eg, blood cells). Theinformativeness CNNmitigates this challenge by selecting only relevant areas, effectively reducing data dimensionality. - The second network, malignancy CNN, classifies FNABs into the three clinically relevant categories (benign, indeterminate, or malignant). The classification is based on ordinal regression whereby a scalar output of the network is compared with two learnable threshold parameters.
- During the training phase, the threshold parameters are tuned together with the parameters of the neural network via
stochastic gradient descent. By the nature of ordinal regression, the three categories reflect increasing probability of malignancy.
- During the training phase, the threshold parameters are tuned together with the parameters of the neural network via
- The first network, termed
- Both CNNs are based on the widely used VGG11 pretrained on Imagenet.
- Each of the RGB color channels of the scans was normalized to have 0 mean and variance 1. Then, the scans were tiled into patches of 128 x 128 pixels and fed into the
informativeness CNNthat predicts if they are informative (i.e., contain thyroid follicular cells). - During the training, the informative patches were sampled from the subset of regions manually marked by D.E.R. in the training
set.- Direct smears made from FNABs contain far more uninformative regions (blank regions, blood cells, and artifacts) than informative ones. In some scans, which typically contain hundreds of thousands of patches, merely a few of them are informative. Because manually annotating regions in the scans is extremely time-consuming, the authors decided to devote this effort to only mark informative regions. The uninformative regions were sampled uniformly from the WSI given the overwhelmingly high likelihood of
sampling background/negative areas. - After completing the training process, a sliding window sweeps over the WSI, and the CNN predicts the informativeness of each patch. For each WSI, the most informative patches are selected and organized into a set of patches of a fixed size, which are then fed into the malignancy CNN. The authors used 1000 patches per WSI for training the malignancy CNN, a number that provides a sufficiently large amount of data to train the neural network.
- The authors found this scheme efficient in extracting the informative regions, while filtering out white space and irrelevant material. The authors’ patches selection strategy allowed selecting overlapping patches. Therefore, when the number of informative regions was smaller than the fixed number of selected patches (1000 in training and 100 in testing), the
informativeness CNNusually selected overlapping regions. - An alternative approach is selecting only patches with prediction value of the
informativeness CNNhigher than a certain threshold value. However, there is no straightforward way to select the threshold value, and this alternative approach did not provide improvement in early experiments.
- Direct smears made from FNABs contain far more uninformative regions (blank regions, blood cells, and artifacts) than informative ones. In some scans, which typically contain hundreds of thousands of patches, merely a few of them are informative. Because manually annotating regions in the scans is extremely time-consuming, the authors decided to devote this effort to only mark informative regions. The uninformative regions were sampled uniformly from the WSI given the overwhelmingly high likelihood of
- The
malignancy CNNprovides predictions of the final surgical pathology diagnosis by averaging the predictions obtained from each patch in the set. To transform the algorithm’s predictions into clinically relevant classifications of benign, indeterminate, and malignant categories, the authors used learnable threshold parameters to which they compare the (continuous) output of themalignancy CNN.- Let
$p \in [0, 1]$ be the (continuous) output of themalignancy CNN(after the sigmoid layer) and let$\tau_1, \tau_2, \tau_3, \tau_4 \in \mathbb{R}$ be the learnable thresholds. The thresholds divide the predictions into ranges associated with the different TBS categories, each with an increased risk of malignancy. - This strategy allows the authors to automatically tune the threshold parameters as part of the training process while allowing the
malignancy CNNto learn from the final pathology labels, which are the gold standard/ground truth. During testing, the
- Let
Results
- As a secondary question, the authors wondered how the algorithm performed among cases for which the CPs showed some disagreement. These are presumably more difficult cases, which are less likely to be in the determinate categories of benign or malignant.
