Document
Problems and recommendations for using machine learning for medical image analysis : case of tumor detection in histological images.
Source
Master's thesis
Country
Oman
City
Muscat
Publisher
Sultan Qaboos University
Gregorian
2023
Language
English
Thesis Type
Master's thesis
English abstract
Deep learning made a powerful impact and was a massive success in numerous
pattern recognition applications, which led to growth in several areas, such as
visual recognition, self-driving cars, natural language processing, and health care.
The use of machine learning in healthcare increased over the past few years for
storing data and as a data source to assist the radiologist in consultation. The
research on using deep learning-based medical image analysis for computeraided diagnoses has increased since the Food and Drug Administration approved
the first computer-aided diagnoses commercial system in 1998 as a second
opinion in screening mammography. Although there is a plethora of models with
high accuracy for medical applications, a limited number of deep learning
applications are used in the real world and approved by medical bodies such as
the Food and Drug Administration. To address the issues of the recent research
that are related to data collection, we applied a systematic literature review using
IEEE to review published papers from 2020 to May 2023 on detecting breast
tumors using histological images. The initial search result was 273 papers. We
excluded 261 papers with an irrelevant title, did not use histological images, or did
not use deep learning methods. We reviewed the 12 papers and analyzed them
using the PROBAST method based on the biases in participants, predictors,
outcomes, and analysis. Our systematic review shows that all the reviewed
papers have selected public datasets, which results in unclear participant biases
due to the need for more information on the selected participants. Most of the
research has a low predictor risk of biases since most use the exact predictor from
all the participants. Most papers have an unclear risk of biases in the outcomes
since the dataset provided the labels. Most papers have a high analysis risk of
biases due to the small sample size of the images, the unclear validation method,
and relying only on internal validation for the images. We recommend that
researchers use a private dataset and follow a case-control design in selecting
the participants. If finding a private dataset is impossible, we recommend
reviewing the public dataset with a pathologist before using it.
We designed a methodology based on our review and applied it to breast tumor
histological images using SQUH dataset combined with BreakHis public dataset.
The SQUH dataset was collected in the SQUH histopathology laboratory from 158
patients between 2017 and 2019. The BreakHis dataset was collected from 82
patients in Pathological Anatomy and Cytopathology, Parana, Brazil. We
reviewed the public dataset with a pathologist then we used it in a convolutional
autoencoder model. We used an autoencoder for feature extraction and tested
two loss functions for quality assessment the mean squared error and the binary
cross-entropy loss. Then, we used the encoder's last layer (Bottleneck) for
classification using convolutional neural networks. We tested the algorithm using
different magnifications and parameters. The results show that binary crossentropy loss was better than the mean squared error loss for histological images
since the mean squared error provided a blurry image and could not extract the
features compared to the binary cross-entropy loss. The binary cross-entropy loss
shows an F1 score of 90, Recall of 89.5, and Precision of 91. We experimented
our model in SQUH data only using different magnifications 4x, 10x, 20x, and 40x.
The result shows that 10x was the best magnification to work with an F1-score of
93.
Category
Theses and Dissertations