Document

Problems and recommendations for using machine learning for medical image analysis : case of tumor detection in histological images.

Source
Master's thesis
Country
Oman
City
Muscat
Publisher
Sultan Qaboos University
Gregorian
2023
Language
English
Thesis Type
Master's thesis
English abstract
Deep learning made a powerful impact and was a massive success in numerous pattern recognition applications, which led to growth in several areas, such as visual recognition, self-driving cars, natural language processing, and health care. The use of machine learning in healthcare increased over the past few years for storing data and as a data source to assist the radiologist in consultation. The research on using deep learning-based medical image analysis for computeraided diagnoses has increased since the Food and Drug Administration approved the first computer-aided diagnoses commercial system in 1998 as a second opinion in screening mammography. Although there is a plethora of models with high accuracy for medical applications, a limited number of deep learning applications are used in the real world and approved by medical bodies such as the Food and Drug Administration. To address the issues of the recent research that are related to data collection, we applied a systematic literature review using IEEE to review published papers from 2020 to May 2023 on detecting breast tumors using histological images. The initial search result was 273 papers. We excluded 261 papers with an irrelevant title, did not use histological images, or did not use deep learning methods. We reviewed the 12 papers and analyzed them using the PROBAST method based on the biases in participants, predictors, outcomes, and analysis. Our systematic review shows that all the reviewed papers have selected public datasets, which results in unclear participant biases due to the need for more information on the selected participants. Most of the research has a low predictor risk of biases since most use the exact predictor from all the participants. Most papers have an unclear risk of biases in the outcomes since the dataset provided the labels. Most papers have a high analysis risk of biases due to the small sample size of the images, the unclear validation method, and relying only on internal validation for the images. We recommend that researchers use a private dataset and follow a case-control design in selecting the participants. If finding a private dataset is impossible, we recommend reviewing the public dataset with a pathologist before using it. We designed a methodology based on our review and applied it to breast tumor histological images using SQUH dataset combined with BreakHis public dataset. The SQUH dataset was collected in the SQUH histopathology laboratory from 158 patients between 2017 and 2019. The BreakHis dataset was collected from 82 patients in Pathological Anatomy and Cytopathology, Parana, Brazil. We reviewed the public dataset with a pathologist then we used it in a convolutional autoencoder model. We used an autoencoder for feature extraction and tested two loss functions for quality assessment the mean squared error and the binary cross-entropy loss. Then, we used the encoder's last layer (Bottleneck) for classification using convolutional neural networks. We tested the algorithm using different magnifications and parameters. The results show that binary crossentropy loss was better than the mean squared error loss for histological images since the mean squared error provided a blurry image and could not extract the features compared to the binary cross-entropy loss. The binary cross-entropy loss shows an F1 score of 90, Recall of 89.5, and Precision of 91. We experimented our model in SQUH data only using different magnifications 4x, 10x, 20x, and 40x. The result shows that 10x was the best magnification to work with an F1-score of 93.
Category
Theses and Dissertations