Date of Award

Fall 2021

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Cancer subtype classification, Convolutional neural networks, Precision medicine, RNA Seq

Supervisor

J. Chen

Supervisor

A. Biniaz

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

The introduction of genetic testing has profoundly enhanced the prospects of early detection of diseases and techniques to suggest precision medicines. The subtyping of critical diseases has proven to be an essential part of the development of individualized therapies and has led to deeper insights into the heterogeneity of the disease. Studies suggest that variants in particular genes have significant effects on certain types of immune system cells and are also involved in the risk of certain critical illnesses like cancer. By analyzing the genetic sequence of a patient, disease types and subtypes can be predicted. Recent research work has shown that the CNN's prediction quality within this context using gene intensity features could be improved when the input is structured into 2D images.

Constructed from chromosome locations or from transformations involving kPCA, t-SNE, etc., these two-dimensional images express certain types of relationships among the intensity features. While this approach extends the success of convolutional neural networks to non-image data, getting a precise mapping of features on the images to reflect the relationship among the features is hard, if not impossible. To this end, we propose an enhancement to the approach by providing the CNN training procedure with not only the samples of the structured image data but also the samples from the unstructured raw gene expression data in its original form. While the former is fed into the convolutional layers in the network, the latter is input only to the fully connected layers of the network. The proposed method is applied to The Cancer Genome Atlas (TCGA) dataset for cancer subtypes with the median values of the expression level of all expressed genes in an RNA sequence. According to the experiments, our proposed approach can improve the classification accuracy by 2.7% when it is applied to the state-of-the-art method with 2D CNN architecture trained using images that are constructed based on chromosome locations of the genes. When built on top of the method with 2D CNN architecture trained using images that are constructed with transformation process involving t-SNE, classification accuracy is enhanced by 4.7%. For the implementation of the proposed approach on the 1D CNN model using the data structured using covariance between the features, the classification accuracy is improved by 1% and an increase of 3% is observed when the approach is implemented over the model trained using 1D CNN with data ordered based on chromosome locations.

Recommended Citation

Singh, Narider Pal, "An Enhancement to CNN Approach with Synthesized Image Data for Disease Subtype Classification" (2021). Electronic Theses and Dissertations. 8849.
https://scholar.uwindsor.ca/etd/8849

Download

Included in

Artificial Intelligence and Robotics Commons, Genetics Commons, Oncology Commons

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

An Enhancement to CNN Approach with Synthesized Image Data for Disease Subtype Classification

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

An Enhancement to CNN Approach with Synthesized Image Data for Disease Subtype Classification

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner