Date of Award

2-29-2024

Publication Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

Keywords

cell type identification;cell-cell interaction prediction;dimensionality reduction methods;machine learning models;single-cell RNA seq data;sparse dataset

Supervisor

Luis Rueda

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

The advent of high-throughput scRNA-seq technologies has enabled the study of individual cells and their biological mechanisms. Traditional clustering methods, commonly employed in scRNA-seq data analysis for identifying cell types, face challenges due to the sparsity and high-dimensionality of the data. To overcome these limitations, we propose an integrated approach that combines non-linear dimensionality reduction techniques with clustering algorithms. Our method involves the use of modified locally linear embedding in conjunction with independent component analysis to identify representative clusters of different cell types. We evaluate the performance of this approach across thirteen publicly available scRNA-seq datasets, encompassing various tissues, sizes, and technologies. Gene set enrichment analysis further confirms the effectiveness of our method, demonstrating superior performance compared to existing unsupervised methods across diverse datasets. Also, we investigate Neural Network-based methods combined with self-organizing maps, feature selection approaches for informative marker gene selection in sparse datasets, as well as supervised techniques, to overcome the high-dimensionality and sparsity of scRNA-seq datasets in cell type identification. Building on the foundation of identifying cell types, we extend our investigation to intercellular signaling networks. Recognizing the limitations of existing link prediction approaches based on graph-structured data, we introduce a novel method named Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO). SEGCECO utilizes an attributed graph convolutional neural network to predict cell-cell communication from scRNA-seq data. Overcoming challenges associated with high-dimensional and sparse scRNA-seq data, we employ SoptSC, a similarity-based optimization method, to construct a cell-cell communication network. Our experiments on six datasets from human and mouse pancreas tissue reveal that SEGCECO outperforms latent feature-based approaches and the state-of-the-art link prediction method, WLNM, achieving a remarkable 0.99 ROC and 99% prediction accuracy. In summary, our approach, spanning the identification of cell types and the prediction of cell-cell communication, leverages advanced techniques to enhance the analysis of scRNA-seq data. This research contributes to the comprehensive understanding of disease modules and intercellular signaling networks, paving the way for more accurate and insightful investigations in the field of single-cell genomics.

Share

COinS