Date of Award
6-18-2021
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
Cell types, Clustering, Dimensionality Reduction, Gene enrichment analysis, Single-cell, Unsupervised Learning
Supervisor
Luis Rueda
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
Identifying specific cell types is a significant step for studying diseases and potentially leading to better diagnosis, drug discovery, and prognosis. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as clustering, which are categorized in the form of unsupervised learning methods, are the most suitable approach in scRNA-seq data analysis when the cell types have not been characterized. These techniques can be used to identify a group of genes that belong to a specific cell type based on their similar gene expression patterns. However, due to the sparsity and high-dimensional nature of scRNA-seq data, classical clustering methods are not efficient. Therefore, the use of non-linear dimensionality reduction techniques to improve clustering results is crucial. We introduce a pipeline to identify representative clusters of different cell types by combining non-linear dimensionality reduction techniques such as modified locally linear embedding (MLLE) and clustering algorithms. We assess the impact of different dimensionality reduction techniques combined with the clustering of thirteen publicly available scRNA-seq datasets of different tissues, sizes, and technologies. We evaluate the intra- and inter-cluster performance based on the Silhouette score before performing a biological assessment. We further performed gene enrichment analysis across biological databases to evaluate the proposed method's performance. As such, our results show that MLLE combined with independent component analysis yields overall the best performance relative to the existing unsupervised methods across different experiments.
Recommended Citation
Danda, Saiteja, "Identification of Cell Types in scRNA-seq Data via Enhanced Local Embedding and Clustering" (2021). Electronic Theses and Dissertations. 8592.
https://scholar.uwindsor.ca/etd/8592