Date of Award

6-18-2021

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Cell types, Clustering, Dimensionality Reduction, Gene enrichment analysis, Single-cell, Unsupervised Learning

Supervisor

Luis Rueda

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

Identifying specific cell types is a significant step for studying diseases and potentially leading to better diagnosis, drug discovery, and prognosis. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as clustering, which are categorized in the form of unsupervised learning methods, are the most suitable approach in scRNA-seq data analysis when the cell types have not been characterized. These techniques can be used to identify a group of genes that belong to a specific cell type based on their similar gene expression patterns. However, due to the sparsity and high-dimensional nature of scRNA-seq data, classical clustering methods are not efficient. Therefore, the use of non-linear dimensionality reduction techniques to improve clustering results is crucial. We introduce a pipeline to identify representative clusters of different cell types by combining non-linear dimensionality reduction techniques such as modified locally linear embedding (MLLE) and clustering algorithms. We assess the impact of different dimensionality reduction techniques combined with the clustering of thirteen publicly available scRNA-seq datasets of different tissues, sizes, and technologies. We evaluate the intra- and inter-cluster performance based on the Silhouette score before performing a biological assessment. We further performed gene enrichment analysis across biological databases to evaluate the proposed method's performance. As such, our results show that MLLE combined with independent component analysis yields overall the best performance relative to the existing unsupervised methods across different experiments.

Recommended Citation

Danda, Saiteja, "Identification of Cell Types in scRNA-seq Data via Enhanced Local Embedding and Clustering" (2021). Electronic Theses and Dissertations. 8592.
https://scholar.uwindsor.ca/etd/8592

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Identification of Cell Types in scRNA-seq Data via Enhanced Local Embedding and Clustering

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Identification of Cell Types in scRNA-seq Data via Enhanced Local Embedding and Clustering

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner