Date of Award

2022

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Cell-cell communication, Graph convolutional neural network, Latent feature approaches, Link prediction, Single-cell RNA-seq, Subgraph embedding

Supervisor

L. Rueda

Supervisor

N. Zhang

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Recently, graph-structured data has become increasingly developed in a variety of fields from biological networks to social networks. While link prediction is one of the key problems in graph theory, cell-cell communication regulates individual cell activities and is a crucial part of tissue structure and function. In this regard, recent advances in single-cell RNA sequencing technologies have eased routine analyses of intercellular signaling networks. Previous studies work on various link prediction approaches. These approaches have certain assumptions about when nodes are likely to interact, thus, showing high performance for some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network.

In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent as well as explicit attributes of undirected, attributed graphs constructed from the gene expression profiles of individual cells. High-dimensional and sparse single-cell RNA-seq data make the process of converting the data to a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell similarity matrix is learned from single-cell gene expression data. The cell-cell communication network is then built using this similarity matrix.

To evaluate our proposed method, we performed experiments on six scRNAseq datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, as well as the state-of-the-art method for link prediction, WLNM, with 0.99 ROC area under the curve and 99% prediction accuracy.

Share

COinS