"Privacy-preserving techniques in modern machine learning models" by Ali Abbasi Tadi

Date of Award

2-19-2025

Publication Type

Doctoral Thesis

Degree Name

Ph.D.

Department

Computer Science

Keywords

AI security, Machine learning security, Privacy-preserving machine learning

Supervisor

Dima Alhadidi

Rights

info:eu-repo/semantics/embargoedAccess

Abstract

Modern data analysis and machine learning applications face the dual challenge of handling high-dimensional data while ensuring data privacy. What makes things more serious are the regulations to address privacy issues such as the health insurance portability and accountability act to protect health Information and the European general data protection regulation to protect all sensitive personal data. This thesis integrates advanced clustering techniques, secure computing frameworks, and federated learning optimizations to present a unified progression of methodologies for scalable and privacy-preserving machine learning. The contributions span diverse domains, offering practical solutions for social network analysis, single-cell genomics, and natural language processing (NLP) applications while laying the groundwork for future innovations in secure and efficient data analysis. The thesis presents innovative frameworks—NICASN, PPPCT, and Trustformer—that address these challenges in social networks, genomics, and federated learning contexts, respectively. Nonnegative matrix factorization and independent component analysis for clustering social networks (NICASN) introduces a hybrid clustering approach that combines nonnegative matrix factorization (NMF), independent component analysis (ICA), and $k$-means with advanced centroid initialization. NICASN effectively detects communities in large-scale social networks by reducing dimensionality and improving clustering quality while maintaining computational efficiency. The success of NICASN in handling high-dimensional social network data inspired its adaptation to biological datasets, leading to the development of privacy-reserving parallel clustering for transcriptomics data (PPPCT). PPPCT addresses the unique challenges of single-cell RNA sequencing (scRNA-seq) data: high dimensionality, sparsity, and stringent privacy concerns. PPPCT ensures privacy while achieving state-of-the-art clustering quality and scalability for sensitive genomic datasets by incorporating NMF for dimensionality reduction, parallel $k$-means clustering, and Intel Software Guard Extensions (SGX) for secure computations. PPPCT paved the way for the challenge of even more complex privacy-preserving machine learning tasks, culminating in the development of Trustformer. Trustformer builds on the foundations of PPPCT, applying its clustering and privacy-preserving strategies to address the challenges of training deep learning models on decentralized sensitive data. Trustformer is a novel Federated Learning (FL) framework designed to train large Transformer models. Unlike traditional FL methods that transmit entire model weights, Trustformer incorporates $k$-means clustering to FL aggregation mechanism, which significantly reduces communication overhead. The Trustformer applies $k$-means clustering on each layer of a deep neural network and finds centroids in each layer. Then, it aggregates the centroids of model parameters in the central server instead of full model parameters. This results in reduced communication overhead and enhanced privacy.

Available for download on Wednesday, February 18, 2026

Share

COinS