Utilizing Deep Learning for Enhancing Performance on Encrypted Stock Market Data

Date of Award


Publication Type


Degree Name



Computer Science

First Advisor


Second Advisor


Third Advisor



Encryption, Finance, Machine learning, Neural networks, Unsupervised learning



Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Hedge funds seek to achieve higher returns on investment in the stock market. Typically, they purchase high-quality data at a high price and employ hundreds of individuals working in isolated teams, leading to duplication of research efforts due to secrecy. As such, the data cannot be shared publicly without risking their competitive edge. Numerai is a hedge fund has developed a method for encrypting high-quality stock market data without compromising its predictive power, allowing public sharing of such data in a weekly data science tournament. This method enables anyone to access the data, generate predictions, and submit them for consideration. Submitted predictions can influence Numerai's allocation of capital in the global stock market. The provided time-series data is cleaned and regularized, which comprises millions of samples and 1191 features that has evolved since their inception. The task in the tournament is to predict the probability of a given sample yielding positive returns. The non-stationary nature of the features presents a significant challenge to the participants. Additionally, participants aim to maintain stable predictions over time. Using this data as a supervised regression learning problem, we focused on improving the Sharpe ratio of correlation scores over time. Tree-based models have demonstrated effectiveness in such tasks, while neural networks have shown potential in computer vision and natural language processing. In this thesis, we investigated the incorporation of deep learning methods to improve results through unsupervised and supervised learning. Our findings indicated an improvement in Sharpe ratio over the provided baseline model. We also examined the potential for generating synthetic data using CTGAN.