Date of Award

Summer 2021

Publication Type


Degree Name



Computer Science

First Advisor


Second Advisor

D. Wu

Third Advisor

I. Ahmad


Datasets, Simulated data, Reality gap, Environment, Setting environmental properties, System structure, Information retrieval, Ground-truth labels



Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Rapid advancements in object recognition have created a huge demand for labeled datasets for the task of training, testing, and validation of different techniques. Due to the wide range of applications, object models in the datasets need to cover both variations in geometric features and diverse conditions in which sensory inputs are obtained. Also, the need to manually label the object models is cumbersome. As a result, it becomes difficult for researchers to gain access to adequate datasets for the development of new methods or algorithms. In comparison, computer simulation has been considered a cost-effective solution to generate simulated data for the training, testing, and validation of object recognition techniques. However, its effectiveness has been the major concern due to a problem commonly known as the reality gap, which emphasizes the differences that exist between real and simulated images. Aimed at bridging the reality gap, this study incorporates the influential factors that cause the problem and then proposes to adjust the setting of simulation to not only imitate the objects but also the environment that matches with the real-world scenario. In addition, it includes a system structure to retrieve information of the real world and to incorporate this information in the setting of environmental properties in simulation. This study covers a total of 14 experiments using different influential factors to generate simulated data and assess the reality gap with real-world counterpart images. The proposed approach enables the rendering of realistic data with ground-truth labels, thus making simulated datasets a cost-effective and efficient alternative.