Intelligent prefetching and caching for scientific data mining in the middleware GAMine.

Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science


Computer Science.



Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.


Scientific data mining applications are widespread in different scientific fields. They are composed of huge datasets, complicated algorithms and often deployed on high performance parallel platforms. Especially, the increasingly large-scale data sets cause the data access to be the most time-consuming stage of the overall execution time. Caching and prefetching can be used to enhance the efficiency of data access to improve the applications' performance. Traditional OS's file system's caching and prefetching strategies as well other enhanced approaches ignore the applications' runtime situation. As a result, not all data retrieval latency can be hidden, or cache units have to be larger if data access is remote. The first step of our approach is to build a middleware---GAMine, which is independent of data sets and applications and provides a generic data access optimization strategy for scientific data mining applications. It supports both client/server and peer-to-peer architectures, and has a flexible, symmetric design. Secondly, within our GAMine, the prefetching strategy exploits the knowledge of access patterns and system parameters (latency and throughput) to set the preferred prefetch depth. In addition, GAMine can be told to select different caching policies according to different access patterns and architectures. As a result, the middleware can hide more latency and avoid cache pollution. Finally, GAMine can monitor the data consumption rate and the data delivery rate to set the prefetch depth dynamically to the optimal value as regards latency hiding and the cache size. Thus even in the dynamic situation, the latency can still be hidden at anytime due to the middleware's adaptation. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .H8. Source: Masters Abstracts International, Volume: 44-01, page: 0393. Thesis (M.Sc.)--University of Windsor (Canada), 2005.