Title

Reduction of collisions in Bloom filters during distributed query optimization.

Date of Award

1999

Degree Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

First Advisor

Morrissey, Joan,

Keywords

Computer Science.

Rights

CC BY-NC-ND 4.0

Abstract

The goal of distributed query optimization is to find the optimal strategy for the execution of a given query. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. Semijoins have the advantage over joins in that there are no increases in data sizes. However, a semijoin needs more local processing such as projection and higher data transmission. To improve the distributed query processing, the filter-based approach is utilized. One of the limitations of this approach is collisions. We investigate how collisions affect the performance of the algorithm and how performance can be improved given those collisions. Our proposed algorithm utilizes two sets of filters to reduce the collisions, so the performance has been improved when collisions exist. Our proposed algorithm is evaluated objectively by comparison to a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from the relations. The results of the evaluation show that: (1) With a perfect hash function, on average, our algorithm eliminates 97.41% of the unneeded data and fully reduces the relations of over 70% of the queries. (2) Using a single set of filters with specific percentages of collisions, on average, less than half of a queries are fully reduced by the algorithm. Therefore, the collisions substantially affects the performance. (3) Using two sets of filters, On average, our algorithm eliminates 95% of noncontributive tuples and achieves over 60% full reduction. In conclusion, our improved algorithm utilizes the two sets of filters to reduce the effects of collisions substantially. Therefore, we improve the performance of our algorithm under the assumption of collisions which is the major problem in using Bloom filters during distributed query optimization.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1999 .L53. Source: Masters Abstracts International, Volume: 39-02, page: 0528. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1999.