Improving the efficiency of Bayesian Network Based EDAs and their application in Bioinformatics
Date of Award
CC BY-NC-ND 4.0
Estimation of distribution algorithms (EDAs) is a relatively new trend of stochastic optimizers which have received a lot of attention during last decade. In each generation, EDAs build probabilistic models of promising solutions of an optimization problem to guide the search process. New sets of solutions are obtained by sampling the corresponding probability distributions. Using this approach, EDAs are able to provide the user a set of models that reveals the dependencies between variables of the optimization problems while solving them. In order to solve a complex problem, it is necessary to use a probabilistic model which is able to capture the dependencies. Bayesian networks are usually used for modeling multiple dependencies between variables. Learning Bayesian networks, especially for large problems with high degree of dependencies among their variables is highly computationally expensive which makes it the bottleneck of EDAs. Therefore introducing efficient Bayesian learning algorithms in EDAs seems necessary in order to use them for large problems. In this dissertation, after comparing several Bayesian network learning algorithms, we propose an algorithm, called CMSS-BOA, which uses a recently introduced heuristic called max-min parent children (MMPC) in order to constrain the model search space. This algorithm does not consider a fixed and small upper bound on the order of interaction between variables and is able solve problems with large numbers of variables efficiently. We compare the efficiency of CMSS-BOA with the standard Bayesian network based EDA for solving several benchmark problems and finally we use it to build a predictor for predicting the glycation sites in mammalian proteins.
Salehi, Elham, "Improving the efficiency of Bayesian Network Based EDAs and their application in Bioinformatics" (2013). Electronic Theses and Dissertations. 4749.