ABSTRACT

Learning the structure of Bayesian networks(BNs) is known to be NP-complete and most of the recent work in the field is based on heuristics. Many recent approaches to the problem trade correctness and exactness for faster computation and are still computationally infeasible, except for networks with few variables. In this paper we present a software/hardware co-design approach to learning Bayesian networks from experimental data that is scalable to very large networks. Our implementation improves the performance of algorithms that are traditionally developed based on the Von Neumann computing paradigm by more than four orders of magnitude. Through parallel implementation and exploitation of the reconfigurability of Field Programmable Gate Array (FPGA) systems our design enables scientists to apply BN learning techniques to large problems such as studies in molecular biology where the number of variables in the system overwhelms any state of the art software implementations. We describe how we combine Markov Chain Monte Carlo (MCMC) sampling with Bayesian network learning techniques as well as supervised learning methods in a parallel and scalable design. We also present how our design is mapped and customized to run on the Berkeley Emulation Engine 2 (BEE2) multi-FPGA system. Experimental results are presented on synthetic data sets generated from standard Bayesian networks as well as a real life problem in the context of systems biology