Efficient Mining of Discriminative Molecular Fragments

G. Di Fatta (Italy) and M.R. Berthold (Germany)

Keywords

Distributed computing, frequent subgraph mining, dy namic load balancing, biochemical databases.

Abstract

Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of scien ces. However, the computational complexity and the large amount of data to be explored often make the sequential al gorithms unsuitable. In this context high performance dis tributed computing becomes a very interesting and promis ing approach. In this paper we present a parallel formula tion of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The appli cation is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.

Important Links:



Go Back