D. Cho and D. Kim (Korea)
parallel data mining, frequent pattern tree, load balancing
Data mining is an effective method of the discovery of useful information such as association rules and previously unknown patterns existing in large databases. We have developed a distributed frequent-pattern mining algorithm with distributed FP (frequent pattern) trees on a networked computing cluster. The algorithm parellelizes FP-growth algorithm, generates local FP trees independently, and partitions the conditional database to all processors so as to have equal load for mining computation. Performance is enhanced by avoiding the construction and broadcast of the global FP tree, and by utilizing the computational power efficiently with even work load distribution. The improvement of the algorithm is experimentally observed on a Linux cluster over the count distribution algorithm, one of the best-known parallel algorithms for association mining.
Important Links:
Go Back