Hierarchical Document Categorization using Associative Networks

Niels Bloom, Mariët Theune, and Franciska de Jong

Keywords

Associative Networks, Automatic Categorization, Connectionist Language Model, Document Clustering

Abstract

Associative networks are a connectionist language model with the ability to handle dynamic data. We used two associative networks to categorize random sets of related Wikipedia articles with only their raw text as input. We then compared the resulting categorization to a gold standard: the manual categorization by Wikipedia authors and used a neural network as a baseline. We also determined a human rating by having a group of judges rank the four categorization methods by correctness and by usefulness with regards to finding information. Based on these tests, we determined that associative networks produce results that are clearly better than the neural network baseline, coming close to the gold standard in terms of usefulness and correctness. Furthermore, automated testing suggests these results continue to hold for large datasets.

Important Links:



Go Back