Feature Selection and Conversion Methods in KDD Cup 99 Dataset: A Comparison of Performance

V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, and E. Hernández-Pereira (Spain)

Keywords

Classification, Feature selection, Conversion methods, KDD Cup 99

Abstract

In this work, the KDD Cup 99 dataset, a benchmark dataset in the intrusion detection field, is used to perform a comparative study that involves Feature Selection (FS) and symbolic-numeric conversion methods, as well as classifiers. FS may enhance the generalization capabilities of the classifiers, while discarding the existing irrelevant features in the KDD Cup 99 dataset. Among the different FS methods, the large number of samples of the KDD Cup determines the election of filters as the most adequate alternative. KDD Cup size also forces selecting classifiers that can handle it, in this case: C4.5, naive Bayes, One-Layer Feed forward Neural Network, Proximal Support Vector Machine and Multilayer Feedforward Neural Network. As some of these methods are not applicable over symbolic features, four different symbolic-numeric techniques will be employed to convert them. Then, the results of a broad study that includes two filters, four conversion methods and five classifiers, in addition to other techniques such as clustering or discretization, are shown. Results achieved over come the KDD contest winner results, while using only 15% of the original features, with the added advantages of simplicity, and time and memory reduction.

Important Links:



Go Back