T.Z. Nkgau and G. Anderson (Botswana)
Operating System Traces, Data Mining
Recent developments in hardware have left software lagging behind. Scheduling is one of the problems. To improve scheduling algorithm performance, a detailed analysis of scheduling traces must be undertaken. However, capturing scheduling data leads to large trace files. For example, a run of a benchmark for 20 minutes produces a 14MB trace file. In this paper, we present a novel approach for using Business Intelligence Data Mining tools built into a commercial Database Management System to discover patterns in the trace that could be used to produce tools to 1) compress huge trace files, 2) characterize traces for synthetic trace generation, so that there is no need to store huge traces files, and 3) learn more about operating system scheduling algorithms by analyzing relatively short sequences of trace records, again eliminating the need to store huge trace files. We use two well known data mining algorithms: Sequence Clustering and Association Analysis. We conclude that the approach is very useful for production of compression tools, synthetic trace generation, and studying of scheduling algorithms. For compression Sequence Clustering is better than Association Rules.
Important Links:
Go Back