S. Ge, J. Song, C. Lai, E. Li, W. Hu (PRC), and X. Tian (USA)
Thread-level parallelism, OpenMP, hyper-threading, machine learning, performance evaluation, optimization.
This paper analyzes a Pentium 4 hyper-threading processor and a Pentium 4 hyper-threading processor on 90nm technology with a machine learning workload parallelized with OpenMP* and Intel compiler. The focus is to understand SNPs performance and the underlying reasons behind that performance. The particular attention is paid to micro-architecture metrics and comparison to examine and evaluate, where appropriate, how those two types of processors perform relative to expectation on SNP machine learning workloads. Results include parallel speedup, micro-architecture metrics comparison.
Important Links:
Go Back