Search
Support & Downloads
All of Support
This Category
Software Products
Performance Tools for Software Developers
Use of Intel® MKL in HPCC benchmark

HPCC Application Note

Step 1 – Overview

This guide is intended to help current HPCC users get better benchmark performance by utilizing Intel® Math Kernel Library (Intel® MKL).

HPCC stands for High Performance Computing Challenge benchmark and is actually a suite of benchmarks that measure performance of the CPU, memory subsystem and interconnect. It consists of 7 benchmark tests – HPL (High Performance LINPACK), DGEMM (Double-precision GEneral Matrix-Matrix multiply), STREAM, PTRANS (Parallel TRANSpose, Random Access, FFT (Fast Fourier Tranform and communication bandwidth/latency.

Please find more information on HPCC from: http://icl.cs.utk.edu/hpcc/* .

Version Information

This application note was created to help users who benchmark clusters using HPCC to make use of the latest versions of Intel MKL on Linux platforms on Xeon systems. Specifically we'll address Intel MKL version 9.1.


Step 2 – Downloading HPCC Source Code

The HPCC source code can be downloaded from: http://icl.cs.utk.edu/hpcc/software/index.html*.

Prerequisites

  1. Intel MKL contains highly optimized FFT and also the wrappers for FFTW, which can be obtained from the following options:
    • Download a FREE evaluation version of the Intel MKL product.
    • Download the FREE non-commercial* version of the Intel MKL product.
    All of these can be obtained at: http://www.intel.com/software/products/mkl.

  2. Intel MPI can be obtained from http://www.intel.com/software/products/cluster.

    Open source MPI (MPICH2) can be obtained from http://www-unix.mcs.anl.gov/mpi/mpich/*.

  3. Download modified MPI FFTW wrapper interfaces with 64bit long for the MKL DFT.

    We have changed a few lines from 32-bit to 64 bit parameters in these modified wrappers, since MKL match the FFTW interfaces completely, given some 32-bit parameters in FFTW, the wrappers without modification will not work for the HPCC-FFT component.


Step 3 - Configuration

Use the following commands to extract the HPCC tar files from the downloaded hpcc-x.x.x.tar.gz and fftw2x_cdft.tar.gz files:

$gunzip hpcc-x.x.x.tar.gz
$tar –xvf hpcc-x.x.x.tar This will create a directory hpcc-x.x.x Extract fftw2x_cdft.tar.gz$gunzip fftw2x_cdft.tar.gz
$tar –xvf fftw2x_cdft.tar

Make sure that MPI, C++ and FORTRAN compilers are installed and they are in PATH. Also set LD_LIBRARY_PATH to your compiler (C++ and FORTRAN), MPI, and MKL libraries.


Step 4 – Building HPCC

  1. Build MPI MKL FFTW library.

    From the fftw2x_cdft directory, run the following command:

    $make libem64t mpi=intel3 comp=intel PRECISION=DOUBLE

    Here we are building for EM64t architecture with Intel MPI version 3.0, with Intel compilers and DOUBLE precision. This will create the MKL FFTW interface library libfftw2x_cdft_DOUBLE.a in lib/em64t directory.

    Note: Please execute $make to see the different options.

  2. Build FFTW C wrapper library

    Change the directory to <your mkl installation>/interfaces/fftw2xc, and run the command as below

    $make libem64t PRECISION=MKL_DOUBLE

    This will create libfftw2xc_intel.a library in <your mkl installation>/lib/em64t directory

  3. Build HPCC

    Change directory to hpcc-x.x.x/hpl

    Create a Makefile from the existing one, for e.g. Make.mkl. You can reuse one from the hpl/setup directory.

    Edit Make.mkl as follows: modify the LAdir, LAlib lines as below to point to MKL libraries. Assuming you have the double precision MPI fftw2x_cdft wrapper library built in $HOME/fftw2x_cdft/lib/em64t directory and you have installed 9.1.021 MKL version.

    LAdir = /opt/intel/cmkl/9.1.021

    LAlib = $(LAdir)/libmkl_em64t.a $(HOME)/fftw2x_cdft/lib/libfftw2x_cdft_DOUBLE.a $(LAdir)/libfftw2xc_intel.a $(LAdir)/libmkl_blacs.a $(LAdir)/libmkl_cdft.a $(LAdir)/libguide.a –lpthread –lm

    Build HPCC by using

    $make all arch=mkl

    This will create an executable with name hpcc in the hpcc-x.x.x directory and a file _hpccinf.txt which is a template input file for hpcc. Rename the file to hpccinf.txt.

Step 5 - Running HPCC

Modify the configuration parameters in hpccinf.txt file.

Run hpcc by executing the following command.

$mpirun –np 4 hpcc

hpccinf.txt is the same as standard hpl input file with a few additional lines. Please refer our HPL application note on tuning parameters in the configuration file.


Appendix A - Performance Results

Below are the hpcc benchmark results of Intel Atlantis cluster which can also be found in hpcc website*.


HPC Challenge Benchmark Record

System Information
Affiliation: Intel Corporation URL: http://www.intel.com/
Location: USA, Washington, DuPont System Use: Vendor
System Manufacturer: Intel System Name: Intel Atlantis cluster
Interconnect Manufacturer: Mellanox Interconnect Type: Infiniband
Operating System: RedHat EL4 Update 4 MPI: Intel MPI 3.1 beta
MPI Wtick: 1e-06 BLAS: Intel Cluster MKL 9.1.023
Language: C Compiler: Intel C/C++ Compiler 10.0.023
Compiler Flags: -O2 -xT -ansi-alias -ip Processor Type: Intel Xeon 5355
Processor Speed: 2.66 GHz Total Processors: 512
Processors Entered: 512 Processors Determined: 512
Cores Per Chip: 4 HPL Processes: 512
MPI Processes: 512 Threads Entered: 1
Threads Determined: 1 XXFLOPs Per Cycle:  
Theoretical Peak: 5.44768 TFlop/s Total Memory: 1024 GiB
FFT Library: Intel Cluster MKL 9.1.023    

HPL
HPL: 4.25904 Tflop/s HPL time: 5129.21
HPL eps: 2.22045e-16 HPL Rnorm1: 2.54184e-08
HPL Anorm1: 80376.3 HPL AnormI: 81257.5
HPL Xnorm1: 322111 HPL XnormI: 5.78706
HPL N: 320000 HPL NB: 168
HPL NProw: 16 HPL NPcol: 32
HPL depth: 0 HPL NBdiv: 2
HPL NBmin: 4 HPL CPfact: R
HPL CRfact: C HPL CPtop: 1
HPL order: R    
HPL dMach EPS: 2.220446e-16 HPL sMach EPS: 1.192093e-07
HPL dMach sfMin: 0 HPL sMach sfMin: 1.175494e-38
HPL dMach Base: 2 HPL sMach Base: 2
HPL dMach Prec: 4.440892e-16 HPL sMach Prec: 2.384186e-07
HPL dMach mLen: 53 HPL sMach mLen: 24
HPL dMach Rnd: 0 HPL sMach Rnd: 0
HPL dMach eMin: -1021 HPL sMach eMin: -125
HPL dMach rMin: 0 HPL sMach rMin: 1.175494e-38
HPL dMach eMax: 1025 HPL sMach eMax: 129
HPL dMach rMax: 0 HPL sMach rMax: 0
dweps: 1.110223e-16 sweps: 5.960464e-08

PTRANS
PTRANS: 32.0329 GB/s PTRANS time: 6.1632 seconds
PTRANS residual: 0 PTRANS N: 160000
PTRANS NB:   PTRANS NProw: 16
PTRANS NPcol: 32    

STREAM
S-STREAM Copy: 3.87384 GB/s S-STREAM Scale: 3.89985 GB/s
S-STREAM Add: 3.82254 GB/s S-STREAM Triad: 3.82804 GB/s
EP-STREAM Copy: 0.740711 GB/s EP-STREAM Scale: 0.736493 GB/s
EP-STREAM Add: 0.74627 GB/s EP-STREAM Triad: 0.747415 GB/s
STREAM Vector Size: 66666666 STREAM Threads: 1

RandomAccess
S-RandomAccess: 0.0149887 Gup/s EP-RandomAccess: 0.00531302 Gup/s
G-RandomAccess: 0.939308 Gup/s G-RandomAccess N: 68719476736
G-RandomAccess time: 292.639 seconds G-RandomAccess Check Time: 291.03 seconds
G-RandomAccess Errors: 58355 G-RandomAccess Errors Fraction: 8.49177e-07
G-RandomAccess TimeBound: -1 G-RandomAccess ExeUpdates: 274877906944
RandomAccess N: 134217728    

FFT
S-FFT: 1.28699 GFlop/s EP-FFT: 0.447491 GFlop/s
MPIFFT: 69.9066 GFlop/s MPIFFT N: 8589934592
MPIFFT Max Error: 2.64655e-15 MPIFFT time0: 0 seconds
MPIFFT time1: 0 seconds MPIFFT time2: 0 seconds
MPIFFT time3: 0 seconds MPIFFT time4: 0 seconds
MPIFFT time5: 0 seconds MPIFFT time6: 0 seconds
FFTEnblk: 16 FFTEnp: 8
FFTEl2size: 1048576    

DGEMM
S-DGEMM: 9.67426 GFlop/s EP-DGEMM: 9.09138 GFlop/s
DGEMM N: 8164    

RandomRing Latency/Bandwidth
RandomRing Latency: 16.7534 usec RandomRing Bandwidth: 0.0899317 GB/s
NaturalRing Latency/Bandwidth
NaturalRing Latency: 6.10352 usec NaturalRing Bandwidth: 0.228898 GB/s

PingPong Latency/Bandwidth
Maximum PingPong Latency: 7.06315 usec Maximum PingPong Bandwidth: 1.52673 GB/s
Minimum PingPong Latency: 2.5034 usec Minimum PingPong Bandwidth: 1.01116 GB/s
Average PingPong Latency: 6.2425 usec Average PingPong Bandwidth: 1.5118 GB/s

Size of Data Types
char: 1 byte short: 2 bytes
int: 4 bytes long: >8 bytes
void ptr: 8 bytes float: 4 bytes
double: 8 bytes size t: 8 bytes
s64Int: 8 bytes u64Int: 8 bytes

OpenMP
M OpenMP: -1 OpenMP Num Threads: 0
OpenMP Num Procs: 0 OpenMP Max Threads: 0

Memory
MemProc: -1 MemSpec: -1
MemVal: -1    

CPS
CPS_HPCC_FFT_235: 0 CPS_HPCC_FFTW_ESTIMATE: 0
CPS_HPCC_MEMALLCTR: 0 CPS_HPL_USE_GETPROCESSTIMES: 0
CPS_RA_SANDIA_NOPT: 0 CPS_RA_SANDIA_OPT2: 1

Appendix B - Known Issues and Limitations

Appendix C – References

This applies to:
Intel® Math Kernel Library (Intel® MKL)
Intel® Math Kernel Library (Intel® MKL) for Linux*

Solution ID: CS-028404
Date Created: 10-Oct-2007
Last Modified: 20-Feb-2008
Back to Top