Search
Support & Downloads
All of Support
This Category
Software Products
Performance Tools for Software Developers
HPL application note

HPL User Note

Step 1 – Overview

This guide is intended to help current HPL users get better benchmark performance by utilizing BLAS from the Intel® Math Kernel Library (Intel® MKL).

HPL (High Performance LINPACK), an industry standard benchmark for HPC, is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers.

We will be explaining 3 ways in this note to get the HPL running.
  1. Using Intel optimized HPL binary directly (mp_inpack)
  2. Building and using HPL from source provided in MKL package
  3. Building and using open source HPL by linking with MKL
Version Information

This application note was created to help users who benchmark clusters using HPL to make use of the latest versions of Intel MKL on Linux platforms. Specifically we'll address Intel MKL version 9.1.


Step 2 – Downloading HPL Source Code

Open source HPL can be downloaded from http://www.netlib.org/benchmark/hpl 
If you have installed MKL, HPL is included in MKL and can be found at
<MKL installation dir>/benchmarks/mp_linpack
Prerequisites
  1. BLAS (Basic Linear Algebra Subprograms)

    BLAS DGEMM is the core high performance routine exercised by HPL. Intel® MKL BLAS is highly optimized for maximum performance on Intel® Xeon® processor-based and Itanium® 2-based systems.

    BLAS from MKL can be obtained from the following options

    1. Download a FREE evaluation version of the Intel MKL product
    2. Download the FREE non-commercial version of the Intel MKL product
    3. Download one of our FREE Intel Optimized LINPACK benchmark packages

    All of these can be obtained at http://www.intel.com/software/products/mkl

    FREE Intel Optimized LINPACK Benchmark packages

    The Intel MKL team provides FREE Intel Optimized LINPACK Benchmark packages that are binary implementations of the LINPACK benchmarks which include Intel MKL BLAS. Not only are these SMP and Distributed Memory packages free, they are also much easier to use than HPL (no compilation needed, just run the binaries). We highly recommend HPL users consider switching from HPL to the Free Intel Optimized LINPACK benchmark packages.

  2. Intel® MPI

    Intel MPI can be obtained from http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm
    Open source MPI (MPICH2) can be obtained from http://www-unix.mcs.anl.gov/mpi/mpich/ 
Step 3 - Configuration

If you have the MKL 9.1 installations, please skip step of extracting HPL. If you downloaded hpl.tar.gz (from netlib) please follow instructions for extracting HPL.
  1. Extract the tar file

    Use the following commands to extract the tar file from the downloaded hpl.tar.gz file

    $gunzip hpl.tar.gz
    $tar -xvf hpl.tar.
    This will create an hpl directory, which we call below the top-level directory.
  2. Makefile Creation

    1. If you are using HPL from Intel MKL you can then use the makefile Make.em64t directly. The top-level directory, in this case will be <your mkl installation>/benchmarks/mp_linpack.
    2. If you are using open source HPL:

      Create a file Make.<arch> in the top-level directory. For this purpose, you may want to re-use one contained in the setup directory (hpl\setup\). Let us use Make.Linux_PII_CBLAS. This file essentially contains the compilers and libraries with their paths to be used.

      Copy this file:

      $cp hpl\setup\Make.Linux_PII_CBLAS hpl
      Rename this file:

      $mv Make.Linux_PII_CBLAS Make.em64t.
      Make sure that Intel® C++ and FORTRAN compilers are installed and they are in PATH, also set LD_LIBRARY_PATH to your compiler (C++ and FORTRAN), MPI, and MKL libraries.
Step 4 – Building HPL

The steps below will explain the steps for building HPL

Edit Make.em64t
  1. Change value of ARCH to em64t (Whichever the value, you have given for <arch>)

    # ----------------------------------------------------------------------
    # - Platform identifier ------------------------------------------------
    # ----------------------------------------------------------------------
    #
    ARCH = em64t
  2. Point to your MPI library

    MPdir = /opt/intel/mpi
    MPinc = -I$(MPdir)/include
    MPlib = $(MPdir)/lib/libmpi.a
    If you are using gnu MPI (MPICH2), it would be libmpich.a instead of libmpi.a
    It is advisable to use Intel MPI for better performance.
  3. Point to the math library, MKL

    LAdir = /opt/intel/mk/9.1/em64t/lib
    LAinc = /opt/intel/mkl/9.1/include
    LAlib = $(LAdir)/libmkl_em64t.a $(LAdir)/libguide.a –lpthread
    To build the executable use "make arch=<arch>". This should create an executable in the bin/<arch> directory called xhpl.

    In our example, execute

    $make arch=em64t
    This creates the executable file bin/em64t/xhpl. It also creates a HPL configuration file HPL.dat.

    Typically, scripts are needed to be run, and perhaps portions of the readme file should be reprinted.

    Also list the compiler command line syntax, etc.
Step 5 - Running HPL

Case 1: If you have downloaded Intel® Optimized linpack

Extract the package and run the script for your platform.
In this case for example for Xeon 64 bit.

$runme_xeon64
Please refer the lpk_notes_lin.htm provided with this package for more details

Case 2 & 3
: If you have built the hpl from the mkl package or open source hpl

Go to the directory where the executable is built.

e.g: For the test run of hpl, use the following commands.

$cd bin/<arch>
$mpirun -np 4 xhpl
Create a machines file with node names.
For e.g. machines files contains names as

front-end-0
compute-0
compute-1
…………………
…………………
compute-128
Running with the machines file.
$mpirun –np 8 –nodes 4 –machinefile machines xhpl
Please refer MPI documentation for various other arguments, which you can use.

Tuning:

Most of the performance parameters can be tuned, by modifying the input file bin/HPL.dat. See the file TUNING in the top-level directory for more information.

Note: If you use Intel Optimized linpack, you have to change the input files provided with that package (not HPL.dat). For e.g: lininput_xeon64. You can refer the extended help xhelp.lpk for more info in modifying the input file.

Main parameters you need to consider while running HPL

Problem size (N): Your problem size should be the largest to fit in the memory to get best performance. For e.g.: If you have 10 nodes with 1 GB RAM, total memory is 10GB. i.e. nearly 1342 M double precision elements. Square root of that number is 36635. You need to leave some memory for Operating System and other things. As a rule of thumb, 80% of the total memory will be a starting point for problem size (So, in this case, say, 33000). If the problem size is too large, it is swapped out, and the performance will degrade.
 
Block Size (NB):  HPL uses the block size NB for the data distribution as well as for the computational granularity. A very small NB will limit computational performance because no data reuse will occur, and also the number of messages will also increase. "Good" block sizes are almost always in the [32 .. 256] interval and it depends on Cache size. These block size are found to be good, 80-216 for IA32; 128-192 for IA64 3M cache; 400 for 4M cache for IA64 and 130 for Woodcrests.

Process Grid Ratio (PXQ):  This depends on physical interconnection network. P and Q should be approximately equal, with Q slightly larger than P. For e.g. for a 480 processor cluster, 20X24 will be a good ratio.

Tips: You can also try changing the node-order in the machine file for check the performance improvement. Choose all the above parameters by trial and error to get the best performance.

Appendix A - Performance comparison

The following chart shows the linpack run performance results for various problem sizes for Intel® Xeon processor bases 5300 series (Clovertown) systems.


Appendix B - Known Issues and Limitations

If you are building haply rather than using the binary from Intel Optimized linpack, make sure that, your MPI is running properly, FORTRAN, C++, MPI and MKL libraries are in LD_LIBRARY_PATH and FORTRAN, C++ and MPI binaries are in PATH.

Appendix C – References

Intel® Cluster Tools
Intel Xeon Processor- and Itanium 2-based Servers Homepage

Operating System:
Red Hat* Linux, Red Hat* Desktop Linux* 3, Red Hat* Enterprise Linux Desktop 4, Red Hat* Desktop 3 Update 4, Red Hat* Enterprise Linux Desktop 3 Update 3, Red Hat* Enterprise Linux Desktop 3 Update 4, Red Hat* Enterprise Linux Desktop 3 Update 5, Red Hat* Enterprise Linux Desktop 4 Update 1, Red Hat* Enterprise Linux 2.1, SUSE* Linux 9.1, SUSE* Linux Enterprise Server 8.0, SUSE* Linux Enterprise Server 9.0, Red Hat* Enterprise Linux 4.0, Redhat* Desktop 3 Update 5, Redhat* Desktop 3 Update 6, Redhat* Desktop 3 Update 7, Redhat* Desktop 4 Update 2, Redhat* Desktop 4 Update 3, Redhat* Desktop 4 Update 4, SuSE* Linux* Enterprise* Desktop 10, SUSE* Linux Enterprise Server 10, Red Hat* Linux 6.2, Red Hat* Linux 6.2 SBE2, Red Hat* Linux 7.0, Red Hat* Linux 7.1, Red Hat* Linux 7.2, Red Hat* Linux 7.3, SUSE* Linux 7.3, SUSE* Linux 8.0, SUSE* Linux 8.1, Red Hat* Linux 8.0, SUSE* Linux 7.2, SUSE* Linux 7.1, SUSE* Linux 7.0, SUSE* Linux, Red Hat* Linux Advanced Server 2.x, Red Hat* Linux 9.0, Red Hat* Enterprise Linux 3.0, SUSE* Linux* 8.2, Red Hat* Linux Advanced Server 3.x, SUSE* Linux* 9.x

This applies to:
Intel® Math Kernel Library (Intel® MKL) for Linux*
Intel® Math Kernel Library (Intel® MKL) for Windows*

Solution ID: CS-025964
Date Created: 20-Apr-2007
Last Modified: 04-Dec-2007
Back to Top