The advantage of using Intel Phi accelerator is that applications do not need to change in order to use accelerators. Intel compilers will compile the same code either for CPUs or for accelerators according to one compiler flag: -mmic for accelerator, -xhost for host CPUs.
The Intel C++ Composer XE 2013 suite is in one's path once the correct environment is up. The GNU GCC Compiler Collection, version 4.4.6 is available for utilities and serial host applications, but cannot be used to build applications with MPI, MIC offloading or native MIC applications. We recommend using the Intel compilers whenever possible. The Intel suite has been installed with 64-bit standard libraries and compiles programs as 64-bit applications (as the default compiler mode). Since the E5's and Phi coprocessors are new architectures, which rely on optimizations in the new 2013 compiler, any program compiled for another Intel system should to be recompiled.
The Intel C/C++ compiler commands are "icc" and "icpc", respectively. Use the "-help" option with any of these commands to display a list and explanation of all the compiler options, useful during debugging and optimization. Please check out Intel C/C++ compilers for additional information.
Compiling serial programs
Compiler | Language | File Extension | Example |
---|---|---|---|
icc | C | .c | icc [compiler_options] prog.c |
icpc | C++ | .C, .cc, .cpp, .cxx | icpc [compiler_options] prog.cpp |
Appropriate file name extensions are required for each compiler. By
default, the executable name is "a.out", but it may be renamed with the
"-o" option. We use "a.out" throughout this guide to designate a generic
executable file. The compiler command performs two operations: it makes
a compiled object file (having a .o suffix) for each file listed on the
command line, and then combines them with system library files in a
link step to create an executable. To compile without the link step, use
the "-c" option.
The same code can be compiled to run either natively on the host or natively on the MIC. Use the same compiler commands for the host (E5) or the MIC (Phi) compiling, but include the "-mmic" option to create a MIC executable. We suggest you name MIC executables with a ".mic" suffix.
Host, MIC and Host+MIC offload compilations
Mode | Required Options | Notes |
---|---|---|
Native (HOST) | none | Use -xhost to generate AVX (Advanced Vector Extensions) instructions. |
Native Phi (MIC) | -mmic | Suggestion: name executables with a ".mic" suffix to differentiate them from a host executable. |
Host + Offload | none | for automatic offloading of MKL lib functions use environment variables for direct offloading use pragmas |
The following examples illustrate how to rename an executable (-o option), compile for the host (run on the CPUs), and compile for the MIC (run natively on the MIC):
A C program example:
login1$ icc -xhost -O2 -o flamec prog.c login1$ icc -mmic -O2 -o flamec.mic prog.c
For additional information, execute the compiler command with the
"-help" option to display every compiler option, its syntax, and a brief
explanation, or display the corresponding man page, as follows:
login1$ icc -help login1$ icpc -help login1$ ifort -help login1$ man icc login1$ man icpc login1$ man ifort
To find out more about developing for Phi accelerators, please check out Programming and Compiling for Intel® Many Integrated Core Architecture.
One of the keys to the performance value of Intel Xeon Phi coprocessors is the 512-bit registers and associated SIMD operations. The compiler vectorizer detects operations in the program that can be done in parallel and converts the sequential operations to parallel; for example, the vectorizer converts the sequential SIMD instruction that processes 2, 4, 8 or up to 16 elements into a parallel operation, depending on the data type. Using the -vec option enables vectorization at default optimization levels for both Intel Xeon host CPUs or Intel Xeon Phi accelerators. There are ways to take the advantages of auto-vectorization by writing code using techniques such as simple for loops, straight-line code (no branching), avoid loop dependency, and correct data alignment. For more information about these techniques, please read Getting Started Tutorial: Using Auto Vectorization . One can also check out how well the auto-vectorization is done by using a compiler flag -vec-report[0|1|2|3|4|5].
On Phi accelerators, applications have to use OpenMP to take advantage of multi-core shared-parallelism. For applications with OpenMP parallel directives, include the -openmp option on the compiler command to enable the parallel thread generation. Use the -openmp_report option to display diagnostic information.
Important OpenMP compiler options.
Compiler Options(OpenMP) | Description |
---|---|
-openmp | Enables the parallelizer to generate multi-threaded code based on the OpenMP directives. Use whenever OpenMP pragmas are present in core for E5 processor or Phi coprocessor. |
-openmp_report[0, 1, or 2] | Controls the OpenMP parallelizer diagnostic level |
Below are host and MIC compile examples for enabling OpenMP code directives.
login1$ icc -xhost -openmp -O2 -o flamec prog.c login1$ icc -mmic -openmp -O2 -o flamec.mic prog.c
For more information about OpenMP programming using Intel compilers, please check out Parallelization Using OpenM from Intel.
On the Phi cluster, the proprietary Intel® MPI Library 4.1 is currently the only option to compile and run MPI programs. At login, Intel MPI environment has been set up if one use the suggested environment set up decribed in the previous sections. Before compiling an MPI program, make sure mpiicc and mpiicpc are in your path.
Here are the examples of compiling MPI programs for host and for accelerators
login1$ mpiicc -xhost -O2 -o simulate mpi_prog.c login1$ mpiicc -mmic -O2 -o simulate.mic mpi_prog.c login1$ mpiicpc -xhost -O2 -o simulate1 mpi_prog1.cpp login1$ mpiicpc -mmic -O2 -o simulate1.mic mpi_prog1.cpp
Please check out Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems for more information.
Please read Intel documents about Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors.