VTune

VTune is a software performance analysis tool provided by Intel for users developing serial, multithreaded or MPI applications.  VTune is part of the Intel Parallel Studio.  

Documentation

 Usage

VTune is part of the Intel Parallel Studio.  It is loaded as part of the default intel module for you on login.  You can see all the versions of the Intel Studio that are available by typing

module avail intel

and you can see specifics about all the components by typing 

module help intel/nnnn

where nnnn is the version number for the specific module you are interested in.

To use a different version of the Intel Studio, type 

module unload intel 
module load intel/nnnn

 

 To use VTune:

  • Run VTune in an batch job on Bridges that profiles your code
  • Connect to Bridges using an ssh client with X11 forwarding enabled
  • Examine the output from VTune

Run VTune in a batch job that profiles your code

  1. Prepare a job script which contains the commands to profile your code with VTune.
    1. If you want a different version of the Intel Parallel Studio than the default, unload the default and load the version you want.
      module unload intel
      module load intel/nnnn
    2. If necessary, compile your code.

    3. Profile your code by running amplxe-cl and passing your code to it.  The amplxe-cl command is defined by the intel module.

      For a non-MPI code, the command will look like:

      amplxe-cl -result-dir dirname -quiet -collect hotspots ./your-executable arguments-to-your-executable

      If you are profiling an MPI code, use mpirun to execute amplxe-cl

      mpirun -np X amplxe-cl -result-dir dirname -quiet -collect hotspots ./your-executable arguments-to-your-executable

       

      In either case, dirname is any name you choose. The data from VTune will be stored in a new directory(s) named dirname.$hostname, where $hostname is the fully qualified name for the nodes allocated to your interactive session.  VTune will create a directory for each node you are using.  $hostname will be of the form 'rxxx.pvt.bridges.psc.edu'.

      The profiling data will be stored in a file with the same name as the directory and the extension 'amplxe'.

  2. Submit your batch script to Bridges with the sbatch command and be sure to use the PERF flag to enable profiling. Substitute the partition name and the name of your job script for partition and job-script.
    sbatch -p partition -C PERF job-script

    See the Running Jobs section of the Bridges User Guide for more information about partitions, running batch jobs, and options to the sbatch command.

 

Connect to Bridges with X11 forwarding enabled

In order to use the GUI supplied with the VTune client to examine the output, you must be connected to Bridges using ssh with X11 forwarding enabled. This is important - otherwise, you will not be able to use the GUI. If you are logged in to Bridges without X11 forwarding enabled, log out and back in with X11 enabled.

 

Examine the profiling information

  1. Start an interactive session with the interact command.
    interact -n X

    where X is the number of cores that you want to use. This will request resources in the RM-shared partition for 60 minutes. You can override these defaults by using other options to the interact command.

    See more about running interactive sessions on Bridges.

  2. If you want a different version of the Intel Parallel Studio than the default, unload the default and load the version you want.
    module unload intel
    module load intel/nnnn
  3. Move to the new directory created by amplxe-cl
    cd dirname.$hostname
  4. Start the GUI and open the .amplxe file.
    amplxe-gui dirname.$hostname.amplxe

    The GUI has many options.  Information on using it is available at https://software.intel.com/en-us/node/543997

More information

The tutorials referenced above will take you step-by-step through analyzing code performance. Sample codes are included.