TAU Usage Examples
- PAPI hardware counters data
- Loop level instrumentation
- Selective instrumentation
- Sample profiling reports
PAPI hardware counters data
First, define TAU_MAKEFILE with setenv or include a Makefile from the TAU Makefiles directory (either $TAU_ROOT_DIR/ia64/lib or $TAU_ROOT_DIR/xt3/lib) that includes the word 'papi'. Then instrument and execute the binary as described in the TAU document. Set the desired PAPI counters before executing the job. To see the list of available counters, first load the papi module (module load papi) then type 'papi_avail'.
Set the environment variables COUNTER[1-25] in the command-line or in the submission script as follows:
setenv COUNTER1 GET_TIME_OF_DAY setenv COUNTER2 PAPI_FP_INS setenv COUNTER3 PAPI_TOT_CYC ...................
Note that GET_TIME_OF_DAY is a system parameter. COUNTER1 is always set to GET_TIME_OF_DAY to allow TAU to synchronize time across tasks and provide a globally synchronized real-time clock for tracing.
Loop level instrumentation
To automatically instrument all the outer do loops in the routine 'foo', create a text file containing the following lines of code and include the text file, say loop_instru.txt, as a flag in your makefile,
% cat loop_instru.txt BEGIN_INSTRUMENT_SECTION loops routine="FOO" END_INSTRUMENT_SECTION
The following code segment instruments the outer loop multiply in loop_test.cpp
BEGIN_INSTRUMENT_SECTION loops file="loop_test.cpp" routine="double multiply#" END_INSTRUMENT_SECTION
Selective instrumentation
Parts of the code can be included or excluded from instrumentation using the keywords SECTIONBEGIN_FILE_INCLUDE_LIST or SECTIONBEGIN_FILE_EXCLUDE_LIST.
The following causes only foo1, foo2, and foo3 files to be instrumented. Save this code segment in a text file and include the text file (select.txt) as a flag in your makefile, OPTS = -optTauSelectFile=select.txt.
SECTIONBEGIN_FILE_INCLUDE_LIST foo1.f foo2.f foo3.c END_FILE_INCLUDE_LIST
Sample TAU Profiling Report
Example 1
This example uses /usr/local/packages/TAU/tau-2.17.1/examples/taucompiler/c/ring.c.
%echo $TAU_MAKEFILE /usr/local/packages/TAU/tau-2.17.1/ia64/lib/Makefile.tau-mpi-pdt
Compilation:
% tau.cc.sh -o ring ring.c
View TAU text report:
% pprof
NODE 0;CONTEXT 0;THREAD 0: --------------------------------------------------------------------------------------- %Time 1Exclusive 2Inclusive 3#Call 4#Subrs Inclusive Name msec total msec usec/call --------------------------------------------------------------------------------------- 100.0 0.172 1,010 1 5 1010725 int main(int, char **) C 99.1 1,001 1,001 1 0 1001199 MPI_Finalize() 0.8 7 7 1 0 7765 MPI_Init() 0.2 0.111 1 1 8 1588 void func(int, int) C 0.1 1 1 1 1 1224 MPI_Barrier() 0.0 0.194 0.194 3 0 65 MPI_Recv() 0.0 0.051 0.051 3 0 17 MPI_Send() 0.0 0.027 0.027 1 0 27 MPI_Comm_free() 0.0 0.008 0.008 1 0 8 MPI_Bcast() 0.0 0.001 0.001 1 0 1 MPI_Comm_size() 0.0 0 0 1 0 0 MPI_Comm_rank( )
Example 2
This example uses /usr/local/packages/TAU/tau-2.17.1/examples/taucompiler/f90/ring.f90. Floating point operations counts are reported instead of time.
% echo $SHELL /usr/psc/shells/csh % setenv TAU_MAKEFILE /usr/local/packages/TAU/tau-2.17.1/ia64/lib/Makefile.tau-multiplecounters-mpi-papi-pdt % tau_f90.sh -o ring ring.f90
Define the following PAPI counters in the submission script:
setenv COUNTER1 GET_TIME_OF_DAYsetenv COUNTER2 PAPI_FP_OPS
% pprof -f MULTI__PAPI_FP_OPS/profile
NODE 0;CONTEXT 0;THREAD 0: --------------------------------------------------------------------------------------- %Time 1Exclusive 2Inclusive 3#Call 4#Subrs Count/Call Name Counts total counts --------------------------------------------------------------------------------------- 100.0 1568 2.126E+04 1 5 21262 MAIN 68.2 1940 1.45E+04 1 4 14498 FUNC 42.1 8588 8956 1 1 8956 MPI_Barrier() 16.2 3436 3436 1 0 3436 MPI_Recv() 14.2 3028 3028 1 0 3028 MPI_Init() 10.1 2152 2152 1 0 2152 MPI_Finalize() 1.7 368 368 1 0 368 MPI_Comm_free() 0.4 92 92 1 0 92 MPI_Send() 0.3 74 74 1 0 74 MPI_Bcast() 0.0 8 8 1 0 8 MPI_Comm_rank() 0.0 8 8 1 0 8 MPI_Comm_size()
1Exclusive time: the amount of time that passed while within that function, excluding the time spent in functions called from that function.
2Inclusive time: the amount of time that passed while within that function including the time spent in functions called from that function.
3Call: the number of calls made to the function.
4Subrs: the number of subroutines called from the function.