The TAU team is pleased to announce the release of TAU v2.18.2 and the POINT Live-DVD. The following changes have been made in TAU release v2.18.2. I. Re-engineering of the TAU measurement library ------------------------------------------------ To simplify the usage of TAU, we have converted the several measurement configurations from compile time to runtime. Instead of re-compiling your application, now you can simply set an environment variable to access the features. You may specify the following environment variables (as 1/0, on/off, or true/false): a. TAU_TRACE - enables generation of TAU trace files. b. TAU_PROFILE - (default) generates flat profiles. You may disable this while tracing. c. TAU_SYNCHRONIZE_CLOCKS - does clock drift correction and synchronizes the clocks while tracing. This is useful in the absence of a globally synchronized realtime clock. d. TAU_CALLPATH - generates callpath profiles (specify depth using TAU_CALLPATH_DEPTH) e. TAU_COMPENSATE - compensates for perturbation caused by TAU instrumentation. It subtracts the TAU timer overheads from the measurements before writing the profile files. f. TAU_TRACK_MESSAGES - enables tracking of message communication volume. g. TAU_COMM_MATRIX - generates detailed information about senders and receivers and calling paths associated with the communication. Setting this automatically enables TAU_TRACK_MESSAGES. See note on ParaProf (Windows -> Communication Matrix) below to view the data. Also, instead of specifying the use of a -multiplecounters stub makefile while using PAPI, it is now possible to evaluate multiple counters in any TAU instrumented executable. You may set the environment variable TAU_METRICS to specify the measurements as follows: % export TAU_METRICS=TIME:PAPI_FP_INS:PAPI_L1_DCM... instead of % export COUNTER1=GET_TIME_OF_DAY % export COUNTER2=PAPI_FP_INS % export COUNTER3=PAPI_L1_DCM TAU retains backward compatibility and it will also accept the previous specification. II. Paraprof enhancements ------------------------- ParaProf now features a new communication matrix window (see TAU_COMM_MATRIX above) that displays the number, max, min, std. dev. and volume of communication between each pair of communication processes. TAU uses an efficient sparse matrix storage for displaying and storing this matrix. We have dropped support for Java 1.3 and require Java v1.4 or better. The selective instrumentation file generation module in paraprof now merges the lists instead of overwriting. III. PerfExplorer enhancements: ------------------------------- PerfExplorer now has a new stacked bar chart display that shows a horizontal gantt chart showing the breakdown of performance for the different number of cores (along the vertical axis). The performance data is not normalized to 100% (as in runtime breakdown). We have also improved the custom chart interface that simplifies the generation of charts by choosing different values for X and Y axes. The X axis labels can be angled and we now support categorical X axis labels (that show equally spaced points such as 1, 64, 512, 1024 cores). There is a new 'Scalability Chart' tool in the Charts menu that simplifies scalability analysis. We now support JFreeChart v1.0.12 and all JFreeChart charts have an updated look (for both paraprof and perfexplorer). IV. Improvements in support for IBM BG/P ---------------------------------------- Improved the TAU_PROFILE_FORMAT=merged that allows us to merge profiles from all nodes and generate a single tauprofile.xml file from rank 0 instead of creating a single profile file from each rank. This scales to large number of cores (we have tested upto 64k) and reduces the load on the metadata server when a large number of files are opened. Each rank communicates the performance data to rank 0 at MPI_Finalize and rank 0 merges and writes the profile data in the XML format. Paraprof can directly load this snapshot format file. We have fixed some issues with support for IBM XL compilers and shared objects on BG/P. The merged profile format is platform independent and may be used for other architectures at large core counts as well. V. Eclipse/PTP enhancements --------------------------- Updated the TAU performance tools plugin to improve support for workflows in Eclipse. VI. Other enhancements and bug fixes ------------------------------------ * Fixed issues with the MPI libraries on Cray CNL. * The TAU compiler scripts now understand two compiler options specified on the command line: -Mfree and -Mfixed. These options are specific to the PGI compiler and TAU internally invokes the PDT parsers with the appropriate options. * Enhanced support for GPGPU based profiling using PGI 8.0.6 compilers. * Added support for demangling C++ names when TAU is configured with -bfd=download and we use compiler-based instrumentation (-optCompInst in TAU_OPTIONS). * Fixed bugs in parsing of gprof format files in PerfDMF. * Added a task based API for profiling asynchronous tasks as seen from the host processor. See examples/profilercreate/taskc++ for details on tracking asynchronous executions. This may be used for tracking events on GPGPUs and other accelerator boards. VI. POINT LiveDVD ----------------- http://tau.uoregon.edu/point.iso now points to the latest version of the LiveDVD. It adds support for the latest releases of PAPI, TAU, VampirTrace, and Scalasca and features workshop examples for VampirTrace and Scalasca. VII. Updated Documentation -------------------------- We now have a quick reference guide to using TAU and have improved our web based documentation, re-organized our user's guide and movies section in the TAU documentaiton.