About this version of the DCPI/DADD engineering model P.J. Drongowski 28 June 2002 Here are some notes on the changes made to the DCPI/DADD engineering model, and some things that have remained the same. DADD API in dadd.a ------------------ The DADD API remains the same. The build process for the sample clients remains the same: cc -o runcli runcli.c dadd.a -lrt cc -o timcli timcli.c dadd.a cc -o expcli expcli.c dadd.a Running dcpid ------------- The command to run dcpid has changed slightly. The following command should be used to start DCPI in DADD mode: dcpid -dyn -DF 1 -slot cycles -slot pm -slot bmiss+retires $DCPIDB where DCPIDB is an environment variable defined to be the path to the DCPI database directory. This change is not due to anything new or different in dcpid. It is needed in order to avoid a known, multiplexing-related problem in the DCPI device driver. If "-slot bmiss" is used instead of the recommended option above, cycle and ProfileMe events may be sampled for too long resulting in substantial overcounts and incorrect results. Running application programs to be monitored -------------------------------------------- When DCPI is running in DADD mode, the daemon must be able to find and read the binary image file associated with the application program to be monitored. DADD needs to read and examine the program's instructions in order to compute execution statistics by instruction type. The DCPI daemon has substantial, but limited ability to locate binary program images. If the daemon cannot find an image, it will not compute execution statistics for the image and events will be undercounted. For example, if a floating point intensive application program is monitored and the daemon is unable to find the image for the program, the floating point virtual counters (FA, FM, etc.) may be zero or may show just a few floating point operations depending on the libraries and other images used by the program. The DCPI daemon will find the image for a program if it is launched with its full path from root: /usr/users/sixpack/flops-papi The DCPI daemon will not find the image for a program if it is launched from the process working directory: cd /usr/users/sixpack flops-papi where flops-papi is an executable program in the directory /usr/users/sixpack. Performance improvement ----------------------- This version of the daemon incorporates data structures and an algorithm to improve performance. When the daemon is monitoring a process, it must identify, extract and summarize the samples associated with the process. This version of the daemon improves the identification and extraction of samples by separating samples by process ID when certain inefficiencies are detected. Without these changes, the daemon performance degrades over the span of its execution (i.e., the longer the daemon runs, the worse its performance gets, until it is started afresh.) Known issues ------------ Three known issues exist at this time. 1. The read/write permissions on shared memory regions must be made more restrictive. Current permissions are less restrictive at this time to assist debugging. 2. There are a few Alpha instruction groups that need to be assigned to PAPI instruction types: * PALcode instructions * Branch to subroutine (BSR) instructions * JSR group instructions 3. PAPI does not require a call to a clean-up function such as PAPI_shutdown to release resources at the end of a run. DADD needs to receive a stop monitoring request to unregister interest in a process and to release the associated shared memory segment. If an application does not clean-up after it's done with PAPI, one or more shared memory regions will be orphaned. These regions can be found using the TRU64 UNIX ipcs command and can be manually released using the ipcrm command. One must be able to recognize the orphaned regions among active shared memory regions, however. Background: Image instruction map --------------------------------- The DCPI daemon, in DADD mode, uses the program image to compute an internal data structure known as an "instruction map" or "inst_map." The instruction map is an array in which each element corresponds to an instruction in the image's text section and each element contains a set of instruction properties including the instruction's type. The daemon is not able to compute the instruction map for an image if it cannot find and process the executable image. The daemon will note its inability to find the image and to construct the instruction map by logging a message in the DCPI log file. (The log file is in the DCPI database directory.) Messages of the form: pcd_get_qualified_name: no qualified paths for image summarize_pdb_samples: NULL inst_map for '' will be written to the log. In order to keep the log file small, only the first 20 such conditions will be noted. > Application programs must be launched using the full path from root < > to make sure that the daemon finds the image, computes the instruction < > map, and tallies the counts. < We noted and fixed an issue that arose when the daemon was unable to find a program image. This issue caused the daemon to ignore external messages such as DADD start/stop requests and quit requests. A dcpiquit command, for example, would hang and not complete. DADD start/stop requests and quit requests are synchronous; they send a message to the daemon and then await a reply. If a reply is not received, the requestor hangs. Background: flush rate ---------------------- The DCPI daemon periodically performs certain processing, such as updating the virtual counters. The period is determined by the "flush interval." In DCPI Classic, the default flush interval is 3600 seconds. In DADD mode, the default flush interval is 10 milliseconds. The default flush interval can be overriden using the -DF option. The option is specified as: -DF "flush interval" where "flush interval" is a positive integer which specifies (in milliseconds) the maximum period time. The flush interval is more of a goal than a hard realtime constraint. Most internal testing performed so far has been done with -DF 1 specified. With no other overhead, this request of a 1 millisecond flush interval should theoretically result in a flush rate of 1000 per second. Please note, however, that the best flush rate actually achieved so far has been on the order of 400 flushes/updates per second. Requesting a high flush rate will cause the daemon to awake and work more often, thereby raising the load that it puts on the system. Lower flush/update rates place a smaller load on the system.