PALcode and Tru64 kernel -- A DCPI "one pager" / PJD / 27 September 2001 DCPI collects data on all system components including the Privileged Architecture Library code (PALcode) and the Tru64 operating system kernel. The following example (dcpiprof -i -pm retired) shows the number of retired instructions in an application (walk), the kernel (/vmunix) and PALcode: retired :count % cum% image 203632 62.18% 62.18% walk 122254 37.33% 99.50% /vmunix 1243 0.38% 99.88% /dsk0h/dcpidb/PALcode Both PALcode and the kernel can be broken out by procedure and instruction. PALcode DCPI re-creates the PALcode image in the DCPI database directory. Enter: dcpiprof -pm retired /dsk0h/dcpidb/PALcode to display a procedure-by-procedure break out of the number of retired instructions in the PALcode. Here are the first few items produced by the dcpiprof command using the example database: retired :count % cum% procedure image 829 66.69% 66.69% dtbm_single /dsk0h/dcpidb/PALcode 74 5.95% 72.65% swpipl_cont /dsk0h/dcpidb/PALcode 69 5.55% 78.20% dtbm_double_3_cont /dsk0h/dcpidb/PALcode 43 3.46% 81.66% swpipl /dsk0h/dcpidb/PALcode PALcode is the bridge from application code to the kernel. Data Translation Buffer (DTB) and Instruction Translation Buffer (ITB) misses are handled by PALcode, which updates the page map information. Excessive numbers of DTB or ITB misses will appear as high retire and cycle counts in the PALcode routines that handle those misses. ITB/DTB miss handling is computational overhead. The number of retired instructions due to miss handling should be subtracted from total system level retires to state system level performance ratios realistically. Tru64 kernel Use the following dcpiprof command to display a procedure-by-procedure break out of cycles and retires in the Tru64 operating system kernel: dcpiprof -event cycles -pm retired /vmunix Here is some sample output: retired cycles % cum% :count % procedure image 44085 81.10% 81.10% 119894 98.07% idle_thread /vmunix 1231 2.26% 83.37% 125 0.10% ufs_sync_int /vmunix 864 1.59% 84.96% 40 0.03% _XentInt /vmunix 863 1.59% 86.55% 309 0.25% hardclock /vmunix 707 1.30% 87.85% 195 0.16% pmap_zero_page /vmunix 687 1.26% 89.11% 76 0.06% simple_lock /vmunix 315 0.58% 89.69% 62 0.05% clock_tick /vmunix The first procedure, "idle_thread," is the idle loop where the Tru64 kernel awaits interrupts, etc. when it cannot run a ready process. When capturing data for the example, the kernel was forced to idle between the shell commands needed to run the DCPI collection service, the application program, and the command to stop the collection service. The idle loop does not trap and has an excellent retire/cycle ratio. Its effect must be eliminated from system level estimates of retire/cycle, by subtracting the idle loop's retire and cycle samples from total system level retire and cycle samples. Otherwise, the system level retire/cycle ratio will be too favorable. The idle loop does not affect other image, procedure or instruction level estimates. Idle time can be reduced by putting measurement commands in a shell script, on a single command line, etc.