21264A Counter Configurations -- A DCPI "one pager" / PJD / 29 November 2001 In addition to ProfileMe, the Alpha 21264A can use its performance counters in "aggregate" mode. Aggregate mode means that the counters just count the occurence of user-selected events and the event counts are associated with specific instructions by PC-sampling. The 21264A can sample and measure four events: Event Description ----- ----------- cycles Processor cycles retires Retired instructions replaytrap Memory box (Mbox) replay traps bmiss L2 B-cache misses (or long-latency probes) Aggregate mode and the specific events to measure must be specified when starting the DCPI daemon. The events are specified using the -slot option, which selects a set of event types to monitor simultaneously using the set of available hardware performance counters. You may measure these events individually: dcpid -slot cycles $DCPIDB or in the following pair-wise combinations: dcpid -slot cycles+retires $DCPIDB dcpid -slot retires+bmiss $DCPIDB dcpid -slot retires+replaytrap $DCPIDB Other combinations are supported by "time-multiplexing" (that is, periodically reconfiguring) the performance counters. Time-multiplexing permits a large set of events to be monitored, but with a different sampling period, since the counter measuring the events must be shared. The -slot option may be repeated to specify a sequence of slots which are time-multiplexed onto the hardware counters. The command: dcpid -slot bmiss -slot replaytrap $DCPIDB collects both L2 B-cache miss and replay trap events. ProfileMe mode supports alternative counter configurations, too. The command: dcpid -slot pm $DCPIDB collects ProfileMe samples and measures both inflight cycles and retire delay. This is the default counter configuration. As shown in the table below, DCPI and ProfileMe supports four different counter configurations. Configuration Counter 0 Counter 1 ------------- --------------- -------------------------- pm0 Retires Inflight cycles pm Inflight cycles Retire delay pm2 Retires L2 B-cache misses pm3 Inflight cycles Memory system replay traps For example, you can use the command: dcpid -slot pm2 $DCPIDB to measure both the number of retired instructions and L2 B-cache misses. The dcpiprof and dcpilist command line syntax uses ProfileMe counter names to modify event names. For example, use the command: dcpiprof -pm ret+ret:bcmisses app_prog to display the number of retired instruction samples and the number of L2 cache misses taken by retired instructions. Here are the DCPI counter names: Counter name Description ------------ --------------------------------------------- retdelay Instruction retire delay inflight Number of cycles the instruction was inflight bcmisses Number of B-cache (L2) cache misses replays Number of memory system replay traps Remember, you must explicitly enable and collect data for certain events in order to display the data later!