AMD Processors
Decrease font size
Increase font size
Topic Title: Problem with performance counters on Opteron 6172
Topic Summary:
Created On: 08/07/2012 01:48 PM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 08/07/2012 01:48 PM
User is offline View Users Profile Print this message

Author Icon
Aleksr9
Lurker

Posts: 1
Joined: 08/07/2012

Hi,
I've been trying to analyze certain applications with performance counters on a Opteron 6172, running Red Hat Enterprise Linux Workstation release 6.2 (Santiago).

I'm using PAPI v4.1.3.0 which uses the AMD native events CPU_CLK_UNHALTED for counting total cycles and DATA_CACHE_ACCESSES for counting L1 Data cache accesses.

http://support.amd.com/us/Processor_TechDocs/31116.pdf
- CPU_CLK_UNHALTED
The number of clocks that the CPU is not in a halted state (due to STPCLK or a HLT instruction). Note: this
event allows system idle time to be automatically factored out from IPC (or CPI) measurements, providing the
OS halts the CPU when going idle. If the OS goes into an idle loop rather than halting, such calculations are
influenced by the IPC of the idle loop.


- DATA_CACHE_ACCESSES
The number of accesses to the data cache for load and store references. This may include certain microcode
scratchpad accesses, although these are generally rare. Each increment represents an eight-byte access,
although the instruction may only be accessing a portion of that. This event is a speculative event.


The problems I've been experiencing is that the number of L1 data cache accesses have been higher than the total number of cycles in some cases. A cache access does not halt the cpu, to my understanding, so it should fit within the total cycles. Also when dividing the total cycles by the clock frequency of the Opteron 6172 I get a pretty accurate estimate of the runtime, which makes me think that the total cycles is ok and the problem has to be with the counting of the data cache accesses.

I understand a core can issue two cache loads/stores per cycle but the cost of even half the accesses would be too great to fit within the total cycles.

Any help or reason to why this can occur is greatly appreciated, thanks in advance!

/Aleks
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information