AMD Processors
Decrease font size
Increase font size
Topic Title: Troubleshooting with HyperTransport link hardware counters
Topic Summary:
Created On: 01/30/2012 04:26 PM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 01/30/2012 04:26 PM
User is offline View Users Profile Print this message

Author Icon
Fabien Gaud
Lurker

Posts: 2
Joined: 01/29/2012

Hi,

We are trying to evaluate the usage of HyperTransport links. To that purpose, we are using the following hardware counters: 0x0F6, 0x0F7 and 0x0F8 (our processors have three HT links). To obtain the usage, we divide the amount of data (mask 0x37) by the number of nop+data (mask 0x3f). We use perf to periodically gather the value of these counters on each processor (we also tried with an old kernel and perfmon).

We want to benchmark two architectures: one architecture composed of 4 AMD Opteron 8356 processors (Barcelona, 4 cores per processor) and one architecture composed of 4 AMD Opteron 8435 (Istanbul, 6 cores per processor). The processors are interconnected through HT Links (version 1.0 for the 16-core machine and 3.0 for the 24-core machine) and the interconnect topology is shown at the end of this message (P = Processor):

We ran a cpu burn benchmark (one thread per core which basically spins on a register) and we expected to see a very low link usage. On the 16-core architecture, we measured an average usage lower than 1%. However, on the 24-core architecture, we obtained very surprising measurements. Indeed, links of all processors are used at 50% (except the I/O links).

In other words, these hardware counters do not seem consistent.
Note that we tried with a microbenchmark performing memory accesses to different memory locations (in terms of memory nodes). The results are consistent on the 16-core architecture but are also weird on the 24-core architecture.

Has somebody already experienced such weird issues with this Istanbul architecture? Or did I misunderstand something?

Thanks in advance for your help,

Code:
       ------     ------
I/0 -- | P0 |-----| P1 |
       ------     ------
          |     /   |
          |    /    |
          |   /     |
       ------     ------
       | P2 |---- | P3 | -- I/O
       ------     ------       
 01/31/2012 12:27 PM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

Istanbul has HT Assist (a probe filter) and HyperTransport version 3, whereas Barcelona has no HT Assist and used HyperTransport version 2. Could this be the cause of your incorrect performance counter readings?

-------------------------
 01/31/2012 12:35 PM
User is offline View Users Profile Print this message

Author Icon
Fabien Gaud
Lurker

Posts: 2
Joined: 01/29/2012

Thanks for your answer.

I don't think this is caused by the HT assist. Indeed, the purpose of the HT assist is to prevent unnecessary cache coherency requests on the links, so the measured HT link usage should be even lesser on the Istanbul architecture than on the Barcelona one.

Maybe the HyperTransport version is the cause of these weird results, but I don't know why.
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information