We are trying to evaluate the usage of HyperTransport links. To that purpose, we are using the following hardware counters: 0x0F6, 0x0F7 and 0x0F8 (our processors have three HT links). To obtain the usage, we divide the amount of data (mask 0x37) by the number of nop+data (mask 0x3f). We use perf to periodically gather the value of these counters on each processor (we also tried with an old kernel and perfmon).
We want to benchmark two architectures: one architecture composed of 4 AMD Opteron 8356 processors (Barcelona, 4 cores per processor) and one architecture composed of 4 AMD Opteron 8435 (Istanbul, 6 cores per processor). The processors are interconnected through HT Links (version 1.0 for the 16-core machine and 3.0 for the 24-core machine) and the interconnect topology is shown at the end of this message (P = Processor):
We ran a cpu burn benchmark (one thread per core which basically spins on a register) and we expected to see a very low link usage. On the 16-core architecture, we measured an average usage lower than 1%. However, on the 24-core architecture, we obtained very surprising measurements. Indeed, links of all processors are used at 50% (except the I/O links).
In other words, these hardware counters do not seem consistent.
Note that we tried with a microbenchmark performing memory accesses to different memory locations (in terms of memory nodes). The results are consistent on the 16-core architecture but are also weird on the 24-core architecture.
Has somebody already experienced such weird issues with this Istanbul architecture? Or did I misunderstand something?
Thanks in advance for your help,
I/0 -- | P0 |-----| P1 |
| / |
| / |
| / |
| P2 |---- | P3 | -- I/O