I'm currently working on a shared memory parallelism module for a metaheuristics framework. I'm performing some benchmarks on a AMD Opteron 6164 HE x2, ie 24 physical cores if I trust specs.
I observe a strange fact concerning the speedup which is dramatically decreasing when I use an even number of threads. I think this is due to the hardware support of the OS I'm using which is Debian Squeeze, but I'm not sure.
To my mind, the OS can't do the scheduling correctly and that's why an unbalanced work is done, resulting in bad performances.
Unfortunatly, I have no single processor node to test, so, could you tell me if you already faced this problem ?
Do you have some clues to explain that fact ?