AMD Processors
Decrease font size
Increase font size
Topic Title: temperature difference in a dual cpu (magny cores) machine
Topic Summary:
Created On: 10/30/2010 05:51 AM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 10/30/2010 05:51 AM
User is offline View Users Profile Print this message

Author Icon
a.green
Lurker

Posts: 9
Joined: 09/26/2010

I have a chassis with two opteron 6136 (8 cores each). While running 8 jobs, the temperature according to lm-sensors are 58 and 40 degree Celsius. Is it normal? 18 degree difference is high I think and it seems that the system can not balance the load. Any idea?
 10/31/2010 08:22 PM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

Each CPU is 8 cores, so if your jobs are single-threaded, the OS is probably only running them on one of the two CPUs. Recent Opterons reduce their clock speed and voltage a lot at idle, so I would absolutely expect a completely idle CPU to run at a much lower temperature than a fully-loaded one.

Now if you were saying that both CPUs were fully loaded on all cores and one ran at 58 C and one at 40 C, then it's probably an airflow issue. My two Opteron 6128s originally ran at 40 C on CPU0 and 60 C on CPU1 at full load. CPU0 sat right behind the front intake fans and got a bunch of cool air, but CPU1 was at the back of the board and was bathed in hot exhaust from CPU0, the chipset, and my graphics card. CPU1's fan was also running at a fairly high speed as well. I changed the fan arrangement quite a bit and now they both run at 50-55 C full load, but the heatsink fans all run at a minimum speed (the fan speed ramp-up temp is 60 C, and they never get that hot.)

-------------------------
 11/03/2010 06:59 AM
User is offline View Users Profile Print this message

Author Icon
a.green
Lurker

Posts: 9
Joined: 09/26/2010

No. As I said, I run 8 jobs. In another word I have 16 cores and 8 jobs. I have noticed that all 8 obs are assigned to one cpu. So one cpu is fully loaded while the other is idle. That makes 18 degree difference in temperature. I don't know if it is a CPU issue or OS but I think the most balanced situation is running 4 jobs on one cpu and 4 jobs on the other cpu. Then the temperature will be balanced.

Currently, if one has 58 degree and the other has 40 degree, then the first cpu will be more susceptible to failure and end of life.
 11/03/2010 05:46 PM
User is offline View Users Profile Print this message

Author Icon
talkinggoat
Lurker

Posts: 3
Joined: 02/16/2007

It's an OS issue. You can change the processor affinity, if you're on windows, in the task manager, running 7 and 2008.
 11/04/2010 03:10 AM
User is offline View Users Profile Print this message

Author Icon
a.green
Lurker

Posts: 9
Joined: 09/26/2010

Well I have ubuntu 10.10
 11/05/2010 11:27 AM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

Originally posted by: a.green

No. As I said, I run 8 jobs. In another word I have 16 cores and 8 jobs. I have noticed that all 8 obs are assigned to one cpu. So one cpu is fully loaded while the other is idle. That makes 18 degree difference in temperature. I don't know if it is a CPU issue or OS but I think the most balanced situation is running 4 jobs on one cpu and 4 jobs on the other cpu. Then the temperature will be balanced.


I was not sure how many threads each of those jobs launched, which is why I asked. Some of my video encoding will have one job (process) using all 16 cores. Apparently your jobs are all single-threaded.

The OS scheduler is why you have all 8 jobs on one CPU and none on the other. To make a long story short, there is higher bandwidth and lower latency in I/O between the cores on one CPU versus between the two separate CPUs, so the scheduler is trying to maximize performance. The reason the jobs stay on that one CPU instead of moving to the other is probably because moving jobs around between CPUs can cause memory access times to go up, so keeping everything put can lead to better performance.

Currently, if one has 58 degree and the other has 40 degree, then the first cpu will be more susceptible to failure and end of life.


I would not worry. 58 degrees is well within the safe operating temperature of the CPU. The operating specifications are very carefully determined as to ensure that few to no CPUs will fail in the 3-year warranty period, because those failures would be an absolutely massive expense and PR disaster for the company. (Look up "Pentium FDIV bug" to see an example of what a fair number of processors failing in the warranty period did to a CPU maker.) Thus the CPUs are really over-engineered and will last a very long number of years if run in spec. I've seen failure rates quoted as about 1 in 100 CPUs fail in 10 years of in-spec operation. I would not worry at all.

You will generally only ruin a CPU by grossly overheating it, running it out of rated specifications, subjecting it to high-voltage electrical discharge, or physically damaging it. A server CPU installed in a running system shouldn't be able to be physically damaged, and the risk of high-voltage electrical discharge is also very small, especially if you have a good quality surge protector. Gross overheating of the CPU should not be possible, since the system will shut down if the CPU overheats, preventing damage from occurring. Emergency shutdown mechanisms have been on CPUs for the better part of a decade, and they work well. Running the CPU out of spec is also not going to happen with a server CPU as it is generally impossible to specify out-of-spec operation on a server motherboard. People running desktop CPUs on desktop motherboards can do things like raise the clock speed and voltage, which can shorten a CPU's lifespan (and voids the warranty), but there are no currently-shipping Opteron motherboards that allow this behavior.

-------------------------
 11/07/2010 08:26 AM
User is offline View Users Profile Print this message

Author Icon
a.green
Lurker

Posts: 9
Joined: 09/26/2010

Tanks for that. One more thing....
Please see this output (from "top" command)
Cpu0 : 81.5%us, 5.0%sy, 0.0%ni, 8.6%id, 4.6%wa, 0.3%hi, 0.0%si, 0.0%st
Cpu1 : 89.4%us, 1.3%sy, 0.0%ni, 8.9%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 60.6%us, 1.3%sy, 0.0%ni, 38.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 73.1%us, 2.3%sy, 0.0%ni, 22.7%id, 1.9%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 96.0%us, 1.0%sy, 0.0%ni, 0.0%id, 3.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 57.8%us, 2.3%sy, 0.0%ni, 38.9%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 77.8%us, 3.0%sy, 0.0%ni, 11.9%id, 7.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 60.7%us, 3.0%sy, 0.0%ni, 36.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 78.5%us, 2.3%sy, 0.0%ni, 17.2%id, 2.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 86.8%us, 3.6%sy, 0.0%ni, 5.6%id, 4.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 92.1%us, 2.0%sy, 0.0%ni, 0.0%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 91.4%us, 3.6%sy, 0.0%ni, 2.0%id, 3.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 9.6%us, 2.2%sy, 0.0%ni, 80.9%id, 7.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 94.4%us, 3.0%sy, 0.0%ni, 0.0%id, 2.6%wa, 0.0%hi, 0.0%si, 0.0%st

and then I used "sensors" command to see the temperature:
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +57.4°C (high = +70.0°C, crit = +70.0°C)

k10temp-pci-00cb
Adapter: PCI adapter
temp1: +57.4°C (high = +70.0°C)

k10temp-pci-00d3
Adapter: PCI adapter
temp1: +33.8°C (high = +70.0°C)

k10temp-pci-00db
Adapter: PCI adapter
temp1: +33.8°C (high = +70.0°C, crit = +70.0°C)

So you think these values are reliable? As you can thirteen cpus have utilization higher than 60%.
 11/12/2010 10:10 AM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

I can't tell you if those temps are in fact reliable or not. The best way to prove if they are reliable or not is to open up the case and feel near the heatsink. A heatsink in the low 30s Celsius will feel to be only slightly above room temperature. One at 50-something Celsius will feel very warm. If the heatsinks in fact do feel appropriately warm or not, then I'd start looking for airflow issues causing the first CPU to be warmer than the second. I originally had one of my 6128s running at 40 C and the other at 60 C with all 16 cores loaded because of poor airflow throughout my case. The CPU running at 40 C was right behind four 92 mm intake fans, while the CPU running at 60 C was at the top rear of the case and bathed in hot exhaust air from the first CPU as well as the chipset and GPU. I modified the fan configuration and now both of them run in the low 50s Celsius (albeit at a much lower fan RPM than before.)

-------------------------
 12/15/2010 10:04 AM
User is offline View Users Profile Print this message

Author Icon
PhilB
Newbie

Posts: 32
Joined: 04/26/2010

if your using win2008 server, there is a power management feature for performance. if its set to conserve power then your os may direct most jobs to one chip and let the other idle to save the 80ish watts it would use at full speed. I use mine as a workstation so I adjusted it to use all the power it can get to improve performance.

www.win2008workstation.com has directions on how to adjust settings for workstation use if you are interested. Some of the directions will help send the processes across both processes evenly.

-------------------------
Phil
----------
System Specs:

2x Opteron 6128
Asus KGPE-16
8x Kingston 4Gb 1333Mhz DDR3 w/ ECC, Registered, Parity
OCZ Vertex2 50gb (OS)
2x WD Blue 500Gb (Data storage)
PowerColor 5770
3x Viewsonic 19" widescreen at 1680x1050
Coolermaster ATCS-840 case
Coolermaster UCP 1100W
Installed OS: Win2008
VirtualBox OSs: Ubuntu/Fedora/Win2k/WinXp/Dos/Vista (set up to use second keyboard and mouse)
 05/24/2011 09:02 AM
User is offline View Users Profile Print this message

Author Icon
ella1985
Lurker

Posts: 3
Joined: 05/24/2011

ye, i think it is normal.
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information