AMD Processors
Decrease font size
Increase font size
Topic Title: MCE log decoding.
Topic Summary:
Created On: 03/13/2007 07:12 PM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 03/13/2007 07:12 PM
User is offline View Users Profile Print this message

Author Icon
drescherjm
Member

Posts: 20
Joined: 05/13/2006

I am getting the following errors very frequently on a linux server at work and it is beginning to be of concern . I am using a TYAN 2882 mobo with 2 2GB patriot pc3200 reg ecc sticks that I believe were on the recommended list when I bought them a 8 months ago but TYAN has changed their web page and now there are only 4 recommended dimms for this mobo. The one thing I notice that the syndrome is always d5de and the address is always 212080. Does this absolutely mean memory and not a PCI card? I want to ask first as this box is headless and in a network closet so its hard to get at. Before everyone thinks this is heat related the closet is does have AC.


# mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC ff961e93a1f14
ADDR 212080
Northbridge Chipkill ECC error
Chipkill ECC syndrome = d5de
bit32 = err cpu0
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d46f4001d5080813 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC ff9df289de662
ADDR 212080
Northbridge Chipkill ECC error
Chipkill ECC syndrome = d5de
bit32 = err cpu0
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d46f4001d5080813 MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC ffa5c694a080f
ADDR 212080
Northbridge Chipkill ECC error
Chipkill ECC syndrome = d5de
bit32 = err cpu0
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d46f4001d5080813 MCGSTATUS 0

-------------------------
Main RIG:
TYAN 2885 2 Opteron 850s running 64bit amd64 gentoo linux 2006.0

Old system:
TYAN 2460 with 2 Athlon MPs 2200 with 1GB of corsair pc2100 reg ecc and 1 GB of MT pc2700 reg ecc and about 700GB of hard disk space (ide, sata and 15K scsi) for my mythTV data...
 03/15/2007 02:18 PM
User is offline View Users Profile Print this message

Author Icon
drescherjm
Member

Posts: 20
Joined: 05/13/2006

I solved the issue temporarily by moving one 2GB dimm to the second processor so that each processor only has 1 dimm connected. The memory speed benchmarks are lower but it passed a 20 hour memtest86+ and there are no errors in the mcelog.

-------------------------
Main RIG:
TYAN 2885 2 Opteron 850s running 64bit amd64 gentoo linux 2006.0

Old system:
TYAN 2460 with 2 Athlon MPs 2200 with 1GB of corsair pc2100 reg ecc and 1 GB of MT pc2700 reg ecc and about 700GB of hard disk space (ide, sata and 15K scsi) for my mythTV data...
 03/16/2007 11:06 AM
User is offline View Users Profile Print this message

Author Icon
mckennma
Senior Member

Posts: 2989
Joined: 06/01/2005

quote:

Originally posted by: John M. Drescher
I solved the issue temporarily by moving one 2GB dimm to the second processor so that each processor only has 1 dimm connected. The memory speed benchmarks are lower but it passed a 20 hour memtest86+ and there are no errors in the mcelog.



I would run memtest on the moved 2GB DIMM from a boot ISO memtest CD. www.memtest86.com.

-------------------------
Tyan Thunder K8WE
Dual AMD 280 Opterons
8GB NUMA enabled DDR333 2.5-3-3-7 RAM
PCI-X SCSI RAID
http://69.14.190.80
 03/16/2007 12:09 PM
User is offline View Users Profile Print this message

Author Icon
drescherjm
Member

Posts: 20
Joined: 05/13/2006

quote:

Originally posted by: mckennma
I would run memtest on the moved 2GB DIMM from a boot ISO memtest CD. www.memtest86.com.



I now believe I definitely have a bad dimm and it is the one I moved as I got a kernel panic (first one in two years in my whole department) and the last thing on the console (besides the kernel panic) was an mcelog error was on cpu 1 now.

-------------------------
Main RIG:
TYAN 2885 2 Opteron 850s running 64bit amd64 gentoo linux 2006.0

Old system:
TYAN 2460 with 2 Athlon MPs 2200 with 1GB of corsair pc2100 reg ecc and 1 GB of MT pc2700 reg ecc and about 700GB of hard disk space (ide, sata and 15K scsi) for my mythTV data...
 03/16/2007 12:52 PM
User is offline View Users Profile Print this message

Author Icon
mckennma
Senior Member

Posts: 2989
Joined: 06/01/2005

quote:

Originally posted by: John M. Drescher
I now believe I definitely have a bad dimm and it is the one I moved as I got a kernel panic (first one in two years in my whole department) and the last thing on the console (besides the kernel panic) was an mcelog error was on cpu 1 now.



2GB modules are flaky. I also suggest 1GB modules unless you need to get beyond 8GB. I use Corsair 1GB Reg. ECC modules without any problem.

-------------------------
Tyan Thunder K8WE
Dual AMD 280 Opterons
8GB NUMA enabled DDR333 2.5-3-3-7 RAM
PCI-X SCSI RAID
http://69.14.190.80
 03/16/2007 01:03 PM
User is offline View Users Profile Print this message

Author Icon
drescherjm
Member

Posts: 20
Joined: 05/13/2006

I agree. For the most part I have avoided 2GB modules and gone with corsair XMS. This time it was about cost as when I purchased these dimms last year (from newegg) there was not a big price difference between the 2GB patriot dimm (which was on the TYAN recommended list) and a 1GB corsair XMS dimm. I guess I learned my lesson...

-------------------------
Main RIG:
TYAN 2885 2 Opteron 850s running 64bit amd64 gentoo linux 2006.0

Old system:
TYAN 2460 with 2 Athlon MPs 2200 with 1GB of corsair pc2100 reg ecc and 1 GB of MT pc2700 reg ecc and about 700GB of hard disk space (ide, sata and 15K scsi) for my mythTV data...
 03/16/2007 01:08 PM
User is offline View Users Profile Print this message

Author Icon
mckennma
Senior Member

Posts: 2989
Joined: 06/01/2005

quote:

Originally posted by: John M. Drescher
I agree. For the most part I have avoided 2GB modules and gone with corsair XMS. This time it was about cost as when I purchased these dimms last year (from newegg) there was not a big price difference between the 2GB patriot dimm (which was on the TYAN recommended list) and a 1GB corsair XMS dimm. I guess I learned my lesson...



Are they set to DDR400 or DDR333? I found they work better DDR333.

-------------------------
Tyan Thunder K8WE
Dual AMD 280 Opterons
8GB NUMA enabled DDR333 2.5-3-3-7 RAM
PCI-X SCSI RAID
http://69.14.190.80
 03/16/2007 01:14 PM
User is offline View Users Profile Print this message

Author Icon
drescherjm
Member

Posts: 20
Joined: 05/13/2006

They are set to DDR400.

I believe I have the ChipKill ECC option turned on and background scrubbing on. Do you think this makes a difference?

BTW, Thanks for your help.

-------------------------
Main RIG:
TYAN 2885 2 Opteron 850s running 64bit amd64 gentoo linux 2006.0

Old system:
TYAN 2460 with 2 Athlon MPs 2200 with 1GB of corsair pc2100 reg ecc and 1 GB of MT pc2700 reg ecc and about 700GB of hard disk space (ide, sata and 15K scsi) for my mythTV data...
 03/16/2007 01:25 PM
User is offline View Users Profile Print this message

Author Icon
mckennma
Senior Member

Posts: 2989
Joined: 06/01/2005

quote:

Originally posted by: John M. Drescher
They are set to DDR400.

I believe I have the ChipKill ECC option turned on and background scrubbing on. Do you think this makes a difference?

BTW, Thanks for your help.



Try DDR333. I have never seen Tyan list a 2GB for DDR400. They are usually in the DDR333 section. My DDR333 setting is 2.5-3-3-7. I lose about 100 MB/s in bandwidth.

My K8W/265s
DDR400 3-3-3-8 was 8.5 GB/s memory bandwidth
DDR333 2.5-3-3-7 was 8.4 GB/s memory bandwidth

Moved them to K8WE/280s
DDR333 on 8GB is 9.1 GB/s.

Leave ECC settings to default. Scrubbing will slow down restarting the server. Chipkill ECC can prevent problems from bad DIMMs.

Put them both on CPU0 first bank. Set them to DDR333. Reboot and test with memtest86 ISO CD.

Could be DDR400 is making them flaky. Try DDR333 first.

-------------------------
Tyan Thunder K8WE
Dual AMD 280 Opterons
8GB NUMA enabled DDR333 2.5-3-3-7 RAM
PCI-X SCSI RAID
http://69.14.190.80
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information