AMD Processors
Decrease font size
Increase font size
Topic Title: "TLB parity error in virtual array; TLB error 'instruction"?
Topic Summary:
Created On: 03/07/2010 11:58 AM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 03/07/2010 11:58 AM
User is offline View Users Profile Print this message

Author Icon
ant
Newbie

Posts: 13
Joined: 03/30/2007

Hello.

Lately, I have been random and rare kernel panics on my old Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure out what it was until I discovered mcelog a couple days ago, and it revealed interesting scary datas in my dmesg/messages and syslog:

# cat /var/log/messages
...
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction, level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

FYI!
dmidecode and dmesg datas in my regular Debian: http://pastie.org/859730

Idling in KNOPPIX v6.2.1 boot CD for in text mode/console over five hours overnight had no problems and errors. I didn't see any machine check errors in its dmesg. I couldn't get mcelog to work (can't find /dev/mcelog) as well.

I am not familiar with hardwares, so I assume this is very bad, but what part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had it and its motherboard since 12/24/2006, so it is not that old yet. I have the full details on my secondary machine at http://alpha.zimage.com/~ant/a...rm/about/computers.txt ...

Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC.

Thank you in advance.

-------------------------
Ant @ Ant's Quality Foraged Links: http://aqfl.net and The Ant Farm: http://antfarm.ma.cx

Edited: 03/08/2010 at 11:34 AM by ant
 03/13/2010 02:27 AM
User is offline View Users Profile Print this message

Author Icon
ant
Newbie

Posts: 13
Joined: 03/30/2007

No replies?

Last night, I ran memtest86+ v4.00's test #9. http://www.memtest86.com/tech.html#descri says: "Test 9 [Bit fade test, 90 min, 2 patterns]

I only ran it for over 3.25 hours and it passed (only one test). Shouldn't this test that problem? Or is that TLB somewhere else? Maybe I need to run it longer and more?

Also, I did a cat /var/log/messages |grep mcelog and posted the long log at http://pastie.org/867602 ... Check out of those mcelog errors.

The author of cpuburn, told me to try seven and 37 "nice -19 ./burnMMX P &" separately. I ran them for many hours, and no problems. I am starting to notice that the errors and kernel panics seem to only occur when my system is idled (again, not using AMD's Cool'n'Quiet).

But wait a minute...
# dpkg -l | grep ^ii |grep cpu
ii cpufrequtils 006-2 utilities to deal with the cpufreq Linux kernel feature
ii cpulimit 1.1-13 tool for limiting the CPU usage of a process
ii libcpufreq0 006-2 shared library to deal with the cpufreq Linux kernel fe

I don't think I am supposed to have these even though I disabled cool'n'quiet and don't have powernow module. I uninstalled cpufrequtils and cpulimit packages, but not libcpufreq0. It wanted to remove a bunch of other things:
# apt-get remove libcpufreq0
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libmono-addins-gui0.2-cil mono-2.0-gac geoclue-localnet libempathy30 geoclue tomboy telepathy-salut libevent-1.4-2
libgtk-vnc-1.0-0 libgnomepanel2.24-cil libglade2.0-cil libglib2.0-cil python-software-properties cheese evolution-exchange
libgconf2.0-cil python-aptdaemon-gtk gnome-codec-install python-aptdaemon cli-common gnome-screensaver w3c-dtd-xhtml
libnm-util1 system-config-printer libart2.0-cil libjs-jquery epiphany-extensions seahorse empathy python-apt
libempathy-common libempathy-gtk28 gvfs-bin vinagre swfdec-gnome libgnome2.24-cil libndesk-dbus1.0-cil seahorse-plugins
libgeoclue0 libmono-cairo2.0-cil gedit-plugins libgmime2.4-cil software-center libmono-i18n-west2.0-cil libcryptui0
libgdu-gtk0 libmono-addins0.2-cil arj python-webkit libmono-posix2.0-cil libmono-security2.0-cil gnome-disk-utility
libgtk2.0-cil mono-gac python-vte libnm-glib2 unattended-upgrades python-xapian geoclue-hostip aptdaemon
python-gnupginterface telepathy-mission-control-5 python-cupsutils libswfdec-0.8-0 libmono-sharpzip2.84-cil
libmono-corlib2.0-cil libchamplain-0.4-0 libchamplain-gtk-0.4-0 mono-runtime python-cups python-evolution
libndesk-dbus-glib1.0-cil libempathy-gtk-common hamster-applet binfmt-support libgnome-vfs2.0-cil libavahi-ui0
transmission-common gstreamer0.10-tools lsb-release libmono-system2.0-cil transmission-gtk
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
gnome gnome-applets gnome-core gnome-desktop-environment libcpufreq0
0 upgraded, 0 newly installed, 5 to remove and 126 not upgraded.
After this operation, 1,028kB disk space will be freed.
Do you want to continue [Y/n]? n


Let's see if this solves the problem?

# lsmod |grep cpu
cpufreq_powersave 602 0
cpufreq_userspace 1444 0
cpufreq_stats 1940 0
cpufreq_conservative 4018 0
xt_tcpudp 1743 92
x_tables 8335 6 xt_tcpudp,xt_limit,xt_state,ipt_LOG,ipt_REJECT,ip_tables

I am not sure if those are bad or not.

-------------------------
Ant @ Ant's Quality Foraged Links: http://aqfl.net and The Ant Farm: http://antfarm.ma.cx
 04/30/2010 05:44 PM
User is offline View Users Profile Print this message

Author Icon
ant
Newbie

Posts: 13
Joined: 03/30/2007

I am still having this problem. I cannot seem to reproduce it outside of my 2005's Debian installation with LiveCDs like KNOPPIX and Ubuntu. Longest uptime was 15.5 hours and I don't think that is enough since I need to use the box.

-------------------------
Ant @ Ant's Quality Foraged Links: http://aqfl.net and The Ant Farm: http://antfarm.ma.cx
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information