AMD Processors
Decrease font size
Increase font size
Topic Title: Server IBM 325e freezes irregualr times, Operton 246 and w2K3 R2 x64
Topic Summary: AMD and Windows 2003 R2 x64 and server hardware help
Created On: 04/28/2008 10:24 AM
Status: Read Only
Linear : Threading : Single : Branch
1 2 Next Last unread
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 04/28/2008 10:24 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

I am having an intermittent "lock up" that is driving me nuts. I am pretty much ready to throw the towel in.

We have an IBM eServer 325 (8835) 1U server with dual AMD Opterons, 4 Gig RAM, dual onboard 10/100/1000 NICS, onboard Rage XL video and onboard IDE with 2 drives (mirrored & sliced into 3 partitions). We are running Windows Server 2003 R2 x64 (using Windows to mirror drives) and Exchange 2007 (basic roles with FSE 2007). Small office network with a Windows 2003 DC in the same rack.
About every 6 or 7 days the box "freezes". The first indication of the problem is that clients lose their Outlook connection to Exchange. When I log onto the console I can view the logs, but there are never any relevant errors. I can never reboot the box. The more "stuff" I try to open, the quicker the box completely locks up 100%.

I have noticed two constants when the problem happens. First of all, the system clock (in the tray) is always frozen at the instant the problem started (Windows event logging also stops at the same instant). Secondly, the two NIC tray icons are missing from the tray. I can view the device manager and everything appears fine there even though the icons are gone. If I try to "remote" in during the problem, I get half way and then the screen goes black. I always end up having to pull the AC plug and let the box recycle that way. Once done, all is well and darn near perfect for another 6 or 7 days (other than the mirrors need to re-synch). On the DC there are no errors indicating a problem on the other box (other than a warning that the Exchange box has the wrong time, etc). This problem has occurred since we built it with W2K3 several months ago. I have no idea what was on the box previous to that.
I have swapped memory, drives, CPU's and finally the entire box with another 325e thinking this was a hardware problem. I keep getting the exact same results. I have upgraded every driver from the IBM site and even installed the latest drivers for the AMD processor and chip set from the AMD site. I have upgraded the BIOS and the firmware on the NICs and BMC. MSI made the IBM 325's, so I looked on their web site too and took their appropriate drivers. I have taken all the Windows suggested updates and patches. I have spent days googling the problem. No changes.

I am reluctant to reformat and start over because I don't think it will solve the problem. Is there some AMD/Opteron/Windows 2003 R2 x64 issue/incompatibility I am missing? Does some hardware (like the BMC) just not play well with Windows 2003? Any and all help is greatly appreciated.

Thanks.

Charlie

-------------------------
Charlie O
 05/05/2008 08:40 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

I am having this exact same problem with the exact same setup (Same server, Windows 2003 x64, Exchange 2007). I have also tried moving the hard drive to another identical server we had. Same type of lockups and errors. I would really love to find the answer to this.
I just downloaded the AMD drivers from this site but it looks like it won't do any good.

Thanks
 05/05/2008 08:49 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

I am convinced that the issue has something to do with the BMC and and the software layer that "talks" to the x64 Windows.

Now you know why there are so many 325e servers for sale so cheap. !!

-------------------------
Charlie O
 05/05/2008 08:57 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

Sorry, not up on everything with 64 bit processors. What exactly is BMC? Also, I was thinking of trying to take one of the processors out and see if that helps. While perusing this board, I noticed others with similar issues with dual processors and they took one out. Have you tried this, Charlie? Thanks
 05/05/2008 11:47 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

BMC=Baseband Management Controller. It is hardware that can manage and/or report on the motherboard at a very low level.

I have not tried pulling one of the CPU's, but I will later today or tonight and see what happens. We'll need to wait at least a week to see if that is the problem.

-------------------------
Charlie O
 05/05/2008 02:16 PM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

Too bad! Unless I am missing something, one cannot get the e325 to boot with a single CPU. Although the documentation implies they do ship a version with a single CPU, ours won't boot. Perhaps they have a different BIOS. We get a 3-3-3 beep code that references memory for some reason. Let me know if you have better luck. CPU2 is cloest to the hard drives.

-------------------------
Charlie O
 05/06/2008 08:02 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

Well, I just pulled the second processor out of my second 325 (the one not running exchange) and it booted up no problem. I didn't have any DIMMs in the memory slots next to the CPU2 socket. Maybe that's it. When I get a chance, I'm going to try with the Exchange server.
 05/06/2008 08:07 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

Great. CPU 2 being the one closest to the front of the server I hope.

I will try moving the RAM. What slots do you have RAM in and what sizes are the DIMMS?

Thanks.

-------------------------
Charlie O
 05/06/2008 08:21 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

In the second server I have 2 slots filled with 512MB DIMMs next to CPU1. In the Exchange server, I have 4 DIMMs in the CPU1 slots and 2 DIMMS in the CPU2 slots also 512MB DIMMs. I'm going to try taking out the CPU in my Exchange server first chance I get.
 05/06/2008 11:27 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

Outstanding! I just moved my two 512 DIMMS to memory slots 1 and 2 (from 5 and 6). Machine boots with CPU2 removed!!! I have over 500 pages of documenttaion on the 325e and that is never mentioned anywhere! I wonder if I had memory in 1,2 AND 5,6 ... it would have booted and seen all the memory too.

The documentation says that for optimal SMP performance (dual CPU) you should install RAM in slots 1,2 and 5,6. I only had memory in the 5 and 6 slots. 5 and 6 being the two acrross from CPU2 and 1 and 2 being the furthest away from CPU1. I am now wondering if trying with DIMMS in slots 1,2,3,4 only might be the answer (or at least something to try first).

Are you having freezes on the 325e with memory in only slots 1 and 2 (just the slots next to CPU1)? Or, does this configuration freeze for you too???

It seems very odd to me that part of the memory structure is somehow physically tied to the second CPU...... I am guessing this has something to do with having the 6 slots instead of 4. You just may be on to something here!

Charlie

-------------------------
Charlie O
 05/07/2008 08:04 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

I have all memory slots filled with 512MB DIMMs. Also, when these lockups occur, after rebooting, the system event log has several machine check errors listed from source WMIxWDM. They normally say fatal bus errors with things like TLB or something like that. Does your event viewer list those errors?
So the documentation says nothing about slots 3 and 4 for optimal performance?

Thanks
 05/07/2008 08:15 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

The errors you are seeing during a cold boot are not fatal errors and are corrected by the operating system. If they weren't, you'd never boot. I see them too when I cold boot. The only thing I am doing different than you is that I have 2 IDE drives that are software RAIDed for redundancy. When my box freezes, the drives always need to resynch after re-boot.

I am going to try two things:

RAM in only slots 1, 2, 3, 4 (512,512, 1gig, 1gig). Slots 5 and 6 empty. CPU1 and 2 installed. In other words, remove RAM from 5 and 6 and see what happens. These two last slots seem tied to CPU2 some way. If I still have freezes, then move on to:

Remove CPU2 and boot with RAM in slots 1, 2, 3, and 4 as above. I only have 10 users on Exchange 07 with Forefront, so, I doubt we really need the second CPU for the "load".

If neither of these configs work . . . . I'll burn the machine and start over.

Charlie

-------------------------
Charlie O
 05/07/2008 08:54 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

Yeah, I believe slots 5 and 6 are only used for CPU2. I may also try disabling the Broadcom ebedded NICs and installing a different one. I've had problems with Broadcoms in the past, most in Dell machines.
Also, the more I think about it, the more I wonder if it is something to do with Exchange 2007 and the way it uses resources. I'm just having a hard time believing 3 different machines that are the same model server have the same exact hardware problems.
 05/07/2008 09:14 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

Slots 5 and 6 are used by CPU1 and CPU2. I am really not all that familiar with the bus structure though and how Windows looks at and/or addresses all the space.

I don't think it's the NICS. If it was, I think we'd see a bunch of errors in the logs. I never see any.

If I had to guess, I really think this is a hardware issue related to the M/B, AMD CPU and Windows 2003 x64. Either the BMC is getting confused, or something is causing the memory to get "lost". If it were Exchange, we would be hearing about it all over the web. The 325's are old, AMD based and not really ever designed for use with W2003 R2 x64.

Actually I have had THREE different 325e's running Exchange with the same problem.

-------------------------
Charlie O
 05/07/2008 09:34 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

Wow, 5 servers total with the same problem. I was thinking of calling IBM to see if these servers qualify for any type of support. I'm guessing they are too old though and support would probably cost some money, like MS.
 05/07/2008 09:51 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

Call IBM and you might die from sticker shock. Service/support on this unit would be more than 5 times the machine is worth. I doubt they support it under Windows 2k3 R2 now anyway. They didn't even manufature the unit, MSI did. You can buy these used servers in quantity now for less than $300!

We have a plan with Microsoft, but I am not going to use up my allowance on this. Between not having meaningful error logs and not being able to recreate the freezes on demand, they are going to blame the hardware anyway.

Later today I'll remove the RAM from 5 and 6 and see how long we go. Hpefully that will be the answer.......

Keep your fingers crossed.

-------------------------
Charlie O
 05/07/2008 10:24 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

I will definitely keep them crossed. Another thing I thought of trying was to "downgrade" to Win2k3 R1 and see if that made any difference. Though as far as I know, R2 is Win2k3 with with the Service Packs already in it.
 05/07/2008 05:12 PM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

No, that is not quite right. Windows 2003 R1 SP2 is NOT the same as Windows 2003 R2 (R2 = release 2 and there are some signifigant changes and additions. You can not downgrade to R1, especially if you have a DC on a network.

-------------------------
Charlie O
 05/09/2008 02:03 AM
User is offline View Users Profile Print this message

Author Icon
charlieo
Junior Member

Posts: 16
Joined: 04/28/2008

Well, memory has been out of slots 5 and 6 now for 48 hours. Nothing bad yet.

BTW, there is a MS update out for Exchange 2007 SP1 that is a roll up of of the most important updates since SP1. You might want to make sure you install it. I just did. Takes about 15 minutes to install. I'd suggest a reboot when you are done.

I'll keep you posted.

Charlie

-------------------------
Charlie O
 05/09/2008 08:09 AM
User is offline View Users Profile Print this message

Author Icon
rayminette
Junior Member

Posts: 14
Joined: 05/05/2008

Thanks Charlie. I didn't mean trying to install Windows 2k3 R1 over R2 before, I had meant I was thinking of redoing everything with R1. I do have SP1 on, I have to check into those updates. Are you still running on 2 processors?
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 1 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information