AMD Processors
Decrease font size
Increase font size
Topic Title: Why is Jaguar 2P ?
Topic Summary:
Created On: 03/25/2011 02:20 AM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 03/25/2011 02:20 AM
User is offline View Users Profile Print this message

Author Icon
4P_Bulldozer
Fanboi

Posts: 52
Joined: 03/24/2011

I am interested in purchasing a 4P Motherboard and am studying about MP prior to purchase.


I noticed that the Jaguar ( http://en.wikipedia.org/wiki/Jaguar_(computer) or http://www.cray.com/Products/XT/Systems/XT5.aspx ) was built over a period of many years at ORNL; that explains why parts of it (the whole Computer) are different.

In early years it was 1P (somewhat understandable?) but more recently is is still only 2P (less understandable?). Looking at one of the 'Motherboards' (Page 2 of http://www.cray.com/Assets/PDF...t/CrayXT5Brochure.pdf) it looks as though it would have been better off as 4P or 8P, since Cray builds it's own Boards they can do as they wish.


Wouldn't 4P (or as many as you can stuff on one Board) allow small chunks of the Program to intercommunicate with itself and provide the "portion of the result" quicker than further dividing each portion of the Task into "less connected bits" ?

Q1: With nearly a 1/4 millions Processors (in the end) can someone tell me why it is 1P and 2P[1] and why that was (presumably) thought to be better than 4P or even 16P ?

[1] http://en.wikipedia.org/wiki/Jaguar_(computer)
"Each XT5 compute node contains dual hex-core AMD Opteron 2435 (Istanbul) processors ..."
"Each XT4 compute node contains a quad-core AMD Opteron 1354 (Budapest) processor ..."


As someone desiring to purchase a 4P Motherboard, 4P with only 2 Processors and half as much memory as I desire (in the end) seems less expensive once I eat the Motherboard and Case cost (which is only 2K (for a GREAT Server) as opposed to one third (for a "Big-Box OTS") as much for a 1P System).

This also seems a cheaper way to upgrade your Computer to a more powerful one (a few years down the road) by adding 2 more Processors (or trying to sell the 2 old ones and adding 4 new faster ones) instead of having an old slow Computer that is only good (bad) for the Landfill.


Q2: From Win7 only supporting 2P, to the above points about Jaguar, to the Motherboard Manufacturers making 'less fancy than 1P or 2P' so-called "Server Boards" (few PCI Slots and poor USB snd slow SATA speed with 'Speaker beep' Audio) and apparently a lack of "Workstation Boards" (unless you have 20K for HP, Oracle, etc.) I have to ask why there is seemingly an anti-4P atmosphere (especially with the 4P Tax lifted; as if that mattered to someone with well over 20K of the Boss's Cash) ?

I figure I'll come in at well less than 4K (with Sales Tax) and every Computer I ever bought always cost me more than 1/4 that much.
 03/25/2011 06:24 PM
User is offline View Users Profile Print this message

Author Icon
4P_Bulldozer
Fanboi

Posts: 52
Joined: 03/24/2011

After numerous 'Googling Strategies' I found this tidbit.

This does not completely answer the questions but may be useful for others who wish to choose between 2P and 4P and are themselves Googling for info.


This page:

ICC Inc. - Building Supermicro 1U, 2U, 3U, 4U
http://www.icc-usa.com/amd-cpu.asp

Says:

1P:
    "1P servers are great for dedicated hosting for a client that wants his own server for security reasons. If the server is going to have light use, but the customer wants their own unit anyway, a single-processor server is the way to go."


2P:
    "... if you need the extra processing power of dual processors, we recommend that you buy a 2P system instead of multiple 1P servers because you will save money ..."
    "One dual-processor (2P) server runs faster than two equivalent single-processor (1P) servers."
    "Even compared to 4P systems, using dual-processor servers in a cluster is more cost-effective."


4P:
    "If you are looking for a server to run complex mathematical software, financial analysis computation, scientific modeling, or any other performance-intensive task, then a 4P server may be the right solution for you."
    "The best use for a 4P server is to work with large databases. A quad-processor unit is more efficient at managing the enormous amount of information stored on a hefty database than two equivalent dual-processor servers."
    "4P servers are pricey. But they are also the cutting-edge of server technology, if your applications call for performance at any cost ..."



The tidbit of useful info I extract from that is:
    "Compared to 4P systems, using dual-processor servers in a cluster is more cost-effective." (may not apply to Cray which builds their own Boards, and why all the 1P).
    "A quad-processor unit is more efficient at managing the enormous amount of information stored on a hefty database than two equivalent dual-processor servers." (thus why all the 1P and 2P and no higher).



Please note that this info will not apply to all situations and is only a part of a general guide. Specific Applications may benefit more from 4P over 2x2P and newer Architectures (Interlagos / Sandy Bridge) may favor 4P over 2x2P.


One thing seems certain is:
    4P means fewer Slots for Cards (and more often has more Slots for Memory) with fewer large enough Chassis available.
    2P almost always has more Slots for Cards (sometimes has many Slots for Memory), more space for better Audio and on-Board Graphics and a greater choice of Motherboards with more choices for Chassis.
    1P has greater choices for everything except total Memory and doubling or (nearly) quadrupling your processing power (unless you wait 5 years to upgrade your CPU in which case you would probably desire to purchase a newer Motherboard also; thus that is more than "2P" for cost without any benefit other than being cheap twice).
 04/01/2011 05:32 PM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

The answer to "why is Jaguar 2P" is "money." Both AMD and Intel previously had a significant price premium on the ability to run more CPUs on one board. Look at the list prices of the older Opteron 1300, 2300, and 8300 Barcelona CPUs. They were all the same CPU, except that AMD fiddled with the number of HyperTransport links left enabled (and the number of HT links determined the number of CPUs you could run on one board.) The 1300s ran between about $150 and $300, the 2300s ran between $200 and $1500, and the 8300s ran between $850 and $2600. Jaguar is a cluster of a ton of motherboards and the cluster sees n number of 1P units the same as n/2 2P units or n/4 4P units. Scaling tails off after a certain point, so they had to crunch the numbers of price hike vs. performance loss in deciding how many of what kind of boards to use. Apparently their interconnect scales pretty well since they were using a bunch of 1P and 2P systems instead of 4P and 8P systems.

Today, Jaguar would very likely use quad-socket boards if it used AMD's CPUs. AMD has significantly redone their price figures and gotten rid of 1P-only server CPUs, priced the 2P-capable units not that much above what desktop CPUs cost, and are selling their 4P-capable CPUs (which now have twice as many cores as the 2P-capable units) at roughly the prices they used to sell 2P-capable CPUs.

And as far as 1P vs 2P vs 4P for a workstation or server:

- AMD 1P: very little point to this unless you make your workstation or server from inexpensive client/consumer parts. 1P server boards are not all that much less expensive than 2P ones and AMD no longer sells 1P-only server CPUs. Even if you use inexpensive consumer parts, the price of a 1P setup isn't going to be that much less than half of an equivalent 2P setup but it will be considerably less powerful and less robust. That's why none of my desktop/workstation/server machines are 1P any more. (My sig is old, the Athlon XP machine is now a dual Opteron 265 unit.)

- Intel 1P: makes sense if you do not need the memory capacity or processing capability of two CPUs. 1P server/workstation boards are again not much less expensive than otherwise similar 2P boards, but 1P CPUs are a lot less expensive than comparable 2P CPUs. A decent Xeon 3500 1P unit costs about $300-400, while the equivalent Xeon 5500 2P costs about a grand.

- AMD 2P: an excellent choice for a workstation or server. The I/O slots are similar to what you'd find on a typical ATX desktop motherboard. The boards are much more robust and reliable and support a lot more RAM than desktop boards. Prices are not much more than a 1P server setup for a 2P setup with two 4/6-core Opteron 4100s, and dual Opteron 4100 setups can be had in typical ATX-size boards. Dual Opteron 6100s are somewhat more expensive and the boards are always extended ATX or bigger, but they provide four times the core count compared to a 1P desktop and twice the core count of a dual Opteron 4100 setup.

- Intel 2P: this is really as far up in their lineup as you want to go. Good 2P Xeons are pricey but absolutely pale in comparison to what the 4P+ ones cost. The 2P Xeons also use ATX and EATX-sized motherboards so they fit in normal-sized and larger desktop cases.

- AMD 4P: still a very good choice for a workstation or server, provided you can use the 32+ cores the system provides. The price is not extremely expensive as four Opteron 6128s and a quad-socket motherboard costs under $2000. It's pretty much exactly twice what a dual Opteron 6100 setup costs. There are slightly fewer slots on 4P boards than 2P and 1P boards, but you still generally get 5 expansion slots. The biggest downside to AMD 4P boards is the size as they measure 16.3-16.5" by 13" and require a massive case.

- Intel 4P: extremely, extremely expensive ($1200-3800 per CPU), all use highly non-standard proprietary motherboards, and suck down power like frat guys suck down beer. They are also 45 nm Nehalem CPUs and thus a generation behind the current 2P Xeons and two generations behind desktop. The only real advantage to the 4P Xeons is if you need to have more than 48 cores (which is what a 4P AMD system can do, and AMD's CPUs top out at 4P now) in one system as these CPUs come in up to 8 core versions and can scale to 32P setups. The cost of a 32P Xeon 7500 setup? If you have to ask...

-------------------------


Edited: 04/01/2011 at 06:34 PM by MU_Engineer
 04/04/2011 12:46 AM
User is offline View Users Profile Print this message

Author Icon
4P_Bulldozer
Fanboi

Posts: 52
Joined: 03/24/2011

Thank you for taking the time to answer and your Expert opinion.

Originally posted by: MU_Engineer

The answer to "why is Jaguar 2P" is "money." Both AMD and Intel previously had a significant price premium on the ability to run more CPUs on one board. Look at the list prices of the older Opteron 1300, 2300, and 8300 Barcelona CPUs. They were all the same CPU, except that AMD fiddled with the number of HyperTransport links left enabled (and the number of HT links determined the number of CPUs you could run on one board.) The 1300s ran ...

I thought it used 61xx's.


... they had to crunch the numbers of price hike vs. performance loss in deciding how many of what kind of boards to use. Apparently their interconnect scales pretty well since they were using a bunch of 1P and 2P systems instead of 4P and 8P systems.

That is why I said:
In early years it was 1P (somewhat understandable?) but more recently is is still only 2P (less understandable?)

Since they make their own Boards and Interconnects then why not 16P (or 8P if that is all that will fit on one Board) ?


Today, Jaguar would very likely use quad-socket boards if it used AMD's CPUs. AMD has significantly redone their price figures and gotten rid of 1P-only server CPUs, priced the 2P-capable units not that much above what desktop CPUs cost, and are selling their 4P-capable CPUs (which now have twice as many cores as the 2P-capable units) at roughly the prices they used to sell 2P-capable CPUs.

Google for "Cray Interlagos" and you will note that a few Places SAY that in June they will be adding to their "Magny-Cours" some Bulldozers.

Here is one Place: http://www.ncrc.gov/computing-resources/gaea/


And as far as 1P vs 2P vs 4P for a workstation or server:

- AMD 1P: very little point to this ...
- Intel 1P: makes sense if you do not need ...
- AMD 2P: an excellent choice for a workstation or server. ...
- Intel 2P: this is really as far up in their lineup as you want to go. ...
- AMD 4P: still a very good choice for a workstation or server, ...
- Intel 4P: extremely, extremely expensive ... if you need to have more than 48 cores ... can scale to 32P setups. The cost of a 32P Xeon 7500 setup? If you have to ask...

AGREED.


...(which is what a 4P AMD system can do, and AMD's CPUs top out at 4P now)...

It is unfortunate that 8P (and more) is not possible IF the Board Manufacturer does ALL the Glue.


Have you seen this little Beauty, it is new: http://forums.amd.com/forum/me...d=y&STARTPAGE=2#bottom (Hit Page-Up).


Thanks for taking the time to respond to my Post.
 04/08/2011 03:16 PM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

Originally posted by: 4P_Bulldozer

Thank you for taking the time to answer and your Expert opinion.

I thought it used 61xx's.


Nope, Jaguar's website says it uses Budapest 1354s in the UP nodes and Istanbul 2435s in the 2P nodes.


That is why I said:

In early years it was 1P (somewhat understandable?) but more recently is is still only 2P (less understandable?)


The 2P nodes use Istanbuls, which were the very last of the pre-price-dropped Opterons. Today you would expect Opteron 6100s to be used, but they were not around when Jaguar was made.

Since they make their own Boards and Interconnects then why not 16P (or 8P if that is all that will fit on one Board) ?


It would physically/electrically be possible to chain together far more than four Opteron 6100s on one motherboard without using any interconnect "glue." Each Opteron 6100 has three off-die coherent HyperTransport links capable of CPU-to-CPU communication. You could chain a bunch of them together in a ring or other multi-hop topology and get a high socket count on one board. Such a setup would have awful scaling because of high interconnect latency and a ton of snoop traffic, but it could be done. However, I don' t know if the CPUs' firmware/HT controllers supports that kind of operation, perhaps they are hard-wired to operate only in pre-designated 1P, 2P, 3P, and 4P topologies.

Have you seen this little Beauty, it is new: http://forums.amd.com/forum/me...d=y&STARTPAGE=2#bottom (Hit Page-Up).


I have seen it, it is a beauty. It will also fit in my case without any modifications My guess it sells for $1000-1100 based on the $900-1000 price of the H8QG6. I really do want a 4P build someday, but it won't be for several years, probably using a motherboard that's two socket generations after G34.

-------------------------
 05/06/2011 10:48 AM
User is offline View Users Profile Print this message

Author Icon
4P_Bulldozer
Fanboi

Posts: 52
Joined: 03/24/2011

Originally posted by: MU_Engineer Nope, Jaguar's website says it uses Budapest 1354s in the UP nodes and Istanbul 2435s in the 2P nodes.


Originally posted by: 4P_Bulldozer That is why I said:

In early years it was 1P (somewhat understandable?) but more recently is is still only 2P (less understandable?)


Originally posted by: MU_Engineer The 2P nodes use Istanbuls, which were the very last of the pre-price-dropped Opterons. Today you would expect Opteron 6100s to be used, but they were not around when Jaguar was made.


Originally posted by: 4P_Bulldozer Since they make their own Boards and Interconnects then why not 16P (or 8P if that is all that will fit on one Board) ?


Originally posted by: MU_Engineer It would physically/electrically be possible to chain together far more than four Opteron 6100s on one motherboard without using any interconnect "glue." Each Opteron 6100 has three off-die coherent HyperTransport links capable of CPU-to-CPU communication. You could chain a bunch of them together in a ring or other multi-hop topology and get a high socket count on one board. Such a setup would have awful scaling because of high interconnect latency and a ton of snoop traffic, but it could be done. However, I don' t know if the CPUs' firmware/HT controllers supports that kind of operation, perhaps they are hard-wired to operate only in pre-designated 1P, 2P, 3P, and 4P topologies.


After much study I have come up with two other factors that should influence the Consumer's decision on Motherboard purchases that may or may not have been a factor in the decisions made with respect to the Jaguar.


1. Memory cost: The less space that is available on a Motherboard (or Jaguar Circuit Card) the larger the size of Memory Chips you would need in order to have either:
    * a particular desired amount of Memory (an amount you believe (right or wrong) you would want)
or
    * a particular needed amount of Memory (an amount you need to provide; so a particular (single) Core can do some useful work).


Years ago Memory size was smaller for a particular price point, today it is larger; but there is always a "line".

At this moment (depending upon Brand) you can buy 8G Chips for a tiny bit cheaper than 2 x 4G Chips and for only 25% the cost of Chips twice the size.

Thus, if you desire to maximize the available Memory (which you likely would with so many Cores and the desire to hold values in Memory instead of on Disk) then you would want to buy 8G Chips; 16G Chips are over $1000 so you probably do not desire to purchase those. A choice of 32G Chips is not yet available.

In this situation it is ONLY the price of the Memory Chips that is considered in the purchase of a System (since the number of Slots is limited and the price skyrockets above a particular size) because with large Memory Chips the cost of Memory is greater than the cost of anything else (even the Processor, more so with AMD Opterons at 1k each, less so with Intel Xeons at 4K each - leave Processor speed (work per Core) out of the equation since we are trading Cores for Speed in this Configuration).


2. A 4P System interconnects all it's Processor Sockets using all it's HT Links, but (depending upon how the Mother board is designed) a 2P System has spare Links and they can also be used to interconnect the 2 Processors for an even greater interconnect speed versus a 4P System. Since you are using expensive Memory you might as well interconnect it as fast as possible (and the Processors also) to get more usefulness out of them.

Not only can a 2P System provide more interconnection but there is also more room on the Motherboard for more Memory Slots and thus the board can be populated with a greater quantity of affordable Memory.

Important: -> This halves the price of the most expensive part, but it costs you "Node latency". In my (our?) situation we might only be able to get the most use out of a 2P System (our ability to use 4P might be limited) thus we would not loose "Node latency" but actually gain it from the spare HT Links (and halve our cost of Memory).

If you have a 4P System with 16 Slots of 16G Memory for $1000 a Slot that is $16000 of your System's cost. That is $8000 for 2 Processors with 128G of Memory.

If you have a 2P System with 16 Slots of 08G Memory for $0250 a Slot that is $04000 of your System's cost. That is $4000 for 2 Processors with 128G of Memory.

One 2P Motherboard I have seen has 24 Memory Slots (and while it will only support 256G of Memory), the greater number of Memory Slots allows for less expensive Chips to populate the MotherBoard.


In conclusion: With or without the "4P Tax" you still have the "Real Estate (or Property) Tax" - the need to have large Memory Chips versus the cost of larger Memory Chips and the need to have as many Memory Slots as possible versus the physical space on the Motherboard to house them (along with the additional interconnect speed derived from the extra HT Links available on a 2P System).

Edited: 05/06/2011 at 10:59 AM by 4P_Bulldozer
 05/07/2011 10:24 AM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

Originally posted by: 4P_Bulldozer

After much study I have come up with two other factors that should influence the Consumer's decision on Motherboard purchases that may or may not have been a factor in the decisions made with respect to the Jaguar.

1. Memory cost: The less space that is available on a Motherboard (or Jaguar Circuit Card) the larger the size of Memory Chips you would need in order to have either:

    * a particular desired amount of Memory (an amount you believe (right or wrong) you would want)

or
    * a particular needed amount of Memory (an amount you need to provide; so a particular (single) Core can do some useful work).



Years ago Memory size was smaller for a particular price point, today it is larger; but there is always a "line".



At this moment (depending upon Brand) you can buy 8G Chips for a tiny bit cheaper than 2 x 4G Chips and for only 25% the cost of Chips twice the size.



Thus, if you desire to maximize the available Memory (which you likely would with so many Cores and the desire to hold values in Memory instead of on Disk) then you would want to buy 8G Chips; 16G Chips are over $1000 so you probably do not desire to purchase those. A choice of 32G Chips is not yet available.


In this situation it is ONLY the price of the Memory Chips that is considered in the purchase of a System (since the number of Slots is limited and the price skyrockets above a particular size) because with large Memory Chips the cost of Memory is greater than the cost of anything else (even the Processor, more so with AMD Opterons at 1k each, less so with Intel Xeons at 4K each - leave Processor speed (work per Core) out of the equation since we are trading Cores for Speed in this Configuration).


Also not Opteron 6100s start at $266, Xeon 7500s start at about $1100. The cheapest Opteron 6100 (6128)has an absolutely identical memory subsytem (speed, capacity) as the most-expensive one (61280 SE.) The same is not true for the Xeons. The cheapest ones have an identical memory capacity as the most-expensive ones, but the memory speed is reduced quite a bit.

2. A 4P System interconnects all it's Processor Sockets using all it's HT Links, but (depending upon how the Mother board is designed) a 2P System has spare Links and they can also be used to interconnect the 2 Processors for an even greater interconnect speed versus a 4P System. Since you are using expensive Memory you might as well interconnect it as fast as possible (and the Processors also) to get more usefulness out of them.


Most if not all 2P G34 Opteron setups do connect all of the coherent HT links between the two CPUs. There is a higher interconnect bandwidth and lower latency in aggregate in a 2P vs. 4P G34 system due to the 2P setup having every die with a direct HT connection to every other die in the system. 4P has any given die in the system being directly connected to only half of the other dies; the I/O makes a hop through the neighboring die (over a fat 24-bit-wide HT3 link) to reach the non-directly-connected dies. It's a well-thought-out system that works well, and it's also why AMD limited G34 CPUs to four-socket operation. The old Opteron 800/8000 8-way setups had some three-hop accesses and 4P->8P scaling was much worse compared to 1P->2P and 2P->4P scaling as a result.

Not only can a 2P System provide more interconnection but there is also more room on the Motherboard for more Memory Slots and thus the board can be populated with a greater quantity of affordable Memory.


Nearly all G34 motherboards have eight DIMM slots per board regardless of the number of sockets. The board size simply gets larger as you add more sockets. The 1P G34 boards are ATX, the 2P ones are EATX/SSI EEB, and the 4P ones are SSI MEB/SWTX. TYAN's latest dual G34 board is the exception, it's a 2P board with 12 slots per socket (the highest number allowed with Opteron 6100s) but is the same size as a 4P board. I'd personally go for the 4P board since the TYAN unit has 24 slots in total while a 4P board has 32.

Important: -> This halves the price of the most expensive part, but it costs you "Node latency". In my (our?) situation we might only be able to get the most use out of a 2P System (our ability to use 4P might be limited) thus we would not loose "Node latency" but actually gain it from the spare HT Links (and halve our cost of Memory).


The single most expensive part would be the motherboard in most cases. A decent 2P G34 motherboard costs $400-600 and a 4P board is $800-1000. Your 8 GB RAM modules are less than $200 each and Opteron 6100s start at $266 each. You do have increased interconnect latency with a 4P system, but you have a big increase in memory bandwidth and at least a 33% increase in total RAM capacity due to having 32 DIMM slots. The higher interconnect latency can be ameliorated by having a program/OS that is NUMA-aware and will thus keep a thread on a specific die so that its working set stays in the same RAM modules, so there isn't as much HT traffic.

If you have a 4P System with 16 Slots of 16G Memory for $1000 a Slot that is $16000 of your System's cost. That is $8000 for 2 Processors with 128G of Memory.

If you have a 2P System with 16 Slots of 08G Memory for $0250 a Slot that is $04000 of your System's cost. That is $4000 for 2 Processors with 128G of Memory.

One 2P Motherboard I have seen has 24 Memory Slots (and while it will only support 256G of Memory), the greater number of Memory Slots allows for less expensive Chips to populate the MotherBoard.


Current G34 motherboards support registered DIMM sizes up to 16 GB, so that 24-slot TYAN board supports 384 GB, not 256 GB. A 4P board supports 512 GB of RDIMMs in its 32 slots. Also, your example is wrong.

- 16x 16 GB modules yields 256 GB of RAM, not 128 GB.
- 4P boards nearly always have 32 DIMM slots, not 16.
- That 24-DIMM TYAN board is as large as a 4P board, so you will still need to get an especially large case to house it.

-------------------------
 05/10/2011 12:54 AM
User is offline View Users Profile Print this message

Author Icon
4P_Bulldozer
Fanboi

Posts: 52
Joined: 03/24/2011

Originally posted by: 4P_Bulldozer Have you seen this little Beauty, it is new: http://forums.amd.com/forum/me...d=y&STARTPAGE=2#bottom (Hit Page-Up).
Originally posted by: MU_Engineer I have seen it, it is a beauty. It will also fit in my case without any modifications My guess it sells for $1000-1100 based on the $900-1000 price of the H8QG6.

This is a completely different Computer but it is competitive with the Motherboard I mentioned. This example is only meant to show the (wonderful) direction of Computer pricing; though you might have estimated a touch high.

Supermicro A+ Server 4022G-6F Barebone System - $1,236.33 (Bareboard + Chassis with extra "Supermicro Chassis Goodies"):
http://www.ctistore.com/produc...ebone,AS-4022G-6F.html

A+ Server 4022G-6F
http://www.supermicro.com/Aplu...r/4022/AS-4022G-6F.cfm

For an extra $136.33 more than your top guess you get a loaded ("Mobile Rack", 2 PS Option, etc), "TQ" (maybe < 42dB) Chassis with an "80+ Platinum Level (94%+) High-efficiency Power Supply" (that saves energy and that pays the difference in cost over the life of the Product - so a "FREE" Chassis, based on the high end of your 'quote').



Originally posted by: MU_Engineer I really do want a 4P build someday, but it won't be for several years, probably using a motherboard that's two socket generations after G34.

The "G2012" Socket might be worth waiting for, I intend to wait and see what it offers; I might as well go for it to obtain "Socket longevity" (since G34 ?WAS/MIGHT/IS? retiring in 2012).

What will two generations get you, "Fabric" (you hope) ?
What will you do with all that Fabric (that you would not do with MOE, or IB, or something really expensive for the HTX Socket).

AMD ought to give us "GPU Fabric" ("GPU Virtualization", split or add GPU Card's "Cores" amongst Processes) but after "G2012" it is "CPU Fabric", this is my understanding.





Originally posted by: MU_Engineer

Originally posted by: 4P_Bulldozer

After much study I have come up with two other factors that should influence the Consumer's decision on Motherboard purchases that may or may not have been a factor in the decisions made with respect to the Jaguar.

1. Memory cost: The less space that is available on a Motherboard (or Jaguar Circuit Card) the larger the size of Memory Chips you would need ...
...
At this moment (depending upon Brand) you can buy 8G Chips for a tiny bit cheaper than 2 x 4G Chips and for only 25% the cost of Chips twice the size.

Thus, if you desire to maximize the available Memory (which you likely would with so many Cores and the desire to hold values in Memory instead of on Disk) then you would want to buy 8G Chips; 16G Chips are over $1000 so you probably do not desire to purchase those. A choice of 32G Chips is not yet available.

In this situation it is ONLY the price of the Memory Chips that is considered in the purchase of a System (since the number of Slots is limited and the price skyrockets above a particular size) because with large Memory Chips the cost of Memory is greater than the cost of anything else (even the Processor, more so with AMD Opterons at 1k each, less so with Intel Xeons at 4K each - leave Processor speed (work per Core) out of the equation since we are trading Cores for Speed in this Configuration).


Also not Opteron 6100s start at $266, Xeon 7500s start at about $1100. The cheapest Opteron 6100 (6128)has an absolutely identical memory subsytem (speed, capacity) as the most-expensive one (61280 SE.) The same is not true for the Xeons. The cheapest ones have an identical memory capacity as the most-expensive ones, but the memory speed is reduced quite a bit.


True but along with higher clocks you would want a bigger Cache, we need to pay extra to get "everything", except "lowest possible power consumption" (since we only have 2-4 Processors) unless we have 4P and 4(-8) GPU (and SAS + RAM) with a 900W Power Supply, then we will be buying "something".

If we can buy more efficient Chips for the cost of two (or three redundant) Power Supplies (and we "throw those pefectly good PS's out, unless we can use them) then we will be getting the better Chips. Probably best to plan the Power Supply precisely (just get 1400W (drool) but lower efficiency), or 1000W (not "Platinum" Rated).

We (those who want "Cores" otherwise we would buy an i7 and OC it to 4GHz (on air)) would buy AMD and not Intel due to the higher Coreage (courage) and lower pricing (gouging).



Originally posted by: MU_Engineer

Originally posted by: 4P_Bulldozer
2. A 4P System interconnects all it's Processor Sockets using all it's HT Links, but (depending upon how the Mother board is designed) a 2P System has spare Links and they can also be used to interconnect the 2 Processors for an even greater interconnect speed versus a 4P System. Since you are using expensive Memory you might as well interconnect it as fast as possible (and the Processors also) to get more usefulness out of them.


Most if not all 2P G34 Opteron setups do connect all of the coherent HT links between the two CPUs. There is a higher interconnect bandwidth and lower latency in aggregate in a 2P vs. 4P G34 system due to the 2P setup having every die with a direct HT connection to every other die in the system. 4P has any given die in the system being directly connected to only half of the other dies; the I/O makes a hop through the neighboring die (over a fat 24-bit-wide HT3 link) to reach the non-directly-connected dies. It's a well-thought-out system that works well, and it's also why AMD limited G34 CPUs to four-socket operation. The old Opteron 800/8000 8-way setups had some three-hop accesses and 4P->8P scaling was much worse compared to 1P->2P and 2P->4P scaling as a result.


> "Most if not all 2P G34 Opteron setups do connect all..."

The Motherboard Manual (different Link that I don't have handy) for this System does not claim to, though it might offer it:
http://www.supermicro.com/manu...er/4U/MNL-4022G-6F.pdf

On Page 13 Tyan makes this claim specifically - Two x16 and one x8 HT3.0:
http://www.tyan.com/manuals/S8232_UG_v1.0.pdf



Originally posted by: 4P_Bulldozer Not only can a 2P System provide more interconnection but there is also more room on the Motherboard for more Memory Slots and thus the board can be populated with a greater quantity of affordable Memory.
Originally posted by: MU_Engineer Nearly all G34 motherboards have eight DIMM slots per board regardless of the number of sockets. The board size simply gets larger as you add more sockets. The 1P G34 boards are ATX, the 2P ones are EATX/SSI EEB, and the 4P ones are SSI MEB/SWTX. TYAN's latest dual G34 board is the exception, it's a 2P board with 12 slots per socket (the highest number allowed with Opteron 6100s) but is the same size as a 4P board. I'd personally go for the 4P board since the TYAN unit has 24 slots in total while a 4P board has 32.


> The board size simply gets larger as you add more sockets.
Sometimes, the Tyan (2P) 12 Socket (RAM) is SSI MEB; so sometimes that Board gets bigger without extra Sockets (for Processor).

Some Boards have extra PCI-e Slots and thus also jump from "E-ATX" to SWTX.

SM has a few 4P Boards with 4 Slots for each (PCI and 'Memory per CPU') (yuk).


('RAMwise') It is almost 50/50 for 4/8 and one that has 12 (For G34 Socket). BD could do 16 but there is no Board, yet.



Originally posted by: 4P_Bulldozer Important: -> This halves the price of the most expensive part, but it costs you "Node latency". In my (our?) situation we might only be able to get the most use out of a 2P System (our ability to use 4P might be limited) thus we would not loose "Node latency" but actually gain it from the spare HT Links (and halve our cost of Memory).

Originally posted by: MU_Engineer The single most expensive part would be the motherboard in most cases. A decent 2P G34 motherboard costs $400-600 and a 4P board is $800-1000. Your 8 GB RAM modules are less than $200 each and Opteron 6100s start at $266 each. You do have increased interconnect latency with a 4P system, but you have a big increase in memory bandwidth and at least a 33% increase in total RAM capacity due to having 32 DIMM slots. The higher interconnect latency can be ameliorated by having a program/OS that is NUMA-aware and will thus keep a thread on a specific die so that its working set stays in the same RAM modules, so there isn't as much HT traffic.


> The single most expensive part would be the motherboard in most cases.

The way I am doing my Math is:
- the Chassis could be $400-1000 (esp. with shipping), so that is the same price as the Motherboard.
- 16Gb RAM is a thou a Stick, if you are doing VM Serving you might want that (we probably do not).
- 8Gb RAM at 1866MHz with ECC (and temp Sensors) won't be less than $200 a Stick, if it is $250 and you get 16 then that equals $4k, otherwise 'cheapout' with only $2k worth.
- SAS Drives have come down, unless your stuffing > 8 1TB Drives in your Chassis they won't be the most expensive part.
- A RAID Card could be expensive but let's just use the one on the Motherboard.
- If we RAIDed eight 512k SSDs that would be the most expensive but we won't be doing that either.

SO, RAM is the most expensive part. You can't just buy ONE Stick for $200 and say that RAM only costs $200 (maybe with 1P Athlon / Phenom / Bobcat, but not with > 2P of Opterons) .

IF the RAM were $200 a Stick (I sure hope it is in 6 months) and you have "2P" then that is 32 Cores, surely you would want 32Gb of RAM.


Do you (many people) now have 4 Cores and 8Gb of RAM ?

IF they do (and we plan our G34 Motherboard to be 'Bulldozer proof', and maybe we are sensible to run "Linux" (etc.) on a high-core-count System instead of Win7/8) then we want 64Gb of RAM to be equal -- BUT we could do well with 48Gb.



SO, just 32Gb @ $200/8Gb Stick is still $800. We will ADD more RAM (someday) if we are only buying so little (and it does make sense to buy more RAM) so the $800 is a minimum price for RAM.

Hard Drives (for most people (not editing Movies)) increase in size and decease in price quicker than we can fill them, so we are unlikely to spend more for Hard Drives (as opposed to the need for more RAM).


This is why I say that RAM is the MOST expensive part (that is how I am 'doing the math').



Originally posted by: 4P_Bulldozer If you have a 4P System with 16 Slots of 16G Memory for $1000 a Slot that is $16000 of your System's cost. That is $8000 for 2 Processors with 128G of Memory.

If you have a 2P System with 16 Slots of 08G Memory for $0250 a Slot that is $04000 of your System's cost. That is $4000 for 2 Processors with 128G of Memory.

One 2P Motherboard I have seen has 24 Memory Slots (and while it will only support 256G of Memory), the greater number of Memory Slots allows for less expensive Chips to populate the MotherBoard.
Originally posted by: MU_Engineer Current G34 motherboards support registered DIMM sizes up to 16 GB, so that 24-slot TYAN board supports 384 GB, not 256 GB. A 4P board supports 512 GB of RDIMMs in its 32 slots. Also, your example is wrong.

- 16x 16 GB modules yields 256 GB of RAM, not 128 GB.

- 4P boards nearly always have 32 DIMM slots, not 16.

- That 24-DIMM TYAN board is as large as a 4P board, so you will still need to get an especially large case to house it.


Tyan S8232 - http://www.tyan.com/manuals/S8232_UG_v1.0.pdf
Page:5 "Capacity Up to 256GB RDIMM/ 64GB UDIMM"

> Also, your example is wrong.
I'm Apples <-> Apples, you are Apples <-> Oranges; read that Post again.


> "A 4P board supports 512 GB of RDIMMs in its 32 slots."
Yeah SOME do, with 16Gb Chips that we can not afford (so not considered).




> "4P boards nearly always have 32 DIMM slots, not 16."

The HIGHEST amount of RAM on this Page (8000 Series, 4P) is 128Gb (and many do not nave that much) and the RAM Slot count is low - they are "4P":
http://www.supermicro.com/Aplu...puclass=all&sorton=cpu

Looking at the Pictures on THIS "4P" Page it is roughly 50/50 for 4/8:
http://www.supermicro.com/Aplu...therboard/Opteron8000/


> That 24-DIMM TYAN board is as large as a 4P board, so you will still need to get an especially large case to house it.
Yes, due to the high Slot count a 2P Motherboard can be large.

Check out the Link I gave you for Chassis (above), some Chassis are enormous (double Radiator PLUS hold an SWTX) and some are relatively small (just barely squeeze the board in).




The Penultimate:

At the START of this Discussion I gave this URL: http://forums.amd.com/forum/me...d=y&STARTPAGE=2#bottom , that is a 4P G34 Motherboard with FOUR Sockets per CPU.

If I line up the 'Standoff Holes' of the "Supermicro H8QGL-6F+" with a SSI MEB Board it leads me to believe it is MEB size (it is so new that I have no info on it) but it may be "SWTX" (similar) size instead.

Due to the large number of Processor Sockets (4P) and the large number of PCI-e Slots it ONLY has 4 Memory Slots per CPU (for a total of 16). The Tyan Board shown directly below it is the same size (in dimension) and is (technically) an "8 PCI Slot" Board (look at the spacing, do not count the Slots), with 2P, and 24 Memory Slots (that hold 256Gb of RAM - according to the Manual, which I read).

If you fill all the Slots you need to use different sizes of RAM (in some cases, depending on the "total capacity" desired). It is a "funky" system for populating the RAM in some configurations (due to the 3x4 layout).



Thus, I mostly disagree with what you said towards the end of your Post (and the opposite with the beginning of your reply).



Note: Cheers and best wishes, (not angry).

PS: I Hope I got the 'quotes' and References correct - I've been a long while (> 1 hour) typing this (and I must attend to something else). I did proofread it.

Edited: 05/10/2011 at 01:57 AM by 4P_Bulldozer
 05/10/2011 08:21 AM
User is offline View Users Profile Print this message

Author Icon
MU_Engineer
Dr. Mu

Posts: 1837
Joined: 08/26/2006

Originally posted by: 4P_Bulldozer

The "G2012" Socket might be worth waiting for, I intend to wait and see what it offers; I might as well go for it to obtain "Socket longevity" (since G34 ?WAS/MIGHT/IS? retiring in 2012).

G34 is going to be retired in 2012 after the "Interlagos" Opteron 6200s based on the first-generation Bulldozer architecture are replaced by the next line of Opteron MPs.

What will two generations get you, "Fabric" (you hope) ?

What will you do with all that Fabric (that you would not do with MOE, or IB, or something really expensive for the HTX Socket).


I generally replace my machines roughly every five years and AMD's current public plans mean roughly a two to three-year lifespan for new sockets. That would mean I'd be looking to replace with parts that use the next socket after G2012/C2012.


Originally posted by: 4P_Bulldozer
> "Most if not all 2P G34 Opteron setups do connect all..."

The Motherboard Manual (different Link that I don't have handy) for this System does not claim to, though it might offer it:

http://www.supermicro.com/manu...er/4U/MNL-4022G-6F.pdf



On Page 13 Tyan makes this claim specifically - Two x16 and one x8 HT3.0:

http://www.tyan.com/manuals/S8232_UG_v1.0.pdf


They might only connect one of the HT links between the CPUs, but then they'd be putting themselves at a competitive disadvantage to the companies that connect all of the links since inter-CPU latency would be increased and bandwidth decreased. We could pretty easily find out how Supermicro connected the CPUs by looking at inter-CPU HT latency with numactl or a similar program. You could also check raw memory bandwidth with a tool like Stream or Ramspeed to see if your numbers are way off what would be expected for a fully-connected 2P setup.



> The single most expensive part would be the motherboard in most cases.

The way I am doing my Math is:

- the Chassis could be $400-1000 (esp. with shipping), so that is the same price as the Motherboard.

- 16Gb RAM is a thou a Stick, if you are doing VM Serving you might want that (we probably do not).

- 8Gb RAM at 1866MHz with ECC (and temp Sensors) won't be less than $200 a Stick, if it is $250 and you get 16 then that equals $4k, otherwise 'cheapout' with only $2k worth.

- SAS Drives have come down, unless your stuffing > 8 1TB Drives in your Chassis they won't be the most expensive part.

- A RAID Card could be expensive but let's just use the one on the Motherboard.

- If we RAIDed eight 512k SSDs that would be the most expensive but we won't be doing that either.

SO, RAM is the most expensive part. You can't just buy ONE Stick for $200 and say that RAM only costs $200 (maybe with 1P Athlon / Phenom / Bobcat, but not with > 2P of Opterons) .

IF the RAM were $200 a Stick (I sure hope it is in 6 months) and you have "2P" then that is 32 Cores, surely you would want 32Gb of RAM.


RAM in aggregate is a large cost, but the rest of the system is certainly a non-negligible cost, especially if the CPUs cost as much as Xeon 7500s or even faster Xeon 5600s do.

Do you (many people) now have 4 Cores and 8Gb of RAM ?


Some people do. The amount of RAM you need is determined by core-independent and core-dependent factors. Core-independent factors are the RAM demands of the programs that you run, regardless of the number of cores in the system. Core-dependent factors are increased RAM demands as you either run more programs concurrently or have programs that use more RAM as the number of threads they use increases. Desktops and workstations typically don't see their RAM demands increase a whole lot when the number of cores/CPUs increases. Some servers do tend to see RAM demands increase with increasing core/CPU count, since they tend to have more running on them if they are more powerful. I spec the RAM for my systems based on the core-independent RAM use since the heavily-multithreaded tasks I run don't have much for increased RAM demands as core/thread count goes up. Video encoding for example doesn't really use any more RAM as the thread count goes up. Neither does image compression. Code compilation does use more RAM as the core count goes up but the total RAM usage is generally pretty small so I don't worry about it too much. In practice, I just make sure I have enough RAM for the core-independent usage and then buy the RAM module size that is the best value for the money and buy enough of it to populate all of the memory channels in the system.

My systems have the following core/RAM count:

-Workstation: 16 cores/16 GB. I really only needed 6-8 GB of RAM here, but I needed 8 RAM modules to fully populate all of the memory channels and 2 GB DIMMs were about the same price as 1 GB units, so I got the larger ones.

- File/print/miscellaneous server: 4 cores/4 GB. The price of DIMMs again determined what RAM modules I needed. Something around 1.5-2 GB would have been plenty.

- Laptop: 2 cores/3 GB. This is an appropriate amount of RAM for that machine.

- HTPC: 4 cores/1 GB. This machine only needed about 512 MB of RAM. It uses dual-channel DDR2 RAM and the smallest commonly-available sticks are 512 MB in size, so I got two of them.

Also, your discussion about RAM capacity that follows mixes last-generation Opteron 8000 and the current-generation Opteron 6100 platforms. Opteron 6100 units have four memory channels per CPU versus Opteron 8200/8300 units' two channels, and the general trend is to have at least two DIMM slots per channel. Lots of 4P Opteron 8000 boards had 16 DIMM slots in total, but most G34 4P boards have 32 DIMM slots.

-------------------------
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information