AMD Processors
Decrease font size
Increase font size
Topic Title: Optimization of AMD opteron
Topic Summary:
Created On: 10/13/2003 11:00 PM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 10/13/2003 11:00 PM
User is offline View Users Profile Print this message

Author Icon
alan9979
Junior Member

Posts: 3
Joined: 10/13/2003

Dear Technical Support Team,

We are software house located in US and Singapore. We are specialized in Compression algorithms and using Microsoft VisualC++ 6.0 for coding.

We just bought AMD Opteron 240 at the local distributor and found out that it is slower than Intel Pentium 2.4 Ghz by 10%. WHY???? It should be faster and better!! Do you have compiler for your Opteron processor??

Please let us know the reasons WHY as we are embarking on a big project to have few thousand processors and doing the selection of processor, INTEL OR AMD??

My system configurations is as follows:

Ordering Part Number (OPN) of Processor: OSA240CC05AH
Brand of motherboard: Rioworks HDAMB motherboard
Motherboard BIOS revision: 0.90
Brand and model of the heat sink: AMD Opteron processor heatsink
Brand and model of the power supply: Antec True550PEPS12V

Kindly provide me with the compiler if you have any so as to optimize our own compression software to our newly acquired AMD system.

Your immediate action will be much appreciated.

Thank you.

Regards,
Alan Lam
(alan@i-nets.com)
 10/14/2003 01:33 AM
User is offline View Users Profile Print this message

Author Icon
Bitey
Elite

Posts: 1492
Joined: 10/07/2003

eh, I am not amd TS but...

240 ist the model number not the speed.

eg.. 240 is the slowest dual operon available and runs at 1.4GHz
as a comparison I think the amd 64 3200+ runs at 2.0 GHz

I think you should have done your homework, if you want a fast single opteron you should have gone for the 146



 10/22/2003 12:55 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

Alan,

As Bitey already pointed out the Pentium 2.4Ghz is a clear 1Ghz ahead of the Opteron 240 in terms of raw clock speed. You shouldn't expect the Opteron to "automagically" beat that sort of head-start.

You should look at optimizing your code to take advantage of the features that the Opteron gives you (like 64bit, extra registers, 1Mb cache etc).

Remember that recompiling for 64-bit is NOT the same as optimizing for 64-bit. Also, if you're working with 32bit numbers, then moving to 64bit is not going to gain you a *huge* amount....however if your dealing with 64bit numbers etc then the move to 64-bit can bring *huge* benefits (as an example my dual 244's (1.8Ghz) recently spanked a dual 2.8Ghz Xeon whilst running a number crunching program.

Also I notice that from your post you mention you're developing with VC++, which *implies* you're developing for Windows...since 64bit Windows is not (as I type) currently available then I assume you're running 32Bit windows on the Opteron? In which case....how do you expect to take advantage of the advantages that 64bit brings?

John.

-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 10/24/2003 03:15 AM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Alan

I am not AMD either. I have no affiliation with AMD, I don't sell AMD, or make any money from them.

I read that you "are embarking on a big project to have few thousand processors and doing the selection of processor, INTEL OR AMD"

You would have to specify the problem a little more for me to help fully, but is this a multi-processor, or cluster, or just a lot of independant (possibly desk-top) machines?

First, you need to be clear: are you running 32-bit or 64-bit?

There are lots of good reasons to go for AMD Opteron.
But, if the project is cost sensitive, you may want to consider the AMD Athlon XP or MP, their price/performance is very good.
If you are simply building software for independant machines, you may want to consider the AMD 64, which is cheaper than an Opteron of a similar clock-speed.

As far as I am aware, AMD do not provide an Opteron/AMD64 compiler.

If you are not tied to a Windows environment, for example you are building embedded systems, it is worth noting that there is a 64 bit compiler for Linux, and it can be obtained from SuSE with a copy of SuSE Linux 9.0 Professional AMD64 for $119.95 SuSE Linux 9.0 Professional AMD64' ">http://www.suse.com/us/private..._linux/i386/index.html
I believe Red Hat will be offering a similar product soon. I must point out that the gcc compiler, is not noted for speed. It's strength is the range of platforms it supports as well as being open and freely available.

Anyway, assuming you do want the 'fastest code' compiler, let us continue.

Like jes, I assume you are probably developing for windows when you are using VC++ 6.0.

If this is true, you are very likely to be running in "32-bit mode" as M$'s Windows operating systems (not DOS) use the CPU's hardware protection to prevent you from flipping the hardware into, for example, 64-bit mode.

So, you if you are going for 32-bit, you are going to have to select the compiler that generates the best 32-bit code for an Opteron for *YOUR* application. This is where you have to do the work.

[You may want to stop and consider 32 bit vs 64 bit. If you pay for appropriate MSDN support you can probably get a beta copy of 64-bit Windows. I assume you aren't on a high-level MSDN right now as I would not expect you to be using VC++ 6.0. As far as I know, there was no AMD64 support on that compiler.]

I recommend you design some benchmarks to represent your problem; their is no simple answer to the "best compiler" question.

I assume you already have a good knowledge of the problem and solution, so building benchmarks won't be too difficult.

You may be able to save your self some time and effort by getting Dr Dobb's Journal, #353 October 2003.

It contains an article "Comparing C/C++ compilers", compares 9 Windows C/C++ compilers, and also covers how to compare compilers, which may give you some useful framework for your comparisons.

It may even be that the authors comparisons and benchmarks are similar enough to your problems that you can make reasonable judgements, though I would be surprised.

The high-points of the article are:
1. Intel overall generate fast code for P4 - this may not be true for Opteron
2. Execution-time can vary by 500% between compilers for the same benchmark
3. VC++7 is mostly faster than VC++ 6.0

If the 1000's of CPU's are for development machines, and you are cost sensitive, you may want to consider that many compilers are available for free on Linux, including Intels compiler.

I hope this helps. If you want more help, e-mail me.
 10/27/2003 09:49 PM
User is offline View Users Profile Print this message

Author Icon
alan9979
Junior Member

Posts: 3
Joined: 10/13/2003

Dear Dexway,

What you are saying is that if I debug and code my compression application in 64-bit Windows, my code is basically optimized to 64-bit environment mode. Do you mean like that ?

If that's the case, you still using 32-bit Visual C++ compiler when installing Visual C++ 6.0 instead of 64-bit compiler.

Please advise me on this mater again.

Regards,
Alan Lam
 10/28/2003 12:43 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

Alan,

In order to benefit from 64-bit code, you need to be running a 64-bit OS. If you are just running 32-Bit windows (i.e. 95/98/2000/ME/NT) then the Opterons will be running in 32-bit mode, any programs you run (or indeed compile) will be working as 32-bit programs. This will happen natively, i.e. there is no penalty for running in 32-bit mode since it's supported by the chip.

Once you have a 64-bit Operating system (i.e. 64-bit windows when it's eventually released, or Linux etc), then you can "choose" whether to compile your programs as 32-bit or 64-bit. For example using gcc on Linux it's as *simple* as -

gcc -m32 -o hello hello.c (to compile 32-bit)

or

gcc -m64 -o hello hello.c (to compile 64-bit).

However, as I stated in my earlier post, optimising for 64-bit is quite different to just recompiling for 64-bit. Don't just recompile your existing code 64-bit and expect huge increases in performance, because you'll be disappointed. Instead you should hold a code review and try to isolate the places where your code would benefit from 64-bit optimisations. There's *no* magic way to do this, since it entirely depends on your program itself.

However generally, look for places where your data structures can be optimised for 64-bit logic, if you're doing any large scale math, either floating point or integer then these are going to be great areas for optimising. Also, try and take advantage of that 1Mb L2 cache on the Opterons, optimising your routines to maintain data incache could produce *huge* gains over running on chips with a far smaller cache.

Also as deuxway's (very helpful) post pointed out, you should look at which compiler *suits you*. Since not all compilers are created equal! You need to ensure that the compiler you choose supports the new instructions provided on the AMD64 chips, otherwise you'll find that things aren't optimised as well as they should be, if performance is *critical* for you, then check the assembler produced after compiling (e.g. gcc -S), to check the instructions that have been used, use inline assembler if your compiler doesn't support the instructions you want or you think you can do a better job....it all comes down to how much control you want.

John.

-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
 10/29/2003 05:03 AM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Alan

Normally when I say performance, I usually mean any of several dimensions such as speed, size, availability. I think your questions are mainly focused on speed.

I'm still a little unclear what you are trying to achieve.

If you don't have working code, and you don't know which parts of your system are likely to have significant speed requirements, then I would suggest you take a step back, and get critical requirements and a candidate architecture defined before worrying about compilers and CPU's. Combining those requirements with the experience you've already gained will let you identify how far away from the target you are.

I think we've covered some of the situation where you have correctly working code, and just need it to go faster on AMD64/Opteron.

I also think we have covered some of the case where you are about to build a system to solve a compression problem that you thoroughly understand.

What I don't understand is how fast do you need the application to go? How far short of your goal was your initial experience with VC++ 6.0 on the Opteron 240?

I can give you general advice, but I can be much more concrete if I can understand approximately what your target is. For example was it x2 too slow, x10, x100, more?


I think Jes has covered the points in your e-mail. But, just to emphasise:

1. You can run standard 32bit Operating Systems like Windows and 32bit applications on AMD64/Opteron processors. The AMD64/Opteron runs 32bit code very quickly.

This is transparent to the OS and application. So the code that you compile with VC++ 6.0 will work.

If this is all you want. You can stop here.

If you want the application to go faster, but don't want to modify source code, or generate a unique binary for the AMD64/Opteron, get a copy of the Dr Dobbs article, and try to select the best compiler and libraries for your application. The Microsoft VC++ 6.0 compiler is probably not be the correct choice.


If you want to take full advantage of the extra capabilities that AMD64/Opteron provide, such as extra, wider, registers, and extra instructions, you need two extra pieces to get AMD64/Opteron 64bit mode fully working:

1. An Operating system that will run applications in AMD64/Opteron 64bit mode. Currently SuSE 9.0 Professional for AMD64 or a version of SuSE 8.x Enterprise Server does this. A version of Windows which supports AMD64 bit applications is available in Beta now, and will be released next year .

Be careful. There is a 64bit Windows released *NOW*, but this only supports Intel Itanium, and *not* AMD64/Opteron. They are different.

2. A compiler that will generate application executables (or libraries) that use the AMD64 instructions. SuSE provide the GNU C/C++ compiler (as Jes describes) with their Linux. Microsoft will supply a version of Visual Studio which supports 64bit AMD64/Opteron. I believe this is also available in Beta too.

There is a similar warning to "64 bit Windows"; there is a 64bit Microsoft C/C++ Compiler released that generates code for Intel Itanium, and not AMD64/Opteron. Using that compiler will *NOT* give you the results you need. You will need the new MS compiler with AMD64/Opteron support. I looked at the Microsoft MSDN web site, but it isn't clear what the status of that compiler is, though it must be available for developers to port applications to the Beta Windows on AMD64.

As Jes explained, if you compile with an AMD64/Opteron compiler and run on an AMD64/Opteron version of an OS, your 64bit mode application is likely to perform even better on AMD64/Opteron than it did as a 32bit mode application.

*AND* as Jes points out, there is even more room to improve performance, through a range of optimisation techniques.

There is a big difference in speed between code that runs, and optimal code, typically 10,000%-100,000%, or 3-4 orders of magnitude.

I got the impression from your original post that you wanted your software to run very fast.

At this point, you should check - is the time and effort you spend on making your software fast going to be generate a tangible benefit?

I'm a little unclear from your e-mail, but if you already have a working system, a good path to investigate is to avoid changing your source code. Then, your existing documentation, testing etc are still relevant too.

I can't tell how fast you need to go, but, for example it is relatively common to improve speed by factors of 2-10 by careful choice of compiler, libraries and application layout (in memory), with no change to the code whatever. As another reference point, it is *much* harder to reduce in-memory size by such a significant amount where the data size is the dominant factor. If you want improvements greater than x10, on a specific hardware and OS platform, you are into modifying source code.

As Jes explains, there are several hardware features you could exploit when running in 64bit mode that are not available in 32bit mode. The general purpose registers in an AMD64/Opteron are 64 bits long rather than the 32 bits, and there more registers in AMD64/Opteron than in a Pentium 4. These register capabilities are restricted to 64bit mode.

While it very much depends on your problem, I would expect to be able to exploit those hardware resources to make significant improvements to computationally intensive application code.

I would expect a very smart compiler which understood the AMD64/Opteron to take advantage of the extra registers and some AMD64 instructions without changing your source code, though I would not expect the Intel compiler to do that :-). The libraries which accompany those compilers may also have a significant effect on performance, especially if they have been tuned for the AMD64/Opteron.

Right now, I would only be able to choose between compilers (and libraries) on the basis of real applications and artificial benchmarks. 64bit compiler candidates are, at least, GNU's gcc, Microsoft VC++, and Portland group (http://www.pgroup.com/).

As jes illustrates, there are many more optimisations which need source code changes before the compiler would generate faster code.

It's unclear from your e-mail, Alan, whether you expect to have multiple versions of your source code, one for 32bit, one for 64bit, and even more optimised for different flavors of processor., or just one version.

I disagree with Jes on the next point, though I'm sure he'll forgive me. IMHO, it is not easy to identify performance bottlenecks from a static analysis of code.

You may find relatively simple source code transformations like reordering the declaration of variables or functions has a measurable effect, which is rarely obvious to most developers (but here, Jes may be rather more savvy than most).

If you already have your application built and working correctly, your path is straightforward.

If you have no code, but you are confident that you know the important functionality and the critical sections of your system, I would suggest you develop a clean, straightforward implementation of those core pieces with a simple supporting infrastructure first [in modern software engineering terminology, these are your architecturally significant Use Cases]. Avoid "premature optimisation" in this implementation; as this usually generates complexity, which is hard to improve on because it became too complicated to modify with confidence.

In both cases you need a test framework and good test data.

Then use profiling tools to automate performance evaluation of your components. Profiling tools will help you identify the (usually) small areas which would provide the greatest opportunities for improvement. Applications really do follow an 80:20 rule; more than 80% of an applications performance is determined by less than 20% of the code. For compression, I would expect it to be an even smaller amount of code.

I like Quantify (http://www-3.ibm.com/software/awdtools/purifyplus/), though there are good tools from other vendors, and Quantify may not tell you everything you want to know. If speed is your goal, good tools are well worth considering. If you are a corporate MSDN subscriber, you can hire a Microsoft lab, with these kinds of tools for a week at a time. Check that they have AMD64/Opteron of course ;-). I think AMD may offer a similar service, but I'm not sure.

By identifying the specific areas which determine speed, you will also be identifying areas where you may aim for both 32bit optimal code and 64bit optimal code, which may be useful to you.

You'll be able to use these critical components and test framework to establish boundaries around your problem and a baseline for comparison as you tune things. This is important as interactions with the caches could give you odd results.

You will be able to test compilers, compilation options, library alternatives and linkage options easily too. You will likely get a lot of useful information out of this.

You will also have something concrete to review. I don't know if AMD offer a consultancy service, but getting someone who really knows optimisation on AMD64/Opteron is well worth considering once you have concrete information about a realistic problem. With a bit of luck, they will have some good performance tools and expertise in how to use them to maximum effect.

I would still recommend reading the Dr Dobb's article. Another point the author makes is that he usually sets compiler optimisations for smallest code space and not fastest code speed (apparently so do Microsoft)! This is because of subtle interactions of the CPU and cache; by going for the smallest code, he gets his code in the (fastest) cache more often, and that goes much faster than the 'fastest instruction sequence' which may be slightly larger and hence fit in the fastest cache less often.

All of this may sound like a lot of work, but look at it a different way.

You will be investigating the areas of highest project risk, which have the largest impact from failure, and greatest unknowns earliest in the project. This means you will have the most time and budget to fix things (or stop) if things don't work the way you want. You will also be developing skills and knowledge which you will need anyway, and which you never have time to acquire late in a project ;-).

I hope this helps.
Please keep asking if you have more questions, or just e-mail me.
(I apologise for my slow reply, but this isn't my day job :-)
 10/29/2003 12:43 PM
User is offline View Users Profile Print this message

Author Icon
jes
Senior Member

Posts: 1134
Joined: 10/22/2003

An excellent post Deuxway!

Actually I agree with you on the optimizing, some bits of code you can easily see are sub-optimal, whilst others could take you a month of sundays to spot! I've been in the situation before where I've been *convinced* that portion X of the code is the bottleneck, only to find that I was barking completely up the wrong tree.

You're quite right about the problems of analysing bottlenecks from a static code reading, especially given a relatively large system. The key, as you say, is to quite often take a step back and look at the wider picture rather than keep focusing in on a subsection of code.

Alan - if you're *really* serious about performance optimizations then you *must* look into code profiling. This will quickly show you where the spots in your code would benefit from being "tweaked"....Remember...optimization obeys the laws of diminishing returns...i.e, the more you optimize it, then less return you're going to get from furthur optimization.

I've rarely found that the computer isn't fast enough for the task...it's *far* more often the case that the code hasn't been written as well as it could have been (of which I'm more than guilty of! We all have deadlines...we all cut corners...). Sometimes it's better to take a step back and investigate the algorithms you're using rather than trying to tweak individual instructions.

As Deuxway points out, the 80:20 rule is *really* key here....your first step should be to profile and identify your "bottlenecks". Once you have identifed those, then you can make an informed decision as to whether you need to optimize them or consider rewriting them in another way....only investigation of the underlying code will give you the information as to which way is best for you.

Above all else...keep detailed records! Profile often! When I go through this process I find it helpful to keep records in a spreadsheet then I have a historical record of what performance gains/losses I can record from each version of a routine.

Let us know how you get on!

John.



-------------------------
The opinions expressed above do not represent those of Advanced Micro Devices or any of their affiliates.
http://www.shellprompt.net
Unix & Oracle Web Hosting Provider powered by AMD Opterons
Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information