Normally when I say performance, I usually mean any of several dimensions such as speed, size, availability. I think your questions are mainly focused on speed.
I'm still a little unclear what you are trying to achieve.
If you don't have working code, and you don't know which parts of your system are likely to have significant speed requirements, then I would suggest you take a step back, and get critical requirements and a candidate architecture defined before worrying about compilers and CPU's. Combining those requirements with the experience you've already gained will let you identify how far away from the target you are.
I think we've covered some of the situation where you have correctly working code, and just need it to go faster on AMD64/Opteron.
I also think we have covered some of the case where you are about to build a system to solve a compression problem that you thoroughly understand.
What I don't understand is how fast do you need the application to go? How far short of your goal was your initial experience with VC++ 6.0 on the Opteron 240?
I can give you general advice, but I can be much more concrete if I can understand approximately what your target is. For example was it x2 too slow, x10, x100, more?
I think Jes has covered the points in your e-mail. But, just to emphasise:
1. You can run standard 32bit Operating Systems like Windows and 32bit applications on AMD64/Opteron processors. The AMD64/Opteron runs 32bit code very quickly.
This is transparent to the OS and application. So the code that you compile with VC++ 6.0 will work.
If this is all you want. You can stop here.
If you want the application to go faster, but don't want to modify source code, or generate a unique binary for the AMD64/Opteron, get a copy of the Dr Dobbs article, and try to select the best compiler and libraries for your application. The Microsoft VC++ 6.0 compiler is probably not be the correct choice.
If you want to take full advantage of the extra capabilities that AMD64/Opteron provide, such as extra, wider, registers, and extra instructions, you need two extra pieces to get AMD64/Opteron 64bit mode fully working:
1. An Operating system that will run applications in AMD64/Opteron 64bit mode. Currently SuSE 9.0 Professional for AMD64 or a version of SuSE 8.x Enterprise Server does this. A version of Windows which supports AMD64 bit applications is available in Beta now, and will be released next year .
Be careful. There is a 64bit Windows released *NOW*, but this only supports Intel Itanium, and *not* AMD64/Opteron. They are different.
2. A compiler that will generate application executables (or libraries) that use the AMD64 instructions. SuSE provide the GNU C/C++ compiler (as Jes describes) with their Linux. Microsoft will supply a version of Visual Studio which supports 64bit AMD64/Opteron. I believe this is also available in Beta too.
There is a similar warning to "64 bit Windows"; there is a 64bit Microsoft C/C++ Compiler released that generates code for Intel Itanium, and not AMD64/Opteron. Using that compiler will *NOT* give you the results you need. You will need the new MS compiler with AMD64/Opteron support. I looked at the Microsoft MSDN web site, but it isn't clear what the status of that compiler is, though it must be available for developers to port applications to the Beta Windows on AMD64.
As Jes explained, if you compile with an AMD64/Opteron compiler and run on an AMD64/Opteron version of an OS, your 64bit mode application is likely to perform even better on AMD64/Opteron than it did as a 32bit mode application.
*AND* as Jes points out, there is even more room to improve performance, through a range of optimisation techniques.
There is a big difference in speed between code that runs, and optimal code, typically 10,000%-100,000%, or 3-4 orders of magnitude.
I got the impression from your original post that you wanted your software to run very fast.
At this point, you should check - is the time and effort you spend on making your software fast going to be generate a tangible benefit?
I'm a little unclear from your e-mail, but if you already have a working system, a good path to investigate is to avoid changing your source code. Then, your existing documentation, testing etc are still relevant too.
I can't tell how fast you need to go, but, for example it is relatively common to improve speed by factors of 2-10 by careful choice of compiler, libraries and application layout (in memory), with no change to the code whatever. As another reference point, it is *much* harder to reduce in-memory size by such a significant amount where the data size is the dominant factor. If you want improvements greater than x10, on a specific hardware and OS platform, you are into modifying source code.
As Jes explains, there are several hardware features you could exploit when running in 64bit mode that are not available in 32bit mode. The general purpose registers in an AMD64/Opteron are 64 bits long rather than the 32 bits, and there more registers in AMD64/Opteron than in a Pentium 4. These register capabilities are restricted to 64bit mode.
While it very much depends on your problem, I would expect to be able to exploit those hardware resources to make significant improvements to computationally intensive application code.
I would expect a very smart compiler which understood the AMD64/Opteron to take advantage of the extra registers and some AMD64 instructions without changing your source code, though I would not expect the Intel compiler to do that :-). The libraries which accompany those compilers may also have a significant effect on performance, especially if they have been tuned for the AMD64/Opteron.
Right now, I would only be able to choose between compilers (and libraries) on the basis of real applications and artificial benchmarks. 64bit compiler candidates are, at least, GNU's gcc, Microsoft VC++, and Portland group (http://www.pgroup.com/)
As jes illustrates, there are many more optimisations which need source code changes before the compiler would generate faster code.
It's unclear from your e-mail, Alan, whether you expect to have multiple versions of your source code, one for 32bit, one for 64bit, and even more optimised for different flavors of processor., or just one version.
I disagree with Jes on the next point, though I'm sure he'll forgive me. IMHO, it is not easy to identify performance bottlenecks from a static analysis of code.
You may find relatively simple source code transformations like reordering the declaration of variables or functions has a measurable effect, which is rarely obvious to most developers (but here, Jes may be rather more savvy than most).
If you already have your application built and working correctly, your path is straightforward.
If you have no code, but you are confident that you know the important functionality and the critical sections of your system, I would suggest you develop a clean, straightforward implementation of those core pieces with a simple supporting infrastructure first [in modern software engineering terminology, these are your architecturally significant Use Cases]. Avoid "premature optimisation" in this implementation; as this usually generates complexity, which is hard to improve on because it became too complicated to modify with confidence.
In both cases you need a test framework and good test data.
Then use profiling tools to automate performance evaluation of your components. Profiling tools will help you identify the (usually) small areas which would provide the greatest opportunities for improvement. Applications really do follow an 80:20 rule; more than 80% of an applications performance is determined by less than 20% of the code. For compression, I would expect it to be an even smaller amount of code.
I like Quantify (http://www-3.ibm.com/software/awdtools/purifyplus/), though there are good tools from other vendors, and Quantify may not tell you everything you want to know. If speed is your goal, good tools are well worth considering. If you are a corporate MSDN subscriber, you can hire a Microsoft lab, with these kinds of tools for a week at a time. Check that they have AMD64/Opteron of course ;-). I think AMD may offer a similar service, but I'm not sure.
By identifying the specific areas which determine speed, you will also be identifying areas where you may aim for both 32bit optimal code and 64bit optimal code, which may be useful to you.
You'll be able to use these critical components and test framework to establish boundaries around your problem and a baseline for comparison as you tune things. This is important as interactions with the caches could give you odd results.
You will be able to test compilers, compilation options, library alternatives and linkage options easily too. You will likely get a lot of useful information out of this.
You will also have something concrete to review. I don't know if AMD offer a consultancy service, but getting someone who really knows optimisation on AMD64/Opteron is well worth considering once you have concrete information about a realistic problem. With a bit of luck, they will have some good performance tools and expertise in how to use them to maximum effect.
I would still recommend reading the Dr Dobb's article. Another point the author makes is that he usually sets compiler optimisations for smallest code space and not fastest code speed (apparently so do Microsoft)! This is because of subtle interactions of the CPU and cache; by going for the smallest code, he gets his code in the (fastest) cache more often, and that goes much faster than the 'fastest instruction sequence' which may be slightly larger and hence fit in the fastest cache less often.
All of this may sound like a lot of work, but look at it a different way.
You will be investigating the areas of highest project risk, which have the largest impact from failure, and greatest unknowns earliest in the project. This means you will have the most time and budget to fix things (or stop) if things don't work the way you want. You will also be developing skills and knowledge which you will need anyway, and which you never have time to acquire late in a project ;-).
I hope this helps.
Please keep asking if you have more questions, or just e-mail me.
(I apologise for my slow reply, but this isn't my day job :-)