AMD Logo AMD Developer Central
AMD Developer Blogs
AMD Developer Blogs
Decrease font size
Increase font size
May 15, 2009
  New Virtualization Article

Check out this new article in the Java Zone: Optimizing Java Performance in a Virtualized Environment.  It's based on a JavaOne 2008 Tech Session of the same name by Shrinivas and Azeem, which provided a good overview of how to navigate the intersecting worlds of Java and Virtualization.

Let us know what you think.



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: Ben Pollan @ 05/15/2009 11:28 AM     AMD Java Labs     Comments (1)  

May 6, 2009
  Striking a Balance

This week, AMD is making a couple of very important announcements for developers: support of Intel's Advanced Vector Extensions (AVX) instruction set in future AMD processors, and the adaptation to the AVX framework of AMD's previous SSE5 instruction set proposal.  The latter step has resulted in three new extensions: XOP (for eXtended Operations), CVT16 (half-precision floating point converts), and FMA4 (four-operand Fused Multiply/Add). In this posting I'll give an overview of the capabilities that these extensions provide, and also some insight into why we're taking this step.

First, the why. When we proposed the SSE5 extensions back in mid-2007, it brought some important innovations to the SIMD side of the x86 architecture:

  • a non-destructive three-operand capability, and a four-operand capability to support some very powerful new operations;
  • a set of powerful permute and conditional move instructions for data movement, plus Fused Multiply/Add (FMA) instructions for high-performance floating point;
  • a variety of other new operations to address various holes in the SSE instruction set: shift/rotate, integer compares, integer multiply/accumulate, and half-precision floating point support.

In April of 2008, Intel published its AVX/FMA proposal, which incorporated several of SSE5's innovations - in particular the three- and four-operand capabilities, the Fused Multiply/Add instructions, and some of the permute instructions - except in a somewhat different form. This proposal also added some new capabilities with a new instruction format: doubling the width of SIMD FP operations, applying the non-destructive three-operand capability to most legacy SSE instructions, and greatly expanding the potential opcode space for future extensions.

With this duplication of functionality between SSE5 and AVX/FMA, and AVX's additional features, we felt the right thing to do was to support AVX. In our minds, a more unified instruction set is clearly what's best for developers and the x86 software industry. With our acceptance of AVX, a key aspect of this instruction set unification is the stability of the specification. Since we don't control the definition of AVX, all we can say for sure is that we expect our initial products to be compatible with version 5 of the specification (the most recent one, as of this writing, published in January of 2009), except for the FMA instructions, which we expect will be compatible with version 3 (published in August of 2008).

Why the FMA difference?  This was not something we did lightly.  In December of 2008, Intel made significant changes to the FMA definition, which we found we could not accommodate without unacceptable risk to our product schedules.  Yet we did not want to deprive customers of the significant performance benefits of FMA. So we decided to stick with the earlier definition, renaming it FMA4 (for four-operand FMA - Intel's newer definition uses what we believe to be a less capable three-operand, destructive-destination format).  It will have a different CPUID feature flag from Intel's FMA extension.  At some future point, we will likely adopt Intel's newer FMA definition as well, coexisting with FMA4.  But as you might imagine, we may wait until we're sure the specification is stable.

The fact remains that AVX does not incorporate all of SSE5's features.  Since SSE5 was based on months of discussions with ISVs on what sort of capabilities they felt were needed, and had been positively reviewed by the industry when we first put out the specification, we decided to follow through with development of these additional features.  To do so most effectively, we redefined them in the AVX framework, resulting in the XOP extension.

So, what's in XOP

Well, quite a lot, really.  First of all, the instruction formatting was changed to leverage the capabilities that the AVX VEX prefix brings, using a new VEX-like three-byte prefix sequence called (interestingly enough) the XOP prefix.  This provides three- and four-operand non-destructive destination encoding, an expansive new opcode space, and extension of SIMD floating point operations to 256 bits.  The SSE5 operations that are retained by the XOP extension are:

  • Horizontal integer add/subtract: Signed or unsigned add, or signed subtract, of adjacent byte, word, or dword elements in the source vector to word, dword or qword elements of the destination vector. 128-bit.
  • Integer multiply/accumulate: Multiplies elements of two input vectors, adding the results to a third input vector. 128-bit.
  • Shift/rotate with per-element counts: These use a vector of shift counts, allowing each element of the source vector to be shifted or rotated by a different amount. There is also a rotate instruction with an immediate-byte single count applied to all elements. 128-bit.
  • Integer compare: Signed and unsigned comparison of byte, word, dword and qword elements, with predicate (mask) generation as in the various SSE compare instructions. The particular comparison to perform is specified in an immediate byte. 128-bit.
  • Byte permute: A powerful operation which copies bytes from two 16-byte input vectors to a 16-byte destination vector, optionally performing a selected transformation on each, under the control of a third input vector. 128-bit.
  • Bit-wise conditional move: Selects each bit of the destination vector from either of two input vectors, per a third input vector. 128- and 256-bit.
  • Fraction extract: Extract the mantissa from floating point operands. Scalar and 128- or 256-bit vector, single and double precision.
  • Half-precision convert: These convert between half-precision and single-precision formats while loading or storing a four- or eight-element vector. They provide dynamic control of rounding and denormalized operand handling.  These particular instructions form a separate extension called CVT16, with a distinct CPUID feature flag.

Along with the FMA4 instructions, these support a wide variety of numeric-intensive, multimedia, and cryptographic applications, and allow some new cases of automatic vectorization by compilers.  Speaking of compilers, plans are afoot to support these in intrinsic form in various compilers, and they may be used automatically in code generation in some cases.

A version of the AMD64 SimNow! simulator with support for these extensions is planned for availability in very short order.

I hope I've given you a good taste of these new features. For all the details on the XOP and FMA4 extensions, you can find the specification here. And, I encourage you to read the blog of our CMO, Nigel Dessau, for an executive perspective on driving innovation into the x86 instruction set. We believe we've struck the right balance between innovation and standardization. Feel free to comment or ask questions - we're always happy to hear from you. As you can see below, we've already heard from ten of our technology partners on the subject.

Dave Christie is a Fellow and senior architect at AMD. His postings are his own opinions and may not represent AMD's positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Partner Support Quotes

Absoft

"The addition of AVX support by AMD is a great move as it enables superior performance potential across AMD's x86 family of processors," said Wood Lotz, Absoft CEO. "AMD's use of AVX can also simplify development of high performance compilers and tools for companies like Absoft, and enable customers across a wide variety of industries to build faster applications."

Acumem

"Acumem fully supports AMD's adoption and enhancement of the AVX instructions and will follow this standard as it becomes available in the market. As an ISV for performance tools we clearly see potential for performance improvements with these new additions" said Mats Nilsson, VP Software Engineering at Acumem.

Axceleon

"Axceleon applauds AMDs efforts to support both specifications, AVX and SSE5,  in their XOP specification proposal. The further enhancements in FMA4 which accelerate floating point algorithms are very important to Axceleon's HPC customers and will be welcomed across the board" said Mike Duffy, CEO of Axceleon.

Bibble Labs

"We at Bibble Labs are constantly looking for performance improvements, and as such we are investigating AVX because of the possible performance advantage it might bring. We also appreciate that AMD is taking an active role to ensure the instruction sets converge and not create separate, conflicting instruction sets," said Jeff Stephens, Vice President of Product Development, Bibble Labs.

Cakewalk

"We commend AMD for taking  an active role in open standards, by unifying the x86 instruction set and merging SSE5 into the AVX specification. This can help improve compatibility and simplify the work for developers implementing this. We look forward to investigating AVX for potential advantages it may bring  to our real-time applications and plug-ins," said Noel Borthwick, Chief Technology Officer, Cakewalk.

Nero

"We are pleased that AMD has decided to adopt the AVX instruction set extension instead of offering a variant," said Simone Hoefer, General Manager, Technology at Nero AG. "This will help reduce implementation complexity and multiple code-paths. We are confident that the SIMD (SSE/SSE2) optimizations already implemented will scale nicely to 256-bit/AVX, allowing us to truly embrace this new development."

Smith Micro Software

"Having to choose acceleration solutions that work well on both AMD and Intel CPU platforms, Smith Micro welcomes convergence of the x86 instruction set. AMD supporting AVX is desirable from Smith Micro's point of view," said Uli Klumpp, director of engineering, Smith Micro Software, Inc. "The AVX instruction set extensions are looking promising for further optimizing our computationally most demanding software, DCC and data compression products such as Poser and StuffIt."

Sonic Solutions

"AMD's adoption of AVX will help Sonic unify some of its engineering efforts and reduce development costs," said Jim Roth, Chief Technical Officer, Sonic Solutions. "We welcome this initiative and the proposed enhancements to the x86 processor architecture, which we will leverage to increase the responsiveness and performance of our digital media applications."

Sony Creative Software

"We are pleased that AMD has decided to adopt the AVX instruction set extension instead of offering a variant," said John Freeborg, Vice President of Engineering for Sony Creative Software. "We also appreciate that AMD is taking an active role to ensure these converge and do not create separate, conflicting instruction sets."



-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

Edited: 05/06/2009 at 10:38 AM by rex8664

 Post a Comment    

    Posted By: Dave Christie @ 05/06/2009 09:07 AM     Inside Dev Central     Comments (4)  

May 4, 2009
  Beta CodeAnalyst v2.9 Released for Windows

Hello all, again --

The next version of the AMD CodeAnalyst Performance Analyzer is available for you now.  I encourage to you to download it in another window and read the rest of the blog while it downloads.

We've added some widely requested enhancements, deprecated one feature, and fixed *cough* a few bugs.  While this release is in the Beta period, please send us feedback about anything you would like to suggest for the actual release or any issues you encounter.  You're welcome to send that to us any time, but during the Beta period, we're devoted to working on issues based on your feedback.    I invite you all to visit our forums for feedback, questions, and answers.

Some of the enhancements added are:

  • Multiple simultaneous symbol servers.
  • Process filters: You can limit the reported data to certain processes.
  • An API: No longer are you limited to interacting with AMD CodeAnalyst through our command line applications or our GUI, you can now programmatically control profiling and you can fold, spindle, and mutilate the data before displaying it.
  • Notes: You can add a customized note to each profile session. This feature should help you remember essential details about a session and reduce the length of session names.
  • Call stack data for a running process: You can now capture call stack information about a process using the command line tool without launching the process from CodeAnalyst.

I am sorry to report that our simulation feature is now deprecated.  It was useful for many reasons, but it was still a simulation of pipeline behavior.  Now we have instruction-based sampling (IBS) information available.  IBS can measure actual instruction execution, so I recommend that to you instead!

If you really must know about bug fixes and open issues, you can check out the release notes shipped with each version of the AMD CodeAnalyst tool. 

Most of the time since the last release has been spent writing and testing the API.  I've been working hard to make the API convenient and well documented (in doxygen format) for y'all.  We added the API so that you can build your own custom tools.  We are including some new sample code showing how to use the API, and I would love to hear (or read) what you end up doing with it.  Please post your projects and requests for further enhancements or clarifications on the forums.

Thanks!

-=Frank



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: AMD DeveloperCentral @ 05/04/2009 05:28 PM     Hard-Core Software Optimization     Comments (0)  

May 1, 2009
  Putting Enums to work

Most Java developers are probably aware that enums were added to Java 1.5 and we are becoming more familiar with seeing them used like this:

enum LIGHT { RED, AMBER, GREEN}; 
Here we  are defining an enum that we can use to hold the state of a traffic light. The above code allows LIGHT to be used as a new type.
LIGHT light = LIGHT.RED;
and via some magic we can use LIGHT values in switch constructs.
switch(light){
   case RED:
       System.out.println("Stop");
       break;
   case AMBER:
       System.out.println("Get ready");
       break;
   case GREEN:
       System.out.println("Go");
       break;
}

we can iterate over the values of
LIGHT using the array returned from the values() accessor..
 
for (LIGHT light:LIGHT.values()){
    System.out.println(light);
}

and can also perform ordinal comparisons..
if (light < LIGHT.GREEN.ordinal()){
   System.out.println("Not yet!");
}

 

Because enums are indeed Classes we can customize them by adding fields, constructors and methods. 

So if we wanted to be able to query each 'value' for the next in the sequence (including wrapping from GREEN to RED) we can use  :- 

current.values()[(current.ordinal()+1)%current.values().length]

 

However, rather than having this logic spill out into the code using the enum, we can provide a method in the enum itself 

enum LIGHT {
   RED, AMBER, GREEN;

   LIGHT next(){
       return(values()[(this.ordinal()+1)%values().length]);
   }   
}
 

So now we can query the next value using

 

light = light.next();

 

We can also overload methods for each value. So an alternative to the above implementation might be 

enum LIGHT {
   RED,
   AMBER{
      LIGHT next(){
         return(GREEN);
      }
   },
   GREEN{
      LIGHT next(){
         return(RED);
      }
   },
   // We need a method to override, so lets assume RED is the default
   LIGHT next(){
       return(AMBER);
   }   
}

 

Which is a little more verbose, but in some ways more explicit.  Note that we must provide an implementation for the enum and then each 'value' can overload this if it chooses.

 

Although we can't extend enums (and probably for good reason) we can implement interfaces.

 

Let’s say we had an application which deals with a bound set of file types (XML, TEXT and ANY). We could make a FILE_TYPE enum which supports the FileFilter interface.

 

enum FILE_TYPE implements FileFilter  {
   ANY,
   TXT{
      boolean accepts(File _file){
         return(_file.getName().endsWith(".txt") || _file.getName().endsWith(".text"));
      }
      String getDescription(){
         return("TXT files");
      }
   }
   XML{
      boolean accepts(File _file){
         return(_file.getName().endsWith(".xml"));
      }
      String getDescription(){
         return("XML files");
      }
   }
   boolean accepts(File _file){
      return(true);
   }
   String getDescription(){
      return("Any file");
   }
}

This then allows us to write a method for getting a file from a JFileChooser dialog... 

File getFile(FILE_TYPE _fileType){
 JFileChooser chooser = new JFileChooser();
 filter.setDescription(_fileType.getDescription());
 chooser.setFileFilter(_fileType);
 int returnVal = chooser.showOpenDialog(parent);
 if(returnVal == JFileChooser.APPROVE_OPTION) {
    return(chooser.getSelectedFile());
 }
 return(null);
}

 

So we can ask for an XML file using..

File file = getFile(FILE_TYPE.XML);

Obviously we need to be careful and not 'misuse' them, but I believe that enums can offer options beyond the traditional static list of values.



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied


 Post a Comment    

    Posted By: Gary Frost @ 05/01/2009 12:42 PM     AMD Java Labs     Comments (1)  

FuseTalk Hosting Executive Plan - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information