AMD Logo AMD Developer Central
AMD Developer Blogs
AMD Developer Blogs
Decrease font size
Increase font size
April 28, 2008
  Live Webcast of AMD's JavaOne Keynote

Well, it’s going to be a pretty busy week here, as AMD puts the finishing touches on our JavaOne activities.  It’s going to be a great conference.  Be sure to stop by our booth and don’t forget that AMD’s keynote will be on Wednesday, May 7 @ 5:30pm, presented by Leendert vanDoorn, Senior Fellow in our Software Technology Office.

 

If you aren’t attending the conference, but still want to see the keynote, no worries!  There will be a live webcast of the session, accessible from the JavaOne home page <http://java.sun.com/javaone.

 

Ben



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: Ben Pollan @ 04/28/2008 12:17 PM     AMD Java Labs     Comments (0)  

April 14, 2008
  "Barcelona" Processor Feature: SSE Misaligned Access
We all crave high performing code and in the process we try hard to optimize the algorithms, reorder instructions, unroll loops, avoid branches, reduce pointer usage to allow compilers to optimize, replace dynamic allocation with static allocation where the size is known and so on. One such optimization is with respect to data loads and stores from memory which consume a majority of processing cycles in data-intensive applications. Here, I'll take you through one such optimization with respect to data alignment while using SSE (Streaming SIMD Extension) instructions.

Why use SSE instructions?

SSE instructions operate on 16 bytes of data in parallel. We can load 16 bytes of data at a time and compute those 16 bytes of packed data using a single SSE instruction.
Example: ADDPS xmm1, xmm2 - Add 4 single precision floating point elements packed in xmm1 register with corresponding elements packed in xmm2 and store the result back in xmm1.

SSE instructions are widely used in developing computation-intensive multimedia applications. Typically, these applications process large amounts of sequential data through the following steps:

1. Load data from memory
2. Perform computation on the data
3. Store data back to memory

First we will discuss the intricacies involved in optimizing memory operations using SSE instructions on the AMD-K8tm family of processors (first and second generation AMD Opterontm processors) and then we will discuss the architectural enhancements provided by the "Barcelona" or Family 10h processors (including Quad-Core AMD Opteron and AMD Phenomtm X4 Quad-Core and X3 Triple-Core processors).

SSE instructions consist of two types of load and store instructions. The first type is aligned loads and stores (ex: MOVDQA, MOVAPD, MOVPS) that operate on 16 byte aligned memory addresses. The second type is unaligned loads and stores (ex: MOVDQU, MOVUPD, MOVUPS) that operate on both aligned and unaligned memory addresses. On the AMD-K8 family of processors the aligned version of load and store operations are faster than the unaligned operations even if the memory is 16 byte aligned. For details on the latencies of the various types of load and store instructions, refer to the AMD Software Optimization Guide for AMD Family 10h Processors.

If we use the aligned version of memory operations without verifying the memory address alignments then there are two possible outcomes. First, if the memory is aligned then the memory operations are fast. Second, if the memory is unaligned then the system throws an exception and hence the application crashes (Bang!!!). Now, the solution to this problem is to align the input data to both gain performance and eliminate exceptions and crashes. This solution may not work always since the target user using the application may not align the data or because enforcing such a rule may be inappropriate at times. The easy solution here is to use the safer unaligned loads and stores, sacrificing performance irrespective of the data alignment.

If you are a programmer looking for the best possible performance, saving every single processing cycle, then the solution here is to handle both aligned and unaligned data by checking for alignment of the data at runtime and call the appropriate function that handles either aligned data or unaligned data.

The code to handle aligned and unaligned data is as follows:



if( isAligned(data) )
{
process_aligned (data);
}
else
{
process_unaligned(data);
}

//The 16 byte alignment check code is as follows.
bool isAligned(void* data)
{
return ((data%16) == 0);
}




Typically, the process_aligned and process_unaligned routines have identical code except for the type of load and store instructions.

Architectural enhancements in AMD Family 10h processors ("Barcelona" processors)

"Barcelona" comes with load instructions that are twice as fast as the previous generation processors. For example, the aligned loads take 2 processor cycles in "Barcelona," compared to 4 processor cycles in the AMD-K8 architecture. This is only the latency of the instruction execution; there could be additional latency depending on the locality of the actual data being present in cache or main memory.

The unaligned loads in "Barcelona" run at the speed of aligned loads if the data is aligned. Thus, it is safer to use unaligned loads whenever the alignment of the data is not guaranteed, hence eliminating the check for 16 byte alignment at runtime. If the data is unaligned then the instruction is slightly slower than aligned loads but at an improved speed compared to the unaligned loads on AMD-K8 processors. The FPU unit in "Barcelona" has been widened to 128 bits from 64 bits and the load instructions are fast path instructions. (Note: In AMD-K8 processors, SSE loads are vector path instructions which block the execution units from executing any other instruction in parallel.)

The above optimizations are not applicable for SSE stores. The unaligned stores are slower than aligned stores even when the data is aligned.

Ravindra Babu
Software Engineer, AMD

-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.



Edited: 04/14/2008 at 12:55 PM by devcentral

 Post a Comment    

    Posted By: AMD DeveloperCentral @ 04/14/2008 11:14 AM     AMD “Istanbul” (Family 10h) Processor Software Visible Features     Comments (0)  

April 11, 2008
  The Software Optimization Guide Comes to Life!
I'm pleased to announce that we have just published a series of six videos that brings to life some of the key concepts outlined in the Software Optimization Guide for Family 10h Processors. This video series is a companion to the optimization guide, and provides a quick look at some highly useful tips in addition to some examples to illustrate coding best practices.

We hope you find this series valuable, and welcome your feedback. Let us know what you think by commenting on this post. If you have questions about the information contained in the videos, feel free to post a question in our forums.

Happy viewing!
Software Optimization Video Series

-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: AMD DeveloperCentral @ 04/11/2008 02:57 PM     Inside Dev Central     Comments (2)  

April 7, 2008
  Come join us at JavaOne 2008!

It's that time of year again.  Here in Austin, the bluebonnets are in full bloom, and that can only mean one thing.  It's time for the Java Labs to pack our bags and head over to the Moscone Convention Center in San Francisco, so that we can rub elbows with 15,000 of our closest friends at 2008 JavaOne!  As platinum sponsors, AMD is playing a much bigger role at the conference this year.  That's because as a platform company, we have a lot to say about Java.

 

There are a few things you won't want to miss.  On  Wednesday, May 7  5:30pm, Leendert van Doorn, AMD Senior Fellow will present a keynote on processor companies' roles in the Java world.  On  Tuesday, May 6  6:00pm , the Java Labs' own Azeem Jiva and Shrinivas Joshi will present a technical session titled "Virtualizing a Virtual Machine," where the duo will discuss best practices when deploying Java applications in virtualized environments.  And of course in the Pavilion there's the booth, the big beautiful booth (so beautiful that I'm thinking of having those designers give my house a makeover), where we will highlight AMD's role in the Java community today and share our vision of the future. 

 

Details can be found on our AMD at 2008 JavaOne page:  http://developer.amd.com/EVENTS/JAVAONE/Pages/default.asp. 

 

See you there!

Ben



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.



Edited: 04/07/2008 at 02:44 PM by bpollan

 Post a Comment    

    Posted By: Ben Pollan @ 04/07/2008 02:14 PM     AMD Java Labs     Comments (0)  

FuseTalk Hosting Executive Plan - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information