AMD Logo AMD Developer Central
AMD Developer Blogs
AMD Developer Blogs
Decrease font size
Increase font size
March 27, 2009
  Accurately profiling code with Instruction Based Sampling

Most modern processors are superscalar or implement a form of parallism where instructions can schedule and execute multiple instructions at one time.  The latest AMD Opteron™ processors are superscalar which allows them to run code faster than would otherwise be possible at the same clock rate.  This can present a few problems with respect to finding performance bottlenecks.  Since the latest AMD Opteron™ processors can execute and track up to 72 instructions in flight at one time, it becomes difficult to properly match up source code to the in-flight instructions.  
 
What does all of the above have to do with profiling your application?  If you are investigating performance issues with a timer based profile then you are more interested in how many cycles a method took to execute.  And a few instructions being mapped to the wrong method is not of much consequence.  Although if you are running profiles with performance counters then the lack of precision between when an instruction performed the event you are tracking and when that value shows up in the counter could throw off your analysis.  Traditional profiling tools will not be able to accurately match the performance counter timer samples with the generated X86 assembly code.  Instructions in flight can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors and performance counter events can be attributed to the wrong instruction.  This is even more difficult in a managed environment like Java where the generated code is dynamically created and executed.  

 

All of this adds up to inaccurate mappings between X86 assembly and performance counters which can mislead most performance engineers into fixing performance bottlenecks on the wrong sequence of code. 

 

One way around this is to use better implementations of hardware performance counters.  The latest AMD Opteron™ processors have something that can help.  Instruction Based Sampling (IBS) provides precise information about the execution of instructions.  IBS provides four advantages over conventional performance counters:

 

1.     Hardware events are attributed precisely to the instruction that caused the event.

2.     A wide range of events are gathered, and are not limited to four out of many events that must be specified at the beginning of the profile.  

3.      Virtual and physical addresses of load/store operands are collected.  This allows managed environments to associate specific data structures with X86 instructions.

4.     Latency is measured for key performance parameters such as data cache miss.

 

For more information about IBS, see the AMD Software Optimization guide for AMD Family 10h Processors.



-------------------------
--
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

 Post a Comment    

    Posted By: Azeem Jiva @ 03/27/2009 09:36 AM     AMD Java Labs     Comments (0)  

March 23, 2009
  Java Labs interview by Java Posse soon
I'm happy to announce that the Java Labs will be interviewed by the Java Posse tomorrow.  Their weekly podcast keeps the development community up-to-speed on the world of Java.  While we certainly plan to discuss the involvement of the team and AMD in that world, the interview gives us the opportunity to discuss the points that are important to you.
 
Do you have a pressing topic that you want us to weave into the discussion?  Let us know by commenting here.
 
Thanks,
Ben
 


-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: Ben Pollan @ 03/23/2009 11:20 AM     AMD Java Labs     Comments (0)  

March 20, 2009
  Sun Tech Days, Hyderabad - 18-20 February, 2009

It was an awesome event for us, at the AMD booth as well as at our technical session. It was a great experience for us to meet developers who are keen to know about AMD's efforts within the software community. We appreciate the support from those of you who attended AMD's technical session, particularly when you had multiple options.

We hope you now know  

  • Why AMD cares about Java
  • What contributions we've made to the Java community
  • Some useful tips for improving the performance of your Java application
  • How AMD works with many software partners to optimize their applications

As promised during my talk, here are the links to the Framewave and SSEPlus open source library projects.  Check them out, and contribute your own enhancements to the libraries or let us know what enhancements you'd like to see.

If you missed our session, here are some useful resources


We hope our recommendations for coding best practices were useful. We also hope that you have upgraded to JDK 1.6 to get the latest enhancements we've contributed.

We sincerely look forward to meeting you all again next year (and to the spicy hyderabadi biryani!!).

Were you there?  Drop us a comment to let us know!

-JK



-------------------------

Velu, Jayaprakash


The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.



Edited: 03/20/2009 at 01:37 PM by AMD Developer Blogs Moderator

 Post a Comment    

    Posted By: Velu Jayaprakash @ 03/20/2009 07:11 AM     Inside Dev Central     Comments (0)  

March 10, 2009
  Java Object Trimming

Things like code refactorings, insufficient code coverage testing, poor coding standards or mere oversight can often lead to redundant code. Generally all commercial JVMs apply a certain level of escape analysis and dead code elimination optimizations while executing Java code. However, there are certain cases where JVM is unable to eliminate redundancy; for example, unused instance variables of a class.

 

IDEs such as Eclipse can warn the Developers about local variables and instance variables not being read by the program. It might be easier for JVMs to identify and optimize away redundant local variables. However, it would not be possible for a JVM to eliminate unused instance variables that can potentially be accessed using mechanisms such as Java Reflection. These unused instance variables will increase object size and, in turn, the memory footprint of the Java application if a large number of such objects are allocated by the program.

 

You might think that such unused/unread instance variables might not have a big impact on the application performance. Contrary to this belief, such redundant variables can substantially affect application performance. In fact, this could lead to pathological cases where application performance can be hampered due to inefficient processor memory cache usage. For example, a redundant instance variable which gets mapped to the end of object layout in the memory can increase the object size such that the resulting object does not fit into a single cache line. On the other hand, when such a field gets mapped into middle parts of the object layout, it can lead to memory holes and result in redundant garbage collection cost.

 

The following benchmark demonstrates performance effects of unused instance variables:

 

import java.util.LinkedList;

 

public final class ObjectTrimming {

      private long redundantField1 = 0L;

      private long redundantField2 = 0L;

      private long redundantField3 = 0L;

      private long redundantField4 = 0L;

      private long redundantField5 = 0L;

      private long redundantField6 = 0L;

      private long redundantField7 = 0L;

      private byte[] usedField = null;

 

      private ObjectTrimming(){

            usedField = new byte[968];

      }

 

      public static void main(String[] args) throws Exception {

            LinkedList listStore = new LinkedList();

            long totalTime = 0L;

 

            for(int j=0; j<10;j++)

            {

                  long then = System.currentTimeMillis();

                  for(int i=0; i< 104857; i++)

                  {

                        listStore.addLast(new ObjectTrimming());

                  }

                  long now = System.currentTimeMillis();      

                  totalTime += now - then;

                  System.out.println("After Iteration " +j + " total number of elements in the Linked List: " +listStore.size());

            }

            System.out.println("Execution Time: " + totalTime);

      }

}

 

As you can see, this application creates approximately 1024M of data. The ObjectTrimming class has 7 fields of long data type and a byte array. Since each long field occupies 64 bits of data, objects of type ObjectTrimming class will have 56 bytes of space occupied by long fields. We are using the byte array to allocate another 968 bytes of data which will bring each ObjectTrimming object to a size of 1KB. We are creating 104857 such objects in a loop which iterates for 10 times. Thus in the end, listStore linked list will contain approximately 1024M of data.

 

We executed this benchmark three times on a system with configuration of 2 chips, 8 CPUs, 4 cores per chip AMD Opteron 8384 processor running at 2.7GHz, with 8G of RAM and it took an average of 9894 milliseconds. None of the long variables are used by the ObjectTrimming class.    Neither could they be accessed by other classes, so we decided to remove these fields and rerun the modified benchmark. This time the execution time for the three runs averaged to 6493 milliseconds. That is 52% improvement in application performance. It is unlikely that a class will have a large unused-fields-to-used-fields ratio as in our example above. However, the example above shows that saving 56 bytes of data in a 1K object can have a big impact on memory footprint of the application.

 

We used Sun Java SE Runtime Environment (build 1.6.0_06-p-b01) with Java HotSpot Server VM (build 14.0-b09, mixed mode) for the above experiments. The JVM command-line flags used to run the benchmark were: java -server -Xms1175M -Xmx1175M

 

As noted above, JVMs cannot necessarily eliminate such unused fields, thus it is the Java Developer's responsibility to optimally define classes.

 



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: Shrinivas Joshi @ 03/10/2009 11:22 AM     AMD Java Labs     Comments (5)  

March 3, 2009
  Is String Immutable?

In many texts String is cited as the 'gold standard' of Java's various immutable classes.  Any google of 'Immutable Java' will invariably reveal examples using String to demonstrate the benefits and characteristics of a good immutable class.  

 

From Wikipedia(http://en.wikipedia.org/wiki/Immutable_object) we get the following definition:-

 'In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created.'

I like this definition as it describes what I had assumed to be the main point: that an instance's state should not be allowed to be modified post-construction.

 

However, I recently found myself looking at the String class and came across its hashCode() method :-

public int hashCode() {
 int h = hash; 
 if (h == 0) {
     int off = offset;
     char val[] = value;
     int len = count;
       for (int i = 0; i < len; i++) {
           h = 31*h + val[off++];
       }
      
hash = h;
    }
    return h;
}

Looking closely we see that this is a classic implementation of the 'lazy evaluation' pattern. Rather than computing the hash value in the constructor, or computing it each time the hashCode() method is called,we compute it once (on the first call to hashCode()) and save the computed value in the hash field.

 

We can see this if we read the private hash field reflectively.

String helloWorld = "helloWorld";

Field field=String.class.getDeclaredField("hash");
field.setAccessible(true); // because hash field is private

System.out.println("Before first hashcode call "+field.getInt(helloWorld));

helloWorld.hashCode();

System.out.println("After first hashcode call "+field.getInt(helloWorld));

If one runs this snippet of code it will output something like


Before first hashcode call 0
After first hashcode call -1554135584

 

So for a String instance which we create, but for which we never call hashCode(), the private hash field will remain 0. It is only changed when we call hashCode().

 

By Wikipedia's definition I believe String fails the immutablility test.

 

The argument might be that we can't observe a String instance in a different state without resorting to a reflective read of String's hash field and because the call to retrieve the state actually modifies it; if we can't observe it changing, then it didn't change.

 

This is reminiscent of

 

"If a tree falls in a forest and nobody hears it, did it make a noise?"

 

Or my personal version

 

"If I say something in a room and my wife doesn't hear me, am I still wrong?"

 

However it still seems like String is not immutable.

 

In Eric Lippert's blog (http://blogs.msdn.com/ericlippert/archive/2007/11/13/immutability-in-c-part-one-kinds-of-immutability.aspx) he refers to this as 'Popsicle Immutability' and although it seems safe, (ignoring reflective access) it does seem incorrect.

 

So I would like to open up a discussion:  Is String immutable?



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied


 Post a Comment    

    Posted By: Gary Frost @ 03/03/2009 10:44 AM     AMD Java Labs     Comments (4)  

FuseTalk Hosting Executive Plan - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information