AMD Logo AMD Developer Central
AMD Developer Blogs
AMD Developer Blogs - Just released: Advanced Synchronization Facility (ASF) specification
Decrease font size
Increase font size
June 15, 2009
  Just released: Advanced Synchronization Facility (ASF) specification

Recently AMD released an experimental specification for a proposed AMD64 architecture feature that may be of interest to all programmers of highly concurrent programs, libraries, runtimes, and operating systems: Advanced Synchronization Facility, or ASF for short. This is the first of three blog articles describing why AMD's Operating System Research Center (OSRC) became involved in the development of ASF, how we are evaluating ASF, and how this and other activities fit into the EU-funded VELOX project aiming at improving the state of the art for software-transactional-memory systems.

In this posting I will give you a quick overview of what ASF is and how it works, along with some example code. I'll also describe how I became involved in developing ASF and why we are releasing this spec proposal.

About ASF
In a nutshell, ASF is intended to make it easier to write efficient, highly concurrent programs.

When AMD introduced multicore CPUs to the x86 world, we acknowledged that individual CPU cores weren't getting much faster with each silicon-technology generation. Instead, we decided to provide multiple CPU cores in one processor. This put the burden on the software community of making programs run faster on newer processors (i.e., programs have to be changed to take advantage of the parallelism.)

Writing efficient, concurrent programs or parallelizing an existing sequential program is a hard endeavor. The trickiest part is making sure that all program threads have a consistent view of all shared data. ASF is intended to address this very problem, known as synchronization.

How does ASF work?
ASF provides a mechanism to update multiple shared memory locations atomically without having to rely on locks for mutual exclusion. It's quite flexible as the semantics of the update are not fixed, but can be provided using standard x86 instructions.

Here's an example. This code snippet implements a two-word compare-and-swap primitive, with new instructions highlighted in red:

; DCAS Operation:
; IF ((mem1 = RAX) && (mem2 = RBX))
; {
;   mem1 = RDI
;   mem2 = RSI
;   RCX = 0
; }
; ELSE
; {
;   RAX = mem1
;   RBX = mem2
;   RCX = 1
; }
; (R8, R9 modified)
;
DCAS:
 MOV      R8, RAX
 MOV      R9, RBX
retry:
 SPECULATE                    ; Speculative region begins
 JNZ      retry               ; Page fault, interrupt, or contention
 MOV      RCX, 1              ; Default result, overwritten on success
 LOCK MOV RAX, [mem1]         ; Specification begins
 LOCK MOV RBX, [mem2]
 CMP      R8, RAX             ; DCAS semantics
 JNZ      out
 CMP      R9, RBX
 JNZ      out
 LOCK MOV [mem1], RDI         ; Update protected memory
 LOCK MOV [mem2], RSI
 XOR      RCX, RCX            ; Success indication
out:
 COMMIT                       ; End of speculative region

The SPECULATE-COMMIT pair wraps a speculative region, which speculatively reads from and writes to protected memory locations using the LOCK MOV instructions. The speculative memory updates will become visible to other CPUs only when the speculative region completes successfully.

Here's what the speculative region does in this example: The initial LOCK MOV instructions signify the memory locations that need to be monitored for external modifications and also read the memory operands into the RAX and RBX registers. The code then compares these operands with the original register operands (saved to R8 and R9 at the outset of the routine). The DCAS operation may fail because of a miscomparison at that point, bypassing the memory update. The RCX register returns a pass-fail indication.

A speculative region may also be aborted, for example when a contending program thread accesses a protected memory location or when an interrupt occurs. In this case, all speculative memory updates are discarded, and the program flow (instruction and stack pointer) is rolled back to just after SPECULATE, where software can inspect the reason for the abort in the rAX and rFLAGS registers. The code in this example examines RFLAGS immediately after SPECULATE using a JNZ instruction that branches to the abort handler, which in this case just attempts a retry. A real implementation might have a more elaborate recovery strategy, for example, exponential backoff if the abort was due to contention.

How we are developing ASF
ASF really is a team effort, with team members looking at various software applications, hardware implementation, and the specification itself.

When I joined AMD's OSRC at the end of 2006, I quickly discovered ASF as it existed at that time: a mechanism for improving the efficiency of highly parallel, lock-free synchronization code. In previous work I had used lock-free data structures for building a real-time microkernel operating system, and I had often craved a feature for multi-word atomic updates such as ASF. This might explain why I was so enthralled by ASF.

In the meantime, I have become the editor of the ASF specification proposal. I'm working with the ASF team to evaluate the feature in various application scenarios, and to further develop ASF based on our findings. We have expanded its focus to include software transactional memory (STM) as well; more on that in a later blog post.

We are also actively discussing ASF with both academic and industrial partners to learn about interesting application areas and to derive requirements for an eventual implementation in future products.

The ASF specification
ASF is an experimental architecture extension currently in proposal stage. AMD has not yet committed to including this feature into any future CPU product. Instead, we are soliciting input from developers and researchers that would help us refine the ASF specification to better meet software development requirements.

ASF is not the first feature we have proposed in this way. A year and a half ago, AMD decided to be more open in developing extensions to the AMD64 architecture to help ensure we meet the needs of the software development community and to encourage cross-vendor compatibility. At that time, we proposed the Lightweight Profiling (LWP) and SSE5 features in a similar spirit, and we received extremely valuable input from the programming community that helped us improve our future products - to your benefit. SSE5 has just recently evolved into the AVX-compatible XOP, which we described in a previous blog entry.

Please download the ASF specification proposal and send your comments to ASF_Feedback@amd.com.

---
Michael Hohmuth, MTS
AMD Operating System Research Center, Dresden



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Post a Comment    

    Posted By: AMD DeveloperCentral @ 06/15/2009 01:57 PM     AMD Operating System Research Center (OSRC)  

June 16, 2009

Comments


 

Looks interesting.  Three questions:

1) Is this at the point where you can put it in a simulator and do some actual comparisons?

2) How easy/hard would this to be to put into a compiler (for those of us wimps who don't always code in assembly

3) Will you be collaborating with gcc to get this implemented?  (I use gcc, although I just found open64 to play with).

I come from a solid state physics background, so I don't know how much of a benefit this would be for me, but I'm still a relative n00b.  How much of a benefit would you expect for, e.g. LAPACK, BLAS, or a Lanczos implementation?


 Posted By: Joseph Pingenot @ 06/16/2009 12:07 AM   :  Post a reply

June 18, 2009
 

Hi Joseph,

Thanks for your comments!

1) Is this at the point where you can put it in a simulator and do some actual comparisons?

Almost. We are going to release a simulator (based on PTLsim) in the near future.

We have previously released a simulator based on an older version of ASF, and have published some preliminary results generated with it. You can find it on the OSRC website.

2) How easy/hard would this to be to put into a compiler (for those of us wimps who don't always code in assembly

Based on the feedback we received from compiler folks, we don't expect any major problems with compiler integration.

3) Will you be collaborating with gcc to get this implemented? (I use gcc, although I just found open64 to play with).

We are aware of the GCC STM work and are considering to leverage it to support ASF. Once we know better what ASF should look like, and if AMD decides to productize it, we expect a major effort towards enabling GCC for ASF. As you may know, AMD is a very active contributor to GCC.

Michael



-------------------------

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.


 Posted By: Michael Hohmuth @ 06/18/2009 05:54 AM   :  Post a reply

July 25, 2009
 

Awesome, thanks!

(Actually, I didn't know that you were a very active contributor to GCC; that's great!


 Posted By: Joseph Pingenot @ 07/25/2009 07:27 AM   :  Post a reply

FuseTalk Hosting Executive Plan - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information