In Part I on the AMD At Home Blog Simon Solotko gave an overview of open, parallel computing with ATI Stream and OpenCl. Here, in Part 2, Simon Solotko & Ben Sander discuss the power of ATI Stream technology and the elegant, standards-based interface now available with OpenCL for GPU.
Ben, what have we created with OpenCL and what does it do?
Ben: Sure, with OpenCL we created a C-based interface for programming a range of parallel processors. Developers write OpenCL Kernels, sub-routines which developers seek to accelerate or offload, and embed these in their applications. OpenCL includes a runtime component which allows these OpenCL Kernels to be compiled at runtime for either a CPU or GPU. AMD has contributed to the development of the OpenCL specification and written the implementation x86 processors and GPU's - a runtime environment which compiles the code near runtime, then schedules and executes the code at runtime.
What are the benefits of being able to compile an application for a CPU or a GPU?
Ben: Developers can write one piece of code and easily support a variety of compute devices in the platform - CPUs and GPUs, from multiple vendors. Code can be load-balanced between CPU and GPU depending on the capabilities in the final platform. For example, we expect that some applications or parts of applications will run faster on the CPU than the GPU, other applications perform better on the GPU. Finally, the OpenCL CPU implementation levertages the CPU hardware debug features to provide excellent debug capabilities, using familiar debug environments, at a full CPU speeds.
When exactly during runtime is the Kernel compiled?
Ben: There are specific commands within the body of your application which you call to compile the Kernel, and direct it to be compiled for the CPU or GPU. At that point, the Kernel code is translated into a binary. The binary later executes natively when the Kernel is called. The code is not interpreted in the hot spot of the loop, it's not like Java in that regard.
So the code within a Kernel looks like C but can be compiled to execute on the GPU?
Ben: Exactly. Because a GPU looks and functions differently than a CPU, however, you have to think differently when you write the Kernel for GPU, because at that point, you are executing your code directly on the GPU. There are constraints imposed on Kernel code to accommodate the specialized functionality of the GPU. Kernels are based on C99 with extensions provided by OpenCL-C for vectors and address spaces.
Give me some examples of the special ways in which the C code within a Kernel is different from the standard code in the body of the application?
Ben: To understand writing a Kernel it is important to understand that the code is actually executing on a GPU, despite the fact that the functions you are performing are syntactically the same as other C code. A GPU has a small fast cache (local memory) and larger main GPU memory (global memory). You move data in blocks, and complete as much of the task on that block as possible before moving the block out and moving the next block in. With a GPU we have a lot of compute bandwidth relative to memory bandwidth making it advantageous to do as much as you can to data within the cache. With OpenCL the blocking process does not necessarily get easier, but you can control it from C code.
How do we move data from main memory to the GPU memory for use by a Kernel function?
Ben: A Kernel cannot move memory from main memory, that is done in your application code. So there are standard functions to copy memory into GPU memory from the application, and pointers to this memory can then be passed to a Kernel function. The Kernel function can then copy memory into the fast cache or "local" memory.
This sounds a bit complicated, but I have to remind myself, this is all standard C code, and we are discussing the optimization that makes something run fast on the GPU, and the memory management tools that are available, now within standard C through the OpenCL library, to do that.
Ben: That's Right. The magic is that a Kernel is C code which is amazingly compiled by the runtime component of OpenCL to run on a GPU or CPU with some extra tools to ensure it can take full advantage of the extremely high compute to memory bandwidth capability of the fast, parallel math engine of the GPU.
So as time goes on, we anticipate that people will write and optimize many useful Kernels which will simplify the development of complex applications?
Ben: Yes. It is relatively straight-forward to port applications written for other GPGPU languages like Brook+ and CUDA to OpenCL. This is a huge step forward from proprietary GPU code, you now have a standard way to get at GPU code and memory from C in a platform independent way.
With ATI Stream technology and the standardization of the programming model with OpenCL for GPU almost any aspiring GPGPU developer can download the tools necessary to get started and develop platform-independent software fueled by the power of the evolved GPU. I have collected resources below to get you started, enjoy blazing the trail of a new frontier in computing!
Simon has regular posts on the AMD At Home blog and you can check out The Digital Nexus series here.
-------------------------
Simon Solotko is a Senior Advanced Marketing Manager at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
AMD has always been an advocate of open standards that build on and extend proven technologies (example: x86-64)W. As such, it is a natural fit for AMD to embrace OpenCLas part of its ATI Stream offering. But, just what is OpenCL?
In this month's episode of the AMD Developer Inside Track I interview Mike Houston, GPG System Architect. He talks about what OpenCL is, what the transition to this new language will be like and he gets into what applications could benefit from OpenCL, as well as what the future has in store for software applications that use it.
One of the advantages of OpenCL is its advanced queuing system which is great for game development. It is also designed to work very well with various graphics APIs such as OpenGL, DirectX 9 and DirectX 10.
Game developers aren't the only ones who can take advantage of OpenCL though. According to Michael, it is going to be very useful for applications such as media encoding, virus scanning, and physics to name a few. It makes alot of sense for AMD to move to a ubiquitous computing language that runs on platforms everywhere. The next few years will be an interesting time for GPGPU technology as several hardware and software vendors get on board.
ATI Stream technology is gaining significant momentum.Some cool and unexpected examples of ATI Stream technology in action are:
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
Edited: 09/16/2009 at 03:21 PM by AMD Developer Blogs Moderator
It's been a while since we've had an update on the ATI Stream Developer Blog... Over the past year since the last blog posting, a lot has happened. ATI Stream SDK v1.x saw two release (v1.3-beta at the end of last year and v1.4-beta at the beginning of this year). With each of those releases the SDK and Brook+, in particular, we focused on stability and adding more exciting features.
We've even launched an ATI Stream Developer Showcase site where quite a few of your fellow developers have submitted their ATI Stream applications to show the developer community (you), the exciting things they have done with the ATI Stream SDK. ATI Stream Power Toys came into existence and we are planning to continue to grow it as we come up with fun and useful tools for you that just can't wait for the next ATI Stream SDK release. And, ACML-GPU finally made it out of alpha/beta testing and is now release on AMD Developer Central. All truly exciting stuff!
But, what has been even more anticipated since the middle of last year has been OpenCL(TM). If you don't know much about OpenCL and how it meshes with the rest of GPGPU history, take a look here. It was a tremendous amount of work that kept our engineering team up late for many nights... but, finally, we were able to release a beta version of our ATI Stream SDK v2.0 with OpenCL x86 CPU support today. It's part of our complete OpenCL development platform and is designed to help accelerate your applications with OpenCL today on multi-core CPUs, plus helps you take advantage of the added speed of GPUs later on this year. If you are interested in giving it a try, visit our ATI Stream SDK v2.0 Beta Program page to download the beta release.
Benedict Gaster, our OpenCL compiler architect here at AMD, has written an introductory tutorial for OpenCL to help developers get started learning and getting comfortable programming in OpenCL. You can find his OpenCL tutorial article here.
Also take a look at Patricia Harrell's blog, OpenCL Changes the Game. Patricia is the Director of Stream Computing here at AMD.
Stay tuned for even more information about ATI Stream SDK developments.
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
Edited: 08/06/2009 at 08:19 PM by michael.chu@amd.com
AMD is currently hosting the first ever AMD China University contest in accelerated computing. Ten teams from distinguished universities will be competing to see who can code the fastest application using AMD Stream technology. The teams have been selected already and are in the midst of coding their applications.
Stay tuned to see what exciting applications they come up with and who ultimately wins!
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
Telanetix, Inc. revealed today that it is working with AMD to jointly develop telepresence-focused Stream Computing technology and announced the first results of this ongoing effort. By utilizing this new technology in its Digital Presence product line, Telanetix claims that it enables higher quality, lower cost High Definition (HD) telepresence, and that this technology is now available and shipping in every Digital Presence system, as part of the recent Telanetix 3.4.3 Digital Presence technology platform release.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
During the current 45th Design Automation Conference (DAC) which took place at Anaheim, CA, ACCIT-New Systems Research revealed that it was able to enlist several Silicon Valley beta sites for testing its new software-hardware parallelized full precision true analog simulation engine. The company said it is achieving significant performance improvements using AMD FireStream™ stream computing technology from AMD. This technology is integrated with multi-core / multiprocessor explicit parallelism.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
RapidMind Inc. announced in June that its flagship product, the RapidMind Multi-Core Development Platform, now supports the AMD FireStream™ 9170 and ATI Radeon™ HD 3870 graphics processor units (GPU) from AMD, enabling RapidMind platform users to automatically gain the benefits of the latest ATI technology without having to update their software.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
Rogue Wave Software, Inc. announced in June that they are collaborating with AMD to enhance the processing power of specialized applications in the financial services industry. As part of this relationship, AMD and Rogue Wave exhibited together at the Securities Industry and Financial Markets Association (SIFMA) Technology Management Conference and Exhibit in New York City, June 10-12
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
According to Neurala’s press release, linked below, Neurala claims to have successfully used AMD Stream processors and the AMD Stream SDK to run AI algorithms up to 230 times faster than a single core AMD Opteron™ CPU. This was done using the Neurala Technology Platform and AMD FireStream™.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
A common question the AMD Stream Team gets over the forums and which gets sent to streamdeveloper@amd.com is: “How can I run my Brook+/CAL program under Linux® without having to sit at the X console or log in first?”
While there are many different ways to try do this, here is one method that one of our AEs, Marc (you know him as marcr on the forums), has found to help with this problem:
You can edit /etc/gdm/custom.conf so that the last few lines look like this:
# Note that to disable servers defined in the defaults.conf file (such as # 0=Standard, you must put a line in this file that says 0=inactive, as # described in the Configuration section of the GDM documentation. # [servers] 0=Rendering
# Also note, that if you redefine a [server-foo] section, then GDM will # use the definition in this file, not the defaults.conf file. It is # currently not possible to disable a [server-foo] section defined # in the defaults.conf file. #
Then run gdm-restart, or reboot the system. This allows running Brook+/CAL programs remotely without manually logging into the system. Since this does disable X Windows security controls, you will want to make sure you are in a secure environment. There are various ways to tweak this to suit specific needs, but that is left as an exercise for the reader…
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
RapidMind will be showcasing live demonstrations of a 27x performance improvement of a Binomial option pricing calculator at SIFMA’08. RapidMind will also be demonstrating the same accelerated option pricing tool running on the AMD FireStream 9170. The demonstration will occur in the AMD booth #2000.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
Edited: 06/04/2008 at 10:51 PM by michael.chu@amd.com
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
The AMD Stream Computing website will be updated in the next few days to reflect this new release.
With v1.1-beta comes:
- AMD FireStream 9170 support - Linux support (RHEL 5.1 and SLES 10 SP1) - Brook+ integer support - Brook+ #line number support for easier .br file debugging - Various bug fixes and runtime enhancements - Preliminary Microsoft Visual Studio 2008 support
If you have any questions, please do not hesitate to post your question to the forum.
Sincerely, AMD Stream Team
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
Edited: 06/03/2008 at 04:35 AM by michael.chu@amd.com
Welcome to the AMD Stream Computing blog, where the AMD Stream Team will publish posts and mini-articles about all things Stream! My name is Michael Chu and I am the product manager for AMD Stream software. From time to time, other members of the team will post articles and will introduce themselves then. We are planning on bringing you interesting news as we find out about them along with any relevant releases and products.
We invite you to help guide the direction of this site by leaving comments that let us know if these are the types of content and topics you would like to see published.
If you are interested in developing with AMD Stream, please visit us on the developer forums (go to AMD Stream). We have a growing community of developers who are constantly sharing what they have learned as they developed their applications on AMD Stream.
Stay tuned for more exciting news!
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.