<?xml version="1.0" ?> 
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
  <title>AMD Developer Forums - OpenCL"</title> 
  <description></description> 
  <link>http://forums.amd.com/forum/index.cfm?forumid=9</link> 
  <generator>FuseTalk Hosting Executive Plan</generator> 

	<item>
		<title>logical operators for build-in vector types</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122475</link> 
		<pubDate>2009-11-21T14:03:57 -05.00</pubDate> 
		<dc:creator>screw</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I would like to to use operator && two int4 and expect a int4 as result. I get an error, please see attached message.</p>
<p>Have you any hints?</p>
<p>&nbsp;</p>
<p>Thanks</p>
<p>screw</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Fluid simulation on OpenCL</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122445</link> 
		<pubDate>2009-11-21T00:08:28 -05.00</pubDate> 
		<dc:creator>memecs</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>Hello everyone,</p>
<p>I am a student working on a research project and I right now I have to write a model reduced fluid simulation on the GPU. Exactly what must be done on the GPU follows:</p>
<p>(1) look up the corresponding tet via some kind of spatial data structure<br /> (2) do a matrix multiply with the reduced state to figure out the fluxes<br /> (3) convert the fluxes to a velocity via a 3x3 matrix inversion<br /> (4) optionally: cache the velocity for this timestep<br /> (5) advect the particle<br /> (6) resolve any collisions with the fixed geometry (car)<br /> (7) render all the particles and a car</p>
<p>do you think i would get big advanteges from the use of OpenCL respect to DirectX11?</p>
<p>What about the rendering?</p>
<p>thanks</p>]]></description>
	</item>

	<item>
		<title>Ubuntu 9.10 64-bit CL_DEVICE_TYPE_CPU not found for Intel Core 2 Quad</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122412</link> 
		<pubDate>2009-11-20T13:55:45 -05.00</pubDate> 
		<dc:creator>david_aiken</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Is it possible to use CL_DEVICE_TYPE_CPU on Ubuntu 9.10 64-bit with an Intel Core 2 Quad CPU? clCreateContextFromType() is returning CL_DEVICE_NOT_FOUND on this platform. It works fine on Windows Vista.</p>]]></description>
	</item>

	<item>
		<title>Incorrect Memory Size</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122369</link> 
		<pubDate>2009-11-19T21:25:00 -05.00</pubDate> 
		<dc:creator>whiteshadow</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Sorry if this question is already ask somewhere and answer already provide, if so please point me to the correct place.</p>
<p>&nbsp;</p>
<p>I have a machine with 4GB RAM memory and HD5870 with 1GB memory. When I use clGetDeviceInfo to get CL_DEVICE_GLOBAL_MEM_SIZE it only return 3GB RAM in my machine and only 256MB video memory from HD5870 which is actually much smaller than the original value.</p>
<p>At first I though memory used by OS will deducted but end up the value getting was not.</p>
<p>When I try to allocate 140MB memory from HD5870, it fail and said memory not enough. In the end, I was able to use less than 110MB than 1GB memory provided by the graphic card.</p>
<p>I wish someone can tell me what is actually happen, that will be&nbsp;gratefully&nbsp;to help me&nbsp;understanding&nbsp;more in OpenCL.</p>
<p>&nbsp;</p>
<p>Thank you.</p>]]></description>
	</item>

	<item>
		<title>Changes on 12-11-2009 and Catalyst 9.11</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122354</link> 
		<pubDate>2009-11-19T18:21:40 -05.00</pubDate> 
		<dc:creator>mat69</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Looking at the Ati Stream website I see that there have been changes to some of the files on 12-11-2009.</p>
<p>Should we redownload them or has nothing inside these files been changed?</p>
<p>&nbsp;</p>
<p>Also does the Catalyst 9.11 driver include OpenCL support?</p>]]></description>
	</item>

	<item>
		<title>CPU / GPU in SDK v2.0-beta4 - Seems Backwards to me.</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122334</link> 
		<pubDate>2009-11-19T12:20:55 -05.00</pubDate> 
		<dc:creator>leonbass</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>The documentation for the newest beta seems to indicate that the SDK will run on CPUs but "doesn't use GPU accelleration" yet.</p>
<p>However, I am testing on 2 machines that have NVIDIA cards and Intel CPUs, and the samples run fine with DEVICE_TYPE_GPU, which impresses me, but with DEVICE_TYPE_CPU the call to <span style="font-size: x-small;"><strong>clCreateContextFromType()&nbsp; </strong>fails with <span style="font-size: x-small;">CL_DEVICE_NOT_FOUND.</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">On one machine I went ahead and installed the driver package (Catalyst), but that made no difference.</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">The machines are Vista64 and the builds are 64 bit.</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">What is it that I don't get?&nbsp; <br />Thanks in advance!</span></span><span style="font-size: x-small;"></span></p>]]></description>
	</item>

	<item>
		<title>New to openCL</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122296</link> 
		<pubDate>2009-11-18T23:42:35 -05.00</pubDate> 
		<dc:creator>ankurdh</dc:creator>
   	    <slash:comments>7</slash:comments> 
		<description><![CDATA[ <p>Hello everyone. I'm totally new to openCL and would like to learn to integrate CPU and GPU and try running complex programs on them concurrently and analyze the performance improvement. I've downloaded the stream sdk and some documentation about openCL and its programming constructs.</p>
<p>What i would want to know from community experts is that, how will i be able to port a part of a complex computation to the GPU? Like, say if i'm doin a triple integration, with all limits from -infinity to +infinity, will i be able to port two inner integrals to the GPU and then then last one to the CPU?</p>
<p>I also wanted to learn about how to use the Stream SDK. Any help or links would be most welcome. I'l be really thankful</p>
<p>Regards,</p>
<p>Ankur</p>]]></description>
	</item>

	<item>
		<title>cpu vs. gpu opencl performance</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122264</link> 
		<pubDate>2009-11-18T14:55:51 -05.00</pubDate> 
		<dc:creator>dstokac</dc:creator>
   	    <slash:comments>6</slash:comments> 
		<description><![CDATA[ <p>To check performance of both, cpu & gpu, I wrote a small program with three versions of a more elaborate copy kernel. The first kernel is just strightforward copying from one global buffer to another. In the second kernel we use vectorized (float-&gt;float4) copying procedure, whereas in the third version we try to make use of the local memory. Each version of the kernel is executed on both cpu & gpu, within one thread. Source code of the program is attached.</p>
<p>The results I get for my system (cpu=Dual Core Pentium E5200, gpu=HD4770):<br /><br />Local memory:<br />cpu: 32768<br />gpu: 16384<br />Memory type:<br />cpu: 2<br />gpu: 2<br />&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; cpu native - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.24 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cpu copy - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.07 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cpu copy4 - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.03 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br />cpu copy4_local - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.03 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gpu copy - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 49.3 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gpu copy4 - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15.6 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br />gpu copy4_local - exec.time:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15.8 t2:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3<br /><br /><br />Conclusions:<br />1) OpenCL implementation substantially outperformed native implementation,<br />about 3x faster. Since the cpu has 2 cores, I'm not sure where this speed up<br />comes from.<br />2) Vectorized version performs better, as expected. About ~2.5-3x better<br />than scalar version.<br />3) gpu performance is not comparable to cpu. cpu is a few hundert times<br />faster.<br /><br /><br />I'm posting this results to hear experiences from other users. Furthermore,<br />it would be nice to see results of the same program on other systems,<br />particularly those which have dedicated(fast) local memory, which is not the case with my system.<br />Any comments on how to increase gpu performance with respect to cpu<br />performance are welcomed. I would also be grateful to those who could give me good explanation of the posted results.It would be nice to know why is cpu __local associated to CL_GLOBAL (explanation for GPU can be found in other threads).<br /><br />P.S. Structure of the kernel is chosen so deliberately. It reflects more<br />complex structure of kernels I use. Of course, this simplified structure can<br />further be simplified, but then it wouldn't reflect demands imposed on<br />hardware by more complex kernels. Performance gains through async copying don't count either.)</p>]]></description>
	</item>

	<item>
		<title>Partial support for OpenCL on RV6xx</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122251</link> 
		<pubDate>2009-11-18T10:29:44 -05.00</pubDate> 
		<dc:creator>hazeman</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>For some time now we are told that RV6xx is missing hardware capabilities to support OpenCL. But is it really true ?</p>
<p>If you want full OpenCL unfortunatelly it's true. But not every program needs full OpenCL. Usually only small subset of the language is required ( like for matrix multiplication, vector addition , etc ).</p>
<p>So lets consider if RV6xx can support some part of OpenCL.</p>
<p>RV6xx doesn't have compute shaders. It means that we can't give any NDRange ( work items index space ). Our NDRange is defined by output buffer size.</p>
<p>1. So we have first restriction here - NDRange == dimension & size of output buffer. This is usually the case when we do classical stream computing ( convert one stream into another - as in brook ).</p>
<p>2. Second limitation is that we can't request any local work group size - the driver makes decision here.</p>
<p>3. Reads from memory must use texture unit. We can't write to read buffers ( it implies that kernel must have const pointers to read buffers ).</p>
<p>4. Writes to output buffer must follow pattern</p>
<p>out[gloval_id(0)] = value; ( for 1D )</p>
<p>out[global_id(1)*out_width + global_id(0)] = value; ( for 2D )</p>
<p>5. We can't use local memory ( missing on RV6xx ).</p>
<p>So on device side we have kernels which are restricted to the form:</p>
<p>void kernel( const global floatx* input1 [, const global floatx* inputx], global floatx* output1 [, global floatx* outputx], floatx,intx variables ( not pointers ) )</p>
<p>{</p>
<p>&nbsp;&nbsp;&nbsp; // can't use local_id, local_size</p>
<p>&nbsp;&nbsp;&nbsp; // only global_id, global size available</p>
<p>&nbsp;&nbsp; ... computations and memory read buffer access here ...</p>
<p>&nbsp;&nbsp; output1[ global_id(1)*output_width + global_id(0)] = value; // 2D case</p>
<p>&nbsp;&nbsp; // up to 8 outputs</p>
<p>}</p>
<p>such a kernel can be compilled into pixel shader. Kernels not matching this pattern should give compilation error on RV6xx.</p>
<p>On the host side we have limitation that NDRange given to kernel invocation must be as the same as output buffer size. ( there is some technical problem of opencl buffer not having 2D size, but it can be easilly solved by small extension to cal ).</p>
<p>&nbsp;</p>
<p>So as we see some part of OpenCL can be supported on RV6xx. Now you can decide for yourself whether this model is sufficient for your work or not.</p>
<p>Personally i think that AMD/ATI should implement this. It is logical extension of Brook framework ( connecting old with new ). Also it would be nice gesture from AMD/ATI towards people with older cards.</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Performance profiling tools for OpenCL GPU kernels</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122237</link> 
		<pubDate>2009-11-18T07:27:07 -05.00</pubDate> 
		<dc:creator>mjharvey</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>&nbsp;</p>
<p>Can anyone give me some pointer for which -if any- performance profiling tools work with&nbsp; GPU OpenCL kernels? A measure of SP occupancy and access to any profiling counters like CUDA permits would do the job nicely.</p>
<p>Ta,</p>
<p>MJH</p>]]></description>
	</item>

	<item>
		<title>Bug in Kernel.GetWorkGroupInfo function (cl.hpp)</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122235</link> 
		<pubDate>2009-11-18T06:28:22 -05.00</pubDate> 
		<dc:creator>dstokac</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>In line 3125</p>
<p>&::clGetKernelWorkGroupInfo, device(), object_, name, param),</p>
<p>should be replaced by</p>
<p>&::clGetKernelWorkGroupInfo, object_, device(), name, param).</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Possible to run OpenCL code on GPU and CPU concurrently?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122214</link> 
		<pubDate>2009-11-17T20:31:57 -05.00</pubDate> 
		<dc:creator>chrisgregg</dc:creator>
   	    <slash:comments>6</slash:comments> 
		<description><![CDATA[ <p>I've written a kernel that I would like to run on both the GPU and the CPU concurrently. &nbsp;First, I launch the GPU kernel, and then immediately launch the CPU kernel. &nbsp;The timers I have indicate that the GPU kernel doesn't block and the CPU kernel is indeed queued up immediately.</p>
<p>Unfortunately, it seems that the kernels don't run concurrently, or else I'm timing something incorrectly. &nbsp;What I'd like to see is the following (for instance):</p>
<p>Time for a single kernel to run on the GPU = 5 sec</p>
<p>Time for a single kernel to run on the CPU = 8 sec</p>
<p>Time for both, running concurrently = around 8 sec (but I'm seeing 13 sec)</p>
<p>I guess my question is about whether or not&nbsp;clEnqueueNDRangeKernel() will forward a kernel on to a second processor in a two-processor system if the first processor is already running a kernel. &nbsp;Thanks!</p>
<p>Edit: A related question would be: is it possible to have two queues, or is there only one queue because there is only one global&nbsp;<strong>clEnqueueNDRangeKernel() </strong>?</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>#include in kernel and structure alignment</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122209</link> 
		<pubDate>2009-11-17T18:30:41 -05.00</pubDate> 
		<dc:creator>nou</dc:creator>
   	    <slash:comments>6</slash:comments> 
		<description><![CDATA[ <p>i found a great feature which isnt make it clear in specification. in section 5.4.3.1 there are option -I. i call clBuildProgram(.., "-I.", ); and in kernel have #include "header.cl". and it work.</p>
<p>another thing. in release notes is stated that every element of structure must be aligned to float4. but i write struct and i get sizeof() that structure is 324 byte. <span id="result_box" class="short_text"><span style="background-color: #ffffff;" title="co je presne suma datovych typov v strukture">which is exactly the sum size of data types in the struct. i tested it only on CPU. is on GPU some restriction in alignment?</span></span></p>
<p>second thing in release note is that array in struct is not supported. but for me it worked.</p>]]></description>
	</item>

	<item>
		<title>Let&apos;s hunt a memory leak!</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122161</link> 
		<pubDate>2009-11-17T08:07:17 -05.00</pubDate> 
		<dc:creator>AndreasStahl</dc:creator>
   	    <slash:comments>8</slash:comments> 
		<description><![CDATA[ <p>Hello,</p>
<p>after searching my code for several months for a rather serious memory leak, I tried to reduce the problem to its core. It seems to happen either when a CommandQueue or a Kernel is being created, its arguments set, or during/after execution. I have attached a minimalistic C++ program, demonstrating this behaviour, to this message.</p>
<p>What it does is the following: during setup it creates a context, gets device handles, compiles a very simple increment kernel, creates a buffer of size 8 MByte and fills it with 0.</p>
<p>Then it does the following 100 times: get the command queue, create kernel from program, set buffer as kernel arg, enqueue kernel, wait for queue to finish. Allocation and deallocation is handled by the stack.</p>
<p>Afterwards the buffer, program, devices and the context are manually deallocated.</p>
<p>I made it so it halts</p>
<ol>
<li>before setup,&nbsp;</li>
<li>before allocating, executing, deallocing the queue and kernel 100 times,&nbsp;</li>
<li>after that, and&nbsp;</li>
<li>after I manually deallocate buffer, program, context etc.</li>
</ol> 
<ul>
</ul>
<p>if you look at task manager memory usage for the process, at the second and third halting points it should roughly be equal, and also at the last halting point it should be equal to the first. But it's not. Not at all, indeed! Here are my read-outs from windows task manager, when run on DEVICE_CPU:</p>
<ol>
<li>2,344 K</li>
<li>22,708 K</li>
<li>39,596 K</li>
<li>31,380 K</li>
</ol>
<p>so for 100 iterations, there were 3. - 2. = 16,888 K leaked. When I increase iteration count to 200, mem usage after kernel execution is 56,596 K, indicating a leak of 33,888 K!</p>
<p>300 iterations: 50,948 K leaked</p>
<p>400 iterations: 67,644 K leaked</p>
<p>This indicates a leak of ~169 K per Iteration.</p>
<p>For iteration counts over ~500, it fails during CommandQueue(), citing error code -6 -- Out of host memory.</p>
<p>When I halve the buffer size, the numbers don't change.</p>
<p>On DEVICE_GPU it leaks ~50 K per Iteration.</p>
<p>But maybe the problem is BKAC*, so please help me identify if there is something totally wrong with my memory allocation / deallocation pattern. Should I allocate queue and kernels only once during setup? I tried this in my production code once, but as soon as I had created the commandqueue handle the program refused to respond to input via the gui.</p>
<p>OS: Win7 x64</p>
<p>RAM: 4 GByte</p>
<p>Compiler: VC++ 2008</p>
<p>Devices: Athlon x64 CPU (1 GB reported), Juniper GPU (5770, 128 MB reported)</p>
<p>*) between keyboard and chair, i.e. me</p>]]></description>
	</item>

	<item>
		<title>Installing ATIStream SDK kit</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122152</link> 
		<pubDate>2009-11-17T02:57:54 -05.00</pubDate> 
		<dc:creator>mmathur</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I am new to using ATIStream SDK kit for OpenCL.</p>
<p>I am facing problems in installing the ATIStream SDK kit.</p>
<p>I have downloaded the 'atistream-1.4.0_beta-lnx32.tar.gzip'.</p>
<p>While I run the script 'atistream-brook-1.4.0_beta.i386.run'</p>
<p>I get the following error message:</p>
<p>
<p>-------------------------------------------------------------------</p>
<p>Installing package via RPM...</p>
<p>error: Failed dependencies:</p>
<p>libstdc++.so.6(GLIBCXX_3.4.5) is needed by atistream-brook-1.4.0_beta-1.i386</p>
<p>----------------------------------------------------------------------</p>
<p>On checking the system I find that libstdc++.so.6 exists in the system.</p>
<p>The system that I am using is a 32 bit machine and is running RedHat linux&nbsp;2.6.9-34.EL.</p>
<p>Could anyone provide any help on this?</p>
<p>thanks</p>
<p>Mona</p>
<div></div>
</p>]]></description>
	</item>

	<item>
		<title>Conflicting error messages</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122150</link> 
		<pubDate>2009-11-17T00:32:38 -05.00</pubDate> 
		<dc:creator>stelleg</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>I'm getting conflicting messages from getBuildInfo() and build().&nbsp; Build is returning error message CL_BUILD_PROGRAM_ERROR, but getBuildInfo&lt;CL_PROGRAM_BUILD_LOG&gt; only shows a minor warning:</p>
<p>/tmp/OCL5IbHhj.cl(10): warning: null (zero) character in input line ignored</p>
<p>Any help would be much appreciated.</p>]]></description>
	</item>

	<item>
		<title>Why can&apos;t i find gpu device supporting OpenCL on vista?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122106</link> 
		<pubDate>2009-11-16T08:59:32 -05.00</pubDate> 
		<dc:creator>codeboycjy</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>hi:<br />&nbsp;&nbsp;i've changed my operating system recently.<br />&nbsp; i've got a demo of OpenCL which works well on windows XP. When i run my demo on vista , the gpu device supporting opencl is not found.</p>
<p>&nbsp; i've got a hd radeon 5870 display card and latest 9.10 catalyst. Is there something i missed??</p>]]></description>
	</item>

	<item>
		<title>clEnqueueRead/WriteBuffer Performance Issue on HD5750</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122092</link> 
		<pubDate>2009-11-15T21:22:46 -05.00</pubDate> 
		<dc:creator>kazuki</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>I use clEnqueueRead/WriteBuffer with blocking mode on Radeon HD 5750.<br />But wrute throughput is lower than result of PCIeSpeedTest(ATI Stream Power Toys).<br />And read throughput is very lower than write throughput. why ?<br /><br />Test pseudocode:<br />&nbsp; size = 1024*1024*64;<br />&nbsp; NUM_TIMING_LOOPS = 100;<br />&nbsp; buf = clCreateBuffer(context,CL_MEM_READ_WRITE,size,NULL,&errcode);<br />&nbsp; stopwatch.start (); // use PerformanceCounter<br />&nbsp; for (int i = 0; i &lt; NUM_TIMING_LOOPS; i ++)<br />&nbsp;&nbsp;&nbsp; clEnqueueWriteBuffer(queue,buf,CL_TRUE,0,size,ptr,0,NULL,NULL);<br />&nbsp; stopwatch.stop ();<br />&nbsp; printf (...);<br /><br />Result:<br />&nbsp; write: 2.575GB/s<br />&nbsp; read: 1.197GB/s<br /><br />PCIeSpeedTestResult (v0.2):<br />[&nbsp; 67108864 bytes] CPU-&gt;GPU=&nbsp;&nbsp; 4.851 GB/sec, GPU-&gt;CPU= 861.791 MB/sec</p>]]></description>
	</item>

	<item>
		<title>printf within kernel</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122085</link> 
		<pubDate>2009-11-15T17:31:54 -05.00</pubDate> 
		<dc:creator>david_aiken</dc:creator>
   	    <slash:comments>19</slash:comments> 
		<description><![CDATA[ <p>Probably a FAQ, but is it possible to use printf or some other mechanism (source level debugging within Visual Studio!) within a kernel to output data structures when using the CPU?&nbsp;</p>]]></description>
	</item>

	<item>
		<title>cl_khr_fp64 on OpenCL</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122042</link> 
		<pubDate>2009-11-14T12:00:12 -05.00</pubDate> 
		<dc:creator>ibird</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>&nbsp;</p>
<p>I has an ATI HD4870 RV770 and an intel Q8200 processor</p>
<p>I am on linux ubuntu 9.04 and Ati driver beta installed</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Searching for CL_DEVICE_EXTENSIONS i get for the Q8200</p>
<p>"cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store"</p>
<p>&nbsp;</p>
<p>and the RV770 return an empety string</p>
<p>&nbsp;</p>
<p>Q8200 and RV770 has double support, but the extension "cl_khr_fp64" is not defined</p>
<p>I am doing something wrong ?</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Does AMD OpenCL support remote desktop in Windows for GPU code?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=122001</link> 
		<pubDate>2009-11-13T16:51:27 -05.00</pubDate> 
		<dc:creator>joephis</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I'm starting up a Windows based OpenCl application -- but the machine will need remote access to dev/test/integrate the code.</p>
<p>Can I execute OpenCL applications against an AMD GPU when I am accessing a Windows machine using Remote Desktop?</p>
<p>Thanks,</p>
<p>--Joe</p>]]></description>
	</item>

	<item>
		<title>AESEncryptDecrypt sample won&apos;t decrypt</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121997</link> 
		<pubDate>2009-11-13T14:15:04 -05.00</pubDate> 
		<dc:creator>chrisgregg</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I'm testing out the OpenCL sample programs, and I can't get the AESEncryptDecrypt code to decrypt the sample .bmp file (lena512.bmp). &nbsp;It seems to encrypt it (at least the bmp looks like random 1-bit noise), but when I try to decrypt it, using the command line:</p>
<p>$ ./AESEncryptDecrypt -d -i output.bmp -o unencryptedOutput.bmp</p>
<p>I get another 1-bit random noise .bmp back. &nbsp;What am I missing? &nbsp;Thanks!</p>
<p>&nbsp;</p>
<p>-Chris</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Timeframe for GPU double precision support in OpenCL?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121991</link> 
		<pubDate>2009-11-13T13:27:25 -05.00</pubDate> 
		<dc:creator>santy</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>I"m currently working on a code that requires double precision for many of its computations.&nbsp; I have a CUDA version of the code, but of course NVIDIA's current crop of GPUs has abysmal performance for double precision.&nbsp; Because of the vastly better double precision performance numbers, I'm interested in the ATI 5000 series cards.&nbsp; If the ATI Stream SDK supported double (even it not fully optimized), I'd pick up several 5870s for my application.</p>
<p>Can anyone in the know hint at a timeframe for GPU double precision via OpenCL?</p>
<p>Thanks,<br />Mike</p>]]></description>
	</item>

	<item>
		<title>OpenCL tutorial: NBody simulation</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121956</link> 
		<pubDate>2009-11-12T17:44:30 -05.00</pubDate> 
		<dc:creator>dar</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>BDT has posted a tutorial that might be of interest to anyone learning OpenCL.</p>
<p>http://www.browndeertechnology.com/docs/BDT_OpenCL_Tutorial_NBody.html</p>
<p>It uses libstdcl to simplify the host code, which was just mentioned in prev post.</p>
<p>Comments are welcome and appreciated.&nbsp; Hope its useful.</p>
<p>-dar</p>]]></description>
	</item>

	<item>
		<title>stdcl: standard compute layer library</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121955</link> 
		<pubDate>2009-11-12T17:41:17 -05.00</pubDate> 
		<dc:creator>dar</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>BDT has released a library (libstdcl) that provides a simplified interface to OpenCL<br />and also provides some tools for linking, tracing and timing compute layer calls.</p>
<p>Its free software and distributed under the LGPLv3 license.&nbsp; Currently targets Linux.</p>
<p>More info can be found here:</p>
<p>http://www.browndeertechnology.com/stdcl.html</p>
<p>-dar</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Extracting pointer to data from the cl::Buffer object</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121954</link> 
		<pubDate>2009-11-12T17:30:14 -05.00</pubDate> 
		<dc:creator>dstokac</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>We have a kernel</p>
<p>__kernel kern(__global int* a_arg){},</p>
<p>with argument a_arg initialized by</p>
<p>kern_kernel.setArg(0, a_buf),</p>
<p>where</p>
<p>cl::Buffer a_buf(...).</p>
<p>&nbsp;</p>
<p>Is it possible to extract value of a_arg from a_buf?</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Trying to run sample OpenCL program on a Linux machine with ATI Firestream 9250</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121946</link> 
		<pubDate>2009-11-12T16:14:12 -05.00</pubDate> 
		<dc:creator>Zeljko</dc:creator>
   	    <slash:comments>9</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I have built sample programs and try to run them on a Linux box with&nbsp;ATI Firestream 9250 card.</p>
<p>The fgrlx driver is installed on that machine.</p>
<p>The programs run, but they do not detect GPU.</p>
<p>Sample output:</p>
<p>
<p>~/ATI/ati-stream-sdk-v2.0-beta4-lnx64/samples/opencl/bin/x86_64$ (export LD_LIBRARY_PATH=../../../../lib/x86_64/:/opt/acml4.3.0-int64/gfortran64_int64/lib:/opt/acml4.3.0-int64/gfortran64_int64/lib:$LD_LIBRARY_PATH; ./BinarySearch)</p>
<p>&nbsp;</p>
<p>Sorted Input</p>
<p>0 1 1 1 1 1 1 2 3 4 5 6 6 6 7 7 7 8 8 9 9 9 9 10 11 11 12 12 13 13 14 14 14 14 14 14 15 16 17 17 18 18 19 19 20 20 21 22 22 23 24 25 25 25 25 26 26 26 27 28 28 28 28 28&nbsp;</p>
<p>&nbsp;</p>
<p>For test only: Expires on Sun Feb 28 00:00:00 2010</p>
<p>No protocol specified</p>
<p>Unsupported GPU device; falling back to CPU ...</p>
<p>l = 10, u = 11, isfound = 1, fm = 5</p>
</p>
<p>&nbsp;</p>
<p>How do I tell the program to use GPU?</p>
<p>Help would be greatly appreciated.</p>
<p>Zeljko</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Invalid Device - XFX 4870 1 GB</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121859</link> 
		<pubDate>2009-11-11T09:04:26 -05.00</pubDate> 
		<dc:creator>smatovic</dc:creator>
   	    <slash:comments>6</slash:comments> 
		<description><![CDATA[ <p>Hello,</p>
<p>got some of the OpenCL examples running with my onboard 3200. Now tried an 4870 from XFX but&nbsp; the device is not known.</p>
<p>Any hints? Is there a BIOS Problem with XFX Branding?</p>
<p>Greetings,</p>
<p>S.Matovic</p>]]></description>
	</item>

	<item>
		<title>Disturbing numbers</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121810</link> 
		<pubDate>2009-11-10T07:22:04 -05.00</pubDate> 
		<dc:creator>Stib</dc:creator>
   	    <slash:comments>7</slash:comments> 
		<description><![CDATA[ <p>I have run the same kernel with 2 different work group settings on my CPU, and GPU 10 times. The output i get is wrong. Very wrong!</p>
<p>Is my hardware broken, or what could be the problem??</p>]]></description>
	</item>

	<item>
		<title>Few questions about opencl &amp; cal</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121787</link> 
		<pubDate>2009-11-09T14:54:11 -05.00</pubDate> 
		<dc:creator>hazeman</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>Hi, I've got few question about opencl & cal for ATI team. ( And if possible please don't give PR type of answers - examples: "you can't compare opencl to brook cause opencl doesn't use texture units" or "opencl is only for 5xxx family" when compiler lists only 4xxx target ).</p>
<p>1. Do you plan to release new extesions used by opencl ( and when ) ? Also if possible could you write short description what they do ( some are obvious <img src="i/expressions/face-icon-small-smile.gif" border="0"> ) ?</p>
<p>Here is the list:</p>
<p>calExtGetProc: extid=8007 name=calConfig<br />calExtGetProc: extid=8005 name=calCtxCreatePrivateCounter<br />calExtGetProc: extid=8005 name=calCtxConfigPrivateCounter<br />calExtGetProc: extid=8005 name=calCtxGetPrivateCounter<br />calExtGetProc: extid=8008 name=calResAllocView<br />calExtGetProc: extid=8008 name=calResQueryInfo<br />calExtGetProc: extid=8008 name=calResMemCopy<br />calExtGetProc: extid=8009 name=calCtxWaitForEvents ( is it blocking ? )<br />calExtGetProc: extid=800B name=calMemCopyRaw</p>
<p>2.&nbsp; Are the devs going to implement LDS optimization for 4xxx family. Specifically I'm thinking about detecting if kernel writes to LDS match pattern "LDS[const1*p + const2]=value" ( where const2&lt;const1 ). This would allow to use native LDS. And if memory access doesn't follow this pattern use global memory ( as it's done now ).</p>
<p>Probably most of the kernels will use this access pattern anyway and it would give huge speed advantage ( and probably some could be converted by programmers if they knew about this optimization ).</p>
<p>3. Are the devs planning to implement use of texture units for memory access ( 4xxx family ). Again the problem is detecting by compiler if memory reads follow the pattern "value=some_const_pointer_parameter[width*y + x]" ( where x&lt;width and width is some value which could be computed by kernel ( or const or parameter ) ). As it's const it cannot be written and memory overlapping with other parameters could be detected at run time ( then we use current compiler code ).</p>
<p>This optimization is quite important for writing efficient kernels ( like matrix mul ).</p>
<p>4. CAL & 4xxx question. Access from/to memory by g[] variable ( global buffer ) generates code with UNCACHED flag. Is it possible to change it to CACHED ?</p>
<p>Example from some code:</p>
<p>07 TEX: ADDR(64) CNT(1) <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9&nbsp; RD_SCATTER R0, DWORD_PTR[0+R0.x], ELEM_SIZE(3) UNCACHED</p>
<p>Documentation to 7xx ISA suggests CACHED flag should be available.</p>
<p>And one more thingy <img src="i/expressions/face-icon-small-smile.gif" border="0">. If you can't answer some or all of this questions please write so <img src="i/expressions/face-icon-small-smile.gif" border="0">.</p>
<p>Also I can add that for me opencl is unusable without points 2 & 3 ( i'm forced to use CAL or switch to other brand which could be less hassle ).</p>
<p>Hazeman</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>cl.hpp improvements</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121730</link> 
		<pubDate>2009-11-08T12:09:25 -05.00</pubDate> 
		<dc:creator>bubu</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Could be possible to add two things to the cl.hpp, pls?</p>
<p>&nbsp;</p>
<p>1) Detect the OpenCL.dll. Delay Load DLL is not enough. We need a method to detect if the OpenCL.dll is present on the system because, if not, any cl:<img src="i/expressions/face-icon-small-happy.gif" border="0">evice or cl:<img src="i/expressions/face-icon-small-tongue.gif" border="0">latform will cause an ugly "Sorry, this program cannot find the module OpenCL.dll".</p>
<p>For windows, detect the Windows\system32\OpenCL.dll presency...</p>
<p>For linux/macos/solaris, I don't know</p>
<p>&nbsp;</p>
<p>2) We need a way to see if the GPU is actually the primary display... so we can skip that card due to the 5 seconds watchdog or whatever. Just add a new CL_GPU_IS_PRIMARY_DISPLAY to the cl:<img src="i/expressions/face-icon-small-happy.gif" border="0">evice.</p>
<p>&nbsp;</p>
<p>thx</p>]]></description>
	</item>

	<item>
		<title>Radeon 4850 and &quot;Unsupported GPU device&quot;</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121718</link> 
		<pubDate>2009-11-08T08:31:04 -05.00</pubDate> 
		<dc:creator>Naycon</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>I'm trying to run the Stream beta4 samples, but openCL does not seem to find my Radeon 4850 card when looking for a GPU-context, as all the samples print the "Unsupported GPU device"-string and then fallbacks to using CPU. Any ideas how as of what could be wrong?</p>
<p>I've installed (and reinstalled) the Stream beta4 SDK, got the latest drivers for my card (v9.10), all the path variables are pointing to the correct places and the code seem to compile just fine.</p>
<p>I'm using Windows 7 64bit. I've tried to search the forums for anyone having a similar problem, but to no result.</p>]]></description>
	</item>

	<item>
		<title>openCL on RV610 / HD3200 ?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121716</link> 
		<pubDate>2009-11-08T08:00:19 -05.00</pubDate> 
		<dc:creator>ThijsWithaar</dc:creator>
   	    <slash:comments>6</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>Is it possible to do openCL computations on a RV610 graphics card ?</p>
<p>It's an on-board ATI radeon 3200.</p>
<p>With a test-program, I can compile just fine for the CPU, but any attempt to build the program for the GPU results in a "link failed". I tried some of the example, and also a tiny one:</p>
<p>__kernel void test(__global float yOut)<br />{<br />&nbsp;&nbsp;&nbsp; //yOut = get_global_id(0);<br />}</p>
<p>Is the card just too cheap, or I am doing something wrong ?</p>
<p>I've attached the openCL info of the graphics-card. It does not have any extensions at all, but I'm not sure how important that is.</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>compiler crash</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121680</link> 
		<pubDate>2009-11-07T13:15:47 -05.00</pubDate> 
		<dc:creator>twiig</dc:creator>
   	    <slash:comments>9</slash:comments> 
		<description><![CDATA[ <p>After calling clBuildProgram, my program crashes. &nbsp;I have traced it down to trying to index a write-only global portion of memory with an index that has been involved in a division. &nbsp;That make sense?</p>
<p>For example, in the attached code, the program will compile successfully if I remove the normalize call. &nbsp;Furthermore, if I write my own normal function it will still fail. &nbsp;I have tracked it down to the index variable(s) being involved in any way with a divide. &nbsp;This only happens when it is compiled for the GPU by the way.</p>
<p>First, I think this must be a compiler bug. &nbsp;Second, does anyone know how to normalize a vector without using a division? &nbsp;I tried multiplying the vector by the length calculated using rsqrt to no avail.</p>]]></description>
	</item>

	<item>
		<title>beta4: CPU target without GPU hardware</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121652</link> 
		<pubDate>2009-11-06T17:42:55 -05.00</pubDate> 
		<dc:creator>mjharvey</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p><br />My question is: is there a way to install the beta4 on a Windows machine that is lacking a GPU so that the CPU can be used as the ocl target device?</p>
<p>&nbsp;</p>
<p>I'd like to use the beta4 runtime to target the CPU on a machine without a Radeon GPU, but the catalyst drivers won't install in the absence of a graphics card.</p>
<p>I've manually installed the aticalcl.dll, &c from a different machine, along with the demo applications.Now, when I try running one, I get:</p>
<p>&nbsp;</p>
<p>$ ./RadixSort --device cpu<br />Error: clCreateContextFromType failed. Error code : CL_DEVICE_NOT_FOUND<br />For test only: Expires on Sun Feb 28 00:00:00 2010</p>
<p>MJH</p>]]></description>
	</item>

	<item>
		<title>static arrays in kernels</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121638</link> 
		<pubDate>2009-11-06T14:36:00 -05.00</pubDate> 
		<dc:creator>twiig</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I have been having a bit of trouble compiling a kernel with a static array of integers. &nbsp;When compiling the kernel (see code below) on the GPU, I receive the following:</p>
<p>
<p>&nbsp;</p>
<p>For test only: Expires on Sun Feb 28 00:00:00 2010</p>
<p>CL Error: clBuildProgram (-11)</p>
<p>Build Log:</p>
<p>Link failed</p>
<p>Press any key to continue . . .</p>
<div></div>
<div>And, when compiled for the CPU, I receive:</div>
<div></div>
<div></div>
<div>
<div>For test only: Expires on Sun Feb 28 00:00:00 2010</div>
<div>C:\Users\tyler\AppData\Local\Temp\OCLB365.tmp.obj:fake<img src="i/expressions/face-icon-small-sad.gif" border="0">.text+0x2c): undefined reference to `__vla_alloc'</div>
<div>C:\Users\tyler\AppData\Local\Temp\OCLB365.tmp.obj:fake<img src="i/expressions/face-icon-small-sad.gif" border="0">.text+0xba): undefined reference to `__vla_dealloc'</div>
<div>C:\Users\tyler\AppData\Local\Temp\OCLB365.tmp.obj:fake<img src="i/expressions/face-icon-small-sad.gif" border="0">.text+0x18e): undefined reference to `__vla_alloc'</div>
<div>C:\Users\tyler\AppData\Local\Temp\OCLB365.tmp.obj:fake<img src="i/expressions/face-icon-small-sad.gif" border="0">.text+0x21d): undefined reference to `__vla_dealloc'</div>
<div>CL Error: clBuildProgram (-11)</div>
<div></div>
<div>Build Log:</div>
<div>Compilation failed</div>
<div>Press any key to continue . . .</div>
</div>
<div></div>
<div>Any ideas? &nbsp;Note that if the only line of code in the kernel is the array declaration, the same result is obtained.</div>
</p>]]></description>
	</item>

	<item>
		<title>Possible compiler bug ( volatile qualifier )</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121610</link> 
		<pubDate>2009-11-05T18:36:04 -05.00</pubDate> 
		<dc:creator>hazeman</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Access to memory with volatile qualifier is optimized out by compiler ( only first access is generated to IL/ISA , following reads from the same address are removed ).&nbsp;</p>
<p>Sample code</p>
<p>global volatile flot4* v;</p>
<p>a1 = v[0]; &lt;- this read is generated</p>
<p>a2 = v[0]; &lt;- optimized out</p>
<p>a3 = v[0]; &lt;- optimized out</p>
<p>The standard says "The type qualifiers const, restrict, volatile as defined by the C99 specification are supported". So I think this needs to be corrected.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>cl.hpp != cl.hpp   ?!?!?!</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121586</link> 
		<pubDate>2009-11-05T05:48:40 -05.00</pubDate> 
		<dc:creator>Stib</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>I found out, that the cl.hpp that Ati Stream installed to my system, is not exactly that, what is in the Khronos group, OpenCL API registry. I searched for differencies, and found some.</p>]]></description>
	</item>

	<item>
		<title>Trouble with beta4 drivers</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121573</link> 
		<pubDate>2009-11-05T01:10:56 -05.00</pubDate> 
		<dc:creator>abrahm</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I'm running Ubuntu 9.04 amd64. I set up ati-stream-sdk-v2.0-beta4-lnx64.tgz without trouble. I can compile and run the example programs on my cpu (if I copy libaticalrt.so and libaticalcl.so to $ATISTREAMSDKROOT/lib/x86_64/).</p>
<p>I wanted to take the next step, and get code running on my Radeon Mobility 4830. I extracted ati-opencl-beta-driver-v2.0-beta4-lnx.zip. In the fglrx-8.67/ directory, I run "sudo ./ati-driver-installer-8.67-x86.x86_64.run". Everything appears to build and install successfully. The kernel module can load without trouble, but I get a black screen when X starts. It appears as though my computer has completely locked up.</p>
<p>If I remove the fglrx.ko kernel module, then X will start and not hang the system. I get the "For testing" AMD logo in the lower right corner of the screen. 2D performance is absolutely awful, and I get an error when I try to run my openCL applications. After the expiration message printout, I get an "X Error of failed request: ..." and then some information about failed request opcodes.</p>
<p>Am I missing something?</p>]]></description>
	</item>

	<item>
		<title>Using AMD SDK with Nvidia driver and Nvidia samples with AMD driver..</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=390&amp;threadid=121557</link> 
		<pubDate>2009-11-04T19:54:18 -05.00</pubDate> 
		<dc:creator>oscarbarenys1</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Includes some fixes to run some samples and insight about failing example</p>
<p>Nvidia samples AMD drivers</p>
<p><a href="http://oscarbg.blogspot.com/2009/11/nvidia-sdk-with-amd-opencl.html">http://oscarbg.blogspot.com/2009/11/nvidia-sdk-with-amd-opencl.html</a></p>
<p>Nvidia samples Nvidia 195 driver</p>
<p><a href="http://oscarbg.blogspot.com/2009/11/amd-opencl-samples-on-nvidia-195-opencl_05.html">http://oscarbg.blogspot.com/2009/11/amd-opencl-samples-on-nvidia-195-opencl_05.html</a></p>
<p>AMD samples Nvidia 195 driver</p>
<p><a href="http://oscarbg.blogspot.com/2009/11/amd-opencl-samples-on-nvidia-195-opencl.html">http://oscarbg.blogspot.com/2009/11/amd-opencl-samples-on-nvidia-195-opencl.html</a></p>
<p>AMD and Nvidia SDK in VS 2010</p>
<p><a href="http://oscarbg.blogspot.com/2009/11/optix-and-amd-opencl-sdk-visual-studio.html">http://oscarbg.blogspot.com/2009/11/optix-and-amd-opencl-sdk-visual-studio.html</a></p>
<p>&nbsp;</p>]]></description>
	</item>

</channel>
</rss>
