AMD Logo AMD Developer Central
AMD Developer Forums
Decrease font size
Increase font size
Topic Title: debugging
Topic Summary:
Created On: 10/24/2009 04:47 AM
Status: Post and Reply
Linear : Threading : Single : Branch
1 2 Next Last unread
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 10/24/2009 04:47 AM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

There is a recent interview with some of the AMD devs (http://forums.amd.com/devblog/blogpost.cfm?catid=335&threadid=120276) which includes the comment "...the OpenCL CPU implementation levertages the CPU hardware debug features to provide excellent debug capabilities, using familiar debug environments, at full CPU speeds.".

I've probably missed it, but is there any debug support for Visual Studio 2008 on Vista planned for kernels running on the CPU, or perhaps within a GPU emulator? It would be great to catch kernel memory and build issues in Visual Studio.

 10/24/2009 07:01 AM
User is offline View Users Profile Print this message

Author Icon
jmundy

Posts: 2
Joined: 10/17/2009

I second this query. Even without Visual Studio integration, is there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed.

 10/24/2009 07:18 AM
User is offline View Users Profile Print this message

Author Icon
omkaranathan

Posts: 152
Joined: 08/09/2009

Originally posted by: jmundyis there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed.

You can get the build log using clGetProgramBuildInfo() API call.



-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
 10/24/2009 01:14 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Yes.. it's pretty close, but you get references like

C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set 

If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier.

This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself.

 10/27/2009 02:22 AM
User is offline View Users Profile Print this message

Author Icon
genaganna

Posts: 73
Joined: 12/12/2008

Originally posted by: david_aiken Yes.. it's pretty close, but you get references like

 

C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set 

 

If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier.

Presently, clCreateProgramWithSource is only supported. you can do what you are expecting from clCreateProgramWithBinary. This will be available in upcoming releases.

 

This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself.

  is it crashing for both CPU and GPU?



-------------------------
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
 10/27/2009 02:45 AM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably.

 10/27/2009 03:02 AM
User is offline View Users Profile Print this message

Author Icon
genaganna

Posts: 73
Joined: 12/12/2008

Originally posted by: david_aiken It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably.

 

 

What modifications you did while porting sample?

Post the code here once you get permission



-------------------------
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
 10/28/2009 09:31 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Taking the original RadixSort.cl from the NVidia SDK v.2.3, I did the following to get it working with AMD Stream v2.0-beta4:

1) copied scan.cl from oclScan NVidia example next to RadixSort.cl. The code also has to be changed to refer to this file rather than the missing "scan_b.cl".

2) create separate builds for AMD and NVidia.

3) modify the code and project settings to work with the AMD environment. Some of the convenience routines and logging were changed and a memory monitor added. Also added check for CL_DEVICE_TYPE_CPU.

4) copy the following AMD dlls into the AMD output directory:

aticalcl.dll, aticalrt.dll (pulled from recent driver)

OpenCL.dll (from AMD SDK)

5) running results in errors in both scan.cl and radixsort.cl:

<cl file> internal error: array_element_type: non-array type

   __local uint numtrue;

               ^

1 catastrophic error detected in the compilation of <cl file>

Compilation aborted.

This is resolved by passing "-DAMD_BUILD" to clBuildProgram for the AMD builds and conditionally removing the __local in both files.

6) once the .cl files build without errors, running with AMD results in a crash on calling clFinish():

> OCL46C9.tmp.dll!001e14d7()

  [Frames below may be incorrect and/or missing, no symbols loaded for OCL46C9.tmp.dll]

  OCL46C9.tmp.dll!001e166d()

  OpenCL.dll!1001612c()

Running with NVidia in both debug and release builds results in a passed test.
I don't see a way to attach binaries so i've put the project/source at http://rapidshare.com/files/299338017/oclRadixSort.zip.html.

 10/29/2009 07:16 AM
User is offline View Users Profile Print this message

Author Icon
genaganna

Posts: 73
Joined: 12/12/2008

It is failed to allocation device memory for mBlockOffsets on GPU(line number 57, RadixSort.cpp).

Try with following

   select small value for numElements.

   WORKGROUP_SIZE must be <= 256 for GPU.

 

Yes, It is crashing for CPU at my end also.  algorithm is too complex.



-------------------------
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
 10/29/2009 01:58 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Are you saying that it works for you on the GPU if you change these settings? If so, it would help if you could tell me which GPU you use and how many elements can you sort.

The algorithm is adapted from "

 10/29/2009 02:30 PM
User is offline View Users Profile Print this message

Author Icon
genaganna

Posts: 73
Joined: 12/12/2008

I tried with different values of numElements. It is crashing different places.

It takes lot of time to understand code. Hope we will reply back as early as possible.



-------------------------
-------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.
 10/29/2009 02:38 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Is it possible to get access to the AMD OpenCL CPU code under NDA? A call stack with source would really help to track down these mysterious crashes.

 10/29/2009 02:42 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

david_aiken,
The crash is most likely coming from a buffer overflow on the local/private/global memory. I don't have your code, but if you increase the amount of local/global/private memory, does the crash go away?

This is one problem with directly porting GPU code, overflow's are stopped by the hardware, this is not the case on the CPU.

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

 10/29/2009 03:50 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Can you tell me where the process for setting the size of these pools is described?

 10/29/2009 04:03 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

The memory size is the size of memory assigned to a specific cl_mem object.

Micah

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

 10/29/2009 04:14 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Well.. i reduced the numElements down to 16Kb and, as also reported by genaganna, still got a crash. I can play with different buffers, but i don't know if i'm addressing an underlying problem or just moving the symptoms around.

 10/29/2009 04:16 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

david_aiken,
Try modifying the size of the local memory inside the kernel.

Micah

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

 10/29/2009 04:38 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

Which variable in particular do you think would be best?

 10/29/2009 04:42 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

I would need to see kernel source to know that.

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

 10/29/2009 05:18 PM
User is offline View Users Profile Print this message

Author Icon
david_aiken

Posts: 32
Joined: 01/07/2009

You have it at the rapidshare link posted above. The kernel is almost identical to the NVidia kernel, but there was a complaint from the AMD compiler regarding one of the local variables. The issue didn't seem like it would cause a problem. 

It's an implementation of Satish's recent paper and at time of publication was considered to be the fastest GPU sort. I need to extend it and add other operations and your CPU-based approach seems good, but source would allow us to take full advantage of the dev environment (and GPUs). It would be nice if OpenCL was Open Source .

AMD Developer Forums » Software Development Topics » OpenCL™ » debugging

1 2 Next Last unread
Topic Tools Topic Tools
Statistics
6125 users are registered to the AMD Developer Forums forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information