|
|
|
![]() |
AMD Developer Forums | ![]() |
|
help :
faq :
home
|
||
|
Latest News:
|
latest topics : statistics | |


|
Topic Title: debugging Topic Summary: Created On: 10/24/2009 04:47 AM Status: Post and Reply |
Linear : Threading : Single : Branch |
|
Search Topic |
Topic Tools
|
|
|
|
|
There is a recent interview with some of the AMD devs (http://forums.amd.com/devblog/blogpost.cfm?catid=335&threadid=120276) which includes the comment "...the OpenCL CPU implementation levertages the CPU hardware debug features to provide excellent debug capabilities, using familiar debug environments, at full CPU speeds.". I've probably missed it, but is there any debug support for Visual Studio 2008 on Vista planned for kernels running on the CPU, or perhaps within a GPU emulator? It would be great to catch kernel memory and build issues in Visual Studio. |
|
|
|
|
|
|
|
|
I second this query. Even without Visual Studio integration, is there a way to view kernel compiler error messages? Now there is just a numeric code returned that the program build failed when clBuildProgram is executed. |
|
|
|
|
|
|
|
|
You can get the build log using clGetProgramBuildInfo() API call. ------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Yes.. it's pretty close, but you get references like C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier. This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself. |
|
|
|
|
|
|
|
|
C:\Users\daiken\AppData\Local\Temp\OCL454.tmp.cl(54): warning: variable "lsb" is used before its value is set
If you double-click on them in the output window they will navigate to the appropriate line in the editor.. or they would if the temporary file still existed. Really what you want, though, is the path to the original .cl file. It's possible to sweep through the output with a regex, replacing the file paths, but a simple fix to the OpenCL implementation would make it much easier. Presently, clCreateProgramWithSource is only supported. you can do what you are expecting from clCreateProgramWithBinary. This will be available in upcoming releases.
This isn't a big issue for me currently. Catching subtle memory overwrites is. I'm working with a radix sort pulled from the NVidia SDK (it uses the recent paper from Satish et al) and it crashes in clFinish(). I suspect it's due to a memory error, but the code is quite low-level so it's difficult to isolate. They are NVidia kernels so i'm waiting for permission to post it here. If there is some way to use the AMD source or an emulator with runtime error checking i'll do the work myself. is it crashing for both CPU and GPU? ------------------------- ------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
It crashes when running it against an Intel Core 2 Quad Q6600 and AMD Turion 64 X2. I don't have an AMD GPU yet, regrettably. |
|
|
|
|
|
|
|
|
What modifications you did while porting sample? Post the code here once you get permission ------------------------- ------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Taking the original RadixSort.cl from the NVidia SDK v.2.3, I did the following to get it working with AMD Stream v2.0-beta4: 1) copied scan.cl from oclScan NVidia example next to RadixSort.cl. The code also has to be changed to refer to this file rather than the missing "scan_b.cl". 2) create separate builds for AMD and NVidia. 3) modify the code and project settings to work with the AMD environment. Some of the convenience routines and logging were changed and a memory monitor added. Also added check for CL_DEVICE_TYPE_CPU. 4) copy the following AMD dlls into the AMD output directory: aticalcl.dll, aticalrt.dll (pulled from recent driver) OpenCL.dll (from AMD SDK) 5) running results in errors in both scan.cl and radixsort.cl: <cl file> internal error: array_element_type: non-array type __local uint numtrue; ^ 1 catastrophic error detected in the compilation of <cl file> Compilation aborted. This is resolved by passing "-DAMD_BUILD" to clBuildProgram for the AMD builds and conditionally removing the __local in both files. 6) once the .cl files build without errors, running with AMD results in a crash on calling clFinish():
> OCL46C9.tmp.dll!001e14d7() [Frames below may be incorrect and/or missing, no symbols loaded for OCL46C9.tmp.dll] OCL46C9.tmp.dll!001e166d() OpenCL.dll!1001612c() Running with NVidia in both debug and release builds results in a passed test.
I don't see a way to attach binaries so i've put the project/source at http://rapidshare.com/files/299338017/oclRadixSort.zip.html.
|
|
|
|
|
|
|
|
|
It is failed to allocation device memory for mBlockOffsets on GPU(line number 57, RadixSort.cpp). Try with following select small value for numElements. WORKGROUP_SIZE must be <= 256 for GPU.
Yes, It is crashing for CPU at my end also. algorithm is too complex. ------------------------- ------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Are you saying that it works for you on the GPU if you change these settings? If so, it would help if you could tell me which GPU you use and how many elements can you sort. The algorithm is adapted from " |
|
|
|
|
|
|
|
|
I tried with different values of numElements. It is crashing different places. It takes lot of time to understand code. Hope we will reply back as early as possible. ------------------------- ------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Is it possible to get access to the AMD OpenCL CPU code under NDA? A call stack with source would really help to track down these mysterious crashes. |
|
|
|
|
|
|
|
|
david_aiken,
The crash is most likely coming from a buffer overflow on the local/private/global memory. I don't have your code, but if you increase the amount of local/global/private memory, does the crash go away? This is one problem with directly porting GPU code, overflow's are stopped by the hardware, this is not the case on the CPU. ------------------------- Micah Villmow Advanced Micro Devices Inc. -------------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Can you tell me where the process for setting the size of these pools is described? |
|
|
|
|
|
|
|
|
The memory size is the size of memory assigned to a specific cl_mem object.
Micah ------------------------- Micah Villmow Advanced Micro Devices Inc. -------------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Well.. i reduced the numElements down to 16Kb and, as also reported by genaganna, still got a crash. I can play with different buffers, but i don't know if i'm addressing an underlying problem or just moving the symptoms around. |
|
|
|
|
|
|
|
|
david_aiken,
Try modifying the size of the local memory inside the kernel. Micah ------------------------- Micah Villmow Advanced Micro Devices Inc. -------------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
Which variable in particular do you think would be best? |
|
|
|
|
|
|
|
|
I would need to see kernel source to know that.
------------------------- Micah Villmow Advanced Micro Devices Inc. -------------------------------- The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied. |
|
|
|
|
|
|
|
|
You have it at the rapidshare link posted above. The kernel is almost identical to the NVidia kernel, but there was a complaint from the AMD compiler regarding one of the local variables. The issue didn't seem like it would cause a problem. It's an implementation of Satish's recent paper and at time of publication was considered to be the fastest GPU sort. I need to extend it and add other operations and your CPU-based approach seems good, but source would allow us to take full advantage of the dev environment (and GPUs). It would be nice if OpenCL was Open Source |
|
|
|
|
AMD Developer Forums
» Software Development Topics » OpenCL™
»
debugging
|
Topic Tools |
FuseTalk Hosting Executive Plan v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.
| Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information |