AMD Logo AMD Developer Central
AMD Developer Forums
Decrease font size
Increase font size
Topic Title: memory corruption addressing global past 1442560?
Topic Summary:
Created On: 10/30/2009 06:14 AM
Status: Post and Reply
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 10/30/2009 06:14 AM
User is offline View Users Profile Print this message

Author Icon
emuller

Posts: 72
Joined: 04/21/2009

Running the attached kernel with

local_size = 64

global_size = 64*20 to use all 20 wavefronts on 4870

where inside the kernel I write to lines of an output buffer 64x20 wide in a loop.

The loop repeats "iter" times.

Up to iter==1127, everything looks fine on the output. For iter>1128, it breaks.  Results are complete garbage, perhaps due to memory corruption.  On my gtx260 using the nvidia stack, this number can be a factor of 80 higher with no problems ...  and I'm no where near running out of memory.  Is this a user error, bug, or hardware limitation?

 

 

Code:
__kernel void GenerateOut(uint iter, __global uint4 *seed, __local uint4 *shmem, __global uint4 *rngs) {


  uint4 rngRegs[REG_COUNT];

  LoadState(seed, rngRegs, shmem);
  for(uint i=0;i<iter;i++){

    rngs[get_global_id(0) + i*get_global_size(0)] = Generate(rngRegs, shmem);
    
  }
  SaveState(shmem, seed);

}



 10/30/2009 01:36 PM
User is offline View Users Profile Print this message

Author Icon
jcpalmer

Posts: 22
Joined: 09/20/2009

64 * 20 = 1280.  Have you queried the max CL_KERNEL_WORK_GROUP_SIZE to ensure it is lower?  I realize that clEnqueueNDRangeKernel should have returned CL_INVALID_WORK_GROUP_SIZE if this were the case, but this is beta software.  Not a great lead, but something to check off as the problem.

Weird though is 1442560.  Isn't that the size of the 3.5" floppy?  Could be significant.

 10/30/2009 02:02 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

jcpalmer,
CL_KERNEL_WORK_GROUP_SIZE is for local work group size, not global. emuller, does this occur with a simplified kernel? i.e. can you simplify it as much as possible where there are no unneeded calculations to reproduce the error and a single line change cause the error to occur?

Thanks,

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

 11/02/2009 07:47 AM
User is offline View Users Profile Print this message

Author Icon
emuller

Posts: 72
Joined: 04/21/2009

It appears the problem was something of a user error.  When I change my constant block initialization .... 

/*

__constant uint2 Q[32]=
{29,  5, 24, 14,  5, 28, 23, 24, 14, 19, 26, 13, 11,  0, 31, 17,  9,
       11,  3, 20,  1,  7, 28, 10,  0,  6,  2, 15, 22,  2, 20,  9, 18,  8,
       15, 23, 27,  4, 13, 30, 10, 12, 16, 25,  8,  3, 17, 21, 25, 26, 12,
       27, 19, 31, 30, 18,  7, 22,  6, 16,  4, 29, 21,  1};

*/

to the following:

__constant uint Q[2][32]={
  {29,24,5,23,14,26,11,31,9,3,1,28,0,2,22,20,18,15,27,13,10,16,8,17,25,12,19,30,7,6,4,21},
  {5,14,28,24,19,13,0,17,11,20,7,10,6,15,2,9,8,23,4,30,12,25,3,21,26,27,31,18,22,16,29,1}
};

Everything is fine (when associated code is setup to use uint).  Sorry for the false alarm.  NVIDIA's stack simply wouldn't compile the first one, which was the first hint.

If the first approach is incorrect, how should one correctly initialize an array of uint2's or uint4's then?  I could not find anything in the OpenCL spec to this regard.

 

 

 11/02/2009 01:33 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

The correct approach should be as follows:
__constant uint2 Q[32]=
{(uint2)(29, 5), (uint2)(24, 14),..., (uint2)(21, 1)};

This is how vector constructors work in OpenCL:
(typeN)(val0,...,valN-1);

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

Statistics
6125 users are registered to the AMD Developer Forums forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information