AMD Logo AMD Developer Central
AMD Developer Forums
Decrease font size
Increase font size
Topic Title: Bad performance on moving data between private memory and local memory
Topic Summary:
Created On: 11/03/2009 04:25 AM
Status: Post and Reply
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 11/03/2009 04:25 AM
User is offline View Users Profile Print this message

Author Icon
rexiaoyu

Posts: 27
Joined: 08/04/2009

Moving data from private memory to local memory is a very time-consuming job, isn't it?  When using the local memory in the kernel, my program runs much slower than before.

code:

__private float4 block[4];

__local float4 local_block[16];

//very slow here. Why?

local_block[local_id] = block[0];

local_block[local_id + 1] = block[1];

local_block[local_id + 2] = block[2];

local_block[local_id + 3] = block[3];

barrier(CLK_LOCAL_MEM_FENCE);

 11/03/2009 05:05 AM
User is offline View Users Profile Print this message

Author Icon
n0thing

Posts: 42
Joined: 08/18/2009

Local Data Share(LDS) supports only owner writes in R7xx series GPUs. It is emulated as global memory internally and hence you will not get expected performance.

See this slide (note the asterix on LDS) :  http://img17.imageshack.us/img17/1153/openclarchitecture.jpg

 

 11/03/2009 11:40 AM
User is offline View Users Profile Print this message

Author Icon
jcpalmer

Posts: 22
Joined: 09/20/2009

Please forgive my temporary inablility to check for my self, but these older cards do report CL_GLOBAL for local memory type right?

 11/03/2009 01:14 PM
User is offline View Users Profile Print this message

Author Icon
MicahVillmow

Posts: 525
Joined: 02/05/2008

rexiaoyu,
One think you can try that might help with performance is to use the async_copy instead of manually copying. This does the copy utilizing the whole group in parallel.

-------------------------
Micah Villmow
Advanced Micro Devices Inc.
--------------------------------
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

Statistics
6125 users are registered to the AMD Developer Forums forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2009 FuseTalk Inc. All rights reserved.

Contact AMD | Terms and Conditions | Forum Rules | ©2009 Advanced Micro Devices, Inc. | Privacy | Trademark information