Topic Title: HD 5000 Folding performance
Topic Summary: What do you expect ?
Created On: 09/26/2009 04:01 AM
Status: Read Only
 09/26/2009 04:01 AM
User is offline View Users Profile Print this message

Author Icon
Case Modder

Posts: 1554
Joined: 10/08/2003

Even thought HD 4000 has theoritically higher FLOPS than GTX 200 series GPU's in both Single & Double Precision computing, all GeForce series from 8000 to 200 through the 9000 series has much better performance in CUDA application specially Folding...

HD 4870 had 1200 TFlops of Single precision power, and 240GFlops of Double Precision power, and this by far much lower than - the supposed to be more powerfull - GTX 280 which tops the SP performance at 933GFlops and the DP performance at much lower 78GFlops, even the flagship GTX 285 doesn't touch the HD performance in both not to mention the performance of the faster HD 4890 ( the GTX 285 has 1066/89 in SP/DP performance while RV790 (HD 4890) has 1360/272GFlops of power for SP/DP )

the fact that HD 4890 ( RV790 ) isn't that different from HD 4870 ( RV770 ) it's basically just a 100MHz faster ( 750MHz for RV770 vs. 850MHz for RV790 ).. and this is where's the extra 160/32GFlops in SP/DP comes from...

we know that the GPU folding team is working very hard, but there's three teams already

the main GPU team that develops the main core of GPU folding
the nV team that develops the core of CUDA use for GPU2 client
the ATi team that develops the core of Stream use for GPU2 client

the nV team is working much much better than ATi team, I donno if this coz nVIDIA support them but resources and/or $$ or coz they love what are they doing or coz they're very good in what are they doing !!
the ATi team in another hand is known to have less resources and maybe less $$$, I know both teams are working for charity and both ATi and nVIDIA doesn't have to pay them, but paying will let them work harder too...

the ATi team has gone very good work, but it's still not enough, we don't know why is it coz working for Stream is much harder than working for CUDA ( for example there's have been some info's about changes between HD 2000 and HD 3000 and HD 4000 )... or coz the number of developers in ATi team is much lower than those in nV team...

or it maybe that AMD doesn't give them much resources and/or $$$...

for what ever reason, Radeon HD's are much slower than GeForec 8/9/200, for example a GTX 260 can be as twice as fast as HD 4870 !!

archi. RV790 didn't come with any changes compared to RV770 so the only different we can expect comes fromt he 100MHz increase in clock.. and that's it, nothing more

but RV870 comes with more than a bump in shaders.
at 850MHz it's same as RV790 in terms of clock, but with the double amount of shaders ( 1600 for RV870 vs. 800 for RV790 ) we can expect at least twice performance as RV790, so will this make RV870 faster than GTX 260 ?
we don't know as personally I didn't saw any Folding performance for HD 5000, even from some review sites that used to have such testing for every new GPU ( for example )...

but is doubling is all what we get ? this is not enough as doubling mean that HD 5870 will have a little bit more than GTX 260, not to mention the GTX 260 Core 216 version which isnearly 12.5% faster !

according to review sites, RV870 isn't just double the RV790 with DX11; The shaders has been improved too...

the basic shaders in RV790 are 800 shaders, but those are not the same...

the 800 shaders are grouped in units called SIMD processors, each had 80 shaders...

these shaders too are not the same, they're grouped in a group of 5 shaders, 4 basic shaders, and one special function shader...

the fifth one is cabable of doing SP+DP computing, but the rest are not, the 4 shaders are very basic, they can do some MAD computing, but the rest is all in the fifth 'Special Function' Shader...

I don't know what exactly special function can do, but this is the reason for the ratio between SPP power of RV790, its 5:1... and this is why the ratio for all RV7xx power is the same, always 5:1 ( 1200:240 for example in RV770 )...

the story doesn't change in RV870, the ratio is still the same : 5:1 this is from calculating the GFlops power from official AMD slides 2720/544GFlops in SP/DP...

but from slides too, shaders are not the same, they have been improved and the basic shaders in RV870 can do more than what basic shader in RV770 can do, still not be able to do DP as it's still only for the fifth 'Special Function' Shader... but they can do more.. not more than basic MAD/MUL or ADD but it's a little bit different...

every shader can do 1 32bit FP MAD per clock, so the whole group of five can do 5x 32bit FP MAD per clock... ( 4 for basic shaders and 1 for the spcial function shader )
using 2 shaders will make it possible to do 1 64bit MUL or ADD per clock, so 2 64bit MUL or ADD can be made in the 4 basic shaders.

and using 4 shaders will make it possible to do 1 64bit MAD per clock

or we can use the 4 shaders to have 4 24bit Int MUL or ADD per clock...

I don't know the details about RV790, and by the way I'm not a programmer so excuse me with this description this is why putting the slide is better

I know that the 24bit part wasn't in RV700, it's new here in RV800...

and I don't know what calculation and power folding needs from these...

the only thing I know is that Folding performance with HD 5000 can be easly the same with HD 4000 in the begining tduo to the fact the core may not recognise the extra power... but I suspect it will so it will be higher and closer to 2x...

but I think we may see another extra juice after optimizing the core for HD 5000.. but when ? I don't know as AFAIK the ATi team didn't even optimize the core for HD 4000 series !!
 09/26/2009 10:33 AM
User is offline View Users Profile Print this message

Author Icon
Mad Scientist

Posts: 2149
Joined: 03/31/2004

I think DP precision FLOPS are of much importance for Folding...

I too have been curious about this. I wish Pandegroup'd get a version out that would support the new cards. Either that or that we'd get a Catalyst supporting Folding for the 5000 series.

