Topic Title: Why are GPU datasheets, usermanuals and register models so sparse ?
Topic Summary:
Created On: 12/29/2012 03:22 PM
Status: Post and Reply
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 12/29/2012 03:22 PM
User is offline View Users Profile Print this message

Author Icon
Brane212
Peon

Posts: 5
Joined: 12/29/2012

Why are GPU datasheets, usermanuals and register models so sparse ?

 

 

While one can get almost any info on most of AMD's x86_64 platforms ( and even there, many parts seem to be sorely missing- above all for APUs) but with GPUs available literature reads as informatively as a chinese phonebook in Bulgaria.

 

Abstraction packets like OpenCL SDK are nice, but thing with abstractions is, they are helpful only when "app programmer" is thinking along the same wavelength as the abstraction creator AND when abstraction creator correctly predicted all user needs and catered for them.

 

But GPU is not CPU and GPU computing is not nearly as mature as CPU computing. Also, while it is nice to be able to write program without concern for every physical contraint and time delay, it is often important to be able to be "Close to the Metal".

 

Shader programs are not multimegabyte monsters. Often times they are on the scale of what we have used to routinely write in assembly for 8051, PIC and many other microcontrollers. Such code could be mantained and polished by hand or through the use of some other abstraction.

 

It would be very nice to know MUCH more about internals. I don't see why is this so closely guarded secret. Its non-disclosure will only hurt honest developers since competition already has and uses decompilers, signal analyzers and whatnot to get the info they need. Only ones left without are ones that can't afford to screw around with hacking.

 

Is there a way for me to get really honest, detailed and explained info on AMD GPUs ? Do I need to sell my soul to the devil (NDA) ? Is my soul even worth that much  ? I am primarily interested in APU internals ( especially latest and ones /Steamroller + GCN /that are comming soon), but GPUs would be fine also and even IGP ( within RS780 etc).

 



-------------------------
On a journey of life I chose the psycho path...
 12/29/2012 03:40 PM
User is offline View Users Profile Print this message

Author Icon
Brane212
Peon

Posts: 5
Joined: 12/29/2012

Here is one, concrete example.

 

I am digging through Coreboot and Linux kernel sources, trying to adapt them to my needs.

I would like to have a machine with open-source BIOS that is much more than a BIOS and that is integrated with kernel that is to be booted later.

I plan to adapt such construction to many roles; workstation, netebook, server. 

In order to tweak and optimize things, I need to know the details.

Take server role for example. Many of us have converted cheap off-the-shelf gear to serve as small server, be it for mail, ftp, http, nfs or Samba server for Windows networking, usually with considerable RAID.

But such gear has one too many weak link- channel between northbridge and southbridge has relatively narrow bandwidth. Also, having one HT link between CPU and NB can be frequently bottleneck, too.

If you use onboard SATA ports for RAID-6 for example, that means that data will have travel through the channel once and recomputed parity has to be written back.

With new APUs however NB is already parth of the chip, so that bottleneck is gone. Furthermore, it has AES and GPU onboard. AES offers 10-20x speedup for en/de/crypting files on disk or whole disk and GPU could be very effectively used for on-chip RAID parity calculations.

So, exisitng Trinity ( and even more upcoming Richland) cheap FM2 board with cheap A10 APU could with extra HBA card ( like Datacenter DC7280) with perhaps extra 10Gbit-E card could kick arse as a one small but smokin' fast server.

It wouldn't be too big of a job to rework kernel code to offer more parity than existing RAID6 does. It would be nice to eb able to have RAID6+2 field of say 26 drives with 20 data drives, 4 parity drives and 2 coldspares.

Even if cryptoprotected, such RAID would still be FAST.

But without extra info, that would be difficult to do.

OpenCL might be fine, but when is the last time one has demonstrated such aplication executed from within linux kernel ?

 

 



-------------------------
On a journey of life I chose the psycho path...

Edited: 12/29/2012 at 06:10 PM by Brane212
Statistics
88978 users are registered to the AMD Support and Game forum.
There are currently 1 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.