 08/18/2009 12:39 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
1. remove L2.
2. 4 blocks to create the analysis.
3. Create a common unit calculation.
4. Kesh place between the two blocs to uskoeniya work.
Units of analysis must share data on the status of processing.
What say the representatives of AMD??
|
|
|
|
 08/26/2009 05:07 PM
|
HaricotVert B1FF

Posts: 106
Joined: 05/28/2009
|
Originally posted by: OPTERON 2???
What say the representatives of AMD??
I'm pretty sure they'd say that they don't take suggestions on public forums...
Might want to try http://forums.amd.com/devforum/ for a more receptive audience than the end-user troubleshooting/info forums.
Edited: 08/27/2009 at 12:30 AM by HaricotVert
|
|
|
|
 08/29/2009 06:01 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
I had in view of approximately this circuit. This future ???. PHENOM architecture deadlock.
The first step, they have increased kernels, optimised cache operation.
The second step, Shanghai.
The third step, is required redesign of kernels. Optimisation of streams.
I here thought, it is necessary to start with webs (spider) by development. Streams should have possibility to go on the shortest paths.
I hope they over it I work.
|
|
|
|
 10/10/2009 06:31 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
I here have understood recently that that I drew that already is used in nehalem. You look on nvidia, they are ready to change all for 2 months.
If for engineers AMD cards everything are opened as how to do? Why you cannot make, that is already drawn? Or to do much faster all. Engineers brake, processors brake!
It is ingenious, and should leave!
I had it in view of! Look above two schemes. There is a wheel, anybody is better than a wheel has thought up of nothing since 18 centuries.
The most difficult in this work to create the most shortest ways on which the data should move. In the mathematician to you would put 2, for PHEHOM. Irrationally, also should be laid off! Your profit will grow from rationality.
If to make the processor on the basis of a web? Why the spider does for itself a web??
Your data as the wheel which goes on road. It is a circle! Recurrence and repeatability!
Edited: 10/10/2009 at 06:56 AM by OPTERON 2???
|
|
|
|
 05/21/2010 11:45 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
I would suggest to modify the scheme bulldozer.
Calling system Cyclone.
Data by central controller are processed quickly.
a ring of processing
IMU should be twice. The data are processed may be placed in the "DATA CACHE". The process must go through certain stages of filtration. Treatment should be started earlier than the data will be processed. Flows should be marked. The central controller must monitor the data that are in "DATA CACHE".
There should be a part of the 8 streams of data that is filtered.
After decoding the data should be grouped, and therefore do not send them directly to processing and prepare the group for the processing of 8 threads.
The central controller must traced, that the groups are filled. It can also disable the kernel that do not involve
The central controller and "DATE CASHE" should be merged. How does the brain in humans.
Edited: 05/21/2010 at 10:26 PM by OPTERON 2???
|
|
|
|
 08/30/2010 02:19 PM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
I propose to use to combine all the belts. Make them multistage. In order, the data do not move back into the cache
multifunctional conveyor needs. We must work on the vicissitude of the nuclei.
|
|
|
|
 08/30/2010 02:34 PM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
http://www.mkgt.ru/files/mater...atic/138/glava_35.htm
Conveyor and superscalar processing
Parallelism at the level of instruction execution pipeline and load planning methodology unfolding cycles
|
|
|
|
 08/30/2010 02:37 PM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
Fundamentals of planning load conveyor and deployment cycles
To maintain maximum loading conveyor should be used concurrency level commands, based on the identification of sequences of unrelated instructions that can be performed in a pipeline with reconciling. To avoid suspension conveyor dependent command must be separated from the original team at a distance in clock cycles equal to the delay of the conveyor for the original team. The ability of the compiler to perform such planning depends on the degree of parallelism level commands available in the program, and on the delay of the functional devices in the pipeline. In this chapter we assume the delay shown in Figure 5.24, unless explicitly not installed other delays. We assume that the conditional transitions have a delay of one clock cycle, so that the command following the command of transition can not be determined during one cycle after the command of a conditional transition. We assume that the functional unit is fully pipelined or duplicated (as many times as the depth of the conveyor), so that the operation of any type may be issued for execution in each cycle and structural conflicts are absent.
|
|
|
|
 08/30/2010 03:01 PM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
From edge to center offset. If you are a miss, conveyor immediately filled with other data, while there is a search. These reserved. You do not move the data, and use registers RISC
The process can not stop, but simply to retrieve data from memory
Suppose that each ring is a stage of processing. To increase speed, you can produce a displacement of the ring. Performing pasting data. We can also fill conveyor of information, not stopping. To increase the speed you can move around the ring. And if the command is repeated, it can simply duplicate.
It should produce, displacement
Edited: 08/30/2010 at 03:14 PM by OPTERON 2???
|
|
|
|
 08/31/2010 09:40 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
PORT? This may have a ring processing at INTEL, called PORT (ring-processing) had taken command. What I described about
In processors INTEL processing goes through the shift between ports. There is a table of registers to avoid losing treatment cycles
Port for example is part of the conveyer, and the processor switches tasks already in the planning process
Edited: 08/31/2010 at 09:52 AM by OPTERON 2???
|
|
|
|
 12/25/2010 01:38 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
4 operations for 1 time
http://rubik-effects.com/view_post.php?id=60
http://www.rubiks.com/
the meaning of the Rubik's Cube, you simultaneously perform 4 operations. If there is a scheme run, you simply create a model of the operation processor.
when we create a movement that must do 3 other parties. At this time, they must move in the direction of addition operations. At a time when we were a part of the catches, should already be marked with three other parts.
The essence of the Rubik's Cube 4 core processor, then that any operation can be shifted to the correct sector calculation. Intel uses around this model for their ports.
We begin the computation without waiting for, we accumulate performed tasks at the same time, the processor must analyze to what tasks are already performed surgery. On the basis of already completed transactions, we are forming a new command queue. For example, some teams, we can transfer to other modules. For example, the command to reset the "L3" command executed, peregrupirovat them and send a free kernel. Must take into account the number of cycles to perform the operation, the problem may be scattered across multiple cores.
Edited: 12/25/2010 at 02:09 AM by OPTERON 2???
|
|
|
|
 04/24/2011 04:28 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
!!!
Why that, I have assumed. That ????? uses storage direct access, passing caches. "Register alias table". We will assume that, decoding goes without a stop. The data which hasn't time to arrive on handling is exhausted in a random access memory. And then selectively arrive on handling already ready. It turns out that handling process doesn't stop. Random access memory operation is independent.
"retired registr file" the data doesn't accumulate, they leave. This data can be used by other kernels.
1. Marking of the data in the table.
2. The assembly of the data.
3. Their burst from a kernel.
4. To give the task for other kernels, to search for markers.
5. To collect markers in 1 kernel for the assembly.
Rapid access in storage is for this purpose necessary
Edited: 04/24/2011 at 04:46 AM by OPTERON 2???
|
|
|
|
 10/16/2011 04:02 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
Schematic work 3 cores
This interaction scheme can be made for 6 cores at once.
Edited: 10/16/2011 at 04:14 AM by OPTERON 2???
|
|
|
|
 10/16/2011 04:48 AM
|
unclefester1 Overclocker

Posts: 629
Joined: 11/29/2008
|
The communication interface seems simple enough, 1-2-3, A-B-C (K-D-I). Now all you have to do is change the Memory scheduling (Address Strobe Row). To speed up the inter action of the cycles during Column/Row access.
-------------------------
Antec 1650B PC Power & Cooling Silencer 910w ASUS M3A79-T DeLuxe 1090T x6 @Boiling in the La-Boratory Corsair H70 OCZ Reaper 8500 2x2 *Pending ASUS 5970 (under OC investigation) EVGA 260 55nm (holding pattern) SeaGate 320x2 16GB Sata Creative X-Fi Elite Pro Logitech Z-5500 XP Pro SP2
|
|
|
|
 11/27/2011 07:10 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
|
|
|
|
 11/30/2011 02:58 PM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
The main problem is the weak Bulldozer core. How do I see how you can solve this problem. It is necessary to make the core of the principle of transformers. For example, the task goes to the module, the module analyzing the loading of all modules as well as data obtained from the operating system must decide which mode to switch to it. For single-threaded applications, we get a lot of core multi-threaded, we get a lot of small nuclei. INTEL since 2003 has worked on technology combining the cores. Bulldozer, a CORE QUAD INTEl, in fact. Intel went on to work and all of the nuclei is common RESOURCES for example, the scheme drawn above. The first step in increasing the productivity of bulldozers will fix mode of the module.
load modules 4 * 100% of the switching modules in the mode 8 * 100%. Or the application must be clearly understood that it can use 8 cores.
First you need to bear in mind that applications are not aware that we have 8 cores, so we must do so first we saw the application modules if the application is capable of 100% loaded modules, then we can also switch to the application of 8 nuclei. Just need to work on switching cores from 4 to 8, and from 8 to 4. 4 * 200% = 8* 100% . Ideally, if you do like the Intel you get a 1 * 800% of 1 transmission, on 4 * 200% 2 transmission, 8 * 100% 3 transmission.
Edited: 11/30/2011 at 11:02 PM by OPTERON 2???
|
|
|
|
 12/14/2011 05:28 AM
|
OPTERON 2??? Lurker

Posts: 17
Joined: 07/13/2009
|
In developing the next generation to opt out of the nuclei, and work on common RESOURCES. Primarily to the fact that we have an application that sees 1 core, and must completely download all RESOURCES, two core flow and so on. The total module output must analyze the results obtained, if possible try to use all of ALU and FPU try to switch to different modes of operation. If AMD will continue to work on kernels that will not achieve anything. Start with two nuclei in the end, with three and four and so on. Intel long ago working on the system with the general ressurami .. Tests show that the number of cores there is nothing to be achieved. Part of that right now, can remain in place, so that the frequency remained at this level. But you need to get to all parts of the processor to communicate. You can spend more time developing but you will overtake Intel. Need to start small - 4 core processor.
|
|
|
|
 12/14/2011 08:37 AM
|
go_for Alpha Geek

Posts: 3217
Joined: 01/21/2006
Answer
|
Opteron 2, have you signed in AMD dev forums yet?
http://forums.amd.com/devforum/
-------------------------
|
|
|