AMD Processors
Decrease font size
Increase font size
Topic Title: Are athlons MP's microcode updateable?
Topic Summary:
Created On: 11/27/2003 01:15 AM
Status: Read Only
Linear : Threading : Single : Branch
Search Topic Search Topic
Topic Tools Topic Tools
View similar topics View similar topics
View topic in raw text format. Print this topic.
 11/27/2003 01:15 AM
User is offline View Users Profile Print this message

Author Icon
Bitey
Elite

Posts: 1492
Joined: 10/07/2003

Are athlons microcode updateable.

eg will newer bios's contain microcode fix's to update the CPU's internal microcode or is it completley hardwired?
 12/05/2003 02:28 AM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

I'm out of date on this, so I will take a SWAG (Smart Wild A$$ Guess).

I would recommend John L. Hennessy, David A. Patterson books like Computer Architecture: A Quantitative Approach*' ">http://www.amazon.com/exec/obi...5?v=glance&s=books&st= as well as AMD's manuals. I suggest you hang out in Borders or B&N if you can't justify the $90 price. I have the first edition. It's brilliant.

I don't believe there is any 'soft' microcode in any Athlon.

I believe that RISC (Reduced Instruction Set Computer) demonstrated the performance benefits of hard-wired logic over microcode. I think of it this way. If we could make a microcoded machine that could do steps that much faster than the CPU instruction cycle, why not use it to build a CPU?

Further modern CAD design tools allow manufacturers to design, simulate, test and hence build more complex logic correctly rather than rely on microcode to fix bugs. Weakening another big reason for microcode.

Finally, ROM (read only memory) can be made faster than RAM (read/write memory), so a 'soft' microcode Athlon would be significatly slower than a ROM'd Athlon. [I don't mean ROM chips, before anyone starts, I mean non-volatile binary on a CPU]

So I think *most* modern CPU's use a lot of hardwired logic rather than microcode.

I'd be happy to be shown that Athlon uses microcode, maybe because it is denser or more flexible (given that horrible x86 instruction set, you'd have to do something :-), but I'd be surprised if it were updateable in *any* significant way.

There, that should attract attention
 12/05/2003 09:34 AM
User is offline View Users Profile Print this message

Author Icon
pcy
Senior Member

Posts: 2029
Joined: 10/18/2003

QUOTE (deuxway @ Dec 4 2003, 11:28 PM)...

I believe that RISC (Reduced Instruction Set Computer) demonstrated the performance benefits of hard-wired logic over microcode. I think of it this way. If we could make a microcoded machine that could do steps that much faster than the CPU instruction cycle, why not use it to build a CPU?

...

There, that should attract attention
Success... Attention well and truely attracted.

I don't agree with the RISC bit.

Microcode is simply a means of mapping the "external" instruction set offerd by the CPU into the fundamental atomic processs that the CPU can perform.

Microcode has been both hard (wired in) and soft (in Eprom or...) over the years. I doubt there is a performance tradeoff, but I do think there is a cost/flexability tradeoff.

You have to have microcode of some sort because different instructions use (different combinations of) the same underlying processes. Pipeline architecture depends on this fact. If you didn't exploit this, and tried to produce explicit hardware for each instruction, the die size would explode, causing increased cost and reduced performance.

Microcode is code resident inside the CPU. It calls the underlying processes to implement each hardware instruction. I use the word "calls" advisedly. The reason this is an efficient structure for both development and execution is exactly the same as the reason we use subroutines in programming and modualr structures in general for all sorts of sytems (and not just IT systems...)

Somewhere, inside the chip, every CPU is a RISC machine.



Peter


 12/05/2003 07:27 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Have you read Hennessy and Patterson? My impression is, RISC showed two things:
1. using chip 'real estate' for cache is better than trying to improve performance by adding more complex instructions
2. modern CAD tools allowed you to produce faster chips by using more random logic and more complex random logic rather than microcoding

I am talking '96 edition one - if the world has moved on, I am truly sorry for misleading you.

I stated in my post that Athlon may use microcode maybe because it is denser or more flexible . I understand that.

QUOTE Microcode is simply a means of mapping the "external" instruction set offerd by the CPU into the fundamental atomic process that the CPU can perform.
Yup, agree. So is random logic.
QUOTE Microcode has been both hard (wired in) and soft (in Eprom or...) over the years. Yup, agree. I almost bought a machine in the 80's with microcode in RAM; each process could have a unique mapping (instruction set) to CPU function. We didn't get it because it was too slow compared to a Sequent (a bunch of i386's) for a similar price. Do you remember what AMD made before the 29000?

Anyway, I don't understand the point your making.

QUOTE I doubt there is a performance tradeoff
Are you saying:
1. read only memory is the same cost (I read this as die size of a CPU and similar production yield) as updatable memory (I disagree without evidence), and
2. read only memory is the same speed as updatable memory (I disagree without evidence)
3. or something else?
QUOTE I do think there is a cost/ flexibility tradeoff Absolutely agree. I believe making a CPU with updatable microcode would make the chip slower, and bigger for any particular point in technology development and hence cost more too.

So there would have to be a significant pay back from the flexibility side of the equation. I accept there could be a payback, but I'm not seeing it in any of AMD's or Intels marketing (or Sun, IBM, Texas Instruments, Fujitsu either), so I am reasoning that they aren't offering it.

I think it would be a niche market, where the possibility of a flawed CPU is an okay marketing message, and the upside of a soft fix would win. Hard to service or hostile environments readily spring to mind, like satellites or submarine cable repeaters.

QUOTE You have to have microcode of some sort because different instructions use (different combinations of) the same underlying processes. Random logic works too, and can be faster, but I think we are a close enough on this one.

QUOTE If you didn't exploit this, and tried to produce explicit hardware for each instruction, the die size would explode, causing increased cost and reduced performance.
I think you are saying, instruction sequencing is very regular, and lends itself to a compact state machine in microcode rather than random logic.

I can agree with that providing you allow for a hybrid of both approaches. I think using microcode for every level of sequencing is too slow. I think this partly because I read Intel whitepapers from the early Pentium days explaining how they adopted a hybrid approach to reap the benefits of RISC's high-speed random logic instruction decode and sequencing. If you have a problem with that assertion I will try to find the evidence.

You are, reasonably, identifying areas where microcode would be expected to encode logic more densely than random logic (making it cheaper) and not present a bottle neck. I apologise for my generalisation. I think I can be forgiven because your question was about a microcode updateable Athlon, and I thought you were asking about significant change to the CPU's function, my fault.

QUOTE Microcode is code resident inside the CPU. It calls the underlying processes to implement each hardware instruction. I use the word "calls" advisedly. Yea, this could be misleading to ppl who aren't familiar with microcode, be careful not to extrapolate too far.

QUOTE The reason this is an efficient structure for both development and execution is exactly the same as the reason we use subroutines in programming and modular structures in general for all sorts of systems (and not just IT systems...) I think this has been extrapolated too far.

These "modularity" or "cohesion" issues are about managing complexity and change. They are not essential to the execution of the system. These are inherent issues in software, where the result of engineering is embedded directly in manufacturing's output, and so there is every value in retaining them, and no value in removing them. But IMHO, they are more easily separable and are mostly development time issues for most hardware, not 'use' time.

I think once a CPU design is implemented there is little to be gained by retaining these explicit structures, they may show up as test points, but could be significantly transformed.

Think of this like compiling. High level code, files and classes are wonderful to help us manage the complexity of our systems, but, we don't need all of those structures to be preserved in the running code, especially when performance is our goal. We need our classes, methods, code and variables to accurately reflect the state of the executing system while debugging too, so we can trace behaviour, but we aren't surprised when weird things happen if you try to 'debug' an optimised binary. We can even take special steps to check a few variables are behaving correctly (leaving asserts in production code) which are, to my mind, the cognates of test points in hardware. BUT we happily take advantage of inline code expansion, loop unrolling, strength reduction, code reordering etc for speed, or encoding, compression, packing and interpreters for size.

So, just because a high-level structure happens to be useful for managing complexity and change, it doesn't need to show up in the production hardware.

Further, I can't see why a manufacturer would put those extra-cost structures in place on every chip, and not bother to advertise the benefits either. Please explain this.

So, is there some microcode in an Athlon? Yes, but I now don't understand how much chip function you want to fix by changing microcode, so that may not be enough anyway.

Is Athlon soft? I don't feel confident to estimate how much or how little function you would want to change, so I won't comment further.

I accept it's a key architectural decision about how to deal with a bug in an installed product, because you need to provide some infrastructure to fix it. I am still unconvinced that "soft microcode" flies.

If you could explain what kind of bugs you are thinking will be fixable, and what level of incidence they have, I may be able to respond intelligently.
 12/05/2003 09:02 PM
User is offline View Users Profile Print this message

Author Icon
pcy
Senior Member

Posts: 2029
Joined: 10/18/2003

Hi deuxway,

I didn't post the original Q, so some of your responses are off the mark... and in other places you subtly mis-understood what I was trying to say. Not suprising as
1. I'm not a hardware engineer and may have used some terminology a bit loosely
2. It would have made my post about 10 times longer to go through and identify + resolve all possible ambiguities in such a complex area.

And NO I have not read Hennessy and Patterson

I picked up on that one paragraph because it was the only one in your original post that I didn't agree with.

The heart of my point, I think, is that the distinction between hard wired logic and microcode is broadly an illusion. In which case I find it unlikely that "RISC demonstrated the performance benefits of hard-wired logic over microcode."

But I accept your general point about very complex instructions. However,I don't think RISC established this - to my memory that idea was understood long before the term RISC existed.

You ask some questions/raise some interesting points...


When I said: "I doubt there is a performance tradeoff" I was suggesting that even if you used updatable memory you could probably design the CPU to eliminate the performance hit, though at some cost - leading, of course to a cost/flexability tradeoff.

I agree that at the sort of costs/volumes we see in today's PC market I would expect cost to be the dominant concern.


I don't know what you mean by "random logic". I can guess from the context... but Could you possibly explain this term before I run off at a tangent.


If I understand correctly what you mean when you say I am suggesting that "instruction sequencing is very regular" then Yes, and it seems I believe this to be a more powerfull factor than you do.

I am also suggusting that because there are "areas where microcode would be expected to encode logic more densely than random logic" the end result can be better performance as well as reduced cost. There is a performance penalty to pay as the chip gets bigger, and this can (in priciple) outweigh the the performance hit of using microcode rather than expanding the instructions out into random logic.

It was this idea I was referring to when I talked about subroutines and modularity. Clearly they are vital tools from managing complexity and change, but I think that there are a signifcant number of occasions when the more compact representation yields performance benefits.


Peter
 12/08/2003 12:09 AM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Pcy, sorry, you didn't post the original question, mea culpa (oh, the benefits of a classical education )

As I said before, my recollection is mid 90's, and I'm happy to believe the world has changed (I must get he new Hennessy and Patterson, but $90 is 242-244)

I agree instruction decode and sequencing is recogising a binary pattern and figuring out what actions to perform.

Clearly you understand that the sequencing of an instruction needs a whole pile of events to happen in a very specific sequence. How that is implemented is the nub of the 'can you update an Athlon' part of the question.

What I mean by random logic is 'proper' gates, like CMOS transistors. The sort of stuff we would expect an arithmetic logic unit (ALU) to be stuffed full of.

QUOTE the distinction between hard wired logic and microcode is broadly an illusionI have to disagree here. Let me start with an analogy: the difference between data and instructions in a CPUs memory is mostly an illusion, it's all binary, but the way that the CPU interprets and treats them is different, and I think the same analogy holds for random logic and microcode.

You could write down the boolean expression to map from the instruction pattern for those states, and build a chunk of logic to represent it (a bit like an expression in code), or you could "pre-compute it" and store it as a row of microcode (like using table look up).

A trivial example of C isupper():
Random logic:
bool isupper(char c) { return 'A' <= c && c <= 'Z'; } // 3 logic operations <=, &&, <=

Pre-computed (microcode):
bool _isupper[256] = { 0, .... 0, 1, 1, 1, ..., 1, 1, 0, ... }; // 1's in position of ASCII A-Z
bool isupper(char c) { return _isupper[c]; }

In small examples, the code is much more compact in large examples, the microcode can become 'sparse' so it may be used to encode many states compactly, and trigger random logic to decode it down to control detailed operation. So programmable microcode may not have a valid state encoded to drive random logic decode to fix a problem.

Then you get sequencing by stringing states together and keeping track of your current state (if you've ever written text parsers, you'll recognise the difference, one approach uses the code itself to represent state and conditional logic to control what happens next [my random logic] the other uses a table and counter [microcode]).

A benefit of random logic is, you can optimise paths. If you have to come to a leaf in a decision tree with 100 outcomes (say decoding an instruction), and using the code and expression analogy, you could make every code path the same length, or you could optimise the frequent cases, or you could optimise path length to complement instruction execution time.

I believe simple microcode machine were beaten by RISC in the 486 days. I think (but don't know) the simple minded microcode sequencer was hard to optimise for the fast or frequent instructions.

Further, my interpretation of RISC was: use simple pipelines, and simple instruction sequencing so that the you don't need large or clever microcode sequencing. MIPS epitomised this with the Multiple non-Interlocking Pipeline Stages. It was up to the compiler to sequence the instructions so that it didn't stall or screw up.

I did mis-represent my view, I don't think microcode went away completely, I just have this mental picure that it isn't the top-to-bottom decode and sequencing architecture anymore. Instead, I think there are subsystems which use logic gates to decode and sequence.

I think we agree, this is absolutely a space (and hence cost) performance trade off.

My other point is, one can imagine a fairly simple scheme where microcode can be updated (looking at my code example about, updating microcode is just filling in the _isupper array with different values). It is much harder to change random logic, they are physical gates wih physical interconnect. So it matters a lot how much of ach the Ahlon uses.

So, in answer to the original question.

If an Athlon used all microcode, it could theoretically be usefully updated. It may even be possible to kludge some nasty errors with such a feature. I think people would prefer to send the chip back.

Anyway, I think it uses a hairy mix of physical gates and microcode-like structures, so I don't think the benefit of updates would be great. Maybe this is where you see me undervaluing microcode. My rational is this structure is 'sparse', only a small subset of the possible set of logical expressions need to exist because it isn't pure microcode, so there is much less opportunity to fix something that a pure microcode machine. I also have a sense about the likelyhood of errors, and the difficulty of testing them, but that's another post ...

Further, I don't think anyone in the cost (read: die size) sensitive merchant chip market offers updatability as, I guess, it'd be a couple of times bigger, and run a couple of times slower (or be many times bigger to get a similar speed). I think chip cost increases faster than proportional to area (due to defects for example), so my gut says, time for supper.

I think we are mostly on the same page about this though
 12/08/2003 07:52 AM
User is offline View Users Profile Print this message

Author Icon
pcy
Senior Member

Posts: 2029
Joined: 10/18/2003

Well... we do seem to have completely hijacked this thread, but this is such a fascinating conversation.

@Bitey. Apologies for that, but I completely agree:

There is no way the microcode on an AMD Athlon can be updated



QUOTE
Pcy, sorry, you didn't post the original question, mea culpa (oh, the benefits of a classical education  )

I too had to learn Latin at School. Hated it. I'm a mathematician. I suppose that counts as a classical education (?)

QUOTE
As I said before, my recollection is mid 90's, and I'm happy to believe the world has changed


My recollection is late 60s to mid 70s... and what amazes me is how little has really changed. Massive development in speed and scale, a complete revolution (well several actually) in implementation technology, and a bit of terminolgy drift, of course. I read the reviews and look at the system structure diagrams and it's still the same old stuff... Prefetch, Decode, Cache, System Bus, ALU, FPU, Memory Latency...


We seem to agree on most things here, and in particular this:

Updatability is not a sensible feature of a high volume low cost CPU.


We seem to differ most in where we stand on the "random logic" (I'm happy with your meaning, just hadn't run accross it before) vs "microcode" debate.

I think we really need to sit down in the same room and try to design a CPU to properly progress this conversation (and I'm in London, England); but let me propose a different analogy:

MicroCode:RandomLogic == Subroutine:Macro

If you have a subroutine and a macro which perform the same task, the subroutine embodies all the logic just once, and is called at runtime; whereas the macro is inserted inline.

The macro will have a shorter execution path because it saves on the subroutine calling overhead and some of the decisions can be pre compiled. But of course the macro adds size to the program on each occasion it is used. Size vs Execution Pathlength. What we often see is a hybrid - macros used as an optimizing interface but calling subroutines for commonly used subtasks.

Gien that increased size carries a cost/performance penalty, subroutines actually deliver better overall price/performance when:
1. They are used often
2. They do a lot of work realtive to their calling overhaed
3. They make few pre-compilable decisions.

They should also perform a well defined task (i.e good encapsulation) - but that's true for a Macro as well, of course.

Let's call a SubRoutine that meets these criteria a "GoodSubroutine"

Hopefully, I have not said anything you disagree with so far: My point is this:

Developing a system so that it can be represented by GoodSubroutines is a Design issue. When you can achieve this you will generally achieve a more robust system with improved performance and/or cost.

Now let's consider the statement of mie that you most obviously disagree with:
QUOTE
the distinction between hard wired logic and microcode is broadly an illusion

I think I'm really looking at this over (say) a 10+ year timescale and saying "this is how it will pan out, or how it should be", whereas (and please forgive me if I have mis-understood you) you would say: "but that's not how it is just now".

And I would reply: "Maybe... but that just means these CPU are badly designed".

WOW! Can I defend that assertion?

Yes, because the poor design is a consequence of history, not the fault of the designers. New designs have to be backward compatible, and that now includes several generations of add-on instructions with vector capabilities etc. Vital performance enhancements, but a design nightmare. The CPUs no longer have a coherent external interface.

The bitter truth is that in order to make real progress you have to sacrifice compatability once in a while. Discard the baggage of the years. Short term pain, long tem gain.

Should AMD have done this with the AMD64? Yes.
Could they have don this? No.

But this, IMHO, is one key area where genuine co-operation betweem MS, Intel and AMD would be in their interests and in ours.



Peter







 12/08/2003 10:04 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

Yes, I don't think we are far apart, and yes, sorry Bitey for hijacking the thread.

We agree, modifiable Athlon's are not publicly available, sorry

I disagree with the macro vs subrotine analogy - those are implementation issues, not internal architecture. Your simply talking about how many places the implementation exists, and how it's invoked.

I think it's pretty straightforward to imagine having both in an application, and controlling which you get by doing a huge pile of simulation runs, analysing the execution paths (e.g. Rational's coverage), and applying the appropriate implementation (fast macro, or slower subroutine call) to get a balance of cost and performance. State of the art compilers (adaptive compilation?) actually do this. It's like Sun's Java HotSpot technology, but remembers performance results over all runs (not just the current run), and tries to gets faster and faster.

Edit: I can even imagine offering customers products which run at different speeds, faster one cost more, there is no single 'optimal' trade off between speed and cost, so I would let the customers choose!

Microcode vs random logic is much closer to data-driven (or interpreted) vs hard coded.

At some level of abstraction, the difference between them is "an illusion".
But manipulation of any model (code and data) with a well specified interface, a CPU instruction set and architecture, is always going to have that property; the underlying implementation shouldn't matter!

Just to make the difference between data-driven and hard-coded logic a bit clearer, let me use a bigger example. I assume you're old enough to remember egrep expressions Here's one for C style floating point numbers (I think):

(+|-)?[0-9]+"."[0-9]+(("e"|"E")"-"?[0-9]+)?
Examples of matches are: -1.0, 2.0e7, 3.14159, 3.6E-23, but not -.7, +3e4

Now to check a string we could code a table driven finite state machine (an interpreter):
CODE
State machine[MAXSTATE][256];  // assume 8 bit char

int c;
State state = START_STATE;
while ((c=getc(inp)) != EOF
   && machine[state][c] != FINISH_STATE ... ) {

       state = machine[state][c];     // ** the guts of it all - move to the new state, this is all an fsm does

       // .... useful work .... based on the current state
}

That's it!

Doing this with hard coded logic:
CODE
int c;
c=getc(inp);
if (c=='-') {              // ** fsm
     ....
     c = ...
else if (c == '+') {     // ** fsm
     ....
     c = ...
}

if ((c != EOF) && ('0' <= c) && ( c <= '9')) {    // ** fsm
  ....
}
// etc - this is only (+|-)?[0-9]

I think we can see that the random logic, or hard coded version is harder to understand, and a pig to change. The state is implicit; it is purely wherever the program has gotton in execution.

BUT it may be faster, the benign looking lookups in 'machine[state][c]' may be much more expensive in time than a lot of random logic gates triggering. There is the address decode to find the row of data, that is lots of logic and time.

EDIT:

I don't believe the internals of CPU's are like software; there isn't a little 8008 in the corner of a P4. The analogy to subroutines is only partial. Over the long term (you suggested 10 years), the economics of chip production has changed roughly by a factor of 2^6-2^7, or 64-128 (Moores 'law', say 100x. Today, I can afford to throw 100,000,000 transistors at a problem that I could only afford to apply 1,000,000 to 10 years ago. If random logic was uneconomic due to size, but faster, it may have become economic in a lot more places!

Assuming the pattern continues for the next 10 years, we will have 10,000,000,000 transistors on a die. As Henney and Patterson might ask, how do we get the most useful work done with that much computational power and chip real-estate?

We have just watched great designs, like DEC(COMPAQ/INTEL+HP) Alpha, specifically designed to be 64bit, clocked like crazy, and be multi-CPU, eaten by Intel's bigger market share and hence huge R&D budget (DEC were not blame free in this). For evidence, go look at the SPECbench.org numbers for alpha. I think it still beats Xeon, with much more modest technology (HP are a tad embarrassed about the whole thing, I suspect).

The end user experience is so far away from the CPU, and CPU's have more than enough power for most every day tasks that Microsoft offers us, that I don't think it matters. It's the software that we care about, and that is locked into MS's OS. We get the CPU that they support.

This makes sense. In the 10 years where we have gone from 1,000,000 transistors to 100,000,000, mass market software economics may have worsened i.e. productivity gains has outpaced inflation, but the complexity of software has increased much faster.

Anyway, fascinating. Peter, let me grab some dinner, Chinese I think and return.
Edit: Nope, Turkey Pastrami on wheat with a garden salad, Eatzi's!
 12/09/2003 05:43 PM
User is offline View Users Profile Print this message

Author Icon
deuxway
Member

Posts: 193
Joined: 10/08/2003

RISC addendum.

Just for completeness ....

Part of the reason for the very complex sequencing, and hence lots of microcode, was the complex instruction state and processor states that CISC got into.

When you add Virtual memory to a system, a processor needs to be capable of aborting or restarting a memory access instruction. Either winding back the instruction as if it has never happened, or storing all of the state so that it can pick up part of the way through an instruction. Both approaches can be made to work, eventually.

RISC simplified the thing by having very simple memory access instructions which accessed a single memory location for either load or save. So instruction abort could be simple. You'd still need to save the internal pipeline (for all of the inflight instructions), but winding back the load or save instruction was pretty safe and easy.

CISC's like 486 had these "clever" multiple memory access instructions, where it was much harder to preserve the semantics of the instruction if a Virtual Memory trap happened.

This, of course suggests that there was lots of microcode in x86 processors.

I thought I should add this in the nature of full and true disclosure. It doesn't mean this is true today though

I believe that this is evidence that Intel's huge R&D budget, massive marketing marketing budget, partnership with the most powerful monopolist since Standard Oil, and a gigantic software legacy win the CPU wars; innovation is not a significant factor. AMD's genius is to use what it can of this to its own benefit in AMD64.

Statistics
112018 users are registered to the AMD Processors forum.
There are currently 0 users logged in.

FuseTalk Hosting Executive Plan v3.2 - © 1999-2014 FuseTalk Inc. All rights reserved.



Contact AMD Terms and Conditions ©2007 Advanced Micro Devices, Inc. Privacy Trademark information