Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (71 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
13.79Mb size Format: txt, pdf, ePub

We’ll talk more about the concept of the instruction window and about the

structures that make it up (the ROB and the reservation stations) in the next

section on the 604. For now, it suffices to say that the 603’s instruction window is quite small compared to that of its successors—three of its four reservation

stations are only single-entry, and one is double-entry (the one attached to

the load-store unit). Because the 603’s instruction window is so small, it needs relatively few rename registers to temporarily hold execution results prior to

commitment. The 603 has five general-purpose rename registers, four

floating-point rename registers, and one rename register each for the

condition register (CR), link register (LR), and count register (CTR).

The 603 and 603e follow the 601 in their ability to do speculative execu-

tion by means of a simple, static branch predictor. Like the static predictor

on the 601, the 603e’s predictor marks forward branches as not taken and

backward branches as taken. This static branch predictor is simple and fast,

but it is only mildly effective compared to even a weakly designed dynamic

branch predictor. If PPC users in the 603e/604 era wanted dynamic branch

prediction, they had to upgrade to the 604.

Summary: The 603 and 603e in Historical Context

With its stellar performance-per-watt ratio, the 603 was a great little processor, and it would have made a good low- to midrange desktop processor as well if

it weren’t for Apple’s legacy 68K code base. The 603e’s tweaks and larger cache

size helped with the legacy problems somewhat, but the updated chip still played second fiddle in Apple’s product line to the larger, much more powerful 604.

You haven’t seen the last of the 603e, though. The 603e’s design formed

the basis for what would eventually become Motorola’s PowerPC 7400—aka the

G4—which
we’ll cover in “The PowerPC 7400 (aka the G4)” on page 133.

122

Chapter 6

The PowerPC 604

At the same time the 603 was making its way toward the market, the 604 was

in the works as well. The 604 was to be Apple’s high-end PPC desktop proc-

essor, so its power and transistor budgets were much higher than that of the

603. Table 6-3 summarizes the 604’s features, and a quick glance at a diagram

of the 604 (see Figure 6-3) shows some obvious ways that it differs from its

lower-end sibling. For example, in the front end, the length of the instruction

queue has been increased by two entries. In the back end, two more integer

units have been added, and the CR logical unit has been removed. These

changes reflect some important differences in the overall approach of the

604, differences that will be examined in greater detail shortly.

Table 6-3:
Features of the PowerPC 604 and 604e

PowerPC 604

PowerPC 604e

Introduction Date

May 1, 1995

July 19, 1996

Process

0.50 micron

0.35 micron

Transistor Count

3.6 million

5.1 million

Die Size

197 mm2

148 mm2

Clock Speed at Introduction

120 MHz

180–200 MHz

L1 Cache Size

32KB split L1

64KB split L1

First Appeared In

PowerMac

Power Computing PowerTower Pro 200

9500/120

(PowerMac 9500/180 on August 7, 1996)

The 604’s Pipeline and Back End

The 604’s pipeline is deeper than that of the 601 and the 603, and it consists

of the following six stages:

Four Phases of the Standard RISC Pipeline

Six Stages of the 604’s Pipeline

Fetch

1. Fetch

Decode/dispatch

2. Decode

3. Dispatch (ROB and rename)

Execute

4. Execute

Write-back

5. Complete

6. Write-back

In the 604, the standard RISC decode/dispatch phase is split into two

stages, as is the write-back phase. I’ll explain just how these two new pipeline stages work in the section on the instruction window, but for now all you need

to understand is that this lengthened pipeline enables the 604 to reach higher

clock speeds than its predecessors. Because each pipeline stage is simpler, it

takes less time to complete, which means that the CPU’s clock cycle time can

be shortened.

PowerPC Processors: 600 Series, 700 Series, and 7400

123

Front End

Instruction Fetch

BU

Instruction Queue

Branch

Unit

CR

Decode/Dispatch

Reserv.

Reserv.

Reserv. Reserv.

Reserv.

Station

Station

Station

Station

Station

VPU-1

FPU-1

SIU-1

SIU-1

CIU-1

LSU-1

FPU-2

CIU-2

FPU-3

CIU-3

Load-

Floating-

Integer

Store

Point Unit

ALU

Unit

Memory Access

Scalar Arithmetic Logic Units

Units

Back End

Reorder Buffer

(16-entry)

Write

Commit Unit

Figure 6-3: PowerPC 604 microarchitecture

Aside from the longer pipeline, another factor that really sets the 604

apart from the other 600-series PPC designs discussed so far is its wider back

end. The 604 can execute up to six instructions per clock cycle in the following six execution units:

z

Branch unit (BU)/condition register unit (CRU)

z

Load-store unit (LSU)

z

Floating-point unit (FPU)

124

Chapter 6

z

Three integer units (IU)

z

Two simple integer units (SIUs)

z

One complex integer unit (CIU)

Unlike the other 600-series processors, the 604 has multiple integer units.

This division of labor, where multiple fast integer units executed simple integer instructions and one slower integer unit execute complex integer instructions,

will be discussed in more detail in Chapter 8. Any integer instruction that takes only a single cycle to execute can pass through one of the two SIUs. On the

other hand, integer instructions that take multiple cycles to execute, like

integer divides, have to pass through the slower CIU.

Like the 603e, the 604 has
register renaming
, a technique that is facilitated by the 12-entry register rename file attached to the 32-entry general-purpose

register file. These rename buffers allow the 604’s execution units more options for avoiding false dependencies and register-related stalls.

The 604’s floating-point unit does most single- and double-precision opera-

tions with a three-cycle latency, just like the 603e. Unlike the 603e, though,

the 604’s floating-point unit is fully pipelined for double-precision multiplies.

Floating-point division and two other instructions take from 18 to 33 cycles

on the 604, as on the 603e. Finally, the 604’s 32-entry floating-point register

file is attached to an 8-entry floating-point rename register buffer.

The 604’s load-store unit (LSU) is also similar to that of the 603e. Like the

603e’s LSU, it contains an adder for doing address calculations and handles

all load-store traffic, but unlike the 603e, it’s connected to deeper load and

store queues and allows a little more flexibility for the optimal reordering of

memory operations.

The 604’s branch unit also features a dynamic branch prediction scheme

that’s a vast improvement over the 603e’s static branch predictor. The 604

has a large, 512-entry branch history table (BHT) with two bits per entry for

tracking branches, coupled with a 64-entry
branch target address cache (BTAC)
, which is the equivalent of the Pentium’s BTB.

As always, the more transistors you spend on branch prediction, the

better performance is, so the 604’s more advanced branch unit helps it quite

a bit. Still, in the case of a misprediction, the 604’s longer pipeline has to pay a higher price than its shorter-pipelined predecessors in terms of performance.

Of course, the bigger performance loss associated with a misprediction is

also the reason the 604 needs to spend those extra resources on branch

prediction.

Notice that the list of execution units on page
124 is missing
a unit that is present on the 603e: the system unit. The 603e’s system unit handled updates

to the PPC condition register, a function that was handled by the integer exe-

cution unit on the older 601. The 604 moves the responsibility of dealing with

the condition register onto the branch unit. So the 604’s branch unit contains

a separate execution unit that handles all logical operations that involve the

PowerPC condition register. This condition register unit (CRU) shares a

PowerPC Processors: 600 Series, 700 Series, and 7400

125

dispatch bus and some other resources with the branch execution unit, so

it’s not a fully independent execution unit like the 603e’s system unit. What

does this BU/CRU combination do for performance? It probably doesn’t have

a huge impact, but whatever impact it does have is significant enough to where

the 604’s immediate successor—the 604e—adds an independent execution

unit to the back end for CR logical operations.

The 604’s Front End and Instruction Window

The 604’s front end and instruction window look like a combination of the

best features of the 601 and the 603e. Like the 601, the 604’s instruction

queue is eight entries deep. Instructions are fetched from the L1 cache into

the instruction queue, where they’re decoded before being dispatched to the

back end. Branches that can be folded are folded, and the 604’s dispatch logic

can dispatch up to four instructions per cycle (up from two on the 603e and

three on the 601) from the bottom four entries of the instruction queue to

the back end’s execution units.

During the 604’s dispatch stage, rename registers and a reorder buffer

entry are assigned to each dispatching instruction. When the instruction is

ready to dispatch, it’s sent either directly to an execution unit or to an execution unit’s reservation station, depending on whether or not its operands are

available at the time of dispatch. Note that the 604 can dispatch at most one

instruction to each execution unit, and there are certain rules that govern

when the dispatch logic can dispatch an instruction to the back end. We’ll

cover these rules in more detail in a moment, but for now you need to be

aware of one of the rules: An instruction cannot dispatch if the execution

unit that it needs is not available.

The Issue Phase: The 604’s Reservation Stations

In Figure 6-3, you probably noticed that each of the 604’s execution units has

a reservation station attached to it; this includes a reservation station each

(not depicted) for the branch execution and condition register units that

make up the branch unit. The 604’s reservation stations are relatively small,

two-entry (the CIU’s reservation station is single-entry), first-in first-out (FIFO) affairs, but they make up the heart of the 604’s instruction window, because

they allow the instructions assigned to one execution unit to issue out of

program order with respect to the instructions that are assigned to the other

execution units.

This works as follows: The dispatch stage sends instructions into the

reservation stations (i.e., the issue phase) in program order, and, with one

important exception (described in the next paragraph), the instructions pass

through their respective reservation stations in order. An instruction enters

the top of a reservation station, and as the instructions ahead of it issue, it moves down the queue, until it eventually exits through the bottom (i.e., it issues).

126

Chapter 6

Therefore, we can say each instruction issues in order with respect to the other instructions in its same reservation station. However, the various reservation

stations can issue instructions at different times, with the result that instructions issue out of order from the perspective of the overall program flow.

The simple integer units function a little differently than described earlier,

because they allow instructions to issue from their two-entry reservation stations out of order with respect to the other instructions in their own execution unit.

So unlike other types of instructions described previously, integer instructions can move through their respective reservation stations and pipelines out of

program order, not just with respect to the overall program flow, but with

respect to the other instructions in their own reservation station.

The reservation stations in the 604 and its architectural successors exist

to keep instructions that lack their input operand data but are otherwise

ready to dispatch from tying up the instruction queue. If an instruction meets

all of the other dispatch requirements
(see “The Four Rules of Instruction

Dispatch”
), and if its assigned execution unit is available but it just doesn’t yet have access to the part of the data stream that it needs, it dispatches to

Other books

Tide by John Kinsella
Boss Takes All by Carl Hancock
Black Horse by Veronica Blake
Uglies by Scott Westerfeld
Luke: Emerson Wolves by Kathi S. Barton
ANUNDR: THE EXODUS by N. U JOSHUA
Flaw (The Flaw Series) by Ryan Ringbloom