Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (102 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
12.08Mb size Format: txt, pdf, ePub

parallel execution of instructions, 63

mul instruction, cycles to execute on

Pentium.
See
Intel Pentium

PowerPC, 117

performance, 51

mulpd instruction, throughput on

branch prediction and, 86, 125

Intel processors, 261

gains from pipelining, 60

mulps instruction, throughput on

of Pentium 4, 140

Intel processors, 261

Performance Optimization

mulsd instruction, throughput on

With Enhanced RISC

Intel processors, 261

(POWER), 112

mulss instruction, throughput on

PFADD (packed floating-point

Intel processors, 261

addition), 260

multi-core processors, 247

physical address space, vs. virtual,

Multimedia Extensions (MMX), 70,

185–186

108, 174

pipeline, 37, 40–43,
42

multiprocessing,
249
, 249–250

challenges, 74–78

cost of, 60

depth, 46

N

of DLW-2 hypothetical

NetBurst architecture, 139, 140, 235

computer,
64

code names for variations, 236

flushing, 86

non-pipelined microprocessor,

limits to, 58–60, 139–140

43–45,
44

on Pentium, 82–85,
83

noops (no operation), 198

on Pentium M, 246

Northwood, 236

on PowerPC 601, 113–115

notebook (portable) computers, 237

instruction queue, 113–114

numbers, basic formats, 66–67,
67

instruction scheduling,

n
-way set associative mapping,

114–115

226–230,
227

on PowerPC 604, 123–126

INDEX

287

pipeline,
continued

PowerPC (PPC) 604, 119,

speedup from, 48–51,
50

123–129, 136

stages, 45

features, 123

and superscalar execution,
65

front end and instruction

trace cache effect on, 154

window, 126–128

pipeline stalls, 54–57

microarchitecture,
124

avoiding, 60

pipeline and back end, 123–126

instruction latency and, 57–58

reorder buffer, 128

pipelined execution, 35

reservation station (RS), 126–127

pipelined microprocessor, 45–48

PowerPC (PPC) 604e, 129

pointers, 187

PowerPC (PPC) 750 (G3), 129–133

polluting the cache, 222

features, 130

pop instruction, 88,
89

front end, instruction window,

portable computers, 237

and branch instruction,

ports, for PowerPC instructions, 206

130–132

postfix expressions, 89

vs. G4, 133

POWER (Performance

Optimization With

in historical context, 132–133

Enhanced RISC), 112

microarchitecture,
131

power density of chip, 237–239

PowerPC (PPC) 970 (G5), 193

power-efficient computing, 237–239

back end, 200–203

PowerPC (PPC), 73, 111

branch prediction, 195–196

AltiVec extension, 173

caches and front end, 194–195

brief history, 112

decode, cracking and group

instruction set architecture

formation, 196–200

(ISA), 70, 162

dispatching and issuing

PowerPC (PPC) 601, 112–118, 135

instructions, 197–198

back end, 115–117

design philosophy, 194

branch execution unit

dispatch rules, 198–199

(BEU), 116

floating-point execution units

floating-point unit, 115–116

(FPUs), 205–206

integer unit, 115

floating-point issue queue (FIQ),

sequencer unit, 116–117

209
, 209–211

features, 112

group dispatch scheme

in historical context, 118

conclusions, 199–200

latency and throughput, 117–118

performance implications,

microarchitecture,
114

211–213

pipeline and front end, 113–115

integer execution units (IUs),

instruction queue, 113–114

201–202

instruction scheduling,

performance conclusions, 203

114–115

load-store units (LSUs) and

PowerPC (PPC) 603 and 603e,

front-end bus, 203–205

118–122, 135

microarchitecture,
195

back end, 119–121

features, 119

predecoding and group

front end, instruction window,

dispatch, 199

and branch prediction, 122

vector computing, 206–209,
207

in historical context, 122

vector instruction

microarchitecture,
120

latencies on, 208

288

INDEX

PowerPC (PPC) 7400 (G4),

R

133–135, 138

RAM (random access memory),

AltiVec support, 173

8–10

features, 133

RAT (register allocation table), on

in historical context, 135

Pentium Pro, 100

microarchitecture,
134

read-modify instruction, 243

scalability of clock rate, 135

read-modify-write sequence, 4

power wall, 237

read-only memory (ROM), 34

PPC.
See
PowerPC

reboot, 34

predecoding, on PowerPC 970, 199

reduced instruction set computing

Prefetch/Fetch stage in Pentium

(RISC), 73–74, 105

pipeline, 84–85

instructions in PowerPC, 113

Prescott, 236

load-store model, 4

processors.
See
microprocessor

refills of pipeline, performance

processor serial number (PSN), 109

impact of, 60

processor status word (PSW)

register allocation table (RAT), on

register, 31, 67

Pentium Pro, 100

condition register for functions

register files, 7–8

of, 202–203

stages on Pentium 4, 158

productivity, pipelining and, 42

register-relative address, 16–17

program, 11–14

with branch instruction, 33

program counter, 26

register renaming

program execution time

to overcome data hazards, 75,
76

and completion rate, 51–52

P6 pipeline stage for, 101

decreasing, 43, 47–48

on PowerPC 604, 125

relationship with completion

registers, 7

rate, 52–53

mapping to binary codes, 20

programmers, early processes, 26

vs. other data storage, 217

programming model, 26,
27
, 69–70

register-to-memory format arith-

32-bit vs. 64-bit,
182

metic instructions, 103–104

early variations, 71

register-type arithmetic instruction,

pseudo-LRU algorithm, 230

21–22

PSN (processor serial number), 109

rename register availability rule, 128

PSW register.
See
processor status

rename registers, 98

word (PSW) register

on Pentium 4, 165

push instruction, 88,
89

on PowerPC 604, 126

pushing data to stack, 88,
89

on PowerPC 750, 131

on PowerPC 970, 199, 200

Q

reorder buffer (ROB), 265

queue.
See also
issue queues

on Pentium 4, 159

instruction

on Pentium Pro, 99–100

on Core, 256

on PowerPC 604, 126, 128

on PowerPC 601, 113–114

rules for, 268

micro-op, 106, 155

reservation station (RS)

stage, on Pentium 4, 156

on P6 core, 258

vector issue (VIQ), for G4e, 146

P6 pipeline stages for writing to

and reading from, 101

INDEX

289

reservation station (RS),
continued

SMP (symmetric

on Pentium 4, 149

multiprocessing), 136

on Pentium Pro, 98–99, 100

software

on PowerPC 604, 126–127

early, custom-fitted to hardware,

on PowerPC 750, 131

71,
71

results, 4

moving hardware complexity to,

results stream, 2

73–74

REX prefix, 191

software branch hits, 147–148

RISC.
See
reduced instruction set

source field, 12

computing (RISC)

source registers, 8, 21

ROB.
See
reorder buffer (ROB)

spatial locality of code, 221–222

ROM (read-only memory), 34

spatial locality of data, 220

RS.
See
reservation station (RS)

speculative execution, 85–86

RS6000 (IBM), 62

path, 152–153,
153

results stream version of, 264–270

S

SRAM (static RAM), for L1

cache, 217

scalar operations, 62

SSE.
See
Streaming SIMD

scalars, vs. vectors, 66,
170

Extensions (SSE)

schedule stage, on Pentium 4,

ST (stack top), 88

156–157

stack, 88

segmented memory model, 192

vs. flat register file, 90

sequencer unit, on PowerPC 601,

swapping element with stack

116–117

top, 91

sequentially ordered data, and

stack execution unit, on Pentium M,

spacial locality, 220

246

set associative mapping, 226

stack pointer register, 246

SIMD (Single Instruction, Multiple

stack top (ST), 88

Data) computing, 168,
169

static branch prediction, 86

extensions to PowerPC instruc-

static power density, 238–239

tion set, 135

static prediction, 147

simple/fast integer execution units,

static RAM (SRAM), for L1

on G4e, 163

cache, 217

simple/fast integer instructions, 163

static scheduling, in Pentium Pro,

simple FP scheduler, on Pentium 4,

94–95,
95

157

storage, 4–5

simple integer instructions, 201

store address unit, on P6 back

simple integer unit (SIU), 68, 87

end, 102

on PowerPC 750, 130

store data unit, on P6 back end, 102

single-cycle processors, 44, 49, 50

stored-program computer, 4–6

SISD (Single Instruction stream,

store instruction, 11

Single Data stream) device,

micro-ops for, 267

168,
169

programmer and control of, 104

SIU.
See
simple integer unit (SIU)

register-type binary format for,

slow integer ALU unit, on

24–25

Pentium 4, 157

translating into fused micro-ops,

slow IU/general FPU scheduler, on

242–243

Pentium 4, 157

write-through for, 233

290

INDEX

store port, on Pentium 4, 157

in Pentium 4, 149–154

Streaming SIMD Extensions (SSE),

and instruction execution

70, 262–263

time, 150–151

on Core Duo, 252

operation, 151–154

floating-point performance

traces, 150

with, 177

trace segment build mode, 151

implementation of, 176

transistors, 1

Intel’s goal for, 175

density, and dynamic power

on Pentium III, 108, 109

density, 237

strings, ISA-level support for,

number on chip, 62

104–105

translating program into machine

structural hazards, 76–77

language, 25

sum vector (VT), 171

Turing machine, 4

superscalar computers, 62

two-way set associative mapping,

challenges, 74–78

228
, 228–229

expanding with execution units,

vs. direct mapping, 229

65–69

vs. four-way, 229

and instructions per clock, 64–65

latency and throughput, 117–118

U

SUV-building process, pipelining

in, 40–43,
42

U integer pipe, 87

swapping stack element with

unconditional branch, 30

stack top, 91

underflow, 184

symmetric multiprocessing

uops.
See
micro-operations

(SMP), 136

system unit, on PowerPC 603

V

and 603e, 121

variable-length instructions, 105

vector ALU (VALU), 135

T

vector complex integer unit,

tag RAM, 224

on G4e, 173

tags for cache

vector computing

direct mapping, 225–226,
226

on 32-bit vs. 64-bit processors, 183

fully associative mapping with,

and AltiVec instruction set,

224,
225

169–170

n
-way set associative mapping,

G3 and, 132–133

226–230,
227

inter-element operations,

temporal locality of code and

172–173

data, 222

intra-element operations,

throughput

171–172

for floating-point instructions on

MMX (Multimedia

Intel processors, 261

Extensions), 174

for PowerPC 970 integer unit, 202

overview of, 168–169

of superscalar processors,

on PowerPC 970, 206–209,
207

117–118

vector execution units, 69, 168–177

trace cache

vector floating-point multiplication,

Other books

Her Heart's Captain by Elizabeth Mansfield
The Bartender's Daughter by Flynn, Isabelle
Restoring Jordan by Elizabeth Finn
Murder on Bamboo Lane by Naomi Hirahara