Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (100 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
8.43Mb size Format: txt, pdf, ePub

of front-side bus, 205

to software, 73–74

importance of, 137–138

complex/slow integer execution

market focus on, 140

units, on G4e, 163

for Pentium 4, 167

complex/slow integer

for Pentium 4, vs. G4e, 176

instructions, 163

for Pentium 4 integer units, 164

compulsory cache miss, 219

cmp instruction, cycles to execute on

computer

PowerPC, 117

costs of systems, 62

code, 2

definition, 3, 4

spatial locality of, 221–222

general-purpose,
2

temporal locality of, 222

memory hierarchy,
82

code names for Intel processors,

power efficiency, 237–239

236–237

with register file,
9

code segment descriptor, 191

stored-program, 4–6

code stream, 2, 11–14

computing

flow of, 5

calculator model of,
2
, 2–3

coding, 21

file-clerk model of, 3–7

collision, 226

conceptual layer, 72

commands, reusing prerecorded

condition codes, 203

sequences, 6

condition register unit (CRU), 129

commit cycle, in Pentium Pro

issue queue for, 210–211

pipeline, 101

on PowerPC, 121

commit phase, 128

on PowerPC 604, 125–126

commit unit, on PowerPC 603, 122

on PowerPC 970, 198, 202

common interleaved queues, 210

conditional branch, 30–34, 78

compatibility mode, in
x
86-64,

conflict misses, 226

190, 191

Conroe, 255

compilers, for high-level languages

control hazards, 78

(HLLs), 104

control operand, in AltiVec vector

complete stage, for G4e, 147

operation, 171

completion buffer availability

control unit, in Pentium, 85

rule, 128

control vector, in AltiVec vector

completion phase

operation, 171

in instruction lifecycle, 98

Coppermine, 109

on PowerPC 604, 128

core logic chipset,
204
, 204–205

278

INDEX

core microarchitecture of

direct mapping, 225–226,
226

processor, 248

vs. two-way set associative

CPU (central processing unit), 1.

mapping, 229

See also
microprocessor

dirty blocks in cache, 230

CPU clock cycle, 44

dispatch group, 197

instruction completion per,

dispatch queue, on PowerPC 970,

53–54

199

vs. memory and bus clock

dispatch rules, on PowerPC 970,

cycles, 216

198–199

in pipelined processor, 47

divw instruction, cycles to execute

cracked instruction, 197, 198, 206

on PowerPC, 117

CRU.
See
condition register

DLW-1 hypothetical computer

unit (CRU)

arithmetic instruction format, 12

cryptography, 185

example program, 13–14

machine language on, 20–21

D

memory instruction format, 13

DLW-2 hypothetical computer

data

decode/dispatch logic, 70

comparison of storage

pipeline,
64

options, 217

two-way superscalar version,

spatial locality of, 220

62–64,
63

temporal locality of, 222

Dothan, 236, 251

databases, back-end servers for, 186

fetch buffer, 239

data bus, 5

double data rate (DDR) front-side

data cache (D-cache), 81, 223

bus, 205

data hazards, 74–76

on PowerPC 970, 195

data parallelism, 168

double-speed execution ports, on

data segment, 16–17

Pentium 4, 157

data stream, 2

drive stages, on Pentium 4, 155

flow of, 5

dual-core processor, 249

daughtercard, 109

dynamic branch prediction, 86–87,

D-cache (data cache), 81, 223

147, 244

DDR.
See
double data rate (DDR)

dynamic execution, 96

front-side bus

dynamic power density,

decode/dispatch stage, 63

237–238,
238

for G4e, 145–146

dynamic range, 183–184

decode phase of instruction, 37

benefits of increased, 184–185

for Core 2 Duo,
256
, 257–258

dynamic scheduling

for Pentium M, 240–244,
241
,
242

with buffers,
97

Decode stages in Pentium pipeline,

instruction’s lifecycle phases, 97

84–85

decoding
x
86 instructions, in P6

E

pipeline, 101

destination field, 12

Eckert, J. Presper, 6
n

destination register, 8

EDVAC (Electronic Discrete

binary encoding, 21

Variable Automatic

digital image, data parallelism for

Computer), 6
n

inverting, 169

embedded processors, 133

INDEX

279

emulation, 72

fetch groups, on PowerPC 970, 196

encryption schemes, 185

fetch phase of instruction, 37

EPIC (Explicitly Parallel Instruc-

for Core 2 Duo,
256
, 256–257

tion Computing), 180

for Pentium M, 239–240

evicted data from cache, 219

Feynman, Richard, 3

eviction policy for cached data,

fields in instruction, 12

230–232

FIFO (first in, first out) data

execute mode for trace cache, 151

structure, 88

execute stage of instruction, 37

file-clerk model of computing, 3–7

for G4e, 146

expanded, 9–10

in Pentium 4 pipeline, 158

refining, 6–7

in Pentium pipeline, 84–85

FILO (first in, last out) data

in Pentium Pro pipeline, 101

structure, 88

execution.
See also
program execu-

filter/mod operand, 171

tion time

finish pipeline stage, on PPC CR,

phases,
39

163, 164

time requirements, and comple-

FIQ.
See
floating-point issue

tion rate, 51–52

queue (FIQ)

execution ports, on Pentium 4, 157

first in, first out (FIFO) data

execution units, 17

structure, 88

empty slots, 198

first in, last out (FILO) data

expanding superscalar process-

structure, 88

ing with, 65–69

fixed-point ALU, on PowerPC 601,

micro-op passed to, 156

115

on Pentium, 83

fixed-point numbers, 66

Explicitly Parallel Instruction

flags stage, in Pentium 4

Computing (EPIC), 180

pipeline, 158

flat floating-point register file, 88

F

flat register file, vs. stack, 90

floating-point ALUs, on Pentium,

fabs instruction, cycles to

88–91

execute, 118

floating-point applications,

fadd instruction

Pentium 4 design for, 165

cycles to execute, 118

floating-point control word (FPCW),

on PowerPC 970, 212

in Intel Core Duo, 252

throughput on Intel

floating-point data type, on 32-bit

processors, 261

vs. 64-bit processors, 183

fall-through, 114

floating-point execution unit

false aliasing, 268

(FPU), 68, 165–168

fast integer ALU1 and ALU2 units,

on Core 2 Duo, 260–262

on Pentium 4, 157

on G4, 134

fast IU scheduler, on Pentium 4, 156

on G4e, 166–167

fdiv instruction, cycles to

on Pentium, 69

execute, 118

on Pentium 4, 167–168

fetch buffer, on Intel Core Duo, 239

on PowerPC 601, 115–116

fetch-execute loop, 28–29

on PowerPC 750, 130

and branch instructions, 32

on PowerPC 970, 205–206

280

INDEX

floating-point instructions

front-end bus, on PowerPC 970,

latencies for G4, 118

203–205

throughput on Intel

fsub instruction, cycles to

processors, 261

execute, 118

floating-point issue queue (FIQ)

fully associative mapping, 224,
225

for G4e, 146

fused multiply-add (fmadd) instruc-

on PowerPC 970,
209
, 209–211

tion.
See
fmadd instruction

floating-point numbers, 66

fxch instruction, 91, 167

floating-point/SEE/MMX ALU, on

Pentium 4, 158

G

floating-point/SSE move unit, on

Pentium 4, 157

G3 (Apple).
See
PowerPC (PPC)

floating-point vector processing, in

750 (G3)

Pentium III, 108

G4.
See
PowerPC (PPC) 7400 (G4)

flushing pipeline, 86

G4e.
See
Motorola G4e

performance impact of, 60

games, 187

fmadd instruction, 116

gaps in pipeline, 54–55,
55
.
See also

cycles to execute, 118

bubbles in pipeline

on G4e, 166

gates, 1

on PowerPC 603, 121

GCT.
See
group completion

fmul instruction

table (GCT)

cycles to execute, 118

general issue queue (GIQ), for

throughput on Intel

G4e, 146

processors, 261

general-purpose registers (GPRs), 17

forward branch, 30

and bit count, 181

forwarding by pipelined

on PPC ISA, 164

processors, 75

gigahertz race, 110

four-way set associative mapping,

GIQ (general issue queue),

226–227,
228

for G4e, 146

vs. two-way, 229

global predictor table, on

FPCW (floating-point control

PowerPC 970, 196

word), in Intel Core

GPRs.
See
general-purpose registers

Duo, 252

(GPRs)

FPU.
See
floating-point execution

group completion table (GCT), 210

unit (FPU)

internal fragmentation, 199

fractional values,

on PowerPC 970, 198–199

approximations of, 66

group dispatch on PowerPC 970,

front end, 38,
38

197, 199

for Pentium Pro, 94–100

conclusions, 199–200

for PowerPC 601, 113–115

performance implications,

instruction queue, 113–114

211–213

instruction scheduling,

114–115

for PowerPC 603, 122

H

for PowerPC 604, 126–128

Hammer processor architecture, 180

for PowerPC 750, 130–132

hard drives

for PowerPC 970, 194–195

vs. other data storage, 217

front-end branch target buffer,

page file on, 218

87, 147

INDEX

281

hardware

execution time, trace cache and,

ISA implementation by, 70

150–151

moving complexity to software,

fetch, 28

73–74

fetch logic, on PowerPC 970, 196

hardware loop buffer, 240

fetch stages, for G4e, 145

Harvard architecture level 1 cache,

field, 12

6, 81

latency, pipeline stalls and, 57–58

hazards, 74

pool, for Pentium 4, 149, 159

control, 78

queue

data, 74–76

on Core, 256

structural, 76–77

on PowerPC 601, 113–114

high-level languages (HLLs), com-

register, 26

pilers for, 104

loading, 28

scheduling, on PowerPC 601,

I

114–115

IA-64, 180

window

IBM.
See also
PowerPC (PPC)

for Core 2 Duo, 254

AltiVec development, 207

for G4, 134

POWER4 microarchitecture, 194

for Pentium 4, 141, 149, 159

RS6000, 62

for Pentium Pro, 93

System/360, 71

for PowerPC 603, 122

VMX, 70, 135, 253

for PowerPC 604, 126–128

I-cache (instruction cache), 78,

for PowerPC 750, 130–132

81, 223

instruction-level parallelism (ILP),

idiv instruction, 253

141, 196

ILP (instruction-level parallelism),

instructions, 11

141, 196

basic flow, 38–40,
39

immediate-type instruction

first, microprocessor hard-wired

format, 22

to fetch, 34

immediate values, in arithmetic

general types, 11–12

instructions, 14–16

lifecycle of, 36–37

indirect branch predictor, on

phases, 45–46

Pentium M, 245

load latency, 78

infix expressions, 89

parallel execution of, 63

in-order instruction dispatch

per clock, and superscalar

rule, 127

computers, 64–65

input, 2

preventing execution out of

input operands, 8

order, 96

input-output (I/O) unit, 26

rules of dispatch on PowerPC 604,

instruction

127–128

bus, 5

throughput, 53–54

cache (I-cache), 78, 81, 223

writing results back to register, 98

completion rate

instruction set, 22, 69–70

of microprocessor, 45, 51

instruction set architecture (ISA)

Other books

Deja Vu by Fern Michaels
Divas and Dead Rebels by Virginia Brown
Best Laid Trap by Rob Rosen
What I Remember Most by Cathy Lamb
Only the Wicked by Gary Phillips
The Season by Sarah MacLean
Broken by Crane, Robert J.