Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (99 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
13.27Mb size Format: txt, pdf, ePub

Intel
x
86 processors” (working paper). Swox AB, September 2005.

http://www.swox.com/doc/x86-timing.pdf.

Gwennap, Linley. “Intel’s MMX Speeds Multimedia: Instruction-Set Exten-

sions to Aid Audio, Video, and Speech.”
Microprocessor Report
10, no. 3

(March 1996).

Mittal, Millind, Alex Peleg, and Uri Weiser. “MMX Technology Architecture

Overview.”
Intel Technology Journal
1, no. 1 (August 1997).

Thakkar, Shreekant and Tom Huff. “The Internet Streaming SIMD Exten-

sions.”
Intel Technology Journal
3, no. 2 (May 1999).

Pentium and P6 Family

Case, Brian. “Intel Reveals Pentium Implementation Details: Architectural

Enhancements Remain Shrouded by NDA.”
Microprocessor Report
7,

no. 4 (March 1993).

Fog, Agner. “How to optimize for the Pentium family of microprocessors,” 2004.

Fog, Agner.
Software optimization resources
.“The microarchitecture of Intel and AMD CPU’s: An optimization guide for assembly programmers and

compiler makers.” 2006. http://www.agner.org/optimize.

Gwennap, Linley. “Intel’s P6 Uses Decoupled Superscalar Design: Next Gen-

eration of
x
86 Integrates L2 Cache in Package with CPU.”
Microprocessor
Report
9, no. 2 (February 16, 1995).

Keshava, Jagannath and Vladimir Pentkovski. “Pentium III Processor Imple-

mentation Tradeoffs.”
Intel Technology Journal
3, no. 2, (Q2 1999).

Intel Architecture Optimization Manual
. Intel, 2001.

Intel Architecture Software Developer’s Manual
, vols. 1–3. Intel, 2006.

P6 Family of Processors Hardware Developer’s Manual
. Intel, 1998.

Pentium II Processor Developer’s Manual
. Intel, 1997.

Pentium Pro Family Developer’s Manual
, vols. 1–3. Intel, 1995.

Bibliography and Suggested Reading

273

Pentium 4

“A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel

Pentium 4 Processor” (white paper). Intel, November 2000.

Boggs, Darrell, Aravindh Baktha, Jason Hawkins, Deborah T. Marr, J. Alan

Miller, Patrice Roussel, Ronak Singhal, Bret Toll, and K.S. Venkatraman,

“The Microarchitecture of the Intel Pentium 4 Processor on 90nm

Technology.”
Intel Technology Journal
8, no. 1 (February 2004).

DeMone, Paul. “What's Up With Willamette? (Part 1).”
Real World Technologies
(March 2000). http://www.realworldtech.com/page.cfm?ArticleID=

RWT030300000001.

Hinton, Glenn, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean,

Alan Kyker, Desktop Platforms Group, and Patrice Roussel. “The Micro-

architecture of the Pentium 4 Processor.”
Intel Technology Journal
5 no. 1

(February 2001).

Intel Pentium 4 Processor Optimization Manual
. Intel, 2001.

Pentium M, Core, and Core 2

Gochman, Simcha, Avi Mendelson, Alon Naveh, and Efraim Rotem.

“Introduction to Intel Core Duo Processor Architecture.”
Intel

Technology Journal
10, no. 2 (May 2006).

Gochman, Simcha, Ronny Ronen, Ittai Anati, Ariel Berkovits, Tsvika

Kurts, Alon Naveh, Ali Saeed, Zeev Sperber, and Robert C. Valentine.

“The Intel Pentium M Processor: Microarchitecture and Performance.”

Intel Technology Journal
7, no. 2 (May 2003).

Kanter, David. “Intel’s Next Generation Microarchitecture Unveiled.”
Real

World Technologies
(March 2006). http://realworldtech.com/page.cfm?

ArticleID=RWT030906143144.

Mendelson, Avi, Julius Mandelblat, Simcha Gochman, Anat Shemer,

Rajshree Chabukswar, Erik Niemeyer, and Arun Kumar. “CMP

Implementation in Systems Based on the Intel Core Duo Processor.”

Intel Technology Journal
10, no. 2 (May 2006).

Wechsler, Ofri. “Inside Intel Core Microarchitecture: Setting New

Standards for Energy-Efficient Performance.”
Technology@Intel

Magazine
(March 2006).

Online Resources

Ace’s Hardware

http://aceshardware.com.

AnandTech

http://anandtech.com.

ArsTechnica

http://arstechnica.com.

Real World Technologies

http://realworldtech.com.

sandpile.org

http://sandpile.org.

X-bit labs

http://xbitlabs.com.

274

Bibliography and Suggested Reading

I N D E X

Note: Page numbers in italics refer to

addpd instruction, throughput on

figures. A page number followed by an

Intel processors, 261

italic n refers to a term in the footnote of

addps instruction, throughput on

that page.

Intel processors, 261

addresses

Symbols and Numbers

on 32-bit vs. 64-bit processors, 183

64-bit, benefits of, 186–187

# (hash mark), for memory

calculation of, 17

address, 15

calculations in load-store

2-way set associative mapping,
228
,

units, 203

228–229

as integer data, 183

vs. 4-way, 229

register-relative, 16–17

vs. direct mapping, 229

virtual vs. physical space, 185–186

4-way set associative mapping,

address generation, by load-store

226–227,
228

unit, 69

vs. 2-way, 229

address space, 186

32-bit computing, vs. 64-bit,
182

addsd instruction, throughput on

32-bit integers,
x
86 ISA support

Intel processors, 261

for, 187

addss instruction, throughput on

64-bit address space, 186–187

Intel processors, 261

64-bit computing, 181–183

Advanced Micro Devices (AMD)

vs. 32-bit,
182

64-bit workstation market

current applications, 183–187

opening for, 180

64-bit mode, in
x
86-64, 190, 191

Athlon processor, 72, 110

x
86-64, 187–192

A

added registers, 188

absolute addressing, vs. register-

extended registers, 187

relative addressing, 17

programming model,
189

add instruction, 6–7, 11

switching modes, 189–192

executing, 8

AIM (Apple, IBM, Motorola)

on PowerPC, 117

alliance, 112

steps to execute, 265, 266

allocate and rename stages, on

three-operand format for

Pentium 4, 155

PowerPC, 162

AltiVec, 135.
See also
Motorola

addition of numbers, steps for

AltiVec

performing, 10

ALU.
See
arithmetic logic unit (ALU)

AMD.
See
Advanced Micro Devices

on Pentium II,
108

(AMD)

on Pentium III,
110

and instruction, cycles to execute on

on Pentium M, 246

PowerPC, 117

on Pentium Pro, 94–100,

Apple.
See also
PowerPC (PPC)

102–103,
103

G3.
See
PowerPC 750 (G3)

on PowerPC 603 and 603e,

G4.
See
PowerPC 7400 (G4)

119–121

G4e.
See
Motorola G4e

on PowerPC 604, 123–126

G5, 193.
See also
PowerPC 970

on PowerPC 970, 200–203

(G5)

backward branch, 30

Performas, 119

bandwidth, and cache block size, 232

PowerBook, 119

Banias, 236

approximations of fractional

base-10 numbering system, 183–184

values, 66

base address, 16, 17

arithmetic

BEU (branch execution unit), 69, 85

coprocessor, 67

on PowerPC 601, 116

instructions, 11, 12, 36

BHT.
See
branch history table (BHT)

actions to execute, 36–37

binary code, 21

binary code for, 21–22

for arithmetic instructions, 21–22

format, 12

binary notation, 20

immediate values in, 14–16

BIOS, 34

micro-op queue, 156

blocks and block frames, for caches,

operations, 67

223
, 223–224

arithmetic logic unit (ALU), 2, 5,
6
,

sizes of, 231–232

67–69

Boolean operations, 67

multiple on chip, 62

bootloader program, 34

on Pentium, 88–91

bootstrap, 34

on Pentium 4, schedulers for, 156

boot up, 34

storage close to, 7

BPU (branch prediction unit), 85

Arm, 73

branch check stage, in Pentium 4

assembler, 26

pipeline, 158

assembler code, 21

branch execution unit (BEU), 69, 85

assembly language, beginnings, 26

on PowerPC 601, 116

associative mapping

branch folding, 113

fully, 224,
225

by Pentium 4 trace cache, 153

n
-way set, 226–230,
227

branch hazards, 78

Athlon processor (AMD), 72, 110

branch history table (BHT), 86

average completion rate, 52–53

on PowerPC 604, 125

average instruction throughput, 54

on PowerPC 750, 132

pipeline stalls and, 56–57

on PowerPC 970, 196

branch instructions, 11, 30–34

B

and fetch-execute loop, 32

and labels, 33–34

back end, 38,
38

on PowerPC 750, 130–132

on Core 2 Duo, 258–270,
260

on PowerPC 970, 198

on Pentium, 87–91

register-relative address with, 33

floating-point ALUs, 88–91

as special type of load, 32–33

integer ALUs, 87–88

in superscalar systems, 64

276

INDEX

branch prediction, 78

Busicom, 62

on Pentium, 84, 85–87

business applications, spatial local-

on Pentium M, 244–245

ity of code for, 221–222

on Pentium Pro, 102

on PowerPC 603, 122

C

on PowerPC 970, 195–196

cache, 81–82

for trace cache, 151–152

basics, 215–219

branch prediction unit (BPU), 85

blocks and block frames for, 219,

branch target, 85, 87

223
, 223–224

branch target address cache (BTAC),

hierarchy, 218,
219

on PowerPC 604, 125

byte’s journey through,

branch target buffer (BTB), 86

218–219

P6 pipeline stage to access, 101

hit, 81, 219

on PowerPC 750, 132

level 1, 81, 217–218

branch target instruction cache

level 2, 81, 218

(BTIC), on PowerPC 750, 132

level 3, 81

branch unit

line, 219

issue queue for, 210–211

locality of reference and,

on Pentium, 85–87

220–223

on PowerPC 604, 125

memory, 81

brand names for Intel processors,

miss, 81, 217

236–237

capacity, 231

BTAC (branch target address cache),

compulsory, 219

on PowerPC 604, 125

conflict, 226

BTB.
See
branch target buffer (BTB)

placement formula, 229

BTIC (branch target instruction

placement policy, 224

cache), on PowerPC 750, 132

on PowerPC 970, 194–195

bubbles in pipeline, 54–55,
55
, 78

replacement policy, 230–232

avoiding on PowerPC 970, 196

tag RAM for, 224

insertion by PowerPC 970 front

write policies for, 232–233

end, 198

caching, instruction, 78

for Pentium 4, 164

calculator model of computing,
2
,

and pipeline depth, 143

2–3

squeezing out, 97–98

capacity miss, 231

buffers.
See also
branch target buffer

C code, 21

(BTB); reorder buffer (ROB)

Cedar Mill, 236

for completion phase, 98

central processing unit (CPU), 1.

dynamic scheduling with,
97

See also
microprocessor

fetch, on Intel Core Duo, 239

channels, 1

front-end branch target, 87, 147

chip multiprocessing,
249
,

hardware loop, 240

249–250,
250

issue, 96

chipset,
204
, 204–205

memory reorder, 265, 268

CISC (complex instruction set

between Pentium Pro front end

computing), 73, 105

and execution units, 96

CIU (complex integer unit), 68, 87

bus, 5, 204

on PowerPC 750, 130

INDEX

277

clock, 29–30

completion queue, on PowerPC 603,

cycle

122

CPU vs. memory and bus, 216

completion rate

instruction completion per,

and clock period, 58–60

53–54

and program execution time,

in pipelined processor, 47

51–52

generator module, 29–30

complex instruction set computing

period, and completion rate,

(CISC), 73, 105

58–60

complex integer instructions, 201

speed

complex integer unit (CIU), 68, 87

and dynamic power

on PowerPC 750, 130

density, 237

complexity, moving from hardware

Other books

A Cockney's Journey by Eddie Allen
Quest for Honour by Sam Barone
Waterfront Weddings by Annalisa Daughety
Hope by Sam Crescent
Crow Boy by Maureen Bush
The Final Shortcut by G. Bernard Ray