Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (3 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
2.41Mb size Format: txt, pdf, ePub

Stages 3 and 4: Trace Cache Fetch ........................................................... 155

Stage 5: Drive ........................................................................................ 155

Stages 6 Through 8: Allocate and Rename (ROB) ....................................... 155

Stage 9: Queue ..................................................................................... 156

Stages 10 Through 12: Schedule .............................................................. 156

Stages 13 and 14: Issue ......................................................................... 157

Contents in Detail

xi

Stages 15 and 16: Register Files .............................................................. 158

Stage 17: Execute ................................................................................... 158

Stage 18: Flags ..................................................................................... 158

Stage 19: Branch Check ......................................................................... 158

Stage 20: Drive ..................................................................................... 158

Stages 21 and Onward: Complete and Commit ......................................... 158

The Pentium 4’s Instruction Window ....................................................................... 159

8

INTEL’S PENTIUM 4 VS. MOTOROLA’S G4E:

THE BACK END

161

Some Remarks About Operand Formats .................................................................. 161

The Integer Execution Units .................................................................................... 163

The G4e’s IUs: Making the Common Case Fast ........................................... 163

The Pentium 4’s IUs: Make the Common Case Twice as Fast ......................... 164

The Floating-Point Units (FPUs) ................................................................................ 165

The G4e’s FPU ........................................................................................ 166

The Pentium 4’s FPU ............................................................................... 167

Concluding Remarks on the G4e’s and Pentium 4’s FPUs ............................. 168

The Vector Execution Units .................................................................................... 168

A Brief Overview of Vector Computing ...................................................... 168

Vectors Revisited: The AltiVec Instruction Set ............................................... 169

AltiVec Vector Operations ........................................................................ 170

The G4e’s VU: SIMD Done Right .............................................................. 173

Intel’s MMX ............................................................................................ 174

SSE and SSE2 ........................................................................................ 175

The Pentium 4’s Vector Unit: Alphabet Soup Done Quickly .......................... 176

Increasing Floating-Point Performance with SSE2 ........................................ 177

Conclusions ......................................................................................................... 177

9

64-BIT COMPUTING AND X86-64

179

Intel’s IA-64 and AMD’s
x
86-64 ............................................................................. 180

Why 64 Bits? ...................................................................................................... 181

What Is 64-Bit Computing? .................................................................................... 181

Current 64-Bit Applications .................................................................................... 183

Dynamic Range ...................................................................................... 183

The Benefits of Increased Dynamic Range, or,

How the Existing 64-Bit Computing Market Uses 64-Bit Integers .............. 184

Virtual Address Space vs. Physical Address Space ...................................... 185

The Benefits of a 64-Bit Address ................................................................ 186

The 64-Bit Alternative:
x
86-64 ............................................................................... 187

Extended Registers .................................................................................. 187

More Registers ........................................................................................ 188

Switching Modes .................................................................................... 189

Out with the Old ..................................................................................... 192

Conclusion .......................................................................................................... 192

xii

Contents in Detail

10

THE G5: IBM’S POWERPC 970

193

Overview: Design Philosophy ................................................................................ 194

Caches and Front End .......................................................................................... 194

Branch Prediction ................................................................................................. 195

The Trade-Off: Decode, Cracking, and Group Formation .......................................... 196

The 970’s Dispatch Rules ......................................................................... 198

Predecoding and Group Dispatch ............................................................. 199

Some Preliminary Conclusions on the 970’s Group Dispatch Scheme ............ 199

The PowerPC 970’s Back End ................................................................................ 200

Integer Unit, Condition Register Unit, and Branch Unit ................................. 201

The Integer Units Are Not Fully Symmetric ................................................. 201

Integer Unit Latencies and Throughput ....................................................... 202

The CRU ................................................................................................ 202

Preliminary Conclusions About the 970’s Integer Performance ...................... 203

Load-Store Units .................................................................................................... 203

Front-Side Bus ..................................................................................................... 204

The Floating-Point Units ......................................................................................... 205

Vector Computing on the PowerPC 970 .................................................................. 206

Floating-Point Issue Queues ................................................................................... 209

Integer and Load-Store Issue Queues ......................................................... 210

BU and CRU Issue Queues ....................................................................... 210

Vector Issue Queues ................................................................................ 211

The Performance Implications of the 970’s Group Dispatch Scheme ........................... 211

Conclusions ......................................................................................................... 213

11

UNDERSTANDING CACHING AND PERFORMANCE

215

Caching Basics .................................................................................................... 215

The Level 1 Cache ................................................................................... 217

The Level 2 Cache ................................................................................... 218

Example: A Byte’s Brief Journey Through the Memory Hierarchy ................... 218

Cache Misses ......................................................................................... 219

Locality of Reference ............................................................................................. 220

Spatial Locality of Data ............................................................................ 220

Spatial Locality of Code ........................................................................... 221

Temporal Locality of Code and Data ......................................................... 222

Locality: Conclusions ............................................................................... 222

Cache Organization: Blocks and Block Frames ........................................................ 223

Tag RAM ............................................................................................................ 224

Fully Associative Mapping ..................................................................................... 224

Direct Mapping .................................................................................................... 225

N
-Way Set Associative Mapping ........................................................................... 226

Four-Way Set Associative Mapping ........................................................... 226

Two-Way Set Associative Mapping ........................................................... 228

Two-Way vs. Direct-Mapped .................................................................... 229

Two-Way vs. Four-Way ........................................................................... 229

Associativity: Conclusions ........................................................................ 229

Contents in Detail

xiii

Temporal and Spatial Locality Revisited: Replacement/Eviction Policies and

Block Sizes ................................................................................................... 230

Types of Replacement/Eviction Policies ...................................................... 230

Block Sizes ............................................................................................. 231

Write Policies: Write-Through vs. Write-Back ........................................................... 232

Conclusions ......................................................................................................... 233

12

INTEL’S PENTIUM M, CORE DUO, AND CORE 2 DUO

235

Code Names and Brand Names ............................................................................ 236

The Rise of Power-Efficient Computing ..................................................................... 237

Power Density ...................................................................................................... 237

Dynamic Power Density ............................................................................ 237

Static Power Density ................................................................................. 238

The Pentium M ..................................................................................................... 239

The Fetch Phase ...................................................................................... 239

The Decode Phase: Micro-ops Fusion ......................................................... 240

Branch Prediction .................................................................................... 244

The Stack Execution Unit .......................................................................... 246

Pipeline and Back End ............................................................................. 246

Summary: The Pentium M in Historical Context ........................................... 246

Core Duo/Solo .................................................................................................... 247

Intel’s Line Goes Multi-Core ...................................................................... 247

Core Duo’s Improvements ......................................................................... 251

Summary: Core Duo in Historical Context ................................................... 254

Core 2 Duo ......................................................................................................... 254

The Fetch Phase ...................................................................................... 256

The Decode Phase ................................................................................... 257

Core’s Pipeline ....................................................................................... 258

Core’s Back End .................................................................................................. 258

Other books

WindDeceiver by Charlotte Boyett-Compo
The Ravens’ Banquet by Clifford Beal
Where Beauty Lies (Sophia and Ava London) by Fowler, Elle, Fowler, Blair
Overnight Male by Elizabeth Bevarly