Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
Stages 3 and 4: Trace Cache Fetch ........................................................... 155
Stages 6 Through 8: Allocate and Rename (ROB) ....................................... 155
Stages 10 Through 12: Schedule .............................................................. 156
Contents in Detail
xi
Stages 15 and 16: Register Files .............................................................. 158
Stage 19: Branch Check ......................................................................... 158
Stages 21 and Onward: Complete and Commit ......................................... 158
INTEL’S PENTIUM 4 VS. MOTOROLA’S G4E:
The G4e’s IUs: Making the Common Case Fast ........................................... 163
The Pentium 4’s IUs: Make the Common Case Twice as Fast ......................... 164
Concluding Remarks on the G4e’s and Pentium 4’s FPUs ............................. 168
A Brief Overview of Vector Computing ...................................................... 168
Vectors Revisited: The AltiVec Instruction Set ............................................... 169
The G4e’s VU: SIMD Done Right .............................................................. 173
The Pentium 4’s Vector Unit: Alphabet Soup Done Quickly .......................... 176
Increasing Floating-Point Performance with SSE2 ........................................ 177
Intel’s IA-64 and AMD’s
x
86-64 ............................................................................. 180
The Benefits of Increased Dynamic Range, or,
How the Existing 64-Bit Computing Market Uses 64-Bit Integers .............. 184
Virtual Address Space vs. Physical Address Space ...................................... 185
The 64-Bit Alternative:
x
86-64 ............................................................................... 187
xii
Contents in Detail
The Trade-Off: Decode, Cracking, and Group Formation .......................................... 196
Predecoding and Group Dispatch ............................................................. 199
Some Preliminary Conclusions on the 970’s Group Dispatch Scheme ............ 199
Integer Unit, Condition Register Unit, and Branch Unit ................................. 201
The Integer Units Are Not Fully Symmetric ................................................. 201
Integer Unit Latencies and Throughput ....................................................... 202
Preliminary Conclusions About the 970’s Integer Performance ...................... 203
Integer and Load-Store Issue Queues ......................................................... 210
BU and CRU Issue Queues ....................................................................... 210
The Performance Implications of the 970’s Group Dispatch Scheme ........................... 211
UNDERSTANDING CACHING AND PERFORMANCE
Example: A Byte’s Brief Journey Through the Memory Hierarchy ................... 218
Temporal Locality of Code and Data ......................................................... 222
N
-Way Set Associative Mapping ........................................................................... 226
Four-Way Set Associative Mapping ........................................................... 226
Two-Way Set Associative Mapping ........................................................... 228
Two-Way vs. Direct-Mapped .................................................................... 229
Two-Way vs. Four-Way ........................................................................... 229
Contents in Detail
xiii
Temporal and Spatial Locality Revisited: Replacement/Eviction Policies and
Types of Replacement/Eviction Policies ...................................................... 230
INTEL’S PENTIUM M, CORE DUO, AND CORE 2 DUO
Dynamic Power Density ............................................................................ 237
Static Power Density ................................................................................. 238
The Decode Phase: Micro-ops Fusion ......................................................... 240
Summary: The Pentium M in Historical Context ........................................... 246
Summary: Core Duo in Historical Context ................................................... 254