Chapter 9: 64-Bit Computing and
x
86-64 ..................................................................... 179
Chapter 10: The G5: IBM’s PowerPC 970
.................................................................... 193
Chapter 11: Understanding Caching and Performance
................................................... 215
Chapter 12: Intel’s Pentium M, Core Duo, and Core 2 Duo
............................................. 235
Bibliography and Suggested Reading ........................................................................... 271
Index ........................................................................................................................ 275
C O N T E N T S I N D E T A I L
PREFACE
xv
ACKNOWLEDGMENTS
xvii
INTRODUCTION
xix
1
BASIC COMPUTING CONCEPTS
1
The Calculator Model of Computing ........................................................................... 2
The File-Clerk Model of Computing ............................................................................. 3
The Stored-Program Computer ...................................................................... 4
Refining the File-Clerk Model ........................................................................ 6
The Register File ....................................................................................................... 7
RAM: When Registers Alone Won’t Cut It ................................................................... 8
The File-Clerk Model Revisited and Expanded ................................................. 9
An Example: Adding Two Numbers ............................................................. 10
A Closer Look at the Code Stream: The Program ........................................................ 11
General Instruction Types ........................................................................... 11
The DLW-1’s Basic Architecture and Arithmetic Instruction Format .................... 12
A Closer Look at Memory Accesses: Register vs. Immediate ......................................... 14
Immediate Values ...................................................................................... 14
Register-Relative Addressing ....................................................................... 16
2
THE MECHANICS OF PROGRAM EXECUTION
19
Opcodes and Machine Language ............................................................................ 19
Machine Language on the DLW-1 ............................................................... 20
Binary Encoding of Arithmetic Instructions .................................................... 21
Binary Encoding of Memory Access Instructions ............................................ 23
Translating an Example Program into Machine Language ............................... 25
The Programming Model and the ISA ....................................................................... 26
The Programming Model ............................................................................ 26
The Instruction Register and Program Counter ............................................... 26
The Instruction Fetch: Loading the Instruction Register ..................................... 28
Running a Simple Program: The Fetch-Execute Loop ....................................... 28
The Clock .............................................................................................................. 29
Branch Instructions .................................................................................................. 30
Unconditional Branch ................................................................................ 30
Conditional Branch ................................................................................... 30
Excursus: Booting Up .............................................................................................. 34
3
PIPELINED EXECUTION
35
The Lifecycle of an Instruction ................................................................................... 36
Basic Instruction Flow .............................................................................................. 38
Pipelining Explained ............................................................................................... 40
Applying the Analogy ............................................................................................. 43
A Non-Pipelined Processor ......................................................................... 43
A Pipelined Processor ................................................................................ 45
The Speedup from Pipelining ...................................................................... 48
Program Execution Time and Completion Rate .............................................. 51
The Relationship Between Completion Rate and Program Execution Time ......... 52
Instruction Throughput and Pipeline Stalls ..................................................... 53
Instruction Latency and Pipeline Stalls .......................................................... 57
Limits to Pipelining ..................................................................................... 58
4
SUPERSCALAR EXECUTION
61
Superscalar Computing and IPC .............................................................................. 64
Expanding Superscalar Processing with Execution Units .............................................. 65
Basic Number Formats and Computer Arithmetic ........................................... 66
Arithmetic Logic Units ................................................................................ 67
Memory-Access Units ................................................................................. 69
Microarchitecture and the ISA .................................................................................. 69
A Brief History of the ISA ........................................................................... 71
Moving Complexity from Hardware to Software ............................................ 73
Challenges to Pipelining and Superscalar Design ....................................................... 74
Data Hazards ........................................................................................... 74
Structural Hazards ..................................................................................... 76
The Register File ....................................................................................... 77
Control Hazards ....................................................................................... 78
5
THE INTEL PENTIUM AND PENTIUM PRO
79
The Original Pentium .............................................................................................. 80
Caches .................................................................................................... 81
The Pentium’s Pipeline ................................................................................ 82
The Branch Unit and Branch Prediction ........................................................ 85
The Pentium’s Back End .............................................................................. 87
x
86 O
verhead on the Pentium .................................................................... 91
Summary: The Pentium in Historical Context ................................................. 92
The Intel P6 Microarchitecture: The Pentium Pro .......................................................... 93
Decoupling the Front End from the Back End ................................................. 94
The P6 Pipeline ....................................................................................... 100
Branch Prediction on the P6 ...................................................................... 102
The P6 Back End ..................................................................................... 102
CISC, RISC, and Instruction Set Translation ................................................. 103
The P6 Microarchitecture’s Instruction Decoding Unit ................................... 106
The Cost of
x
86 Legacy Support on the P6 ................................................. 107
Summary: The P6 Microarchitecture in Historical Context ............................. 107
Conclusion .......................................................................................................... 110
x
Contents in Detail
6
POWERPC PROCESSORS: 600 SERIES,
700 SERIES, AND 7400
111
A Brief History of PowerPC .................................................................................... 112
The PowerPC 601 ................................................................................................ 112
The 601’s Pipeline and Front End .............................................................. 113
The 601’s Back End ................................................................................ 115
Latency and Throughput Revisited .............................................................. 117
Summary: The 601 in Historical Context .................................................... 118
The PowerPC 603 and 603e ................................................................................. 118
The 603e’s Back End ............................................................................... 119
The 603e’s Front End, Instruction Window, and Branch Prediction ................ 122
Summary: The 603 and 603e in Historical Context ..................................... 122
The PowerPC 604 ................................................................................................ 123
The 604’s Pipeline and Back End .............................................................. 123
The 604’s Front End and Instruction Window ............................................. 126
Summary: The 604 in Historical Context .................................................... 129
The PowerPC 604e .............................................................................................. 129
The PowerPC 750 (aka the G3) ............................................................................. 129
The 750’s Front End, Instruction Window, and Branch Instruction .................. 130
Summary: The PowerPC 750 in Historical Context ....................................... 132
The PowerPC 7400 (aka the G4) ........................................................................... 133
The G4’s Vector Unit ............................................................................... 135
Summary: The PowerPC G4 in Historical Context ........................................ 135
Conclusion .......................................................................................................... 135
7
INTEL’S PENTIUM 4 VS. MOTOROLA’S G4E:
APPROACHES AND DESIGN PHILOSOPHIES
137
The Pentium 4’s Speed Addiction ........................................................................... 138
The General Approaches and Design Philosophies of the Pentium 4 and G4e ............. 141
An Overview of the G4e’s Architecture and Pipeline ............................................... 144
Stages 1 and 2: Instruction Fetch ............................................................... 145
Stage 3: Decode/Dispatch ....................................................................... 145
Stage 4: Issue ......................................................................................... 146
Stage 5: Execute ..................................................................................... 146
Stages 6 and 7: Complete and Write-Back ................................................. 147
Branch Prediction on the G4e and Pentium 4 ........................................................... 147
An Overview of the Pentium 4’s Architecture ........................................................... 148
Expanding the Instruction Window ............................................................ 149
The Trace Cache ..................................................................................... 149
An Overview of the Pentium 4’s Pipeline ................................................................. 155
Stages 1 and 2: Trace Cache Next Instruction Pointer .................................. 155