Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (83 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
6.89Mb size Format: txt, pdf, ePub

application domain where large amounts of memory come in handy is in

simulation and modeling. Under this heading you could put various CAD

186

Chapter 9

itm09_03.fm Page 187 Thursday, January 11, 2007 10:37 AM

tools and 3D rendering programs, as well as things like weather and scientific

simulations, and even, as I’ve already half-jokingly referred to, real-time 3D

games. Though the current crop of 3D games (as of 2006) probably wouldn’t

benefit from greater than 4GB of address space, it’s certain that you’ll see a

game that benefits from greater than 4GB of address space within the next

few years.

There is one drawback to the increase in memory space that 64-bit

addressing affords. Because memory address values (or
pointers
, in pro-

grammer lingo) are now twice as large, they take up twice as much cache

space. Pointers normally make up only a fraction of all the data in the cache,

but when that fraction doubles, it can squeeze other useful data out of the

cache and degrade performance.

NOTE

Some of you who read the preceding discussion would no doubt point out that 32-bit
Xeon systems are available with more than 4GB of RAM. Furthermore, Intel allegedly has
a fairly simple hack that it could implement to allow its 32-bit systems to address up to
512GB of memory. Still, the cleanest and most future-proof way to address the 4GB

ceiling is a 64-bit pointer.

The 64-Bit Alternative:
x
86-64

When AMD set out to alter the
x
86 ISA in order to bring it into the world of 64-bit computing, they took the opportunity to do more than just widen the

GPRs.
x
86-64 makes a number of improvements to
x
86, and this section looks at some of them.

Extended Registers

I don’t want to get into a historical discussion of the evolution of what

eventually became the modern
x
86 ISA, as Intel’s hardware went from 4-bit to 8-bit to 16-bit to 32-bit. You can find such discussions elsewhere, if you’re interested. I’ll only point out that what we now consider to be the “
x
86 ISA”

was first introduced in 1978 with the release of the 8086. The 8086 had four

16-bit integer registers and four 16-bit registers that were intended to hold

memory addresses but also could be used as integer registers. (The four

integer registers, though, could not be used to store memory addresses in

16-bit addressing mode.) This gave the 8086 a total of eight integer registers,

four of which could also be used to store addresses.

With the release of the 386, Intel extended the
x
86 ISA to support 32-

bit integers by doubling the size of original eight 16-bit registers. In order

to access the extended portion of these registers, assembly language pro-

grammers used a different set of register mnemonics.

With
x
86-64, AMD has done pretty much the same thing that Intel did to

enable the 16-bit to 32-bit transition—it has doubled the sizes of the eight

GPRs and assigned new mnemonics to the extended registers. However,

extending the existing eight GPRs isn’t the only change AMD made to the

x
86 register model.

64-Bit Computing and
x
86-64

187

More Registers

One of the oldest and longest-running gripes about
x
86 is that the programming model has only eight GPRs, eight FPRs, and eight SIMD registers. All

newer RISC ISAs support many more architectural registers; the PowerPC

ISA, for instance, specifies 32 of each type of register. Increasing the number

of registers allows the processor to keep more data where the execution units

can access it immediately; this translates into a reduced number of loads and

stores, which means less memory subsystem traffic and less waiting for data to

load. More registers also give the compiler or programmer more flexibility to

schedule instructions so that dependencies are reduced and pipeline bubbles

are kept to a minimum.

Modern
x
86 CPUs get around some of these limitations by means of a

trick called
register renaming
, described in Chapter 4. Register renaming involves putting extra, “hidden,” internal registers onto the die and then

dynamically mapping the programmer-visible registers to these internal,

machine-visible registers. The Pentium 4, for instance, has 128 of these

microarchitectural rename registers, which allow it to store more data closer

to the ALUs and reduce false dependencies.

In spite of the benefits of register renaming, it would still be nicer to

have more registers directly accessible to the programmer via the
x
86 ISA.

This would allow a compiler or an assembly language programmer more

flexibility and control to statically optimize the code. It would also allow a

decrease in the number of memory access instructions (loads and stores).

In extending
x
86 to 64 bits, AMD has also taken the opportunity to double the number of programmer-visible GPRs and SIMD registers.

When running in 64-bit mode,
x
86-64 programmers have access to eight

additional GPRs, for a total of 16 GPRs. Furthermore, there are eight new

SIMD registers, added for use in SSE/SSE2 code. So the number of GPRs

and SIMD registers available to
x
86-64 programmers has gone from eight

each to 16 each. Take a look at Figure 9-3, which contains a diagram from

AMD that shows the new programming model.

Notice that they’ve left the
x
87 floating-point stack alone. This is because both Intel and AMD are encouraging programmers to use SSE/SSE2 for

floating-point code, instead of
x
87. I’ve discussed the reason for this before, so I won’t recap it here.

Also notice that the PC is extended. This was done because the PC holds

the address of the next instruction, and since addresses are now 64-bit, the

PC must be widened to accommodate them.

188

Chapter 9

itm09_03.fm Page 189 Thursday, January 11, 2007 10:37 AM

32-bit x86

PC

GPRs

x87/MMX

SSE/SS2

80-bit

64-bit

128-bit

x86-64

Figure 9-3: The
x
86-64 programming model

Switching Modes

Full binary compatibility with existing
x
86 code, both 32-bit and older 16-bit flavors, is one of
x
86-64’s greatest strengths.
x
86-64 accomplishes this using a nested series of
modes
. The first and least interesting mode is
legacy mode
.

When in legacy mode, the processor functions exactly like a standard
x
86

CPU—it runs a 32-bit operating system and 32-bit code exclusively, and none

of
x
86-64’s added capabilities are turned on. Figure 9-4 illustrates how legacy mode works.

64-Bit Computing and
x
86-64

189

32-Bit OS

x
86

Apps

Legacy Mode

Figure 9-4:
x
86-64 legacy mode

In short, the Hammer in legacy mode looks like just another
x
86

processor.

It’s in the 64-bit
long mode
that things start to get interesting. To run application software in long mode, you need a 64-bit operating system. Long

mode provides two submodes—
64-bit mode
and
compatibility mode
—in which the OS can run either
x
86-64 or vanilla
x
86 code. Figure 9-5 should help you visualize how long mode works. (In this figure, x
86 Apps
includes both 32-bit and 16-bit
x
86 applications.)

64-Bit

Compatibility

Mode

Mode

64-Bit OS

x
86-64

x
86

Apps

Apps

Long Mode

Figure 9-5:
x
86-64 long mode

So, legacy
x
86 code (both 32-bit and 16-bit) runs under a 64-bit OS in

compatibility mode, and
x
86-64 code runs under a 64-bit OS in 64-bit mode.

Only code running in long mode’s 64-bit submode can take advantage of

all the new features of
x
86-64. Legacy
x
86 code running in long mode’s
190

Chapter 9

compatibility submode, for example, cannot see the extended parts of the

registers, cannot use the eight extra registers, and is limited to the first 4GB

of memory.

These modes are set for each segment of code on a per-segment

basis by means of two bits in the segment’s
code segment descriptor
. The chip examines these two bits so that it knows whether to treat a particular chunk of

code as 32-bit or 64-bit. Table 9-1 (from AMD) shows the relevant features of

each mode.

Table 9-1:
x
86-64 Modes

Operating

Application

Defaults1

Mode

System

Recompile

Required

Required

Address

Operand

Register

GPR Width

Size (Bits)

Size (Bits)

Extensions2

(Bits)

64-bit mode

Yes

64

Yes

64

Long mode3

New 64-bit OS

32

Compatibility

32

No

No

32

mode

16

32

32

Legacy mode4

Legacy 32-bit

No

No

32

or16-bit OS

16

16

1 Defaults can be overridden in most modes using an instruction prefix or system control bit.

2 Register extensions includes eight new GPRs and eight new XMM registers (also called SSE registers).

3 Long mode supports only
x
86 protected mode. It does not support
x
86 real mode or virtual-8086 mode. Also, it does not support task switching.

4 Legacy mode supports
x
86 real mode, virtual-8086 mode, and protected mode.

Notice that Table 9-1 specifies 64-bit mode’s default integer size as 32 bits.

Let me explain.

We’ve already discussed how only the integer and address operations are

really affected by the shift to 64 bits, so it makes sense that only those instructions would be affected by the change. If all the addresses are now 64-bit,

there’s no need to change anything about the address instructions apart

from their default pointer size. If a load in 32-bit legacy mode takes a 32-bit

address pointer, then a load in 64-bit mode takes a 64-bit address pointer.

Integer instructions, on the other hand, are a different matter. You don’t

always need to use 64-bit integers, and there’s no need to take up cache space

and memory bandwidth with 64-bit integers if your application needs only

smaller 32- or 16-bit ones. So it’s not in the programmer’s best interest to have the default integer size be 64 bits. Hence, the default data size for integer

instructions is 32 bits, and if you want to use a larger or smaller integer, you must add an optional
prefix
to the instruction that overrides the default. This prefix, which AMD calls the
REX prefix
(presumably for
register extension
), is one byte in length. This means that 64-bit instructions are one byte longer,

a fact that makes for slightly increased code sizes.

Increased code size is bad, because bigger code takes up more cache and

more bandwidth. However, the effect of this prefix scheme on real-world code

size depends on the number of 64-bit integer instructions in a program’s

64-Bit Computing and
x
86-64

191

instruction mix. AMD estimates that the average increase in code size from

x
86 code to equivalent
x
86-64 code is less than 10 percent, mostly due to the prefixes.

It’s essential to AMD’s plans for
x
86-64 that there be no performance

penalty for running in legacy or compatibility mode versus long mode.

The two backward-compatibility modes don’t give you the performance-

enhancing benefits of
x
86-64 (specifically, more registers), but they don’t incur any added overhead, either. A legacy 32-bit program simply ignores

x
86-64’s added features, so they don’t affect it one way or the other.

Out with the Old

In addition to beefing up the
x
86 ISA by increasing the number and sizes of its registers,
x
86-64 also slims it down by kicking out some of the older and less frequently used features that have been kept thus far in the name of

backward compatibility.

When AMD’s engineers started looking for legacy
x
86 features to jettison, the first thing to go was the segmented memory model. Programs written

to the
x
86-64 ISA use a flat, 64-bit virtual address space. Furthermore, legacy
x
86 applications running in long mode’s compatibility submode must run in protected mode. Support for real mode and virtual-8086 mode are absent in

long mode and available only in legacy mode. This isn’t too much of a hassle,

Other books

How to Disappear by Duncan Fallowell
A Half Forgotten Song by Katherine Webb
The English Teacher by Lily King
The Sisterhood by Barr, Emily
Nuts and Buried by Elizabeth Lee
The Field by Lynne McTaggart
The Dime Museum Murders by Daniel Stashower