Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
application domain where large amounts of memory come in handy is in
simulation and modeling. Under this heading you could put various CAD
186
Chapter 9
itm09_03.fm Page 187 Thursday, January 11, 2007 10:37 AM
tools and 3D rendering programs, as well as things like weather and scientific
simulations, and even, as I’ve already half-jokingly referred to, real-time 3D
games. Though the current crop of 3D games (as of 2006) probably wouldn’t
benefit from greater than 4GB of address space, it’s certain that you’ll see a
game that benefits from greater than 4GB of address space within the next
few years.
There is one drawback to the increase in memory space that 64-bit
addressing affords. Because memory address values (or
pointers
, in pro-
grammer lingo) are now twice as large, they take up twice as much cache
space. Pointers normally make up only a fraction of all the data in the cache,
but when that fraction doubles, it can squeeze other useful data out of the
cache and degrade performance.
NOTE
Some of you who read the preceding discussion would no doubt point out that 32-bit
Xeon systems are available with more than 4GB of RAM. Furthermore, Intel allegedly has
a fairly simple hack that it could implement to allow its 32-bit systems to address up to
512GB of memory. Still, the cleanest and most future-proof way to address the 4GB
ceiling is a 64-bit pointer.
The 64-Bit Alternative:
x
86-64
When AMD set out to alter the
x
86 ISA in order to bring it into the world of 64-bit computing, they took the opportunity to do more than just widen the
GPRs.
x
86-64 makes a number of improvements to
x
86, and this section looks at some of them.
Extended Registers
I don’t want to get into a historical discussion of the evolution of what
eventually became the modern
x
86 ISA, as Intel’s hardware went from 4-bit to 8-bit to 16-bit to 32-bit. You can find such discussions elsewhere, if you’re interested. I’ll only point out that what we now consider to be the “
x
86 ISA”
was first introduced in 1978 with the release of the 8086. The 8086 had four
16-bit integer registers and four 16-bit registers that were intended to hold
memory addresses but also could be used as integer registers. (The four
integer registers, though, could not be used to store memory addresses in
16-bit addressing mode.) This gave the 8086 a total of eight integer registers,
four of which could also be used to store addresses.
With the release of the 386, Intel extended the
x
86 ISA to support 32-
bit integers by doubling the size of original eight 16-bit registers. In order
to access the extended portion of these registers, assembly language pro-
grammers used a different set of register mnemonics.
With
x
86-64, AMD has done pretty much the same thing that Intel did to
enable the 16-bit to 32-bit transition—it has doubled the sizes of the eight
GPRs and assigned new mnemonics to the extended registers. However,
extending the existing eight GPRs isn’t the only change AMD made to the
x
86 register model.
64-Bit Computing and
x
86-64
187
More Registers
One of the oldest and longest-running gripes about
x
86 is that the programming model has only eight GPRs, eight FPRs, and eight SIMD registers. All
newer RISC ISAs support many more architectural registers; the PowerPC
ISA, for instance, specifies 32 of each type of register. Increasing the number
of registers allows the processor to keep more data where the execution units
can access it immediately; this translates into a reduced number of loads and
stores, which means less memory subsystem traffic and less waiting for data to
load. More registers also give the compiler or programmer more flexibility to
schedule instructions so that dependencies are reduced and pipeline bubbles
are kept to a minimum.
Modern
x
86 CPUs get around some of these limitations by means of a
trick called
register renaming
, described in Chapter 4. Register renaming involves putting extra, “hidden,” internal registers onto the die and then
dynamically mapping the programmer-visible registers to these internal,
machine-visible registers. The Pentium 4, for instance, has 128 of these
microarchitectural rename registers, which allow it to store more data closer
to the ALUs and reduce false dependencies.
In spite of the benefits of register renaming, it would still be nicer to
have more registers directly accessible to the programmer via the
x
86 ISA.
This would allow a compiler or an assembly language programmer more
flexibility and control to statically optimize the code. It would also allow a
decrease in the number of memory access instructions (loads and stores).
In extending
x
86 to 64 bits, AMD has also taken the opportunity to double the number of programmer-visible GPRs and SIMD registers.
When running in 64-bit mode,
x
86-64 programmers have access to eight
additional GPRs, for a total of 16 GPRs. Furthermore, there are eight new
SIMD registers, added for use in SSE/SSE2 code. So the number of GPRs
and SIMD registers available to
x
86-64 programmers has gone from eight
each to 16 each. Take a look at Figure 9-3, which contains a diagram from
AMD that shows the new programming model.
Notice that they’ve left the
x
87 floating-point stack alone. This is because both Intel and AMD are encouraging programmers to use SSE/SSE2 for
floating-point code, instead of
x
87. I’ve discussed the reason for this before, so I won’t recap it here.
Also notice that the PC is extended. This was done because the PC holds
the address of the next instruction, and since addresses are now 64-bit, the
PC must be widened to accommodate them.
188
Chapter 9
itm09_03.fm Page 189 Thursday, January 11, 2007 10:37 AM
32-bit x86
PC
GPRs
x87/MMX
SSE/SS2
80-bit
64-bit
128-bit
x86-64
Figure 9-3: The
x
86-64 programming model
Switching Modes
Full binary compatibility with existing
x
86 code, both 32-bit and older 16-bit flavors, is one of
x
86-64’s greatest strengths.
x
86-64 accomplishes this using a nested series of
modes
. The first and least interesting mode is
legacy mode
.
When in legacy mode, the processor functions exactly like a standard
x
86
CPU—it runs a 32-bit operating system and 32-bit code exclusively, and none
of
x
86-64’s added capabilities are turned on. Figure 9-4 illustrates how legacy mode works.
64-Bit Computing and
x
86-64
189
32-Bit OS
x
86
Apps
Legacy Mode
Figure 9-4:
x
86-64 legacy mode
In short, the Hammer in legacy mode looks like just another
x
86
processor.
It’s in the 64-bit
long mode
that things start to get interesting. To run application software in long mode, you need a 64-bit operating system. Long
mode provides two submodes—
64-bit mode
and
compatibility mode
—in which the OS can run either
x
86-64 or vanilla
x
86 code. Figure 9-5 should help you visualize how long mode works. (In this figure, x
86 Apps
includes both 32-bit and 16-bit
x
86 applications.)
64-Bit
Compatibility
Mode
Mode
64-Bit OS
x
86-64
x
86
Apps
Apps
Long Mode
Figure 9-5:
x
86-64 long mode
So, legacy
x
86 code (both 32-bit and 16-bit) runs under a 64-bit OS in
compatibility mode, and
x
86-64 code runs under a 64-bit OS in 64-bit mode.
Only code running in long mode’s 64-bit submode can take advantage of
all the new features of
x
86-64. Legacy
x
86 code running in long mode’s
190
Chapter 9
compatibility submode, for example, cannot see the extended parts of the
registers, cannot use the eight extra registers, and is limited to the first 4GB
of memory.
These modes are set for each segment of code on a per-segment
basis by means of two bits in the segment’s
code segment descriptor
. The chip examines these two bits so that it knows whether to treat a particular chunk of
code as 32-bit or 64-bit. Table 9-1 (from AMD) shows the relevant features of
each mode.
Table 9-1:
x
86-64 Modes
Operating
Application
Defaults1
Mode
System
Recompile
Required
Required
Address
Operand
Register
GPR Width
Size (Bits)
Size (Bits)
Extensions2
(Bits)
64-bit mode
Yes
64
Yes
64
Long mode3
New 64-bit OS
32
Compatibility
32
No
No
32
mode
16
32
32
Legacy mode4
Legacy 32-bit
No
No
32
or16-bit OS
16
16
1 Defaults can be overridden in most modes using an instruction prefix or system control bit.
2 Register extensions includes eight new GPRs and eight new XMM registers (also called SSE registers).
3 Long mode supports only
x
86 protected mode. It does not support
x
86 real mode or virtual-8086 mode. Also, it does not support task switching.
4 Legacy mode supports
x
86 real mode, virtual-8086 mode, and protected mode.
Notice that Table 9-1 specifies 64-bit mode’s default integer size as 32 bits.
Let me explain.
We’ve already discussed how only the integer and address operations are
really affected by the shift to 64 bits, so it makes sense that only those instructions would be affected by the change. If all the addresses are now 64-bit,
there’s no need to change anything about the address instructions apart
from their default pointer size. If a load in 32-bit legacy mode takes a 32-bit
address pointer, then a load in 64-bit mode takes a 64-bit address pointer.
Integer instructions, on the other hand, are a different matter. You don’t
always need to use 64-bit integers, and there’s no need to take up cache space
and memory bandwidth with 64-bit integers if your application needs only
smaller 32- or 16-bit ones. So it’s not in the programmer’s best interest to have the default integer size be 64 bits. Hence, the default data size for integer
instructions is 32 bits, and if you want to use a larger or smaller integer, you must add an optional
prefix
to the instruction that overrides the default. This prefix, which AMD calls the
REX prefix
(presumably for
register extension
), is one byte in length. This means that 64-bit instructions are one byte longer,
a fact that makes for slightly increased code sizes.
Increased code size is bad, because bigger code takes up more cache and
more bandwidth. However, the effect of this prefix scheme on real-world code
size depends on the number of 64-bit integer instructions in a program’s
64-Bit Computing and
x
86-64
191
instruction mix. AMD estimates that the average increase in code size from
x
86 code to equivalent
x
86-64 code is less than 10 percent, mostly due to the prefixes.
It’s essential to AMD’s plans for
x
86-64 that there be no performance
penalty for running in legacy or compatibility mode versus long mode.
The two backward-compatibility modes don’t give you the performance-
enhancing benefits of
x
86-64 (specifically, more registers), but they don’t incur any added overhead, either. A legacy 32-bit program simply ignores
x
86-64’s added features, so they don’t affect it one way or the other.
Out with the Old
In addition to beefing up the
x
86 ISA by increasing the number and sizes of its registers,
x
86-64 also slims it down by kicking out some of the older and less frequently used features that have been kept thus far in the name of
backward compatibility.
When AMD’s engineers started looking for legacy
x
86 features to jettison, the first thing to go was the segmented memory model. Programs written
to the
x
86-64 ISA use a flat, 64-bit virtual address space. Furthermore, legacy
x
86 applications running in long mode’s compatibility submode must run in protected mode. Support for real mode and virtual-8086 mode are absent in
long mode and available only in legacy mode. This isn’t too much of a hassle,