Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
of the program’s immediate address values have to be changed to reflect the
data segment’s actual location in memory.
Because both memory addresses and regular integer numbers are stored
in the same registers, these registers are called
general-purpose registers (GPRs)
.
On the DLW-1, A, B, C, and D are all GPRs.
Basic Computing Concepts
17
T H E M E C H A N I C S O F P R O G R A M
E X E C U T I O N
Now that we understand the basics of computer organi-
zation, it’s time to take a closer look at the nuts and
bolts of how stored programs are actually executed by
the computer. To that end, this chapter will cover
core programming concepts like machine language,
the programming model, the instruction set architec-
ture, branch instructions, and the fetch-execute loop.
Opcodes and Machine Language
If you’ve been following the discussion so far, it shouldn’t surprise you to
learn that both memory addresses and instructions are ordinary numbers
that can be stored in memory. All of the instructions in a program like
Program 1-1 are represented inside the computer as strings of numbers.
Indeed, a program is one long string of numbers stored in a series of
memory locations.
How is a program like Program 1-1 rendered in numerical notation so
that it can be stored in memory and executed by the computer? The answer
is simpler than you might think.
As you may already know, a computer actually only understands 1s and
0s (or “high” and “low” electric voltages), not English words like
add
,
load
, and
store
, or letters and base-10 numbers like A, B, 12, and 13. In order for the computer to run a program, therefore, all of its instructions must be rendered
in
binary notation
. Think of translating English words into Morse code’s dots and dashes and you’ll have some idea of what I’m talking about.
Machine Language on the DLW-1
The translation of programs of any complexity into this binary-based
machine
language
is a massive undertaking that’s meant to be done by a computer, but I’ll show you the basics of how it works so you can understand what’s going
on. The following example is simplified, but useful nonetheless.
The English words in a program, like
add
,
load
, and
store
, are
mnemonics
(meaning they’re easy for people to remember), and they’re all mapped to
strings of binary numbers, called
opcodes
, that the computer can understand.
Each opcode designates a different operation that the processor can perform.
Table 2-1 maps each of the mnemonics used in Chapter 1 to a 3-bit opcode
for the hypothetical DLW-1 microprocessor. We can also map the four
register names to 2-bit binary codes, as shown in Table 2-2.
Table 2-1:
Mapping of Mnemonics to
Opcodes for the DLW-1
Mnemonic
Opcode
add
000
sub
001
load
010
store
011
Table 2-2:
Mapping of Registers to
Binary Codes for the DLW-1
Register
Binary Code
A
00
B
01
C
10
D
11
The binary values representing both the opcodes and the register codes
are arranged in one of a number of 16-bit (or 2-byte) formats to get a complete
machine language instruction,
which is a binary number that can be stored in RAM and used by the processor.
20
Chapter 2
NOTE
Because programmer-written instructions must be translated into binary codes before
a computer can read them, it is common to see programs in any format—binary,
assembly, or a high-level language like BASIC or C, referred to generically as
“code” or “codes.” So programmers sometimes speak of “assembler code,” “binary
code,” or “C code,” when referring to programs written in assembly, binary, or C
language. Programmers also will often describe the act of programming as “writing
code” or “coding.” I have adopted this terminology in this book, and will henceforth
use the term “code” regularly to refer generically to instruction sequences and
programs.
Binary Encoding of Arithmetic Instructions
Arithmetic instructions have the simplest machine language instruction
formats, so we’ll start with them. Figure 2-1 shows the format for the machine
language encoding of a
register-type
arithmetic instruction.
0
1
2
3
4
5
6
7
mode
opcode
source1
source2
Byte 1
8
9
10
11
12
13
14
15
destination
000000
Byte 2
Figure 2-1: Machine language format for a register-type instruction
In a register-type arithmetic instruction (that is, an arithmetic instruc-
tion that uses only registers and no immediate values), the first bit of the
instruction is the
mode bit
. If the mode bit is set to 0, then the instruction is a register-type instruction; if it’s set to 1, then the instruction is of the immediate type.
Bits 1–3 of the instruction specify the opcode, which tells the computer
what type of operation the instruction represents. Bits 4–5 specify the instruc-
tion’s first source register, 6–7 specify the second source register, and 8–9
specify the destination register. The last six bits are not needed by register-to-register arithmetic instructions, so they’re padded with 0s (they’re
zeroed out
in computer jargon) and ignored.
Now, let’s use the binary values in Tables 2-1 and 2-2 to translate the add
instruction in line 3 of Program 1-1 into a 2-byte (or 16-bit) machine language
instruction:
Assembly Language Instruction
Machine Language Instruction
add A, B, C
00000001 10000000
The Mechanics of Program Execution
21
Here are a few more examples of arithmetic instructions, just so you can
get the hang of it:
Assembly Language Instruction
Machine Language Instruction
add C, D, A
00001011 00000000
add D, B, C
00001101 10000000
sub A, D, C
00010011 10000000
Increasing the number of binary digits in the opcode and register
fields increases the total number of instructions the machine can use and the
number of registers it can have. For example, if you know something about
binary notation, then you probably know that a 3-bit opcode allows the pro-
cessor to map up to 23 mnemonics, which means that it can have up to 23, or
8, instructions in its
instruction set
; increasing the opcode size to 8 bits would allow the processor’s instruction set to contain up to 28, or 256, instructions.
Similarly, increasing the number of bits in the register field increases the
possible number of registers that the machine can have.
Arithmetic instructions containing an immediate value use an
immediate-
type
instruction format, which is slightly different from the register-type format we just saw. In an immediate-type instruction, the first byte contains the
opcode, the source register, and the destination register, while the second
byte contains the immediate value, as shown in Figure 2-2.
0
1
2
3
4
5
6
7
mode
opcode
source
destination
Byte 1
8
9
10
11
12
13
14
15
8-bit immediate value
Byte 2
Figure 2-2: Machine language format for an immediate-type instruction
Here are a few immediate-type arithmetic instructions translated from
assembly language to machine language:
Assembly Language Instruction
Machine Language Instruction
add C, 8, A
10001000 00001000
add 5, A, C
10000010 00000101
sub 25, D, C
10011110 00011001
22
Chapter 2
Binary Encoding of Memory Access Instructions
Memory-access instructions use both register- and immediate-type instruction
formats exactly like those shown for arithmetic instructions. The only
difference lies in how they use them. Let’s take the case of a load first.
The load Instruction
We’ve previously seen two types of load, the first of which was the immediate
type. An immediate-type load (see Figure 2-3) uses the immediate-type
instruction format, but because the load’s source is an immediate value (a
memory address) and not a register, the source field is unneeded and must
be zeroed out. (The source field is not ignored, though, and in a moment
we’ll see what happens if it isn’t zeroed out.)
0
1
2
3
4
5
6
7
mode
opcode
00
destination
Byte 1
8
9
10
11
12
13
14
15
8-bit immediate source address
Byte 2
Figure 2-3: Machine language format for an immediate-type load
Now let’s translate the immediate-type load in line 1 of Program 1-1 (12 is
1100 in binary notation):
Assembly Language Instruction
Machine Language Instruction
load #12, A
10100000 00001100
The 2-byte machine language instruction on the right is a binary repre-
sentation of the assembly language instruction on the left. The first byte
corresponds to an immediate-type load instruction that takes register A as its
destination. The second byte is the binary representation of the number 12,
which is the source address in memory that the data is to be loaded from.
The second type of load we’ve seen is the register type. A register-type
load uses the register-type instruction format, but with the source2 field
zeroed out and ignored, as shown in Figure 2-4.
In Figure 2-4, the source1 field specifies the register containing the
memory address that the processor is to load data from, and the destination
field specifies the register that the loaded data is to be placed in.
The Mechanics of Program Execution
23
0
1
2
3
4
5
6
7
mode
opcode
source1
00
Byte 1
8
9
10
11
12
13
14
15
destination
000000
Byte 2
Figure 2-4: Machine language format for a register-type load
For a register-relative addressed load, we use a version of the immediate-
type instruction format, shown in Figure 2-5, with the base field specifying
the register that contains the base address and the offset stored in the second
byte of the instruction.
0
1
2
3
4
5
6
7
mode
opcode
base
destination
Byte 1
8
9
10
11
12
13
14
15
8-bit immediate offset
Byte 2
Figure 2-5: Machine language format for a register-relative load
Recall from Table 2-2 that 00 is the binary number that designates
register A. Therefore, as a result of the DLW-1’s particular machine language
encoding scheme, any register but A could theoretically be used to store the
base address for a register-relative load.
The store Instruction
The register-type binary format for a store instruction is the same as it is for a load, except that the destination field specifies a register containing a destination memory address, and the source1 field specifies the register contain-
ing the data to be stored to memory.
The immediate-type machine language format for a store, pictured in
Figure 2-6, is also similar to the immediate-type format for a load, except that since the destination register is not needed (the destination is the immediate
memory address) the destination field is zeroed out, while the source field
specifies which register holds the data to be stored.
24
Chapter 2
itm02_03.fm Page 25 Thursday, January 11, 2007 10:44 AM
0
1
2
3
4
5
6
7
mode
opcode
source
00
Byte 1
8
9
10
11
12
13
14
15
8-bit immediate destination address
Byte 2
Figure 2-6: Machine language format for an immediate-type store
The register-relative store, on the other hand, uses the same immediate-
type instruction format used for the register-relative load (Figure 2-5), but