The power of a
machine can sometimes be increased by expanding its
external storage. For example, a pushdown automaton becomes more
powerful when we give it access to a second stack, because two stacks
can be used to simulate an infinite tape: each stack stores the
characters from one half of the simulated tape, and the PDA can pop and
push characters between the stacks to simulate the motion of the tape
head, just like ourTape
implementation in
Simulation
. Any
finite state machine with access to an infinite tape is effectively a
Turing machine, so just adding an extra stack makes a pushdown automaton
significantly more powerful.
It’s therefore reasonable to expect that a Turing machine might be
made more powerful by adding one or more extra tapes, each with its own
independent tape head, but again that’s not the case. A single Turing
machine tape has enough space to store the contents of any number of
tapes by interleaving them: three tapes containingabc
,def
,
andghi
can be stored together asadgbehcfi
. If we leave a blank square
alongside each interleaved character, the machine has space to write
markers indicating where all of the simulated tape heads are located: by
usingX
characters to mark the
current position of each head, we can represent both the contents and
the head positions of the tapesab(c)
,(d)ef
, andg(h)i
with the single tapea_dXg_b_e_hXcXf_i_
.
Programming a Turing machine to use multiple simulated tapes is complicated, but the
fiddly details of reading, writing, and moving the heads of the tapes can be wrapped up in
dedicated states and rules (“subroutines”) so that the main logic of the machine doesn’t
become too convoluted. In any case, however inconvenient the programming turns out to be, a
single-tape Turing machine is ultimately capable of performing any task that a multitape
machine can, so adding extra tapes to a Turing machine doesn’t give it any new
abilities.
Finally, it’s
tempting to try giving a Turing machine a more spacious
storage device. Instead of using a linear tape, we could provide an
infinite two-dimensional grid of squares and allow the tape head to move
up and down as well as left and right. That would be useful for any
situation where we want the machine to quickly access a particular part
of its external storage without having to move the head past everything
else on it, and it would allow us to leave an unlimited amount of blank
space around multiple strings so that each of them can easily grow
longer, rather than having to manually shuffle information along the
whole tape to make space whenever we want to insert a character.
Inevitably, though, a grid can be simulated with one-dimensional
tape. The easiest way is to use
two
one-dimensional
tapes: a primary tape for actually storing data, and a secondary tape to
use as scratch space. Each row of the simulated grid
[
40
]
is stored on the primary tape, top row first, with a
special character marking the end of each row.
The primary tape head is positioned over the current character as usual, so to move left
and right on the simulated grid, the machine simply moves the head left and right. If the
head hits an end-of-row marker, a subroutine is used to shuffle everything along the tape to
make the grid one space wider.
To move up or down on the simulated grid, the tape head must move
a complete row to the left or right respectively. The machine can do
this by first moving the tape head to the beginning or end of the
current row, using the secondary tape to record the distance travelled,
and then moving the head to the same offset in the previous or next row.
If the head moves off the top or bottom of the simulated grid, a
subroutine can be used to allocate a new empty row for the head to move
into.
This simulation does require a machine with two tapes, but we know
how to simulate that too, so we end up with a simulated grid stored on
two simulated tapes that are themselves stored on a single native tape.
The two layers of simulation introduce a lot of extra rules and states,
and require the underlying machine to take many steps to perform a
single step of the simulated one, but the added size and slowness don’t
prevent it from (eventually) doing what it’s supposed to
do.
All the machines
we’ve seen so far have a serious shortcoming: their rules are hardcoded, leaving
them unable to adapt to different tasks. A DFA that accepts all the strings that match a
particular regular expression can’t learn to accept a different set of strings; an NPDA that
recognizes palindromes will only ever recognize palindromes; a Turing machine that increments
a binary number will never be useful for anything else.
This isn’t how most real-world computers work. Rather than being specialized for a
particular job, modern digital computers are designed to be
general
purpose
and can be programmed to perform many different tasks. Although the
instruction set and CPU design of a programmable computer is fixed, it’s able to use
software
to control its hardware and adapt its behavior to whatever
job its user wants it to do.
Can any of our simple machines do that? Instead of having to design
a new machine every time we want to do a different job, can we design a
single
machine that can read a program from its input
and then do whatever job the program specifies?
Perhaps unsurprisingly, a Turing machine is powerful enough to read the description of a
simple machine from its tape—a deterministic finite automaton, say—and then run a simulation
of that machine to find out what it does. In
Simulation
, we wrote some
Ruby code to simulate a DFA from its description, and with a bit of work, the ideas from that
code can be turned into the
rulebook of a Turing machine that runs the same
simulation.
There’s an important difference between a Turing machine that
simulates a
particular
DFA and one that can
simulate
any
DFA.
Designing a Turing machine to reproduce the behavior of a specific
DFA is very easy—after all, a Turing machine is just a deterministic
finite automaton with a tape attached. Every rule from the DFA’s
rulebook can be converted directly into an equivalent Turing machine
rule; instead of reading from the DFA’s external input stream, each
converted rule reads a character from the tape and moves the head to the
next square. But this isn’t especially interesting, since the resulting
Turing machine is no more useful than the original DFA.
More interesting is a Turing machine that performs a
general
DFA simulation. This machine can read a DFA
design from the tape—rules, start state, and accept states—and walk
through each step of that DFA’s execution, using another part of the
tape to keep track of the simulated machine’s current state and
remaining input. The general simulation is much harder to implement, but
it gives us a single Turing machine that can adapt itself to any job
that a DFA can do, just by being fed a description of that DFA as
input.
The same applies to our deterministic Ruby simulations of NFAs, DPDAs and NPDAs, each of
which can be turned into a Turing machine capable of simulating any automaton of that type.
But crucially, it also works for our simulation of Turing machines themselves: by
reimplementingTape
,TMRule
,DTMRulebook
, andDTM
as Turing machine rules, we are able to design a machine that
can simulate any other DTM by reading its rules, accept states, and initial configuration from
the tape and stepping through its execution, essentially acting as a Turing machine rulebook
interpreter. A machine that does this is
called a
universal Turing machine
(UTM).
This is exciting, because it makes the maximum computational power of Turing machines
available in a single programmable device. We can write software—an encoded description of a
Turing machine—onto a tape, feed that tape to the UTM, and have our software executed to
produce the behavior we want. Finite automata and pushdown automata can’t simulate their own
kind in this way, so Turing machines not only mark the transition from limited to powerful
computing machines, but also from single-purpose devices to fully programmable ones.
Let’s look briefly at how a universal Turing machine works. There
are a lot of fiddly and uninteresting technical details involved in
actually building a UTM, so our exploration will be relatively
superficial, but we should at least be able to convince ourselves that
such a thing is
possible.
Before we
can design a UTM’s rulebook, we have to decide how to represent an entire Turing
machine as a sequence of characters on a tape. A UTM has to read in the rules, accept
states, and starting configuration of an arbitrary Turing machine, then repeatedly update
the simulated machine’s current configuration as the simulation progresses, so we need a
practical way of storing all this information in a way that the UTM can work with.
One challenge is that every Turing machine has a finite number of states and a finite
number of different characters it can store on its tape, with both of these numbers being
fixed in advance by its rulebook, and a UTM is no exception. If we design a UTM that can
handle 10 different tape characters, how can it simulate a machine whose rules use 11
characters? If we’re more generous and design it to handle a hundred different characters,
what happens when we want to simulate a machine that uses a thousand? However many
characters we decide to use for the UTM’s own tape, it’ll never be enough to directly
represent every possible Turing machine.
There’s also the risk of unintentional character collisions between the simulated
machine and the UTM. To store Turing machine rules and configurations on a tape, we need to
be able to mark their boundaries with characters that will have special meaning to the UTM,
so that it can tell where one rule ends and another begins. But if we choose, say,X
as the special marker between rules, we’ll run into problems
if any of the simulated rules contain the characterX
.
Even if we set aside a super-special set of reserved characters that only a universal Turing
machine is allowed to use, they’d still cause problems if we ever tried to simulate the UTM
with itself, so the machine wouldn’t be truly universal. This suggests we need to do some
kind of escaping to prevent ordinary characters from the simulated machine getting
incorrectly interpreted as special characters by the UTM.
We can solve both of these problems by coming up with a scheme that uses a fixed
repertoire of characters to encode the tape contents of a simulated machine. If the encoding
scheme only uses certain characters, then we can be sure it’s safe for the UTM to use other
characters for special purposes, and if the scheme can accommodate any number of simulated
states and characters, then we don’t need to worry about the size and complexity of the
machine being simulated.
The precise details of the encoding
scheme aren’t important as long as it meets these goals. To give an example, one
possible scheme uses a
unary
[
41
]
representation to encode different values as different-sized strings of a single
repeated character (e.g.,1
): if the simulated machine
uses the charactersa
,b
, andc
, these could be encoded in unary as1
,11
, and111
. Another character, say0
, can be used as a marker to delimit unary values: the stringacbc
might be represented as101110110111
. This isn’t a very space-efficient scheme, but it can scale up to
accommodate any number of encoded characters simply by storing longer and longer strings of1
s on the tape.
Once we’ve decided how to encode individual characters, we need a way to represent the
rules of the simulated Turing machine. We can do that by encoding the separate parts of the
rule (state, character, next state, character to write, direction to move) and concatenating
them together on the tape, using special separator characters where necessary. In our
example encoding scheme, we could represent states in unary too—state 1 is1
, state 2 is11
, and so
on—although we’re free to use dedicated characters to represent left and right (say,L
andR
), since we
know there will only ever be two directions.
We can concatenate individual rules together to represent an
entire rulebook; similarly, we can encode the current configuration of
the simulated machine by concatenating the representation of its current
state with the representation of its current tape contents.
[
42
]
And that gives us what we want: a complete Turing machine
written as a sequence of characters on another Turing machine’s tape,
ready to be brought to life by a simulation.
Fundamentally
a universal Turing machine works in the same way as the
Ruby simulation we built in
Simulation
, just much more
laboriously.
The description of the simulated machine—its rulebook, accept
states, and starting configuration—is stored in encoded form on the
UTM’s tape. To perform a single step of the simulation, the UTM moves
its head back and forth between the rules, current state, and tape of
the simulated machine in search of a rule that applies to the current
configuration. When it finds one, it updates the simulated tape
according to the character and direction specified by that rule, and
puts the simulated machine into its new state.
That process is repeated until the simulated machine enters an
accept state, or gets stuck by reaching a configuration to which no rule
applies.