So far we’ve seen that
Turing machines have a lot of power and flexibility: they
can execute arbitrary programs encoded as data, perform any algorithm we
can think of, run for an unlimited amount of time, and calculate their own
descriptions. And in spite of their simplicity, these little imaginary
machines have turned out to be representative of universal systems in
general.
If they’re so powerful and flexible, is there anything that Turing
machines—and therefore real-world computers and programming
languages—can’t do?
Before we can answer that question we need to make it a bit more
precise. What kind of thing can we ask a Turing machine to do, and how can
we tell whether it’s done it? Do we need to investigate every possible
kind of problem, or is it enough to consider just some of them? Are we
looking for problems whose solutions are merely beyond our current
understanding, or problems that we already know we’ll
never
be able to solve?
We can narrow the scope of the question by focusing on
decision
problems
. A decision problem is any question with a yes or no answer, like “is 2
less than 3?” or “does the regular expression(a(|b))*
match the string'abaab'
?” Decision problems are easier to
handle than
function problems
whose answer is a number or some other
non-Boolean value, like “what is the greatest common divisor of 18 and 12?,” but they’re still
interesting enough to be worth investigating.
A decision problem is
decidable
(or
computable
) if there’s an algorithm that’s guaranteed to solve it in
a finite amount of time for any possible input. The Church–Turing thesis
claims that every algorithm can be performed by a Turing machine, so for a
problem to be decidable, we have to be able to design a Turing machine that always produces
the correct answer and always halts if we let it run for long enough. It’s easy enough to
interpret a Turing machine’s final configuration as a “yes” or “no” answer: we could look to
see whether it’s written aY
orN
character at the current tape location, for example, or we could ignore the
tape contents altogether and just check whether its final state is an accept (“yes”) or a
nonaccept (“no”) state.
All the decision problems we’ve seen in earlier chapters are decidable. Some of them, like
“does this finite automaton accept this string?” and “does this regular expression match this
string?,” are self-evidently decidable because we’ve written Ruby programs to solve them by
simulating finite automata directly. Those programs could be laboriously translated into
Turing machines given enough time and energy, and because their execution involves a finite
number of steps—each step of a DFA simulation consumes a character of input, and the input has
a finite number of characters—they are guaranteed to always halt with a yes-or-no answer, so
the original problems qualify as decidable.
Other problems are a bit more subtle. “Does this pushdown automaton accept this string?”
might appear to be undecidable, because we’ve seen that our direct simulation of a pushdown
automaton in Ruby has the potential to loop forever and never produce an answer. However,
there happens to be a way to calculate exactly
how many simulation steps a particular pushdown automaton will need in order to
accept or reject an input string of a given length,
[
74
]
so the problem is decidable after all: we just calculate the number of steps
needed, run the simulation for that many steps, and then check whether or not the input has
been accepted.
So, can we do this every time? Is there always a clever way to sneak
around a problem and find a way to implement a machine, or a program, that
is guaranteed to solve it in a finite amount of time?
Well, no, unfortunately not. There are many decision
problems—
infinitely
many—and it turns out that a lot
of them are
undecidable: there is no guaranteed-to-halt algorithm for
solving them. Each of these problems is undecidable not because we just
haven’t found the right algorithm for it yet, but because the problem
itself is fundamentally impossible to solve for some inputs, and we can
even prove that no suitable algorithm will
ever be found.
A lot of
undecidable problems are concerned with the behavior of machines and programs
during their execution. The most famous of these, the
halting problem
,
is the task of deciding whether the execution of a particular Turing machine with a particular
initial tape will ever halt. Thanks to universality, we can restate the same problem in more
practical terms: given a string containing the source code of a Ruby program, and another
string of data for that program to read from standard input, will running that program
ultimately result in an answer or just an infinite loop?
It’s not obvious why the halting problem should be considered
undecidable. After all, it’s easy to come up with individual programs
for which the question is answerable. Here’s a program that will
definitely halt, regardless of its input string:
input
=
$stdin
.
read
puts
input
.
upcase
We’ll assume that$stdin.read
always immediately
returns a value—namely, that every program’s standard input is finite and
nonblocking—because we’re interested in the internal behavior of the program, not its
interactions with the operating system.
Conversely, a small addition to the source code produces a program
that will clearly never halt:
input
=
$stdin
.
read
while
true
# do nothing
end
puts
input
.
upcase
We can certainly write a halting checker to distinguish between only these two cases.
Just testing whether the program’s source code contains the stringwhile true
is enough:
def
halts?
(
program
,
input
)
if
program
.
include?
(
'while true'
)
false
else
true
end
end
This implementation of#halts?
gives the right answers for the two example programs:
>>
always
=
"input = $stdin.read
\n
puts input.upcase"
=> "input = $stdin.read\nputs input.upcase"
>>
halts?
(
always
,
'hello world'
)
=> true
>>
never
=
"input = $stdin.read
\n
while true
\n
# do nothing
\n
end
\n
puts input.upcase"
=> "input = $stdin.read\nwhile true\n# do nothing\nend\nputs input.upcase"
>>
halts?
(
never
,
'hello world'
)
=> false
But#halts?
is likely to be
wrong for other programs. For example, there are programs whose halting
behavior depends on the value of their input:
input
=
$stdin
.
read
if
input
.
include?
(
'goodbye'
)
while
true
# do nothing
end
else
puts
input
.
upcase
end
We can always extend our halting checker to cope with specific
cases like this, since we know what to look for:
def
halts?
(
program
,
input
)
if
program
.
include?
(
'while true'
)
if
program
.
include?
(
'input.include?(\'goodbye\')'
)
if
input
.
include?
(
'goodbye'
)
false
else
true
end
else
false
end
else
true
end
end
Now we have a checker that gives the correct answers for all three
programs and any possible input strings:
>>
halts?
(
always
,
'hello world'
)
=> true
>>
halts?
(
never
,
'hello world'
)
=> false
>>
sometimes
=
"input = $stdin.read
\n
if input.include?('goodbye')
\n
while true
\n
# do nothing
\n
end
\n
else
\n
puts input.upcase
\n
end"
=> "input = $stdin.read\nif input.include?('goodbye')\nwhile true\n# do nothing\n
end\nelse\nputs input.upcase\nend"
>>
halts?
(
sometimes
,
'hello world'
)
=> true
>>
halts?
(
sometimes
,
'goodbye world'
)
=> false
We could go on like this indefinitely, adding more checks and more
special cases to support an expanding repertoire of example programs,
but we’d never get a solution to the full problem of deciding whether
any
program will halt. A brute-force implementation
can be made more and more accurate, but it will always have blind spots;
the simple-minded approach of looking for specific patterns of syntax
can’t possibly scale to all programs.
To make#halts?
work in general, for any possible
program and input, seems like it would be difficult. If a program contains any loops at
all—whether explicit, likewhile
loops, or implicit, like
recursive method calls—then it has the potential to run forever, and sophisticated analysis
of the program’s
meaning
is required if we want to predict anything
about how it behaves for a given input. As humans, we can see immediately that this program
always halts:
input
=
$stdin
.
read
output
=
''
n
=
input
.
length
until
n
.
zero?
output
=
output
+
'*'
n
=
n
-
1
end
puts
output
But
why
does it always halt? Certainly not for any straightforward
syntactic reason. The explanation is thatIO#read
always
returns aString
, andString#length
always returns a nonnegativeInteger
, and repeatedly calling-(1)
on a
nonnegativeInteger
always eventually produces an object
whose#zero?
method returnstrue
. This chain of reasoning is subtle and highly sensitive to small
modifications; if then = n - 1
statement inside the loop
is changed ton = n - 2
, the program will only halt for
even-length inputs. A halting checker that knew all these facts about Ruby and numbers, as
well as how to connect facts together to make accurate decisions about this kind of program,
would need to be large and complex.
The fundamental difficulty is that it’s hard to predict what a program will do without
actually running it. It’s tempting to run the program with#evaluate
to see whether it halts, but that’s no good: if the program doesn’t halt,#evaluate
will run forever and we won’t get an answer from#halts?
no matter how long we wait. Any reliable
halting-detection algorithm must find a way to produce a definitive answer in a finite
amount of time just by inspecting and analyzing the text of the program, not by simply
running it and waiting.
Okay, so our intuition tells us that#halts?
would be
hard to implement correctly, but that doesn’t necessarily mean that the halting problem is
undecidable. There are plenty of difficult problems (e.g., writing#evaluate
) that turn out to be solvable given enough effort and ingenuity; if
the halting problem is undecidable, that means#halts?
is
impossible
to write, not just extremely difficult.
How do we know that a proper implementation of#halts?
can’t possibly exist? If it’s just an
engineering problem, why can’t we throw lots of programmers at it and
eventually come up with a solution?
Let’s pretend temporarily that the halting problem is decidable. In this imaginary
world, it’s possible to write a full implementation of#halts?
, so a call tohalts?(program,
always comes back with a
input)true
orfalse
answer for anyprogram
andinput
, and that answer always
correctly predicts whetherprogram
would halt if it was
run withinput
on standard input. The rough structure
of the#halts?
method might be something like
this:
def
halts?
(
program
,
input
)
# parse program
# analyze program
# return true if program halts on input, false if not
end
If we can write#halts?
, then we can construct
does_it_halt.rb
, a program that reads another
program as input and printsyes
orno
depending on whether that program halts when it reads the
empty string:
[
75
]
def
halts?
(
program
,
input
)
# parse program
# analyze program
# return true if program halts on input, false if not
end
def
halts_on_empty?
(
program
)
halts?
(
program
,
''
)
end
program
=
$stdin
.
read
if
halts_on_empty?
(
program
)
'yes'
else
'no'
end
Once we have
does_it_halt.rb
, we can use it to solve very
difficult problems. Consider this famous claim made
by Christian Goldbach in 1742:
Every even integer greater than 2 can be written as the sum of
two primes.
This is
the
Goldbach conjecture
, and it’s
famous because nobody has been able to prove whether it’s true or
false. The evidence suggests that it’s true, because an even number
picked at random can invariably be broken apart into two primes—12 = 5
+ 7, 34 = 3 + 31, 567890 = 7 + 567883 and so forth—and computers have
been used to check that this can be done for all even numbers between
four and four quintillion (4,000,000,000,000,000,000). But there are
infinitely many even numbers, so no computer can check them all, and
there’s no known proof that every even number must break apart in this
way. There is a possibility, however small, that some very large even
number is
not
the sum of two primes.
Proving the Goldbach conjecture is one of the holy grails of number theory; in the
year 2000, the publisher
Faber and Faber offered a million-dollar prize to anyone who could produce a
proof. But wait: we already have the tools to discover whether the conjecture is true! We
just need to write a program that searches for a counterexample:
require
'prime'
def
primes_less_than
(
n
)
Prime
.
each
(
n
-
1
)
.
entries
end
def
sum_of_two_primes?
(
n
)
primes
=
primes_less_than
(
n
)
primes
.
any?
{
|
a
|
primes
.
any?
{
|
b
|
a
+
b
==
n
}
}
end
n
=
4
while
sum_of_two_primes?
(
n
)
n
=
n
+
2
end
n
This establishes a connection between the truth of the Goldbach
conjecture and the halting behavior of a program. If the conjecture is
true, this program will never find a counterexample no matter how high
it counts, so it will loop forever; if the conjecture’s false,n
will eventually be assigned the
value of an even number that isn’t the sum of two primes, and the
program will halt. So we just have to save it as
goldbach.rb
and run
ruby does_it_halt.rb < goldbach.rb
to
find out whether it’s a halting program, and that will tell us whether
the Goldbach conjecture is true. A million dollars is ours!
[
76
]
Well, obviously this is too good to be true. To write a program
that can accurately predict the behavior of
goldbach.rb
would require a proficiency in
number theory beyond our current understanding. Mathematicians have
been working for hundreds of years to try to prove or disprove the
Goldbach conjecture; it’s unlikely that a bunch of avaricious software
engineers could construct a Ruby program that can miraculously solve
not only this problem but
any
unsolved
mathematical conjecture that can be expressed as a looping
program.
So far we’ve seen strong evidence that the halting problem is
undecidable, but not conclusive proof. Our intuition may say it’s
unlikely that we can prove or disprove the Goldbach conjecture just by
turning it into a program, but computation can be pretty
counterintuitive at times, so we shouldn’t allow ourselves to be
convinced solely by arguments about how improbable something is. If
the halting problem really is undecidable, as opposed to simply very
difficult, we should be able to prove it.
Here’s why#halts?
can never
work. If it
did
work, we’d be able to construct a
new method#halts_on_itself?
that
calls#halts?
to determine what a
program does when run with its own source code as input:
[
77
]
def
halts_on_itself?
(
program
)
halts?
(
program
,
program
)
end
Like#halts?
, the#halts_on_itself?
method will always finish and return a Boolean value:true
ifprogram
halts when run with itself as input,false
if it loops
forever.
Given working implementations of#halts?
and#halts_on_itself?
, we can write a program
called
do_the_opposite.rb
:
def
halts?
(
program
,
input
)
# parse program
# analyze program
# return true if program halts on input, false if not
end
def
halts_on_itself?
(
program
)
halts?
(
program
,
program
)
end
program
=
$stdin
.
read
if
halts_on_itself?
(
program
)
while
true
# do nothing
end
end
This code readsprogram
from
standard input, finds out whether it would halt if run on itself, and
promptly does the exact opposite: ifprogram
would halt,
do_the_opposite.rb
loops forever; ifprogram
would loop forever,
do_the_opposite.rb
halts.
Now, what does
ruby do_the_opposite.rb
< do_the_opposite.rb
do?
[
78
]
Just as we saw earlier with
does_it_say_no.rb
, this question creates an
inescapable contradiction.
#halts_on_itself?
must return
eithertrue
orfalse
when given the source of
do_the_opposite.rb
as an argument. If it
returnstrue
to indicate a halting
program, then
ruby do_the_opposite.rb <
do_the_opposite.rb
will loop forever, which means#halts_on_itself?
was wrong about what would
happen. On the other hand, if#halts_on_itself?
returnsfalse
, that’ll make
do_the_opposite.rb
immediately halt, again
contradicting#halts_on_itself?
’s
prediction.
It’s wrong to pick on#halts_on_itself?
here—it’s just an innocent
one-liner that delegates to#halts?
and relays its answer. What we’ve really shown is that it’s not
possible for#halts?
to return a
satisfactory answer when called with
do_the_opposite.rb
as bothprogram
andinput
arguments; no matter how hard it
works, any result it produces will automatically be wrong. That means
there are only two possible fates for any real implementation of#halts?
:
It sometimes gives the wrong answer, e.g., predicting that
do_the_opposite.rb
will loop forever even though it halts
(or vice versa).
It sometimes loops forever and never returns any answer, just like#evaluate
does in
ruby
does_it_say_no.rb < does_it_say_no.rb
.
So a fully correct implementation of#halts?
can never exist: there must be
inputs for which it makes either an incorrect prediction or no
prediction at all.
Remember the definition of decidability:
A decision problem is decidable if there’s an algorithm that’s
guaranteed to solve it in a finite amount of time for any possible
input.
We’ve proved it’s impossible to write a Ruby program that
completely solves the halting problem, and since Ruby programs are
equivalent in power to Turing machines, it must be impossible with a
Turing machine too. The Church–Turing thesis says that
every
algorithm can be performed by a Turing
machine, so if there’s no Turing machine for solving the halting
problem, there’s no algorithm either; in other words, the halting
problem is undecidable.