Now we can check for safety by seeing whether each concrete
computation’s result falls within the abstract computation’s
prediction:
>>
(
6
*
-
9
)
.
sign
<=
(
6
.
sign
*
-
9
.
sign
)
=> true
>>
(
-
5
+
0
)
.
sign
<=
(
-
5
.
sign
+
0
.
sign
)
=> true
>>
(
6
+
-
9
)
.
sign
<=
(
6
.
sign
+
-
9
.
sign
)
=> true
This safety property holds for any computation involving addition
and multiplication, because we’ve designed an abstraction that falls
back to a safe approximation when it can’t give a precise answer.
Incidentally, having access to this abstraction lets us do simple
analysis of Ruby code that adds and multiplies numbers. As an example,
here’s a method that sums the squares of its arguments:
def
sum_of_squares
(
x
,
y
)
(
x
*
x
)
+
(
y
*
y
)
end
If we want to automatically analyze this method to learn something about how it behaves,
we have a choice: we can treat it as a black box and run it for all possible arguments,
which would take forever, or we can inspect its source code and try to use mathematical
reasoning to deduce some of its properties, which is complicated. (And, in the general case,
doomed to failure because of Rice’s theorem
.) Abstract interpretation gives us the third option of calling the method with
abstract values to see what outputs are produced by an abstract version of the computation,
and it’s practical to do this for all possible inputs, because there are only a small number
of potential combinations of abstract values.
Each of the argumentsx
andy
can be a negative, zero, or
positive number, so let’s see what the possible outputs can be:
>>
inputs
=
Sign
::
NEGATIVE
,
Sign
::
ZERO
,
Sign
::
POSITIVE
=> [#
, # , # ] >>
outputs
=
inputs
.
product
(
inputs
)
.
map
{
|
x
,
y
|
sum_of_squares
(
x
,
y
)
}
=> [
#
, # , # , #
, # , # , #
, # , # ]
>>
outputs
.
uniq
=> [#
, # ]
Without having done any clever analysis, this tells us that#sum_of_squares
can only produce zero or positive numbers, never negative
numbers—a fairly boring property that’s obvious to an educated human reading the source
code, but not something that would be immediately evident to a machine. Of course, this kind
of trick only works for very simple code, but despite being a toy, it shows how abstraction
can make a difficult problem more
tractable.
So far we’ve seen toy
examples of how to discover approximate information about computations without
actually performing them. We could learn more by doing those computations for real, but
approximate information is better than nothing, and for some applications (like route
planning), it might be all we really need.
In the multiplication and addition examples, we were able to turn a small program into a
simpler, more abstract version just by feeding it abstract values as input instead of concrete
numbers, but we can only get so far with this technique if we want to investigate larger and
more elaborate programs. It’s easy to create values that supply their own implementations of
multiplication and addition, but Ruby doesn’t allow values to control their own behavior more
generally—when they’re used in anif
statement, for
example—because it has hardcoded rules
[
88
]
about how particular pieces of syntax should work. Besides, we still have the
problem that it’s not feasible in general to learn about programs by running them and waiting
for their output, because some programs loop forever without returning a result.
Another downside of the multiplication and addition examples is that they’re not very
interesting: nobody cares about whether their program returns positive or negative numbers. In
practice, the interesting questions are ones like “will my program crash when I run it?” and
“can my program be transformed to make it more efficient?”
We can answer more interesting questions about programs by considering their
static semantics
. In
Chapter 2
, we
looked at the
dynamic semantics
of programming
languages, a way of specifying the meaning of code when it’s executed; a
language’s static semantics tells us about properties of programs that we can investigate
without executing them. The classic example of static semantics is a
type
system
: a collection of rules that can be used to analyze a program and check
that it doesn’t contain certain kinds of bug. In
Correctness
, we considered
Simple
programs like «x = true; x
», which are syntactically valid but cause a problem for the dynamic
= x + 1
semantics when they’re executed. A type system can anticipate these mistakes ahead of time,
allowing some bad programs to be automatically rejected before anyone tries to run
them.
Abstract interpretation gives us a way of thinking about the static
semantics of a program. Programs are meant to be run, so our standard
interpretation of a program’s meaning is the one given by its dynamic
semantics: «x = 1 + 2; y = x * 3
» is a
program that manipulates numbers by doing arithmetic on them and storing
them somewhere in memory. But if we have an alternative, more abstract
semantics for the language, we can “execute” the same program according to
different rules, and get more abstract results that give us partial
information about what will happen when the program is interpreted
normally.
Let’s make these ideas
concrete by building a type system for the
Simple
language from
Chapter 2
. Superficially,
this will look like a big-step operational semantics from
Big-Step Semantics
: we’ll implement a method on each of the classes representing the syntax of
Simple
programs (Number
,Add
, and so on), and calling the method will return a
final result. In the dynamic semantics, that method is called#evaluate
, and its result is either a fully evaluated
Simple
value or an environment associating names with
Simple
values, depending on whether we evaluate an expression or a
statement:
>>
expression
=
Add
.
new
(
Variable
.
new
(
:x
),
Number
.
new
(
1
))
=> «x + 1»
>>
expression
.
evaluate
({
x
:
Number
.
new
(
2
)
})
=> «3»
>>
statement
=
Assign
.
new
(
:y
,
Number
.
new
(
3
))
=> «y = 3»
>>
statement
.
evaluate
({
x
:
Number
.
new
(
1
)
})
=> {:x=>«1», :y=>«3»}
For our static semantics, we’ll implement a different method that does less work and
returns a more abstract result. Instead of concrete values and environments, our abstract
values will be
types
. A type represents many possible values: a
Simple
expression can evaluate to a number or a Boolean,
so for expressions, our types will be “any number” and “any Boolean.” These types are
similar to theSign
values we saw earlier, especiallySign::UNKNOWN
, which really does mean “any number.” As
withSign
, we can introduce types by defining a class
calledType
and creating some instances:
class
Type
<
Struct
.
new
(
:name
)
NUMBER
,
BOOLEAN
=
[
:number
,
:boolean
].
map
{
|
name
|
new
(
name
)
}
def
inspect
"#
#{
name
}
>"
end
end
Our new method will return a type, so let’s call it#type
. It’s supposed to answer a question: when this
Simple
syntax is evaluated, what type of value will it return? This is very easy
to implement for
Simple
’sNumber
andBoolean
syntax classes, because
numbers and Booleans evaluate to themselves, so we know exactly what type of value we’ll
get:
class
Number
def
type
Type
::
NUMBER
end
end
class
Boolean
def
type
Type
::
BOOLEAN
end
end
For operations likeAdd
,Multiply
, andLessThan
, it’s slightly more
complicated. We know that evaluatingAdd
returns a
number, for example, but we also know that evaluation will only succeed if both arguments toAdd
also evaluate to a number, otherwise the
Simple
interpreter will fail with an error:
>>
Add
.
new
(
Number
.
new
(
1
),
Number
.
new
(
2
))
.
evaluate
({})
=> «3»
>>
Add
.
new
(
Number
.
new
(
1
),
Boolean
.
new
(
true
))
.
evaluate
({})
TypeError: true can't be coerced into Fixnum
How can we find out whether an argument will evaluate to a number? That’s what its type
tells us. So forAdd
, the rule is something like: if the
type of both arguments isType::NUMBER
, the type of the
overall result isType::NUMBER
; otherwise, the result is
no type at all, because the evaluation of any expression that tries to add nonnumeric values
will fail before it can return anything. For simplicity, we’ll let the#type
method returnnil
to
indicate this failure, although in other circumstances, we might have chosen to raise an
exception or return some special error value instead (Type::ERROR
, for instance) if that made the overall implementation more
convenient.
The code forAdd
looks like
this:
class
Add
def
type
if
left
.
type
==
Type
::
NUMBER
&&
right
.
type
==
Type
::
NUMBER
Type
::
NUMBER
end
end
end
The implementation ofMultiply#type
is identical, andLessThan#type
is very similar, except that it returnsType::BOOLEAN
instead ofType::NUMBER
:
class
LessThan
def
type
if
left
.
type
==
Type
::
NUMBER
&&
right
.
type
==
Type
::
NUMBER
Type
::
BOOLEAN
end
end
end
On the console, we can see that this is enough to distinguish between expressions that
will evaluate successfully and those that won’t, even though the syntax of
Simple
allows both:
>>
Add
.
new
(
Number
.
new
(
1
),
Number
.
new
(
2
))
.
type
=> #
>>
Add
.
new
(
Number
.
new
(
1
),
Boolean
.
new
(
true
))
.
type
=> nil
>>
LessThan
.
new
(
Number
.
new
(
1
),
Number
.
new
(
2
))
.
type
=> #
>>
LessThan
.
new
(
Number
.
new
(
1
),
Boolean
.
new
(
true
))
.
type
=> nil
We’re assuming that the abstract syntax tree is at least
syntactically
valid. The actual values stored at
the leaves of the tree are ignored by the static semantics, so#type
might incorrectly predict the
evaluation behavior of a badly formed expression:
>>
bad_expression
=
Add
.
new
(
Number
.
new
(
true
),
Number
.
new
(
1
))
=> «true + 1»
>>
bad_expression
.
type
=> #
>>
bad_expression
.
evaluate
({})
NoMethodError: undefined method `+' for true:TrueClass
The high-level structure of this AST looks correct (anAdd
containing twoNumber
s), but the
firstNumber
object is malformed, because itsvalue
attribute istrue
instead of aFixnum
.
The static semantics assumes that adding twoNumber
s together will always produce
anotherNumber
, so#type
says that evaluation will
succeed…
…but if we actually evaluate the expression, we get an
exception when Ruby tries to add1
totrue
.
Badly formed expressions should never be produced by a
Simple
parser, so this is unlikely to be a
problem in practice.