Structure and Interpretation of Computer Programs (32 page)

Read Structure and Interpretation of Computer Programs Online

Authors: Harold Abelson and Gerald Jay Sussman with Julie Sussman

BOOK: Structure and Interpretation of Computer Programs
11.81Mb size Format: txt, pdf, ePub

Exercise 2.60.
  We specified that a set would be represented as a list with no
duplicates. Now suppose we allow duplicates. For instance,
the set {1,2,3} could be represented as the list
(2 3 2 1 3 2
2)
. Design procedures
element-of-set?
,
adjoin-set
,
union-set
, and
intersection-set
that operate on this
representation. How does the efficiency of each compare with the
corresponding procedure for the non-duplicate representation? Are
there applications for which you would use this representation in
preference to the non-duplicate one?

Sets as ordered lists

One way to speed up our set operations is to change the representation
so that the set elements are listed in increasing order. To do this,
we need some way to compare two objects so that we can say which is
bigger. For example, we could compare symbols lexicographically, or
we could agree on some method for assigning a unique number to an
object and then compare the elements by comparing the corresponding
numbers. To keep our discussion simple, we will consider only the
case where the set elements are numbers, so that we can compare
elements using
>
and
<
. We will represent a set of
numbers by listing its elements in increasing order. Whereas our
first representation above allowed us to represent the set
{1,3,6,10} by listing the elements in any order, our new
representation allows only the list
(1 3 6 10)
.

One advantage of ordering shows up in
element-of-set?
: In
checking for the presence of an item, we no longer have to scan the
entire set. If we reach a set element that is larger than the item we
are looking for, then we know that the item is not in the set:

(define (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (car set)) true)
        ((< x (car set)) false)
        (else (element-of-set? x (cdr set)))))

How many steps does this save? In the worst case, the item we are
looking for may be the largest one in the set, so the number of steps
is the same as for the unordered representation. On the other hand,
if we search for items of many different sizes we can expect that
sometimes we will be able to stop searching at a point near the
beginning of the list and that other times we will still need to
examine most of the list. On the average we should expect to have to
examine about half of the items in the set. Thus, the average
number of steps required will be about
n
/2.
This is still θ(
n
) growth, but
it does save us, on the average, a factor of 2 in number of steps over the
previous implementation.

We obtain a more impressive speedup with
intersection-set
. In
the unordered representation this operation required
θ(
n
2
) steps, because we performed a complete scan of
set2
for
each element of
set1
. But with the ordered representation, we
can use a more clever method. Begin by comparing the initial
elements,
x1
and
x2
, of the two sets. If
x1
equals
x2
, then that gives an element of the intersection, and
the rest of the intersection is the intersection of the
cdr
s of
the two sets. Suppose, however, that
x1
is less than
x2
.
Since
x2
is the smallest element in
set2
, we can
immediately conclude that
x1
cannot appear anywhere in
set2
and hence is not in the intersection. Hence, the intersection
is equal to the intersection of
set2
with the
cdr
of
set1
. Similarly, if
x2
is less than
x1
, then the
intersection is given by the intersection of
set1
with the
cdr
of
set2
. Here is the procedure:

(define (intersection-set set1 set2)
  (if (or (null? set1) (null? set2))
      '()    
      (let ((x1 (car set1)) (x2 (car set2)))
        (cond ((= x1 x2)
               (cons x1
                     (intersection-set (cdr set1)
                                       (cdr set2))))
              ((< x1 x2)
               (intersection-set (cdr set1) set2))
              ((< x2 x1)
               (intersection-set set1 (cdr set2)))))))

To estimate the number of steps required by this process, observe that at each
step we reduce the intersection problem to computing intersections of
smaller sets – removing the first element from
set1
or
set2
or both. Thus, the number of steps required is at most the sum
of the sizes of
set1
and
set2
, rather than the product of
the sizes as with the unordered representation. This is θ(
n
) growth
rather than θ(
n
2
) – a considerable speedup, even for sets of
moderate size.

Exercise 2.61.
  Give an implementation of
adjoin-set
using the ordered
representation. By analogy with
element-of-set?
show how to
take advantage of the ordering to produce a procedure that requires on
the average about half as many steps as with the unordered
representation.

Exercise 2.62.
  Give a θ(
n
) implementation of
union-set
for sets
represented as ordered lists.

Sets as binary trees

We can do better than the ordered-list representation by arranging the
set elements in the form of a tree. Each node of the tree holds one
element of the set, called the “entry” at that node, and a link to
each of two other (possibly empty) nodes. The “left” link points to
elements smaller than the one at the node, and the “right” link to
elements greater than the one at the node.
Figure 
2.16
shows some trees that represent the set
{1,3,5,7,9,11}. The same set may be represented by a tree in a
number of different ways. The only thing we require for a valid
representation is that all elements in the left subtree be smaller
than the node entry and that all elements in the right subtree be
larger.

Figure 2.16:
  Various binary trees that represent the set { 1,3,5,7,9,11 }.

The advantage of the tree representation is this: Suppose we want to
check whether a number
x
is contained in a set. We begin by
comparing
x
with the entry in the top node. If
x
is less than
this, we know that we need only search the left subtree; if
x
is
greater, we need only search the right subtree. Now, if the tree is
“balanced,” each of these subtrees will be about half the size of
the original. Thus, in one step we have reduced the problem of
searching a tree of size
n
to searching a tree of size
n
/2. Since
the size of the tree is halved at each step, we should expect that the
number of steps needed to search a tree of size
n
grows as θ(
log
n
).
38
For large sets, this will
be a significant speedup over the previous representations.

We can represent trees by using lists. Each node will be a list of
three items: the entry at the node, the left subtree, and the right
subtree. A left or a right subtree of the empty list will indicate
that there is no subtree connected there. We can describe this
representation by the following procedures:
39

(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
  (list entry left right))

Now we can write the
element-of-set?
procedure using the strategy
described above:

(define (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (entry set)) true)
        ((< x (entry set))
         (element-of-set? x (left-branch set)))
        ((> x (entry set))
         (element-of-set? x (right-branch set)))))

Adjoining an item to a set is implemented similarly and also requires
θ(
log
n
) steps. To adjoin an item
x
, we compare
x
with
the node entry to determine whether
x
should be added to the
right or to the left branch, and having adjoined
x
to the
appropriate branch we piece this newly constructed branch together
with the original entry and the other branch. If
x
is equal to
the entry, we just return the node. If we are asked to adjoin
x
to an empty tree, we generate a tree that has
x
as the
entry and empty right and left branches. Here is the procedure:

(define (adjoin-set x set)
  (cond ((null? set) (make-tree x '() '()))
        ((= x (entry set)) set)
        ((< x (entry set))
         (make-tree (entry set) 
                    (adjoin-set x (left-branch set))
                    (right-branch set)))
        ((> x (entry set))
         (make-tree (entry set)
                    (left-branch set)
                    (adjoin-set x (right-branch set))))))

The above claim that searching the tree can be performed in a logarithmic
number of steps
rests on the assumption that the tree is
“balanced,” i.e., that the
left and the right subtree of every tree have approximately the same
number of elements, so that each subtree contains about half the
elements of its parent. But how can we be certain that the trees we
construct will be balanced? Even if we start with a balanced tree,
adding elements with
adjoin-set
may produce an unbalanced
result. Since the position of a newly adjoined element depends on how
the element compares with the items already in the set, we can expect
that if we add elements “randomly” the tree will tend to be balanced
on the average. But this is not a guarantee. For example, if we
start with an empty set and adjoin the numbers 1 through 7 in sequence
we end up with the highly unbalanced tree shown in
figure 
2.17
. In this tree all the left subtrees
are empty, so it has no advantage over a simple ordered list. One
way to solve this problem is to define an operation that transforms an
arbitrary tree into a balanced tree with the same elements. Then we
can perform this transformation after every few
adjoin-set
operations to keep our set in balance. There are also other ways to
solve this problem, most of which involve designing new data
structures for which searching and insertion both can be done in
θ(
log
n
) steps.
40

Figure 2.17:
  Unbalanced tree produced by adjoining 1 through 7 in sequence.

Exercise 2.63.
  Each of the following two procedures converts a
binary tree to a list.

(define (tree->list-1 tree)
  (if (null? tree)
      '()
      (append (tree->list-1 (left-branch tree))
              (cons (entry tree)
                    (tree->list-1 (right-branch tree))))))
(define (tree->list-2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree)
        result-list
        (copy-to-list (left-branch tree)
                      (cons (entry tree)
                            (copy-to-list (right-branch tree)
                                          result-list)))))
  (copy-to-list tree '()))

a. Do the two procedures produce the same result for every tree? If
not, how do the results differ? What lists do the two procedures
produce for the trees in figure 
2.16
?

Other books

Cinderella Sidelined by Syms, Carly
Daisy and the Duke by Janice Maynard
The Machine Awakes by Adam Christopher
The Janus Man by Colin Forbes
The Secret of Rover by Rachel Wildavsky
I Saw Your Profile by Swan, Rhonda
Raising Innocence by Shannon Mayer