Read Thinking Functionally with Haskell Online
Authors: Richard Bird
The first argument of
best
is the remaining space available on the current line. This function is made local to the definition of
pretty
to avoid having to carry around the maximum line width
w
as an additional argument.
That leaves us with the problem of computing
better r lx ly
. Here we can make use of the fact that the first line of
lx
is guaranteed to be at least as long as the first line of
ly
. Thus it suffices to compare the length of the first line of
lx
with
r
. If the former fits within the latter, we choose
lx
; otherwise we choose
ly
. We therefore define
better r lx ly = if fits r lx then lx else ly
But we don’t want to compute the length of the whole of the first line of
lx
since that looks ahead too far. Instead, we take a more miserly approach:
fits r _ | r<0
= False
fits r []
= True
fits r (c:cs)
= if c == '\n' then True
else fits (r-1) cs
For exactly the same reason it is essential that the second and third arguments to
better
are computed lazily, that is, the two layouts are evaluated just enough to determine which is the better one, and no further.
Let’s revisit our troublesome paragraph:
ghci> putStrLn $ pretty 30 $ para pg
This is a fairly short
paragraph with just twenty-two
words. The problem is that
pretty-printing it takes time,
in fact 31.32 seconds.
(0.00 secs, 1602992 bytes)
Much better. Exercise L discusses what we can say about the running time of
pretty
.
The final task is to put our small library together as a module. Here is the main declaration:
module Pretty
(Doc, Layout,
nil, line, text,
nest, (<>), group,
layouts, pretty, layout) where
The module name is
Pretty
and the file containing the above declaration and the definitions of the library functions has to be saved in a file called
Pretty.lhs
.
The module exports 11 entities. Firstly, there is the name
Doc
of the abstract type of documents. The constructors of this type are not exported. (By the way, if we did want to export all the constructors we can write
Doc ()
in the export list, and if we wanted just, say,
Nil
and
Text
, we can write
Doc (Nil, Text)
.) Secondly, there is the name
Layout
which is just a synonym for
String
. The next eight constants and functions are the ones we have defined above. The final function
layout
is used for printing a layout:
layout :: Layout -> IO ()
layout = putStrLn
And that’s it. Of course, in a really useful library a number of additional combinators could be provided. For example, we could provide
(<+>),(<|>) :: Doc -> Doc -> Doc
x <+> y = x <> text " " <> y
x <|> y = x <> line <> y
spread,stack :: [Doc] -> Doc
spread = foldr (<+>) nil
stack = foldr (<|>) nil
No doubt the reader can think of many others.
Exercise A
A picky user of the library wants just three layouts for a certain document:
A B C
A B
A
C B
C
Can the user do it with the given functions?
Exercise B
The layouts of a document are given as a list. But are they all different? Either prove that they are or give a counterexample.
By the way, is it obvious from the laws that each document has a nonempty set of layouts?
Exercise C
The next four exercises refer to the shallow embedding of
Section 8.3
. Prove, by equational reasoning, that
nest i . nest j = nest (i + j)
You will need a subsidiary result about
nestl
, which you don’t have to prove.
Exercise D
Continuing on from the previous question, prove that
nest i (group x) = group (nest i x)
by equational reasoning (at the point level). Again, you will need a subsidiary result.
Exercise E
Continuing on, prove that
flatten . group = flatten
. You will need a subsidiary result.
Exercise F
The final law is
flatten . nest i = flatten
. And, yes, you will need yet another subsidiary result.
Exercise G
We said in the text that the prelude function
lines
breaks up a string on newline characters. In fact,
lines
treats a newline as a terminator character, so both
lines "hello"
and
lines "hello\n"
return the same result. It is arguable that a better definition treats newlines as
separator
characters, so there is always one more line than there are newlines. Define a function
lines
that has this behaviour. We will need the new definition below.
Now, the proof that
map shape
applied to the layouts of a document returns a lexicographically decreasing sequence of list of integers can be structured into the following steps. First, define
msl
= map shape . layouts
shape = map length . lines
where
lines
refers to the revised version above. We have to prove that
msl
returns a decreasing sequence on every document. To this end, we can define functions
nesty
and
groupy
so that
nesty i . msl = msl . nest i
groupy . msl = msl . group
and an operation
<+>
so that
msl x <+> msl y = msl (x <> y)
(It is this equation that requires the revised definition of
lines
.) The proof is then completed by showing that if
xs
and
ys
are decreasing, then so are
nesty i xs
and
groupy xs
and
xs <+> ys
. All this exercise asks though is that you construct definitions of
nesty
,
groupy
and
<+>
.
Exercise H
Write a function
doc :: Doc -> Doc
that describes how to lay out elements of
Doc
where
Doc
is the abstract syntax tree representation in
Section 8.6
.
Exercise I
Consider a function
prettybad
that chooses a best layout from the list
layouts
by taking the first layout all of whose lines fit within the given width, and the last layout if this is not possible. Does
prettybad
always compute the same layout as
pretty
? (Hint: think about paragraphs.)
Exercise J
Using the algebraic properties of the constructors of
Doc
, calculate the efficient version of
layouts
.
Exercise K
We have designed
pretty w
to be
optimal
, meaning that it chooses line breaks to avoid overflowing lines if at all possible. We also have that
pretty w
is
bounded
, meaning that it can make the choice about the next line break without looking at more than the next
w
characters of the input. Given that, what do you expect GHCi’s response would be to the commands
layout $ pretty 5 $ para pg
layout $ pretty 10 $ cexpr ce
where
pg = "Hello World!" ++ undefined
ce = If "happy" (Expr "great") undefined
Exercise L
We cannot relate the cost of
pretty w x
to the size of
x
without saying what the size of a document is. Here is a reasonable measure:
size :: Doc -> Int | |
size Nil | = 1 |
size Line | = 1 |
size (Text s) | = 1 |
size (Nest i x) | = 1 + size x |
size (x :<>: y) | = 1 + size x + size y |
size (Group x) | = 1 + size x |
Under this definition both the documents
nest 20 (line <> text "!")
nest 40 (line <> text "!")
have size two. But it takes twice as long to produce the second layout, so the cost of
pretty
cannot be linear in the document size.
Instead of having
pretty
produce the final layout, a string, we can interpose an additional data type of layouts:
data Layout = Empty
| String String Layout
| Break Int Layout
and define
layout :: Layout -> String
by
layout Empty
= ""
layout (String s x) = s ++ layout x
layout (Break i x)
= '\n':replicate i ' ' ++ layout x
We have
pretty w = layout . prettyl w
where the new function
prettyl
produces a
Layout
rather than a string. Define
prettyl
.
A fairer question to ask is whether
prettyl w x
takes linear time in the size of
x
. Does it?
Answer to Exercise A
No. There is no way of allowing both
A<0>B<1>C
and
A<1>B<0>C
without also having both of
A<0>B<0>C
and
A<1>B<1>C
. These four are given by the expression
group (A <> line <> B) <> group (line <> C)
Answer to Exercise B
The layouts of a document are not necessarily all different. For example
layouts (group (text "hello")) = ["hello","hello"]
Yes, it is obvious that each document has a nonempty set of layouts. Look at the laws of
layouts
. The basic documents have a nonempty list of layouts and this property is preserved by the other operations.
Answer to Exercise C
The calculation is:
nest i . nest j
=
{definition of
nest
}
map (nestl i) . map (nestl j)
=
{functor law of
map
}
map (nestl i . nestl j)
=
{claim}
map (nestl (i+j))
=
{definition of
nest
}
nest (i+j)
The claim is that
nestl i . nestl j = nestl (i+j)
, which follows – after a short calculation – from
indent (i+j) = concat . map (indent i) . indent j
We omit the proof.
Answer to Exercise D
We reason:
nest i (group x)
=
{definition of
group
}
nest i (flatten x ++ x)
=
{since
nest i = map (nestl i)
}
nest i (flatten x) ++ nest i x
=
{claim}
flatten (nest i x) ++ nest i x
=
{definition of
group
}
group (nest i x)
The claim follows from
nest i . flatten
=
{since there are no newlines in
flatten x
}
flatten
=
{since
flatten . nest i = flatten
(Exercise F)}
flatten . nest i
Answer to Exercise E
We reason:
flatten . group
=
{definition of
flatten
and
group
}
one . flattenl . flattenl . head
=
{claim}
one . flattenl . head
=
{definition of
flatten
}
flatten
The claim is that
flattenl
is
idempotent
:
flattenl . flattenl = flattenl
This follows because
flattenl
returns a layout with no newlines.
By the way, it is the idempotence of
flattenl
that ensures all layouts of a document flatten to the same string. The only function that introduces multiple layouts is
group
, whose definition is
group x = flatten x ++ x
We have therefore to show that flattening the first element of this list gives the same string as flattening the second element. Thus we need to show
flattenl . head . flatten = flattenl . head
This follows at once from the definition of
flatten
and the idempotence of the function
flattenl
.
Answer to Exercise F
We reason:
flatten . nest i
=
{definitions}
one . flattenl . head . map (nestl i)
=
{since
head . map f = f . head
}
one . flattenl . nestl i . head
=
{claim}
one . flattenl . head
=
{definition of
flatten
}
flatten