Read Thinking Functionally with Haskell Online
Authors: Richard Bird
expr = token (constant <|> paren binary)
constant = do {n <- nat; return (Con n)}
binary = do {e1 <- expr;
p <- op;
e2 <- expr;
return (Bin p e1 e2)}
op = (symbol "+" >> return Plus) <|>
(symbol "-" >> return Minus)
For readability we have made use of a subsidiary parser
binary
; the parser
paren
is left as an exercise.
Now suppose we want a parser that also works for expressions that are not fully parenthesised, things like
6-2-3
and
6-(2-3)
and
(6-2)-3
. In such a case,
(+)
and
(-)
should associate to the left in expressions, as is normal with arithmetic. One way to express such a grammar in BNF is to write
expr ::= expr op term | term
term ::= nat | '(' expr ')'
This grammar says that an expression is a sequence of one or more terms separated by operators. A term is either a number or a parenthesised expression. In particular,
6-2-3
will be parsed as the expression
6-2
followed by a minus operator, followed by the term
3
. In other words, the same as
(6-2)-3
, as required. This grammar also translates directly into a parser:
expr = token (binary <|> term)
binary = do {e1 <- expr;
p <- op;
e2 <- term;
return (Bin p e1 e2)}
term = token (constant <|> paren expr)
However, there is a fatal flaw with this parser: it falls into an infinite loop. After ignoring initial white space the first action of
expr
is to invoke the parser
binary
, whose first action is to invoke the parser
expr
again. Whoops!
Furthermore, it will not do to rewrite
expr
as
expr = token (term <|> binary)
because, for example,
Main*> apply expr "3+4"
[(Con 3,"+4")]
Only the first term is parsed. The problem is called the
left recursion
problem and is a difficulty with all recursive parsers, functional or otherwise.
One solution is to rewrite the grammar in the following equivalent form:
expr ::= term {op term}*
The meta-symbol
{-}*
indicates a syntactic category that can be repeated zero or more times. The new parser then takes the form
expr = token (term >>= rest)
rest e1 = do {p <- op;
e2 <- term;
rest (Bin p e1 e2)} <|> return e1
The parser
rest
corresponds to the category
{op term}*
and takes an argument (an accumulating parameter) whose value is the expression parsed so far.
Finally, let us design a parser for arithmetic expressions that may contain multiplication and division, changing the definition of
Op
to
data Op = Plus | Minus | Mul | Div
The usual rules apply in that multiplication and division take precedence over addition and subtraction, and operations of the same precedence associate to the left. Here is a grammar:
expr ::= term {addop term}*
term ::= factor {mulop factor}*
factor ::= nat | '(' expr ')'
addop ::= '+' | '-'
mulop ::= '*' | '/'
And here is the parser:
expr = token (term >>= rest)
rest e1 = do {p <- addop;
e2 <- term;
rest (Bin p e1 e2)}
<|> return e1
term = token (factor >>= more)
more e1 = do {p <- mulop;
e2 <- factor;
more (Bin p e1 e2)}
<|> return e1
factor = token (constant <|> paren expr)
The definitions of
addop
and
mulop
are left as exercises.
Our final question is: how can we install
Expr
as a member of the type class
Show
so that the function
show
is the inverse of parsing? More precisely, we want to define
show
so that
parse expr (show e) = e
Recall that
parse p
extracts the first parse returned by
apply p
.
As a warm-up, here is the instance of
Show
when
expr
is the parser for fully parenthesised expressions involving addition and subtraction only:
instance Show Expr where
show (Con n) = show n
show (Bin op e1 e2) =
= "(" ++ show e1 ++
" " ++ showop op ++
" " ++ show e2 ++ ")"
showop Plus = "+"
showop Minus = "-"
Clear enough, but there is a problem with efficiency. Because
(++)
has time complexity linear in the length of its left argument, the cost of evaluating
show
is, in the worst case, quadratic in the size of the expression.
The solution, yet again, is to use an accumulating parameter. Haskell provides a type synonym
ShowS
:
type ShowS = String -> String
and also the following subsidiary functions
showChar
:: Char -> ShowS
showString
:: String -> ShowS
showParen :: Bool -> ShowS -> ShowS
These functions are defined by
showChar
= (:)
showString
= (++)
showParen p x = if b then
showChar '(' . p . showChar ')'
else p
Now we can define
show
for expressions by
show e = shows e ""
where
shows (Con n) = showString (show n)
shows (Bin op e1 e2)
= showParen True (shows e1 . showSpace .
showsop op . showSpace . shows e2)
showsop Plus = showChar '+'
showsop Minus = showChar '-'
showSpace = showChar ' '
This version, which contains no explicit concatenation operations, takes linear time in the size of the expression.
Now suppose we want to display expressions that are not fully parenthesised. There is no need for parentheses around left-hand expressions, but we do need parentheses around right-hand expressions. That leads to
show = shows False e ""
where
shows b (Con n) = showString (show n)
shows b (Bin op e1 e2)
= showParen p (shows False e1 . showSpace .
showsop op . showSpace . shows True e2)
This definition takes no account of associativity; for example,
1+(2+3)
is not shown as
1+2+3
.
Finally, let’s tackle expressions involving all four arithmetic operations. The difference here is that: 1. With expressions
e1 + e2
or
e1 - e2
we will never need parentheses around
e1
(just as above), nor will we need parentheses around
e2
if
e2
is a compound expression with a multiplication or division at the root.
2. On the other hand, with expressions
e1 * e2
or
e1 / e2
we will need parentheses around
e1
if
e1
is a compound expression with a plus or minus at the root, and we will always need parentheses around
e2
.
One way to codify these rules is to introduce precedence levels (for another way, see Exercise L). Define
prec :: Op -> Int
prec Mul
= 2
prec Div
= 2
prec Plus
= 1
prec Minus
= 1
Consider now how to define a function
showsPrec
with type
showsPrec :: Int -> Expr -> ShowS
such that
showsPrec p e
shows the expression
e
assuming that the parent of
e
is a compound expression with an operator of precedence
p
. We will define
show
by
show e = showsPrec 0 e ""
so the enclosing
context
of
e
is an operator with fictitious precedence
0
. We can at once define
showsPrec p (Con n) = showString (show n)
because constants are never enclosed in parentheses. The interesting case is when we have a compound expression. We give the definition first and explain it afterwards:
showsPrec p (Bin op e1 e2)
= showParen (p>q) (showsPrec q e1 . showSpace .
showsop op . showSpace . showsPrec (q+1) e2)
where q = prec op
We put parentheses around an expression if the parent operator has greater precedence than the current one. To display the expression
e1
it is therefore sufficient to pass the current precedence as the new parent precedence. But we need parentheses around
e2
if the root operator of
e2
has precedence less than
or equal to
q
; sowe have to increment
q
in the second call.
Admittedly, the above definition of
showsPrec
requires a little thought, but there is a payoff. The type class
Show
has a
second
method in it, namely
showsPrec
. Moreover, the default definition of
show
is just the one above. So to install expressions as a member of
Show
we merely have to give the definition of
showsPrec
.
Exercise A
Consider the synonym
type Angle = Float
Suppose we want to define equality on angles to be equality modulo a multiple of 2
π
. Why can’t we use
(==)
for this test? Now consider
newtype Angle = Angle Float
Install
Angle
as a member of
Eq
, thereby allowing
(==)
as an equality test between angles.
Exercise B
We could have defined
newtype Parser a = Parser (String -> Maybe (a,String))
Give the monad instance of this kind of parser.
Exercise C
Prove that
fail >> p = fail
.
Exercise D
Could we have defined
<|>
in the following way?
p <|> q = Parser (\s -> parse p s ++ parse q s)
When is the result a deterministic parser? Define a function
limit :: Parser a -> Parser a
such that
limit (p <|> q)
is a deterministic parser, even if
p
and
q
are not.
Exercise E
Parsers are not only instances of monads, they can also be made instances of a more restricted class, called
MonadPlus
, a class we could have introduced in the previous chapter. Basically, these are monads that support choice and failure. The Haskell definition is
class Monad m => MonadPlus m where
mzero :: m a
mplus :: m a -> m a -> m a
As examples, both
[]
and
Maybe
can be made members of
MonadPlus
:
instance MonadPlus [] where
mzero = []
mplus = (++)
instance MonadPlus Maybe where
mzero = Nothing
Nothing `mplus` y = y
Just x `mplus` y = Just x
Install
Parser
as an instance of
MonadPlus
.
Exercise F
Continuing from the previous exercise, the new methods
mzero
and
mplus
are expected to satisfy some equational laws, as is usually the case with the methods of a type class. But currently the precise set of rules that these methods should obey is not agreed on by the Haskell designers! Uncontroversial are the laws that
mplus
should be associative with identity element
mzero
. That’s three equations. Another reasonable law is the
left-zero
law
mzero >>= f = mzero
The corresponding
right-zero
law, namely
p >> mzero = mzero
can also be imposed. Does the
MonadPlus
instance of the list monad satisfy these five laws? How about the
Maybe
monad?
Finally, the really contentious law is the following one:
(p `mplus` q) >>= f = (p >>= f) `mplus` (q >>= f)
This law is call the
left-distribution
law. Why can’t
Maybe
be installed as a member of
MonadPlus
if the left-distribution is imposed?
Exercise G
Design a parser for recognising Haskell floating-point numbers. Bear in mind that
.314
is not a legitimate number (no digits before the decimal point) and that
3 . 14
is not legitimate either (because no spaces are allowed before or after the decimal point).
Exercise H