Writing an expression parser evaluator using Scala and Cats

Introduction

To write a program that could take input such as "(1 + 3) * (4 * 5 - 6)" return the answer, we need a program that parses the data and evaluates it. Parsing is reading input and extracting data.

The structure of one possible example of a parser could be represented as this signature.

type Parser[A] = String => Option[(String, A)]

So Parser[A] is a function that consumes some text input and depending on that input may return the Some case of Option containing the parsed value and the rest of the input to be consumed by subsequent parsers, or else fail with None. A parser for parsing integers could be a Parser[Int] and may consume some characters from the string, returning the parsed number and the rest of the string.

Parsing step

”56+42”⤏Some((“+42”, 56))

The next useful thing to understand is the State Monad. and for our purposes the StateT monad transformer. You can think of it like this.

case class StateT[F[_], S, A](f: S => F[(S, A)])

StateT is a monad so it has flatMap for chaining StateT computations together. It therefore is also an Applicative (so we can use mapN and we can use it in a traverse) and a Functor (so we can use map).

It is useful for our purposes because it essentially is a wrapper for our Parser form above.

Parser as StateT

Parser[A]≅StateT[String, Option, A]

And we can use the flatMap functions on StateT to chain together parse steps.

val parseAddition: Parser[Int] = for {
  x <- parseInt
  _ <- parseChar('+')
  y <- parseInt
} yield x + y

Parsing illustration

parseAddition.run(“56+42”)⤏Some((""), 100)

Writing our first parsers

def anyChar: Parser[Char] = StateT { s =>
  if s.isEmpty then None
  else Some((s.substring(1), s.charAt(0)))
}

The above parser is the most simple, it accepts the first character of the string and returns it along with the rest of the string, or fails if empty.

a non empty string

anyChar.run(“abc”)⤏Some((“bc”), ‘a’)

an empty string

anyChar.run("")⤏None

def ifChar(p: Char => Boolean): Parser[Char] = anyChar.filter(p)

The above parser takes a predicate and returns the character if the predicate returns true. While the state monad doesn’t have the filter function. StateT[F, *] can use the filter from F if it has one, (in this case the filter on Option). So it may turn a Some into a None.

def digit: Parser[Char] = ifChar(_.isDigit)

a valid digit

digit.run(“42”)⤏Some((“2”, ‘4’))

an invalid digit

digit.run(“abc”)⤏None

In order to use a parser as many times as possible we need to build two more combinators.

def some[A](fa: Parser[A]): Parser[List[A]] = for {
  a <- fa
  as <- many(fa)
} yield a :: as
 
def many[A](fa: Parser[A]): Parser[List[A]] =
  some(fa) <+> List.empty[A].pure[Parser]

They are defined in terms of each other. Both take a Parser[A] and consume as many As from the stream until it fails and return them in a list. However some will return at least one A or fail. And many may return an empty list if it is unable to parse any As. Here we use <+> which isn’t a function of StateT but comes from Option’s Alternative instance. You can think of it as or else.

behaviour of some

some(digit).run(“42+58”)⤏Some((“+58”, List(‘4’, ‘2’)))

some(digit).run(“+58”)⤏None

behaviour of many

many(digit).run(“42+58”)⤏Some((“+58”, List(‘4’, ‘2’)))

some(digit).run(“+58”)⤏Some((“+58”, List()))

Now we have some we can write a parser for natural numbers (positive or zero). Also negative numbers. And then integers in general.

val nat: Parser[Int] = some(digit).map(_.mkString.toInt)
def char(c: Char): Parser[Char] = ifChar(_ == c)
val neg: Parser[Int] = char('-') *> nat.map(-_)
val int: Parser[Int] = neg <+> nat

int behaviour

int.run(“42+58”)⤏Some((“+58”, 42))

We have a lot of the techniques we need do do a whole expression parser now. If we refer to Professor Graham Hutton’s Functional Parsing Video on youtube. He defines an expression parser like this

-- expr ::= term + expr | term
-- term ::= factor * term | factor
-- factor ::= (expr) | int

This conforms to BIDMAS because while that tells us to do brackets … then multiplication … then addition. This is a bottom up way of thinking. To think of it in the order of a recursive descent parser you need to reverse those. That is, you could think of it in a top down way as being additions of multiplications of expressions in brackets (or numbers).

We can translate this directly to parsers like this.

def expr: Parser[Int] = {
  for {
    t <- term
    _ <- char('+')
    e <- expr
  } yield t + e
} <+> term
 
def term: Parser[Int] = {
  for {
    f <- factor
    _ <- char('*')
    t <- term
  } yield f * t
} <+> factor
 
def factor: Parser[Int] = {
  for {
    _ <- char('(')
    e <- expr
    _ <- char(')')
  } yield e
} <+> int

Let’s extract some common logic and rewrite expr and term:

def add: Parser[(Int, Int) => Int] = char('+').as(_ + _)
 
def multiply: Parser[(Int, Int) => Int] = char('*').as(_ * _)
 
def chainR[A](fa: Parser[A], op: Parser[(A, A) => A]): Parser[A] = {
  for {
    a <- fa
    f <- op
    r <- chainR(fa, op)
  } yield f(a, r)
} <+> fa
 
def expr: Parser[Int] = chainR(term, add)
 
def term: Parser[Int] = chainR(factor, multiply)

expr behaviour

expr.run(“1+2*(3+4)+3*5”)⤏Some(("", 30))

Nice!

If we expand this to include subtraction and integer division however:

def subtract: Parser[(Int, Int) => Int] = char('-').as(_ - _)
 
def divide: Parser[(Int, Int) => Int] = char('/').as(_ / _)
 
def expr: Parser[Int] = chainR(term, add <+> subtract)
 
def term: Parser[Int] = chainR(factor, multiply <+> divide)

We find that it doesn’t behave how we would wish it to:

expr behaviour using chainR

expr.run(“6-3+2-1”)⤏Some(("", 2))

This isn’t what we would expect our expression should return (4). The reason for this is that chainR is right associative. So it evaluates 6-3+2-1 as 6-(3+(2-1)). But addition and subtraction together needs to associate to the left. As ((6-3)+2)-1. As does multiplication and division.

Our chainR function is quite analogous to foldRight which is named as such as it associates to the right.

def foldRight[A, B](l: List[A], z: B, f: (A, B) => B): B = l match {
  case Nil     => z
  case a :: as => f(a, foldRight(as, z, f))
}

foldRight behaviour

foldRight(List(a, b, c), z, f)⤏f(a, f(b, f(c, z)))

So to remedy our function we want something more akin to foldLeft.

def foldLeft[A, B](l: List[A], z: B, f: (B, A) => B): B = l match {
  case Nil     => z
  case a :: as => foldLeft(as, f(z, a), f)
}

foldLeft behaviour

foldRight(List(a, b, c), z, f)⤏f(f(f(z, a), b), c)

The implementation gives us a clue: For our new function chainL, instead of passing our recursion step to our binary operator, we need to apply arguments and then recurse. Here is one possible solution.

def chainL[A](fa: Parser[A], op: Parser[(A, A) => A]): Parser[A] = {
  def rest(x: A): Parser[A] = {
    for {
      f <- op
      y <- fa
      r <- rest(f(x, y))
    } yield r
  } <+> x.pure[Parser]
  fa.flatMap(rest)
}

With expr and term using chainL they now work correctly

def expr: Parser[Int] = chainL(term, add <+> subtract)
 
def term: Parser[Int] = chainL(factor, multiply <+> divide)

expr behaviour using chainL

expr.run(“6-3+2-1”)⤏Some(("", 4))