January 25, 2013

Lazy sequences and streams

Functional languages have the notion of lazy sequences, which are an abstraction of infinite sequences that are stored using a small, finite amount of memory. It would be wasteful to realize an entire infinite sequence before even using it. The basic idea is to only call the function that generates the sequence when needed, and cache the results. With lazy sequences, you don't blow the stack and the elements in the sequence are not recalculated everytime.

Let's look at how the two most popular and functional JVM languages handle lazy sequences.

Clojure has the lazy-seq macro to create lazy sequences. Since most functions are lazy in Clojure, sequences generated by these functions are also lazy. Here's how the Fibonacci sequence is implemented in Clojure.

(def fibo-seq
  (map first
    (iterate (fn [[a b]] [b (+ a b)]) [0N 1N])))

(defn fibo-list [n]
  (take n fibo-seq))

(defn fibo-last [n]
  (last (fibo-list n)))

This is a really elegant implementation that uses the lazy iterate function.

Scala has the parameterized Stream[T] class to represent a lazy list. Here's what a Scala Fibonacci stream looks like.

object Fibo {
  lazy val fibo: Stream[BigInt] =
    BigInt(0) #::
    BigInt(1) #::
    fibo.zip(fibo.tail).map { n => n._1 + n._2 }

  def fiboList = fibo.take(_: Int).toList

  def fiboLast = fiboList(_: Int).last
}

This implementation uses the lazy right-associative #:: function, which is actually the Stream.cons method. There's also an implicit conversion from a sequence to a stream, from the Stream companion object. I'm sure Haskell programmers will eagerly point out the need for a zipWith function.

Let's look at how these two implementations match up against eachother in terms of performance. In Clojure, the time macro can be used to measure the time taken to evaluate a form. There's really no equivalent in Scala, so let's implement our own. We should be able to simply say time(fiboLast(n)), for example.

def time[R](block: => R): R = {
  val t0 = System.currentTimeMillis
  val result = block
  val t1 = System.currentTimeMillis

  println("Elapsed time: " + (t1 - t0) + "ms")
  result
}

Note that the time taken to print the result of a statement in the REPL shouldn't be measured. We can ensure this by binding the result to a variable. This is done by using val fibo = time(fiboLast(5000) in Scala, and (time (def fibo (fibo-last 5000))) in Clojure, for example. Memory usage can be ignored, as the GC is invoked unpredictably.

And here are the results!

A weird observation in the Scala implementation is that fiboList(1) takes 8-10 milliseconds to evaluate. But Scala actually performs better on average, which is mostly due to the use of static types.

Interestingly, the Clojure fibo-list function evaluates in constant time, which is less than a millisecond! Clojure also fetches cached elements in the lazy sequence much faster than Scala. However, fibo-last performs linearly since the last function has linear time-complexity.

Tags: Scala Clojure