Tuesday, July 26, 2011

Functional Programming as a Particular Use of Modules

Bob Harper sometimes gets grumbly when people say that ML is an impure language, even though he knows exactly what they mean (and, indeed, agrees with it), because this way of phrasing things does not pay data abstraction its full due.

Data abstraction lets us write verifiably pure programs in ML. By "verifiably pure", I mean that we can use the type system to guarantee that our functions are pure and total, even though ML's native function space contains all sorts of crazy effects involving higher-order state, control, IO, and concurrency. (Indeed, ML functions can even spend your money to rent servers: now that's a side effect!) Given that the ML function space contains these astonishing prodigies and freaks of nature, how can we control them? The answer is data abstraction: we can define new types of functions which only contains well-behaved functions, and ensure through type abstraction that the only ways to form elements of this new type preserve well-behavedness.

Indeed, we will not just define a type of pure functions, but give an interface containing all the type constructors of the guarded recursion calculus I have described in the last few posts. The basic idea is to give an ML module signature containing:

  • One ML type constructor for each type constructor of the guarded recursion calculus.
  • A type constructor for the hom-sets of the categorical interpretation of this calculus
  • One function in the interface for each categorical combinator, such as identity, composition, and each of the natural transformations corresponding to the universal properties of the functors interpreting the type constructors.

That's a mouthful, but it is much easier to understand by looking at the following (slightly pretty-printed) Ocaml module signature:

module type GUARDED =
type one
type α × β
type α ⇒ β
type α stream
type num
type •α

type (α, β) hom

val id : (α, α) hom
val compose : (α, β) hom -> (β, γ) hom -> (α, γ) hom

val one : (α, one) hom

val fst : (α × β, α) hom
val snd : (α × β, β) hom
val pair : (α, β) hom -> (α, γ) hom -> (α, β × γ) hom

val curry : (α × β,γ) hom -> (α,(β,γ) exp) hom
val eval : ((α ⇒ β, α) times, β) hom

val head : (α stream, α) hom
val tail : (α stream, •(α stream)) hom
val cons : (α × •(α stream), α stream) hom

val zero : (one,num) hom
val succ : (num, num) hom
val plus : (num × num, num) hom
val prod : (num × num, num) hom

val delay : (α, •α) hom
val next : (α,β) hom -> (•α, •β) hom
val zip : (•α × •β, •(α × β)) hom
val unzip : (•(α × β), •α × •β) hom

val fix : (•α ⇒ α, α) hom

val run : (one, num stream) hom -> (unit -> int)

As you can see, we introduce abstract types corresponding to each of our calculus's type constructors. So α × β and α ⇒ β are not ML pairs and functions, but rather is the type of products and functions of our calculus. This is really the key idea -- since ML functions have too much stuff in them, we'll define a new type of pure functions. I replaced the "natural" numbers of the previous posts with a stream type, corresponding to our LICS 2011 paper, since they are really a kind of lazy conatural, and not the true inductive type of natural numbers. The calculus guarantees definitions are productive, but it's kind of weird in ML to see something called nat which isn't. So I replaced it with streams, which are supposed to yield an unbounded number of elements. (For true naturals, you'll have to wait for my post on Mendler iteration, which is a delightful application of parametricity.)

The run function takes a num stream, and gives you back an imperative function that successively enumerates the elements of the stream. This is the basic trick for making streams fit nicely into an event loop a la FRP.

However, we can implement these functions and types using traditional ML types:

module Guarded : GUARDED =
type one = unit
type α × β = α * β
type α ⇒ β = α -> β
type •α = unit -> α
type α stream = Stream of α * α stream delay
type num = int

type (α, β) hom = α -> β

let id x = x
let compose f g a = g (f a)

let one a = ()

let fst (a,b) = a
let snd (a,b) = b
let pair f g a = (f a, g a)

let curry f a = fun b -> f(a,b)
let eval (f,a) = f a

let head (Stream(a, as')) = a
let tail (Stream(a, as')) = as'
let cons (a,as') = Stream(a,as')

let zero () = 0
let succ n = n + 1
let plus (n,m) = n + m
let prod (n,m) = n * m

let delay v = fun () -> v
let next f a' = fun () -> f(a'())
let zip (a',b') = fun () -> (a'(), b'())
let unzip ab' = (fun () -> fst(ab'())), (fun () -> snd(ab'()))

let rec fix f = f (fun () -> fix f)

let run h =
let r = ref (h()) in
fun () ->
let Stream(x, xs') = !r in
let () = r := xs'() in

Here, we're basically just implementing the operations of the guarded recursion calculus using the facilities offered by ML. So our guarded functions are just plain old ML functions, which happen to live at a type in which they cannot be used to misbehave!

This is the sense in which data abstraction lets us have our cake (effects!) and eat it too (purity!).

Note that when we want to turn a guarded lambda-term into an ML term, we can basically follow the categorical semantics to tell us what to write. Even though typechecking will catch all misuses of this DSL, actually using it is honestly not that much fun (unless you're a Forth programmer), since even even small terms turn into horrendous combinator expressions -- but in another post I'll show how we can write a CamlP4 macro/mini-compiler to embed this language into OCaml. This macro will turn out to involve some nice proof theory there, just as this ML implementation shows off how to use the denotational semantics in our ML programming.


  1. did you mean "val run : (I, nat) hom -> int" ?

  2. Yes, I did -- thanks for the correction!

  3. assuming also that times and exp are × and ⇒ it seems that the following should typecheck:

    val inf : (I,nat) hom
    let inf = compose (curry (compose snd succ)) fix

    so then "run inf" shouldn't terminate.

    Does this mean that the DSL is too liberal or that run should have a lazy natural type as codomain, or that (more likely) i'm missing something?

  4. You didn't miss anything -- I messed up! The guarded "naturals" I described are really co-natural numbers, as your example demonstrates. I'll change the example to a stream type, so that things make sense.

    On the bright side, you gave me ideas for two more posts: one about Cauchy completions, which explains why this type *must* be lazy, and another about proving the correctness of Mendler iteration, which will show how to define total recursive functions on the naturals.

  5. This, and your bounded FRP paper is a lot of fun, and a good motivator to learn logic. A simple question: what if we trust the programmer to provide pure terminating functions and expose hom implementation? As ML programmers are usually pretty good at reasoning about side-effects and termination, and then there is no need for macros. Intuitively, provided this discipline, the streams still seem guarded (but how would I prove that I wonder). Here is a sketch: https://gist.github.com/1138753

  6. Hi Anton,

    What you're suggesting should work, as long as your hom functions never call the sample function. (This is essentially the invariant I am enforcing by hiding the hom implementation.)

    Unfortunately, it turns out to be rather inconvenient to program this way, though. Basically, you ending doing a lot of variable management "by hand" with the zip and unzip combinators.