Monday, August 27, 2012

An astonishing isomorphism

I just learned today that in classical linear logic, the following type isomorphism holds: $$(!A \multimap R) \multimap R \qquad\simeq\qquad R^\bot \multimap (!A \otimes R^\bot)$$

So linear CPS and linear state are the same thing, up to a dual notion of observation, and this is the fundamental reason why SSA and CPS are equivalent program representations.

Apparently, this is the observation that prompted Egger, Møgelberg and Simpson to invent the enriched effect calculus, which is a reformulation of Paul Levy's call-by-push-value to make the linear nature of control more apparent. I guess this is also why Møgelberg and Staton were able to prove that linear state is a universal effect.

Now I'm curious how all this fits with the story of the algebraic theory of effects, such as Pretnar and Plotkin's effect handlers.

Thursday, August 23, 2012

A Computational Lambda Calculus for Applicative Functors

These days, most people are familiar with using monads to structure functional programs, and are also familiar with Haskell's do-notation for writing monadic code. Slightly less well-known is the fact that the do-notation actually originates in Moggi's original 1991 paper on monads, in which he gave the following type-theoretic introduction and elimination rules for monadic types:

Wednesday, August 8, 2012

An ML implementation of match compilation

In this post, I'll give an OCaml implementation of the pattern match compiler from my previous post. Just for fun, it will also pattern compilation into a series of tests.

We begin by giving some some basic types. The type var of variables is just strings, and we have a type tp of types, which are just units, sums, products, void types, and a default Other to handle all non-matchable types (e.g., functions). The type pat is the type of patterns. We have patterns for variables, wildcards, units, pairs, and left and right injections.

Finally, we have the type exp of expressions. We've thrown in the minimum number of constructs necessary to elaborate pattern matching. Namely, we have variables, let-binding, unit and pair elimination (both given in the "open-scope" style), an abort (for the empty type), and a case expression for sums.

type var = string

type tp = One | Times of tp * tp | Zero | Sum of tp * tp | Other

type pat = PVar of var
| PWild
| PUnit
| PPair of pat * pat
| PInl of pat
| PInr of pat

type exp = EVar of var
| ELet of var * exp * exp
| ELetUnit of exp * exp
| ELetPair of var * var * exp * exp
| EAbort of exp
| ECase of exp * var * exp * var * exp

First, we give some utility operations. gensym does fresh name generation. This is an effect, but a simple one, and should be easy enough to transcribe into (say) monadic Haskell.
let gensym =
let r = ref 0 in fun () -> incr r; Printf.sprintf "u%d" !r


The filtermap operation maps an optional function over a list, returning the non-None values. I really should work out the general story for this someday -- there's obviously a nice story involving distributive laws at work here.

(* filtermap : ('a -> 'b option) -> 'a list -> 'b list *)

let rec filtermap f = function
| [] -> []
| x :: xs -> (match f x with
| None -> filtermap f xs
| Some y -> y :: filtermap f xs)

Now, we have the simplification routines -- these correspond to the transformations of $\overrightarrow{ps \Rightarrow e}$ to $\overrightarrow{ps' \Rightarrow e'}$ in the inference rules I gave.
let simplify_unit u =
List.map (function
| (PVar x :: ps, e) -> (ps, ELet(x, EVar u, e))
| (PWild :: ps, e)  -> (ps, e)
| (PUnit :: ps, e)  -> (ps, e)
| (_, e) -> assert false)

let simplify_pair u =
List.map (function
| (PVar x :: ps, e) -> (PWild :: PWild :: ps, ELet(x, EVar u, e))
| (PWild :: ps, e)  -> (PWild :: PWild :: ps, e)
| (PPair(p1, p2) :: ps, e) -> (p1 :: p2 :: ps, e)
| (_, e) -> assert false)

let simplify_other u =
List.map (function
| (PVar x :: ps, e) -> (ps, ELet(x, EVar u, e))
| (PWild :: ps, e)  -> (ps, e)
| (_, e) -> assert false)


Unlike the simplify_unit and simplify_pair routines, the simplification routines for sums return an option, which we filtermap over. This is because when we want the inl patterns, the inr patterns are well-typed, but not needed. So we need to filter them out, but it's not actually an error to see them in the branch list.

let simplify_inl u =
filtermap (function
| (PVar x :: ps, e) -> Some (PWild :: ps, ELet(x, EVar u, e))
| (PWild :: ps, e)  -> Some (PWild :: ps, e)
| (PInl p :: ps, e) -> Some (p :: ps, e)
| (PInr _ :: _, _)  -> None
| (_, e) -> assert false)

let simplify_inr u =
filtermap (function
| (PVar x :: ps, e) -> Some (PWild :: ps, ELet(x, EVar u, e))
| (PWild :: ps, e)  -> Some (PWild :: ps, e)
| (PInr p :: ps, e) -> Some (p :: ps, e)
| (PInl _ :: _, _)  -> None
| (_, e) -> assert false)


Now we reach coverage checking proper. Note that this is basically just a transcription of the inference rules in my previous post.

let rec cover arms tps =
match arms, tps with
| arms, [] -> snd (List.hd arms)
| arms, (u, One) :: tps ->
let e = cover (simplify_unit u arms) tps in
ELetUnit(EVar u, e)
| arms, (u, Times(tp1, tp2)) :: tps ->
let u1 = gensym() in
let u2 = gensym() in
let arms = simplify_pair u arms in
let e = cover arms ((u1, tp1) :: (u2, tp2) :: tps) in
ELetPair(u1, u2, EVar u, e)
| arms, (u, Zero) :: tps ->
EAbort (EVar u)
| arms, (u, Sum(tp1, tp2)) :: tps ->
let u1 = gensym() in
let u2 = gensym() in
let e1 = cover (simplify_inl u arms) ((u1, tp1) :: tps) in
let e2 = cover (simplify_inr u arms) ((u2, tp2) :: tps) in
ECase(EVar u, u1, e1, u2, e2)
| arms, (u, Other) :: tps ->
cover (simplify_other u arms) tps


Monday, August 6, 2012

One of the most practically useful features of functional programming languages is pattern matching, and one of the most important quality-of-implementation features for an implementation is the exhaustiveness checking for pattern matches. In this post, I will give the simplest algorithm for coverage checking that I know.

I previously gave one in my paper Focusing on Pattern Matching. I was never completely satisfied with that algorithm, since it was harder to understand than I liked. It worked by giving a judgment to prove that no match would fail, rather than directly proving that all matches covered. The negation made thinking about extensions (such as pattern matching for dependent types) harder than it should be, and it also made it harder to get a match compilation algorithm.

So in this blog post, I'll give a better algorithm. It's super-simple, and is about as naive as it should be -- there are only six rules in total. It's not totally industrial strength, but (a) it's certainly good enough to use in a prototype compiler, and (b) it's easy enough to beef up.


First, I'll give the syntax. $$\begin{array}{llcl} \mbox{Types} & A & ::= & 1 \bnfalt A \times B \bnfalt A + B \bnfalt A \to B \bnfalt o \\ \mbox{Patterns} & p & ::= & () \bnfalt (p, p) \bnfalt \inl(p) \bnfalt \inr(p) \bnfalt \_ \bnfalt x \\ \mbox{Pattern Lists} & ps & ::= & \cdot \bnfalt p, ps \\ \mbox{Branch Lists} & \Pi & ::= & \cdot \bnfalt ps \To e; \Pi \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt u:A, \Gamma \end{array}$$

The types are sums and products, plus a type $o$ to indicate functions and other non-matchable types. The patterns are just what you'd expect -- $()$ for the unit pattenrn, $(p, p')$ for pairs, and $\inl(p)$ and $\inr(p)$ for the left and right branches of sums. (Of course, there is no pattern for the empty type!) We also have wildcard $\_$ and variable $x$ patterns.

Since our algorithm works inductively, we will destructure patterns into their components, and we will need to keep track of lists of patterns. A pattern list (with metavariable $ps$) is just the obvious thing --- a comma-separated list of patterns.

We also need branch lists -- a pattern match clause consists of a list of of patterns and expressions. So we use branch lists to say what body is associated with each pattern list (a pattern list, since we are going to destructure the pattern given for each arm).

We will also write $\overrightarrow{ps \To e}$ to indicate branch lists. If I write a list like $\overrightarrow{(p_1, p_2), ps \To e}$, then I mean that every element of the branch list starts with a pair pattern, and similarly for the other constructors.

This algorithm keeps track of the type of the value being destructured, so we introduce a typed context $\Gamma$ to keep track of the types of the expressions being destructured.

Without further delay, here's the coverage judgment $\cover{\Pi}{\Gamma}$. You can read this as algorithm bottom-up, in the usual logic-programming style. $$\array{ \rule{ \cover{\overrightarrow{ps' \To e'}}{\Gamma} } { \cover{\overrightarrow{\_, ps' \To e'}}{u:A, \Gamma} } \\[1em] \rule{ \cover{ \overrightarrow{ps' \To e'},\;\; (\_, ps \To e), \overrightarrow{ps'' \To e''} }{u:A, \Gamma} } { \cover{ \overrightarrow{ps' \To e'},\;\; (x, ps \To e), \overrightarrow{ps'' \To e''} }{u:A, \Gamma} } \\[1em] \rule{ \cover{ \overrightarrow{ps \To e},\;\; \overrightarrow{ps' \To e'} }{\Gamma}} { \cover{ \overrightarrow{(), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:1, \Gamma}} \\[1em] \rule{ \cover{ \overrightarrow{p_1, p_2, ps \To e},\;\; \overrightarrow{\_, \_, ps' \To e'} }{a:A, b:B, \Gamma}} { \cover{ \overrightarrow{(p_1,p_2), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:A \times B, \Gamma}} \\[1em] \rule{ } { \cover{ \overrightarrow{\_, ps'' \To e''} }{u:0, \Gamma}} \\[1em] \rule{ \cover{ \overrightarrow{p, ps \To e},\;\; \overrightarrow{\_, ps'' \To e''}}{a : A, \Gamma} \\ \cover{ \overrightarrow{p', ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''}}{b : B, \Gamma}} { \cover{ \overrightarrow{\inl(p), ps \To e},\;\; \overrightarrow{\inr(p'), ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''} } {u:A + B, \Gamma}} }$$

The first two rules handle wildcards and variables. The first rule says that if the lead pattern of every pattern list in the branch set is a wildcard, we can just drop it and keep going. The second rule says that you can turn any lead variable pattern into a wildcard pattern.

The third rule handles the unit pattern. It is nice and easy --- it says that if every lead pattern is either a unit or wildcard pattern, we can drop it and keep going.

The fourth rule handles tuples, and is almost as easy. It says that if we have a lead pattern $(p_1, p_2), ps \To e$, we can rewrite it to $p_1, p_2, ps \To e$. We also have to remember to double any lead wildcard pattern $\_, ps' \To e'$ into $\_, \_, ps' \To e'$, since we are destructuring a product into two subpatterns.

The fifth rule handles the void type. It just succeeds, since there are no patterns of empty type!

The sixth rule handles sums. It says that if we are scrutinizing a value of type $A + B$, then we can separate the branch list into those branches handling the left case $\inl(p), ps \To e$ and those branches handling the right case $\inr(p'), ps' \To e'$, and then check then at $A$ and $B$ respectively. Here, we have to remember to send wildcards $\_, ps'' \To e''$ to both sides, since a wildcard can match both the left- and right-cases.

And that's it!

The open problem in proof theory is a bit of slight of hand in my notation. You understand exactly what I mean when I take a meta-pattern like $\overrightarrow{(p_1, p_2), ps \To e}$, and turn it into $\overrightarrow{p_1, p_2, ps \To e}$. However, formalizing this is trickier than it looks, and it's not known how to give a proof-theoretic characterization of this kind of pattern (according to Rob Simmons, who I asked early this year about it).

As a bonus, I'll also show how coverage checking can also give you a compilation algorithm for pattern matching. The judgment $\elaborate{\Pi}{\Gamma}{t}$ tells you how to turn a list of branches $\Pi$ into a nested sequence of tuple destructurings and case-statements. $$\array{ \rule{ \elaborate{\overrightarrow{ps' \To e'}}{\Gamma}{t} } { \elaborate{\overrightarrow{\_, ps' \To e'}}{u:A, \Gamma}{t} } \\[1em] \rule{ \elaborate{\overrightarrow{ps' \To e'}}{\Gamma}{t} } { \elaborate{\overrightarrow{\_, ps' \To e'}}{u:A, \Gamma}{t} } \\[1em] \rule{ \elaborate{ \overrightarrow{ps' \To e'},\;\; (\_, ps \To \letv{x}{u}{e}), \overrightarrow{ps'' \To e''} }{\Gamma}{t} } { \elaborate{ \overrightarrow{ps' \To e'},\;\; (x, ps \To e), \overrightarrow{ps'' \To e''} }{\Gamma}{t} } \\[1em] \rule{ \elaborate{ \overrightarrow{ps \To e},\;\; \overrightarrow{ps' \To e'} }{\Gamma}{t} } { \elaborate{ \overrightarrow{(), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:1, \Gamma}{\letv{()}{u}{t}} } \\[1em] \rule{ \elaborate{ \overrightarrow{p_1, p_2, ps \To e},\;\; \overrightarrow{\_, \_, ps' \To e'} }{a:A, b:B, \Gamma}{t} } { \elaborate{ \overrightarrow{(p_1,p_2), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:A \times B, \Gamma}{\letv{(a,b)}{u}{t}} } \\[1em] \rule{ } { \elaborate{ \overrightarrow{\_, ps'' \To e''} }{u:0, \Gamma}{\abort(u)} } \\[1em] \rule{ \elaborate{ \overrightarrow{p, ps \To e},\;\; \overrightarrow{\_, ps'' \To e''}}{a : A, \Gamma}{t_1} \\ \elaborate{ \overrightarrow{p', ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''}}{b : B, \Gamma}{t_2} } { \elaborate{ \overrightarrow{\inl(p), ps \To e},\;\; \overrightarrow{\inr(p'), ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''} } {u:A + B, \Gamma} {\case{u}{a}{t_1}{b}{t_2}} } }$$

Note that it is exactly the same judgment as before --- coverage checking computes exactly the information we need, and so code generation comes along for the ride!