Algebraic data types

Previously, we saw that to be Turing-complete, the simply-typed lambda calculus must be augmented with both numbers/Booleans and fix. Algebraic data types make it possible to define both!

Product types

Read TAPL 11.6-8.

Pierce shows how to add pairs, tuples (which are an easy generalization of pairs), and records (which are basically tuples with named members).

Scheme cons cells, Python tuple, and C struct are real-world examples of product types.

The i ∈ 1..n notation that the book uses is a bit hard to read; here’s a less rigorous notation that also uses evaluation contexts.

t ::= ...
      {t, t, ...}
      t.i             (i≥1)

v ::= ...
      {v, v, ...}

T ::= ...
      {T, T, ...}

E ::= ...
      E.i
      {v, v, ..., E, t, t, ...}

{v₁, v₂, ...}.j ⟶ v_j             E-ProjTuple

  Γ ⊢ t₁ : T₁  Γ ⊢ t₂ : T₂  ...
―――――――――――――――――――――――――――――――――  T-Tuple
Γ ⊢ {t₁, t₂, ...} : {T₁, T₂, ...}

Γ ⊢ t : {T₁, T₂, ...}
―――――――――――――――――――――              T-Proj
     Γ ⊢ t.j : T_j

Sum types

Read TAPL 11.9-10.

Pierce presents two kinds of sum types: sums and variants, which are an easy generalization of sums.

C union is not quite the same as a sum type, because a C union variable doesn’t “know” which of the two unioned types it belongs to. Sum types are often called tagged unions to distinguish from C-style untagged unions.

Question: Why don’t untyped languages (Scheme, Python) typically have sum types?

In order to preserve uniqueness of types, TAPL adds as clauses in many places. For variants, these are actually not necessary if we require that every label uniquely determines a variant type. This is, in fact, the case in many languages, like ML and Haskell (p. 141). We will adopt this convention.

Again, the notation gets a bit hard to read; here’s a less rigorous version

t ::= ...
      <l=t>
      case t of <l=x> ⇒ t, <l=x> ⇒ t, ...

T ::= ...
      <l:T, l:T, ...>

E ::= ...
      case E of <l=x> ⇒ t, <l=x> ⇒ t, ...
      <l=E>

case <l_j=v_j> of <l₁=x₁> ⇒ t₁, <l₂=x₂> ⇒ t₂, ... ⟶ [x_j↦v_j]t_j      E-CaseVariant

           Γ ⊢ t_j:T_j
―――――――――――――――――――――――――――――――                                       T-Variant
Γ ⊢ <l_j=t_j>:<l₁:T₁, l₂:T₂, ...>

Γ ⊢ t:<l₁:T₁, l₂:T₂, ...>   Γ, x₁:T₁ ⊢ t₁:T   Γ, x₂:T₂ ⊢ t₂:T   ...
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――   T-Case
        Γ ⊢ case t of <l₁=x₁> ⇒ t₁, <l₂=x₂> ⇒ t₂ : T

Assuming that we have a Unit type, we can define variants like

Color ≡ <red:Unit, green:Unit, blue:Unit>

which are analogous to C/C++/Java enum. In particular, we no longer need Bool to be built in; we can define it ourselves:

Bool ≡ <false:Unit, true:Unit>

Exercise. Show how to simulate if if Bool is defined as above.

Curry-Howard isomorphism

Now would be a good time to bring up the Curry-Howard isomorphism (pages 108-109). Forget about type systems for a moment and think about propositional logic.

We define a syntax of formulas:

φ ::= x
      true
      false
      φ → φ
      φ ∧ φ
      φ ∨ φ

And inference rules for Γ ⊢ φ (“under the assumptions in Γ, φ is provable”):

――――――――
Γ ⊢ true

――――――――
Γ, φ ⊢ φ

 Γ, φ₁ ⊢ φ₂
――――――――――――                              (→ introduction)
Γ ⊢ φ₁ → φ₂                              
                                          
Γ ⊢ φ₁ → φ₂  Γ ⊢ φ₁                      
――――――――――――――――――――                      (→ elimination, or modus ponens)
       Γ ⊢ φ₂                             
                                          
Γ ⊢ φ₁  Γ ⊢ φ₂                            
――――――――――――――                            (∧ introduction)
 Γ ⊢ φ₁ ∧ φ₂                              
                                          
Γ ⊢ φ₁ ∧ φ₂                               
―――――――――――   i ∈ {1,2}                   (∧ elimination)
  Γ ⊢ φ_i                                  
                                          
   Γ ⊢ φ_i                                 
―――――――――――   i ∈ {1,2}                   (∨ introduction)
Γ ⊢ φ₁ ∨ φ₂

Γ ⊢ φ₁ ∨ φ₂   Γ ⊢ φ₁ → φ   Γ ⊢ φ₁ → φ
―――――――――――――――――――――――――――――――――――――――   (∨ elimination)
                Γ ⊢ φ

This is a restriction of propositional logic called intuitionistic propositional logic that doesn’t have negation.

The Curry-Howard isomorphism is a correspondence between proof systems in logic and type systems in programming language theory. Notice the similarity between these rules and the typing rules we’ve built up so far.

proof systems	type systems
formulas `φ`	types `T`
assumptions `Γ`	contexts `Γ`
`Γ ⊢ φ`	`Γ ⊢ t:T` for some `t`
`→`	`→`
`∧`	`×`
`∨`	`+`
`true`	`unit`

This naturally leads to a number of questions, like:

What does false correspond to? The empty type, usually written ⊥.
What do ∀ and ∃ correspond to?
What does negation correspond to?

Recursive types

Read TAPL 20.1-2.

Pierce presents recursive types and then describes two ways of formalizing them, equi-recursive types and iso-recursive types. Equi-recursive types are easier to program with, so he starts off with them; iso-recursive types are easier to implement.

You’ve used recursive types in OCaml. For example, if you wanted to define your own linked list type, you would do:

type int_list = Nil | Cons of int * int_list

In our extended λ-calculus, we can’t just write

NatList ≡ <nil:Unit, cons:Nat×NatList>

because you can’t define something in terms of itself. So we introduce some new notation,

NatList ≡ μX. <nil:Unit, cons:Nat×X>

where μ is kind of like fix for types. It binds a type variable X that stands for the type μX. ... itself.

See the book for definitions of nil, cons, isnil, head, tail, etc.

Fold and unfold

With isorecursive types, the types NatList (≡ μX. <nil:Unit, cons:Nat×X>) and <nil:Unit, cons:Nat×NatList> are not the same. To go back and forth between them, we need two new operators, fold and unfold:

fold[NatList] : <nil:Unit, cons:Nat×NatList> → NatList
unfold[NatList] : NatList → <nil:Unit, cons:Nat×NatList>

These operators don’t really do anything; they’re like casts.

Note: Don’t confuse fold with the list operation by the same name.

Isorecursive types are easier to implement, and OCaml uses them. So why don’t we have to write fold and unfold everywhere? The answer is that in OCaml,

Every constructor (e.g., Cons) has an implicit fold in it.
Every pattern-match on a variant type (e.g., match l with...) has an implicit unfold in it.

So in our extended λ-calculus, any time you want to construct something of recursive type, you must use fold. For example, define cons like this:

cons ≡ λx:Nat. λy:NatList. fold[NatList] <cons={x, y}>

And every time you want to use something of recursive type, you must use unfold. For example, define car like this:

car ≡ λl:NatList. case unfold[NatList] l of
                    <nil=u> ⇒ 0,
                    <cons=p> ⇒ p.1

With arrows

So far we have built recursive types out of just sums and products, but it’s also allowed to use →. Sometimes a distinction is made between whether the recursion occurs in positive or negative position; a position is positive if it occurs to the left of an even number of arrows (including 0), negative otherwise (p. 428-429). For example, in

μA. Nat→A
μX. <nil:Unit, cons:{Nat,X}>
μX. (X→Nat)→Nat

the recursion is in positive position; in

μX. X→X

the first X occurs in negative position.

The reason for the distinction is that recursion in positive positions doesn’t break strong normalization, but recursion in negative positions does.