# Algebraic data types

Previously, we saw that to be Turing-complete, the simply-typed lambda calculus must be augmented with both numbers/Booleans and `fix`. Algebraic data types make it possible to define both!

## Product types

Pierce shows how to add pairs, tuples (which are an easy generalization of pairs), and records (which are basically tuples with named members).

Scheme `cons` cells, Python `tuple`, and C `struct` are real-world examples of product types.

The `i ∈ 1..n` notation that the book uses is a bit hard to read; here’s a less rigorous notation that also uses evaluation contexts.

``````t ::= ...
{t, t, ...}
t.i             (i≥1)

v ::= ...
{v, v, ...}

T ::= ...
{T, T, ...}

E ::= ...
E.i
{v, v, ..., E, t, t, ...}

{v1, v2, ...}.j ⟶ vj             E-ProjTuple

Γ ⊢ t1 : T1  Γ ⊢ t2 : T2  ...
―――――――――――――――――――――――――――――――――  T-Tuple
Γ ⊢ {t1, t2, ...} : {T1, T2, ...}

Γ ⊢ t : {T1, T2, ...}
―――――――――――――――――――――              T-Proj
Γ ⊢ t.j : Tj``````

## Sum types

Pierce presents two kinds of sum types: sums and variants, which are an easy generalization of sums.

C `union` is not quite the same as a sum type, because a C `union` variable doesn’t “know” which of the two unioned types it belongs to. Sum types are often called tagged unions to distinguish from C-style untagged unions.

Question: Why don’t untyped languages (Scheme, Python) typically have sum types?

In order to preserve uniqueness of types, TAPL adds `as` clauses in many places. For variants, these are actually not necessary if we require that every label uniquely determines a variant type. This is, in fact, the case in many languages, like ML and Haskell (p. 141). We will adopt this convention.

Again, the notation gets a bit hard to read; here’s a less rigorous version

``````t ::= ...
<l=t>
case t of <l=x> ⇒ t, <l=x> ⇒ t, ...

T ::= ...
<l:T, l:T, ...>

E ::= ...
case E of <l=x> ⇒ t, <l=x> ⇒ t, ...
<l=E>

case <lj=vj> of <l1=x1> ⇒ t1, <l2=x2> ⇒ t2, ... ⟶ [xj↦vj]tj      E-CaseVariant

Γ ⊢ tj:Tj
―――――――――――――――――――――――――――――――                                       T-Variant
Γ ⊢ <lj=tj>:<l1:T1, l2:T2, ...>

Γ ⊢ t:<l1:T1, l2:T2, ...>   Γ, x1:T1 ⊢ t1:T   Γ, x2:T2 ⊢ t2:T   ...
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――   T-Case
Γ ⊢ case t of <l1=x1> ⇒ t1, <l2=x2> ⇒ t2 : T``````

Assuming that we have a `Unit` type, we can define variants like

``Color ≡ <red:Unit, green:Unit, blue:Unit>``

which are analogous to C/C++/Java `enum`. In particular, we no longer need `Bool` to be built in; we can define it ourselves:

``Bool ≡ <false:Unit, true:Unit>``

Exercise. Show how to simulate `if` if `Bool` is defined as above.

## Curry-Howard isomorphism

Now would be a good time to bring up the Curry-Howard isomorphism (pages 108-109). Forget about type systems for a moment and think about propositional logic.

We define a syntax of formulas:

``````φ ::= x
true
false
φ → φ
φ ∧ φ
φ ∨ φ``````

And inference rules for `Γ ⊢ φ` (“under the assumptions in `Γ`, `φ` is provable”):

``````――――――――
Γ ⊢ true

――――――――
Γ, φ ⊢ φ

Γ, φ1 ⊢ φ2
――――――――――――                              (→ introduction)
Γ ⊢ φ1 → φ2

Γ ⊢ φ1 → φ2  Γ ⊢ φ1
――――――――――――――――――――                      (→ elimination, or modus ponens)
Γ ⊢ φ2

Γ ⊢ φ1  Γ ⊢ φ2
――――――――――――――                            (∧ introduction)
Γ ⊢ φ1 ∧ φ2

Γ ⊢ φ1 ∧ φ2
―――――――――――   i ∈ {1,2}                   (∧ elimination)
Γ ⊢ φi

Γ ⊢ φi
―――――――――――   i ∈ {1,2}                   (∨ introduction)
Γ ⊢ φ1 ∨ φ2

Γ ⊢ φ1 ∨ φ2   Γ ⊢ φ1 → φ   Γ ⊢ φ1 → φ
―――――――――――――――――――――――――――――――――――――――   (∨ elimination)
Γ ⊢ φ``````

This is a restriction of propositional logic called intuitionistic propositional logic that doesn’t have negation.

The Curry-Howard isomorphism is a correspondence between proof systems in logic and type systems in programming language theory. Notice the similarity between these rules and the typing rules we’ve built up so far.

proof systems type systems
formulas `φ` types `T`
assumptions `Γ` contexts `Γ`
`Γ ⊢ φ` `Γ ⊢ t:T` for some `t`
`→` `→`
`∧` `×`
`∨` `+`
`true` `unit`

This naturally leads to a number of questions, like:

• What does `false` correspond to? The empty type, usually written `⊥`.
• What do `∀` and `∃` correspond to?
• What does negation correspond to?

## Recursive types

Pierce presents recursive types and then describes two ways of formalizing them, equi-recursive types and iso-recursive types. Equi-recursive types are easier to program with, so he starts off with them; iso-recursive types are easier to implement.

You’ve used recursive types in OCaml. For example, if you wanted to define your own linked list type, you would do:

``type int_list = Nil | Cons of int * int_list``

In our extended λ-calculus, we can’t just write

``NatList ≡ <nil:Unit, cons:Nat×NatList>``

because you can’t define something in terms of itself. So we introduce some new notation,

``NatList ≡ μX. <nil:Unit, cons:Nat×X>``

where `μ` is kind of like `fix` for types. It binds a type variable `X` that stands for the type `μX. ...` itself.

See the book for definitions of `nil`, `cons`, `isnil`, `head`, `tail`, etc.

### Fold and unfold

With isorecursive types, the types `NatList` (`≡ μX. <nil:Unit, cons:Nat×X>`) and `<nil:Unit, cons:Nat×NatList>` are not the same. To go back and forth between them, we need two new operators, `fold` and `unfold`:

``````fold[NatList] : <nil:Unit, cons:Nat×NatList> → NatList
unfold[NatList] : NatList → <nil:Unit, cons:Nat×NatList>``````

These operators don’t really do anything; they’re like casts.

Note: Don’t confuse `fold` with the list operation by the same name.

Isorecursive types are easier to implement, and OCaml uses them. So why don’t we have to write `fold` and `unfold` everywhere? The answer is that in OCaml,

• Every constructor (e.g., `Cons`) has an implicit `fold` in it.
• Every pattern-match on a variant type (e.g., `match l with...`) has an implicit `unfold` in it.

So in our extended λ-calculus, any time you want to construct something of recursive type, you must use `fold`. For example, define `cons` like this:

``cons ≡ λx:Nat. λy:NatList. fold[NatList] <cons={x, y}>``

And every time you want to use something of recursive type, you must use `unfold`. For example, define `car` like this:

``````car ≡ λl:NatList. case unfold[NatList] l of
<nil=u> ⇒ 0,
<cons=p> ⇒ p.1``````

### With arrows

So far we have built recursive types out of just sums and products, but it’s also allowed to use `→`. Sometimes a distinction is made between whether the recursion occurs in positive or negative position; a position is positive if it occurs to the left of an even number of arrows (including 0), negative otherwise (p. 428-429). For example, in

``````μA. Nat→A
μX. <nil:Unit, cons:{Nat,X}>
μX. (X→Nat)→Nat``````

the recursion is in positive position; in

``μX. X→X``

the first `X` occurs in negative position.

The reason for the distinction is that recursion in positive positions doesn’t break strong normalization, but recursion in negative positions does.