Algebraic data types
Previously, we saw that to be Turing-complete, the simply-typed lambda calculus must be augmented with both numbers/Booleans and fix
. Algebraic data types make it possible to define both!
Product types
Read TAPL 11.6-8.
Pierce shows how to add pairs, tuples (which are an easy generalization of pairs), and records (which are basically tuples with named members).
Scheme cons
cells, Python tuple
, and C struct
are real-world examples of product types.
The i ∈ 1..n
notation that the book uses is a bit hard to read; here’s a less rigorous notation that also uses evaluation contexts.
t ::= ...
{t, t, ...}
t.i (i≥1)
v ::= ...
{v, v, ...}
T ::= ...
{T, T, ...}
E ::= ...
E.i
{v, v, ..., E, t, t, ...}
{v1, v2, ...}.j ⟶ vj E-ProjTuple
Γ ⊢ t1 : T1 Γ ⊢ t2 : T2 ...
――――――――――――――――――――――――――――――――― T-Tuple
Γ ⊢ {t1, t2, ...} : {T1, T2, ...}
Γ ⊢ t : {T1, T2, ...}
――――――――――――――――――――― T-Proj
Γ ⊢ t.j : Tj
Sum types
Read TAPL 11.9-10.
Pierce presents two kinds of sum types: sums and variants, which are an easy generalization of sums.
C union
is not quite the same as a sum type, because a C union
variable doesn’t “know” which of the two unioned types it belongs to. Sum types are often called tagged unions to distinguish from C-style untagged unions.
Question: Why don’t untyped languages (Scheme, Python) typically have sum types?
In order to preserve uniqueness of types, TAPL adds as
clauses in many places. For variants, these are actually not necessary if we require that every label uniquely determines a variant type. This is, in fact, the case in many languages, like ML and Haskell (p. 141). We will adopt this convention.
Again, the notation gets a bit hard to read; here’s a less rigorous version
t ::= ...
<l=t>
case t of <l=x> ⇒ t, <l=x> ⇒ t, ...
T ::= ...
<l:T, l:T, ...>
E ::= ...
case E of <l=x> ⇒ t, <l=x> ⇒ t, ...
<l=E>
case <lj=vj> of <l1=x1> ⇒ t1, <l2=x2> ⇒ t2, ... ⟶ [xj↦vj]tj E-CaseVariant
Γ ⊢ tj:Tj
――――――――――――――――――――――――――――――― T-Variant
Γ ⊢ <lj=tj>:<l1:T1, l2:T2, ...>
Γ ⊢ t:<l1:T1, l2:T2, ...> Γ, x1:T1 ⊢ t1:T Γ, x2:T2 ⊢ t2:T ...
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― T-Case
Γ ⊢ case t of <l1=x1> ⇒ t1, <l2=x2> ⇒ t2 : T
Assuming that we have a Unit
type, we can define variants like
Color ≡ <red:Unit, green:Unit, blue:Unit>
which are analogous to C/C++/Java enum
. In particular, we no longer need Bool
to be built in; we can define it ourselves:
Bool ≡ <false:Unit, true:Unit>
Exercise. Show how to simulate if
if Bool
is defined as above.
Curry-Howard isomorphism
Now would be a good time to bring up the Curry-Howard isomorphism (pages 108-109). Forget about type systems for a moment and think about propositional logic.
We define a syntax of formulas:
φ ::= x
true
false
φ → φ
φ ∧ φ
φ ∨ φ
And inference rules for Γ ⊢ φ
(“under the assumptions in Γ
, φ
is provable”):
――――――――
Γ ⊢ true
――――――――
Γ, φ ⊢ φ
Γ, φ1 ⊢ φ2
―――――――――――― (→ introduction)
Γ ⊢ φ1 → φ2
Γ ⊢ φ1 → φ2 Γ ⊢ φ1
―――――――――――――――――――― (→ elimination, or modus ponens)
Γ ⊢ φ2
Γ ⊢ φ1 Γ ⊢ φ2
―――――――――――――― (∧ introduction)
Γ ⊢ φ1 ∧ φ2
Γ ⊢ φ1 ∧ φ2
――――――――――― i ∈ {1,2} (∧ elimination)
Γ ⊢ φi
Γ ⊢ φi
――――――――――― i ∈ {1,2} (∨ introduction)
Γ ⊢ φ1 ∨ φ2
Γ ⊢ φ1 ∨ φ2 Γ ⊢ φ1 → φ Γ ⊢ φ1 → φ
――――――――――――――――――――――――――――――――――――――― (∨ elimination)
Γ ⊢ φ
This is a restriction of propositional logic called intuitionistic propositional logic that doesn’t have negation.
The Curry-Howard isomorphism is a correspondence between proof systems in logic and type systems in programming language theory. Notice the similarity between these rules and the typing rules we’ve built up so far.
proof systems | type systems |
---|---|
formulas φ |
types T |
assumptions Γ |
contexts Γ |
Γ ⊢ φ |
Γ ⊢ t:T for some t |
→ |
→ |
∧ |
× |
∨ |
+ |
true |
unit |
This naturally leads to a number of questions, like:
- What does
false
correspond to? The empty type, usually written⊥
. - What do
∀
and∃
correspond to? - What does negation correspond to?
Recursive types
Read TAPL 20.1-2.
Pierce presents recursive types and then describes two ways of formalizing them, equi-recursive types and iso-recursive types. Equi-recursive types are easier to program with, so he starts off with them; iso-recursive types are easier to implement.
You’ve used recursive types in OCaml. For example, if you wanted to define your own linked list type, you would do:
type int_list = Nil | Cons of int * int_list
In our extended λ-calculus, we can’t just write
NatList ≡ <nil:Unit, cons:Nat×NatList>
because you can’t define something in terms of itself. So we introduce some new notation,
NatList ≡ μX. <nil:Unit, cons:Nat×X>
where μ
is kind of like fix
for types. It binds a type variable X
that stands for the type μX. ...
itself.
See the book for definitions of nil
, cons
, isnil
, head
, tail
, etc.
Fold and unfold
With isorecursive types, the types NatList
(≡ μX. <nil:Unit, cons:Nat×X>
) and <nil:Unit, cons:Nat×NatList>
are not the same. To go back and forth between them, we need two new operators, fold
and unfold
:
fold[NatList] : <nil:Unit, cons:Nat×NatList> → NatList
unfold[NatList] : NatList → <nil:Unit, cons:Nat×NatList>
These operators don’t really do anything; they’re like casts.
Note: Don’t confuse fold
with the list operation by the same name.
Isorecursive types are easier to implement, and OCaml uses them. So why don’t we have to write fold
and unfold
everywhere? The answer is that in OCaml,
- Every constructor (e.g.,
Cons
) has an implicitfold
in it. - Every pattern-match on a variant type (e.g.,
match l with...
) has an implicitunfold
in it.
So in our extended λ-calculus, any time you want to construct something of recursive type, you must use fold
. For example, define cons
like this:
cons ≡ λx:Nat. λy:NatList. fold[NatList] <cons={x, y}>
And every time you want to use something of recursive type, you must use unfold
. For example, define car
like this:
car ≡ λl:NatList. case unfold[NatList] l of
<nil=u> ⇒ 0,
<cons=p> ⇒ p.1
With arrows
So far we have built recursive types out of just sums and products, but it’s also allowed to use →
. Sometimes a distinction is made between whether the recursion occurs in positive or negative position; a position is positive if it occurs to the left of an even number of arrows (including 0), negative otherwise (p. 428-429). For example, in
μA. Nat→A
μX. <nil:Unit, cons:{Nat,X}>
μX. (X→Nat)→Nat
the recursion is in positive position; in
μX. X→X
the first X
occurs in negative position.
The reason for the distinction is that recursion in positive positions doesn’t break strong normalization, but recursion in negative positions does.