# Type reconstruction

Read TAPL 22.1-6. We’re jumping way ahead in the book, but there’s nothing in these sections that depends on any intervening material (as far as I know).

The goal of this chapter is to develop an algorithm for type reconstruction (or type inference): Given an *untyped* lambda-term, find the most general (or *principal*) type.

We could start simply by removing type annotations from the typing rules for the simply-typed lambda calculus:

```
x:T ∈ Γ
——————— (T-Var)
Γ ⊢ x:T
Γ, x:T
```_{1} ⊢ t_{2} : T_{2}
—————————————————— (T-Abs')
Γ ⊢ λx.t_{2} : T_{1}→T_{2}
Γ ⊢ t_{1} : T_{1}_{1}→T_{1}_{2} Γ ⊢ t_{2} : T_{1}_{1}
————————————————————————————————— (T-App)
Γ ⊢ t_{1} t_{2} : T_{1}_{2}
Γ ⊢ t : T→T
————————————— (T-Fix)
Γ ⊢ fix t : T

The problem with these rules is that they are not syntax-directed: Given an untyped term, we used to be able to traverse the syntax tree once and either end up with a type or find that there is no type. Now, the lack of type annotations means that in T-Abs’, we have to guess the type `T`

. That’s bad news because the set of possible types is infinite._{1}

The pseudocode for type reconstruction might look like this (using Python-style `yield`

):

```
function typecheck(t, Γ)
# Don't implement this!
if t = x
yield Γ(x)
else if t = λx.t
```_{2}
for all types T_{1}
for T_{2} in typecheck(t_{2}, Γ ∪ {x:T_{1}})
yield T_{1}→T_{2}
else if t = t_{1} t_{2}
for T_{1}_{1}→T_{1}_{2} in typecheck(t_{1}, Γ)
for T_{2} in typecheck(t_{2}, Γ)
if T_{1}_{1} = T_{2}
yield T_{1}_{2}
else if t = fix t_{1}
for T_{1}_{1}→T_{1}_{2} in typecheck(t_{1}, Γ)
if T_{1}_{1} = T_{1}_@
yield T_{1}_{1}

But the loop over all types `T`

is infinite._{1}

## Constraint-based typing

The solution pursued in the book is to develop a syntax-directed algorithm that assigns every term a type possibly involving *type variables*, while generating a set of constraints (a system of equations involving the type variables). Then, in a second phase, it tries to solve the system of equations.

The notation in Figure 22-1 is pretty hard to read. I second the suggestion (p. 321) to ignore the subscripts. I’m also going to go ahead and incorporate the implicit type annotation rule from section 22.6. That makes the typing rules look like this:

```
—————————————— x:T ∈ Γ (CT-Var)
Γ ⊢ x : T | {}
Γ, x:X ⊢ t
```_{2} : T_{2} | C
————————————————————— X fresh (CT-AbsInf)
Γ ⊢ λx.t_{2} : X→T_{2} | C
Γ ⊢ t_{1} : T_{1} | C_{1} Γ ⊢ t_{2} : T_{2} | C_{2}
—————————————————————————————————————— X fresh (CT-App)
Γ ⊢ t_{1} t_{2} : X | C_{1} ∪ C_{2} ∪ {T_{1} = T_{2}→X}
Γ ⊢ t_{1} : T_{1} | C
——————————————————————————————— X fresh (CT-Fix)
Γ ⊢ fix t : X | C ∪ {T_{1} = X→X}

The side condition `X fresh`

means that `X`

is different from the type variables introduced in all other uses of CT-App. It’s a bit of informality that is actually perfectly safe when you implement the algorithm (Exercise 22.3.9):

```
function typecheck(t, Γ)
if t = x
return Γ(x), {}
else if t = λx:T
```_{1}.t_{2}
T_{2}, C = typecheck(t_{2}, Γ ∪ {x:T_{1}})
return T_{1}→T_{2}, C
else if t = t1 t2
T_{1}, C_{1} = typecheck(t_{1}, Γ)
T_{2}, C_{2} = typecheck(t_{2}, Γ)
X = fresh()
return X, C_{1} ∪ C_{2} ∪ {T_{1} = T_{2}→X}
else if t = fix t_{1}
T_{1}, C_{1} = typecheck(t_{1}, Γ)
X = fresh()
return X, C_{1} ∪ {T_{1} = X→X}

where `fresh()`

returns a new type variable that has never been used before. (In fact, it might be easier to think about a single constraint set that the recursive calls to `typecheck`

update imperatively.)

Example derivation of `λf. f 0`

:

```
―――――――――――――――――― (CT-Var) ――――――――――――――――――― (CT-Zero)
f:X
```_{1} ⊢ f : X_{1} | {} f:X_{1} ⊢ 0 : Nat | {}
―――――――――――――――――――――――――――――――――――――――――――――――― (CT-App)
f:X_{1} ⊢ f 0 : X_{2} | {X_{1} = Nat→X_{2}}
――――――――――――――――――――――――――――――――――― (CT-AbsInf)
⊢ λf. f 0 : X_{1}→X_{2} | {X_{1} = Nat→X_{2}}

## Solving the constraints

The unification algorithm in Figure 22-2 might also be easier to read as imperative pseudocode:

`function unify(T`_{0}, C)
while C is not empty
pop a constraint S=T from C
if S = T
do nothing
else if S = X and X ∉ FV(T)
T_{0} = [X↦T]T_{0}
C = [X↦T]C
else if T = X and X ∉ FV(S)
T_{0} = [X↦S]T_{0}
C = [X↦S]C
else if S = S_{1}→S_{2} and T = T_{1}→T_{2}
C = C ∪ {S_{1}=T_{1}, S_{2}=T_{2}}
else
fail
return T_{0}

Then the top-level reconstruction function is:

```
function typecheck(t)
T, C = typecheck(t, Γ)
return unify(T, C)
```

The operations `T`

and the like are expensive and can be made more efficient if we don’t actually perform these substitutions, but maintain a table that says what each type variable stands for, e.g.,_{0} = [X↦T]T_{0}

Type variable | Stands for |
---|---|

`X` |
`Nat→X` |

`X` |
`X` |

`X` |
`X` |

This can be thought of as a union-find data structure, where each set has one or more type variables and at most one type that isn’t a type variable. This data structure can be implemented so that querying and updating are almost linear time.

## Implicit type annotations

The very short section 22.6 modifies the typing rules, finally, to work on untyped terms. Please disregard, for the moment, the comment on page 331 about making copies of an unannotated abstraction. During type reconstruction, we don’t (yet) copy anything.