Fragments of System F

Read TAPL 22.7. Pierce covers this topic before System F, because it turns out to be a fairly mild extension of constraint-based typing. But it can also be thought of as a restriction of System F.

Prenex polymorphism

We can restrict System F to only use types where a ∀X can only occur at the outermost level (known as prenex polymorphism). Thus, ∀α. ∀β. α → β → α is allowed but (∀α. α → α) → (∀α. α → α) is not. So the syntax for types can be written as:

T ::= Nat
      Bool
      X
      T → T

S ::= T
      ∀X. S

where S stands for type scheme. We call types S polymorphic and we call types T monomorphic.

The syntax for terms is:

t ::= x
      λx: T. t
      t t
      λX. t
      t [T]

The typing rules would be the same as for System F.

The restriction to prenex types has some further consequences on syntax:

In abstractions, the variable can’t have polymorphic type (it’s a T, not an S) because then the abstraction would have a non-prenex type.
In type applications t [T] as well, we restrict the type to be monomorphic. This restriction is called predicativity. Given that we’ve already restricted to prenex types, predicativity does not shrink the set of typable terms.

Question. If we did allow t [S] where S is polymorphic, what would t have to look like in order for the result to have a prenex type?)

But these restrictions are too tight; we can’t even type something like

let f = λX. λx: X. x in {f [Nat] 0, f [Bool] true}
      ≡ (λf: ∀X. X → X. {f [Nat] 0, f [Bool] true}) (λX. λx: X. x)

because the λf term has type (∀X. X → X) → {Nat, Bool}, which is not in prenex form. In fact, we can’t type any more terms than the simply-typed lambda calculus can, in the sense that for any well-typed term t of System F under the restriction to prenex types, there’s a well-typed simply-typed term t' such that t and t' are the same after erasing all type information [2].

Question. What are the places where a type abstraction can occur?

Exercise. Show, by induction on typing derivations: If t is a well-typed term of System F under the restriction to prenex types, then for any substitution [X₁↦T₁,...] of the free type variables of t, there is a well-typed term t' of λ→ that has the same type-erasure as [X₁↦T₁,...] t.

Let polymorphism

The extension made by the Hindley-Milner type system, used by ML, is called let-polymorphism. Previously, we defined (let x = t₁ in t₂) ≡ (λx. t₂) t₁, but now we make let a primitive with its own typing rule.

t ::= ...
      let x = t in t

Γ ⊢ t₁ : S₁   Γ, x:S₁ ⊢ t₂ : S₂
———————————————————————————————   (T-Let)
   Γ ⊢ let x = t₁ in t₂ : S₂

The non-prenex type that would have appeared fleetingly as the type of (λx. t₂) no longer appears, and so we do allow x to have polymorphic type.

Exercise. Check that the above example is typable.

(There’s one last problem. In the expression let x = t₁ in t₂, if t₁ has side-effects, bad things can happen (pp 335-6). So, to be on the safe side, t₁ has to be a value in order for x to get a polymorphic type. This is called the value restriction.)

Type reconstruction

In the simply-typed lambda calculus, we made the typing rules syntax-directed, even without type annotations, by switching to a constraint-based formulation. If we did that now, the typing rules would still not be syntax-directed, because of T-Gen and T-Inst – in the absence of explicit types, we don’t know where to use them, and in T-Inst, we don’t know what T₂ should be.

Let’s forget about constraint-based typing for a moment and concentrate on the problem of T-Gen and T-Inst. The fix [1] is to realize that T-Gen can only be used just above T-Let. For example:

Γ, X ⊢ t₁ : T₁
——————————————  (T-Gen)
Γ ⊢ t₁ : ∀X.T₁           Γ, x:∀X.T₁ ⊢ t₂ : T₂
—————————————————————————————————————————————  (T-Let)
          Γ ⊢ let x = t₁ in t₂ : T₂

And T-Inst can only be used just below T-Var. For example:

  —————————————— x:∀X.T₁₂ ∈ Γ   (T-Var)
  Γ ⊢ x : ∀X.T₁₂
——————————————————              (T-Inst)
Γ ⊢ x : [X↦T₂]T₁₂

So T-Gen can be baked into T-Let, and T-Inst can be baked into T-Var. It’s possible for T-Let to introduce multiple type variables, and for T-Var to instantiate multiple type variables, so the new rules should look like:

Γ, X₁, ... ⊢ t₁ : T₁    Γ, x:∀X₁....T₁ ⊢ t₂ : T₂
————————————————————————————————————————————————  (T-GenLet)
          Γ ⊢ let x = t₁ in t₂ : T₂


—————————————————————— x:∀X₁....T ∈ Γ             (T-VarInst)
Γ ⊢ x : [X₁→T₁, ...]T

Example: let f = λx. x in {f 0, f false}.

First we derive f:∀X.X→X ⊢ f 0 : Nat:

(T-VarInst)  ――――――――――――――――――――――――   ――――――――――――――――――― (T-Zero)
             f:∀X.X→X ⊢ f : Nat→Nat   f:∀X.X→X ⊢ 0 : Nat
             ――――――――――――――――――――――――――――――――――――――――――――――  (T-App)
                        f:∀X.X→X ⊢ f 0 : Nat

And similarly for f:∀X.X→X ⊢ f false : Bool.

Then the rest of the derivation is:

                           ⋮                       ⋮
 ――――――――――――    ―――――――――――――――――――――   ――――――――――――――――――――――――――
 X, x:X ⊢ x:X    f:∀X.X→X ⊢ f 0 : Nat   f:∀X.X→X ⊢ f false : Bool
――――――――――――――   ――――――――――――――――――――――――――――――――――――――――――――――――――  (T-Pair)
X ⊢ λx.x : X→X        f:∀X.X→X ⊢ {f 0, f false} : {Nat, Bool}
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――  (T-GenLet)
     ⊢ let f = λx.x in {f 0, f false} : {Nat, Bool}

Note that above, when we used T-VarInst, we had to guess what type to substitute for X: in one place it was Nat and in another place it was Bool. To remove the remaining guesswork, we need constraint-based typing.

Recall our constraint-based typing rules for simple types:

—————————————— x:T ∈ Γ                          (CT-Var)
Γ ⊢ x : T | {}

 Γ, x:X ⊢ t₂ : T₂ | C
————————————————————— X fresh                   (CT-AbsInf)
Γ ⊢ λx.t₂ : X→T₂ | C

 Γ ⊢ t₁ : T₁ | C₁    Γ ⊢ t₂ : T₂ | C₂
—————————————————————————————————————— X fresh  (CT-App)
Γ ⊢ t₁ t₂ : X | C₁ ∪ C₂ ∪ {T₁ = T₂→X}

A non-polymorphic rule for let (but we’re about to fuse it with T-Gen, so this is just for illustration):

Γ ⊢ t₁ : T₁ | C₁   Γ, x:T₁ ⊢ t₂ : T₂ | C₂
―――――――――――――――――――――――――――――――――――――――――       (CT-Let)
  Γ ⊢ let x = t₁ in t₂ : T₂ | C₁ ∪ C₂

To add polymorphism, we replace CT-Var with CT-VarInst. For each universally-quantified type variable (if any), we have to instantiate it to some type, but instead of guessing what to instantiate it to, we simply guess a fresh, free type variable:

———————————————————— x:T ∈ Γ                    (CT-VarInst)
Γ ⊢ x : Inst(T) | {}

where Inst removes all quantifiers and replaces each quantified type variable with a fresh type variable.

T-GenLet has to guess the type variables X₁, ... that will be used for typing t₁. We have no way of knowing what those are, but in practice it does not matter. Instead (p. 333-4), we just go ahead and type t₁, and look in its type T₁ for any new type variables – that is, type variables that are free in T₁ but not in Γ. However, in order to know what T₁ is, we need to solve the constraints immediately instead of saving it all for the end. So CT-GenLet looks like this:

Γ ⊢ t₁ : T₁ | C₁    Γ, x:Gen(T₁', Γ) ⊢ t₂ : T₂ | C₂
——————————————————————————————————————————————————— T₁' = unify(T₁, C₁)   (CT-GenLet)
        Γ ⊢ let x = t₁ in t₂ : T₂ | C₂

where Gen(T₁', Γ) looks for all type variables that are free in T₁' but not Γ and universally quantifies them.

At this point, I think it’s much easier to switch to pseudocode instead:

function typecheck(t, Γ)
    if t = x
        T = Γ(x)
        while T is of the form ∀X.T₂
            T = [X↦fresh()]T₂
        return T, {}
    else if t = λx.t₂
        T₂, C = typecheck(t₂, Γ ∪ {x:fresh()})
        return T₁→T₂, C
    else if t = t₁ t₂
        T₁, C₁ = typecheck(t₁, Γ)
        T₂, C₂ = typecheck(t₂, Γ)
        X = fresh()
        return X, C₁ ∪ C₂ ∪ {T₁ = T₂→X}
    else if t = fix t₁
        T₁, C₁ = typecheck(t₁, Γ)
        X = fresh()
        return X, C₁ ∪ {T₁ = X→X}
    else if t = (let x = t₁ in t₂)
        T₁, C₁ = typecheck(t₁, Γ)
        T₁ = unify(T₁, C₁)
        for type variables X free in T₁ but not in Γ
            T₁ = ∀X. T₁
        return typecheck(t₂, Γ ∪ {x:T₁})

function typecheck(t)
    T, C = typecheck(t, {})
    return unify(T, C)

(This algorithm delays calling unify until it’s absolutely necessary. It’s also common to instead call unify as soon as each new constraint is added, that is, in the handling of the t = t₁ t₂ case.)

References

[1] Dominique Clement, Joelle Despeyroux, Thierry Despeyroux, and Gilles Kahn. 1986. A simple applicative language: Mini-ML. Retrieved from https://hal.inria.fr/inria-00076025/document

[2] Daniel Leivant. 1983. Polymorphic type inference. In Proc. POPL. DOI:https://doi.org/10.1145/567067.567077