State
TAPL (Chapter 13) adds references, or pointers to mutable memory cells, to the simply-typed lambda calculus. For simplicity, we’ll start with just a single mutable memory cell.
Single cell
Let’s assume a single memory location of type Nat
which we access using get
and set
.
Syntax:
t ::= ...
get unit
set t
E ::= ...
set E
Let’s say the initial state is 0
. Then we want get
and set
to behave like this:
get unit ⟶* 0
set (succ 0); get unit ⟶* succ 0
Typing and evaluation
The typing rules are straightforward:
Γ ⊢ get unit : Nat
Γ ⊢ t : Nat
————————————————
Γ ⊢ set t : Unit
The evaluation rules have to keep track of the state. Previously, our judgements were of the form t ⟶ t'
, but now they are of the form t | v ⟶ t' | v'
, which means “term t
with state v
reduces to term t'
with state v'
.”
t | v ⟶ t' | v'
———————————————————————
E[t] | v ⟶ E[t'] | v'
(λx. t12) v2 | v ⟶ [x↦v2]t12 | v
get unit | v ⟶ v | v
set v' | v ⟶ unit | v'
In pure lambda calculus
Since the lambda calculus is Turing-complete, any program in the lambda calculus with state can be converted into a program in the lambda calculus without state.
But it’s not possible to just write a definition for get
or set
in the same way that we can write a definition for let
as (let x = t1 in t2) ≡ (λx. t2) t1
. To be more precise, state is not macro-expressible [1] in the pure lambda calculus. The reason is that in the pure lambda calculus, t; t
is always equivalent to t
. But set (succ (get unit)); set (succ (get unit))
is not equivalent to set (succ (get unit))
.
Instead, the conversion is a transformation of the whole program. Every term t
becomes a function from a state s
, which is the state before running t
, to a pair {v, s'}
, where v
is the value of t
and s'
is the new state after running t
. If a term has no side effects, then it doesn’t change the state:
𝒯⟦x⟧ = λs. {x, s}
𝒯⟦λx. t⟧ = λs. {λx. 𝒯⟦t⟧, s}
The get
operation doesn’t change the state either, but it looks at the state:
𝒯⟦get unit⟧ = λs. {s, s}
And the set
operation changes the state. For clarity, we define it first just for set x
(where x
is a variable):
𝒯⟦set x⟧ = λs. {unit, x}
The tricky case is application, which has to “thread” the state through the subexpressions in the right order:
𝒯⟦t1 t2⟧ = λs0. let {f, s1} = 𝒯⟦t1⟧ s0 in
let {v, s2} = 𝒯⟦t2⟧ s1 in
f v s2
We can derive the transformation for let
:
𝒯⟦let x = t1 in t2⟧ = 𝒯⟦(λx. t2) t1⟧
= λs0. let {x, s1} = 𝒯⟦t1⟧ s0 in 𝒯⟦t2⟧ s1
And we can derive 𝒯⟦set t⟧
by expanding 𝒯⟦let x = t in set x⟧
, which ends up as:
𝒯⟦set t⟧ = λs0. let {s2, s1} = 𝒯⟦t⟧ s0 in {unit, s2}
I’m not sure why you’d want to do it, but you could write get t
where t
is a term like set 0; unit
. So we should derive a more general transformation for get
as well:
𝒯⟦get t⟧ = λs0. let {z, s1} = 𝒯⟦t⟧ s0 in {s1, s1}
To evaluate a whole program, we need to apply it to an initial state s0
:
let {v, s} = 𝒯⟦t⟧ s0 in v
Exercise. Check that 𝒯⟦set (succ (get unit)); set (succ (get unit))⟧ 0
evaluates to succ (succ 0)
.
More general typing rules
If we want the state to have some other type than Nat
, but still a fixed type (call it Tst
), the typing rules are straightforward:
Γ ⊢ t : Unit
――――――――――――――― T-Get
Γ ⊢ get t : Tst
Γ ⊢ t : Tst
―――――――――――――――― T-Set
Γ ⊢ set t : Unit
Digression (updated after class; it unfortunately became more complicated, and you’re not responsible for this): But if want the type to change over time, we must extend the type system. The translation into λ-calculus is instructive: a term of base type T
becomes a term of type S0→{T,S1}
, where S0
is the old type of the state and S1
is the new type of the state. (We’re just using S
to range over types when they happen to be state types; this has nothing to do with type schemas.) That suggests that our typing judgements must now have “slots” for both the old state and new state. Let’s write them as
Γ ⊢ t : T | S0↝S1
Terms of function type T1→T2
have a more complicated translation, because the term itself might change the state, and the body of the function might also change the state. The upshot is that we also need to extend the type system; function types now look like T1→(T2|S1↝S2)
, which means “a function from type T1
to type T2
, that, when applied, changes the state from type S1
to S2
”.
Then the typing rules are:
――――――――――――――――― Γ(x)=T T-Var
Γ ⊢ x : T | S0↝S0
Γ, x:T1 ⊢ t2 : T2 | S1↝S2
――――――――――――――――――――――――――――――――――――――― T-Abs
Γ ⊢ λx:T1. t : T1→(T2|S1↝S2) | S0↝S0
Γ ⊢ t1 : T11→(T12|S2↝S3) | S0↝S1 Γ ⊢ t2 : T11 | S1↝S2
―――――――――――――――――――――――――――――――――――――――――――――――――――――――― T-App
Γ ⊢ t1 t2 : T12 | S0↝S3
Γ ⊢ t : Unit | S0↝S1
―――――――――――――――――――――― T-Get
Γ ⊢ get t : S1 | S0↝S1
Γ ⊢ t : S2 | S0↝S1
―――――――――――――――――――――――― T-Set
Γ ⊢ set t : Unit | S0↝S2
References
Read TAPL 13.
Syntax and evaluation rules modified to use evaluation contexts:
E ::= ...
ref E
! E
E := t
v := E
t | μ ⟶ t' | μ'
———————————————————————
E[t] | μ ⟶ E[t'] | μ'
(λx. t12) v2 | μ ⟶ [x↦v2]t12 | μ
ref v | μ ⟶ l | (μ, l→v) where l ∉ dom(μ)
!l | μ ⟶ v | μ where μ(l) = v
l := v | μ ⟶ unit | [l↦v]μ
These rules extend the syntax to allow locations (l
), but note that actual programs do not contain explicit locations (p. 165); they only appear in the course of evaluating a program.
Section 13.4 has a long discussion of store typings, which are needed for typing locations. For the purpose of type-checking a program, we don’t need store typings, because programs don’t contain explicit locations. But store typings are needed in order to prove type safety.
Simulating a store in the pure untyped lambda calculus would require implementing a data structure mapping from locations to values; since that would be somewhat complicated, we’ll skip this. In a typed lambda calculus, however, we’d need a separate such data structure for each type; I’m not sure how this would be done.
Mutable variables
ML has references that look pretty much like the above. Thus there is a sharp distinction between variables, whose values are immutable, and locations, whose values are mutable. Clojure does something similar.
Why don’t other languages (Scheme, C) make this distinction? Because every variable in these languages is actually a reference. Every time a function is called, the argument is copied and the variable is bound to a reference to the copy. Every time the variable is accessed, it is automatically dereferenced. We can write a transformation from Scheme-style mutation to ML-style as follows:
𝒯⟦x⟧ = !x
𝒯⟦λx.t⟧ = λx. let x = ref x in 𝒯⟦t⟧
𝒯⟦t1 t2⟧ = 𝒯⟦t1⟧ 𝒯⟦t2⟧
𝒯⟦x := t⟧ = x := 𝒯⟦t⟧
You may be familiar with the terminology from C/C++, where a reference is called an lvalue and a value is called an rvalue.
(Bibliographic) References
[1] Matthias Felleisen. 1991. On the expressive power of programming languages. Science of Computer Programming 17, 1–3 (1991), 35–75. DOI:https://doi.org/10.1016/0167-6423(91)90036-W