State
TAPL (Chapter 13) adds references, or pointers to mutable memory cells, to the simply-typed lambda calculus. For simplicity, we’ll start with just a single mutable memory cell.
Single cell
Let’s assume a single memory location of type Nat
which
we access using get
and set
.
Syntax:
t ::= ...
get
set t
E ::= ...
set E
The initial state is 0
. Then we want get
and set
to behave like this:
get ⟶* 0
set 0; set (succ 0); get ⟶* succ 0
set (succ 0); set 0; get ⟶* 0
where
t1; t2 ≡ let z = t1 in t2
where z
is a variable not occurring free in
t2
.
Typing and evaluation
The typing rules are straightforward:
Γ ⊢ get : Nat
Γ ⊢ t : Nat
————————————————
Γ ⊢ set t : Unit
The evaluation rules have to keep track of the state. Previously, our
judgements were of the form t ⟶ t'
, but now they are
of the form t | v ⟶ t' | v'
, which means “term
t
with state v
reduces to term t'
with state v'
.”
t | v ⟶ t' | v'
———————————————————————
E[t] | v ⟶ E[t'] | v'
(λx. t12) v2 | v ⟶ [x↦v2]t12 | v
get | v ⟶ v | v
set v' | v ⟶ unit | v'
Example:
set (succ 0); set 0; get | 0 ⟶ unit; set 0; get | succ 0
⟶* set 0; get | succ 0
⟶* unit; get | 0
⟶* get | 0
⟶ 0 | 0
In pure lambda calculus
Since the lambda calculus is Turing-complete, any program in the lambda calculus with state can be converted into a program in the lambda calculus without state.
But it’s not possible to just write a definition for get
or set
in the same way that we can write a definition for
zro
as zro ≡ λs. λz. z
. Instead, the
conversion is a transformation of the whole program. Every term
t
becomes a function from a state s
, which is
the state before running t
, to a pair {v, s'}
,
where v
is the value of t
and s'
is the new state after running t
.
If a term has no side effects, then it doesn’t change the state:
𝒯⟦x⟧ = λs. {x, s}
𝒯⟦λx. t⟧ = λs. {λx. 𝒯⟦t⟧, s}
The get
operation doesn’t change the state either, but
it does look at the state:
𝒯⟦get⟧ = λs. {s, s}
And the set
operation changes the state. For clarity, we
first define the translation just for set x
where
x
is a variable.
𝒯⟦set x⟧ = λs. {unit, x}
Next, we define the translation of application, again just for applying one variable to another:
𝒯⟦x1 x2⟧ = x1 x2
The tricky case is let
, which has to “thread” the state
through the subexpressions in the right order:
𝒯⟦let x = t1 in t2⟧ = λs0. letpair {x, s1} = 𝒯⟦t1⟧ s0 in 𝒯⟦t2⟧ s1
We can now derive the general rule for set t
:
𝒯⟦set t⟧ = 𝒯⟦let x = t in set x⟧
= λs0. letpair {x, s1} = 𝒯⟦t⟧ s0 in 𝒯⟦set t⟧ s1
= λs0. letpair {x, s1} = 𝒯⟦t⟧ s0 in (λs. {unit, x}) s1
= λs0. letpair {x, s1} = 𝒯⟦t⟧ s0 in {unit, x}
And for t1 t2
:
𝒯⟦t1 t2⟧ = 𝒯⟦let x1 = t1 in let x2 = t2 in x1 x2⟧
= λs0. letpair {x1, s1} = 𝒯⟦t1⟧ s0 in
(λs1. letpair {x2, s2} = 𝒯⟦t2⟧ s1 in 𝒯⟦x1 x2⟧ s2) s1
= λs0. letpair {x1, s1} = 𝒯⟦t1⟧ s0 in
letpair {x2, s2} = 𝒯⟦t2⟧ s1 in x1 x2 s2
Finally, when we transform a whole program t
, we need to
wrap something around it that applies it to the initial state
0
:
𝒯p⟦t⟧ = (letpair {v, s} = 𝒯⟦t⟧ 0 in v)
Example:
𝒯⟦set (succ 0); get⟧ ≡ 𝒯⟦let z = set (succ 0) in get⟧
= λs0. letpair {x, s1} = 𝒯⟦set (succ 0)⟧ s0 in 𝒯⟦get⟧ s1
= λs0. letpair {x, s1} = (λs. {unit, succ 0}) s0 in 𝒯⟦get⟧ s1
= λs0. letpair {x, s1} = (λs. {unit, succ 0}) s0 in (λs. {s, s}) s1
References
Read TAPL 13. This chapter introduces ML-style references. Unlike the
state we developed above, where there is permanently one and only one
memory cell, references let you allocate a new memory cell using
ref
, read it using !
, and write to it using
:=
.
I think it’s simpler to think about the semantics using evaluation contexts:
E ::= ...
ref E
! E
E := t
v := E
t | μ ⟶ t' | μ'
———————————————————————
E[t] | μ ⟶ E[t'] | μ'
(λx. t12) v2 | μ ⟶ [x↦v2]t12 | μ
ref v | μ ⟶ l | (μ, l→v) where l ∉ dom(μ)
!l | μ ⟶ v | μ where μ(l) = v
l := v | μ ⟶ unit | [l↦v]μ
These rules extend the syntax to allow locations (l
),
but note that actual programs do not contain explicit locations
(p. 165); they only appear in the course of evaluating a program.
Section 13.4 has a long discussion of store typings, which are needed for typing locations. For the purpose of type-checking a program, we don’t need store typings, because programs don’t contain explicit locations. But store typings are needed in order to prove type safety.
Mutable variables
ML has references that look pretty much like the above. Thus there is a sharp distinction between variables, whose values are immutable, and locations, whose values are mutable. Clojure does something similar.
But some other languages (Scheme, C) don’t make this distinction. They simply have variables whose values are mutable.
(define (f y) (set! y 1))
(define x 0)
(f x)
(display x) ; 0
void f(int y) { y++; }
int main(void) { int x = 0; f(x); printf("%d\n", x); } /* 0 */
One way to think about these languages is that every variable is
actually a reference. When function f
is called, the
argument x
is copied and y
is bound to a
reference to the copy. Every time y
is accessed, it is
automatically dereferenced. We can write a transformation from
Scheme-style mutation to ML-style as follows:
𝒯⟦x⟧ = !x
𝒯⟦λx.t⟧ = λx. let x = ref x in 𝒯⟦t⟧
𝒯⟦t1 t2⟧ = 𝒯⟦t1⟧ 𝒯⟦t2⟧
𝒯⟦x := t⟧ = x := 𝒯⟦t⟧
The difference between a reference and a value is still visible in C,
though. In C, an lvalue is an expression that can appear on the
left-hand side of an assignment (=
), and an rvalue
is an expression that can only appear on the right-hand side of an
assignment.
x = 0;
0 = x; /* error: 0 is an rvalue */
What C calls lvalues and rvalues are exactly what ML calls references and values.
In the homework, you’ll explore a third possibility, C++ reference arguments:
void f(int &y) { y++; }
int main(void) { int x = 0; f(x); printf("%d\n", x); } /* 1 */
Python and Java objects behave similarly.