# State

TAPL (Chapter 13) adds references, or pointers to mutable memory cells, to the simply-typed lambda calculus. For simplicity, we’ll start with just a single mutable memory cell.

## Single cell

Let’s assume a single memory location of type `Nat` which we access using `get` and `set`.

Syntax:

``````t ::= ...
get unit
set t

E ::= ...
set E``````

Let’s say the initial state is `0`. Then we want `get` and `set` to behave like this:

``````              get unit ⟶* 0
set (succ 0); get unit ⟶* succ 0``````

### Typing and evaluation

The typing rules are straightforward:

``````Γ ⊢ get unit : Nat

Γ ⊢ t : Nat
————————————————
Γ ⊢ set t : Unit``````

The evaluation rules have to keep track of the state. Previously, our judgements were of the form `t ⟶ t'`, but now they are of the form `t | v ⟶ t' | v'`, which means “term `t` with state `v` reduces to term `t'` with state `v'`.”

``````           t | v ⟶ t' | v'
———————————————————————
E[t] | v ⟶ E[t'] | v'

(λx. t12) v2 | v ⟶ [x↦v2]t12 | v

get unit | v ⟶ v | v
set v' | v ⟶ unit | v'``````

### In pure lambda calculus

Since the lambda calculus is Turing-complete, any program in the lambda calculus with state can be converted into a program in the lambda calculus without state.

But it’s not possible to just write a definition for `get` or `set` in the same way that we can write a definition for `let` as `(let x = t1 in t2) ≡ (λx. t2) t1`. To be more precise, state is not macro-expressible [1] in the pure lambda calculus. The reason is that in the pure lambda calculus, `t; t` is always equivalent to `t`. But `set (succ (get unit)); set (succ (get unit))` is not equivalent to `set (succ (get unit))`.

Instead, the conversion is a transformation of the whole program. Every term `t` becomes a function from a state `s`, which is the state before running `t`, to a pair `{v, s'}`, where `v` is the value of `t` and `s'` is the new state after running `t`. If a term has no side effects, then it doesn’t change the state:

``````       𝒯⟦x⟧ = λs. {x, s}
𝒯⟦λx. t⟧ = λs. {λx. 𝒯⟦t⟧, s}``````

The `get` operation doesn’t change the state either, but it looks at the state:

``𝒯⟦get unit⟧ = λs. {s, s}``

And the `set` operation changes the state. For clarity, we define it first just for `set x` (where `x` is a variable):

``   𝒯⟦set x⟧ = λs. {unit, x}``

The tricky case is application, which has to “thread” the state through the subexpressions in the right order:

``````   𝒯⟦t1 t2⟧ = λs0. let {f, s1} = 𝒯⟦t1⟧ s0 in
let {v, s2} = 𝒯⟦t2⟧ s1 in
f v s2``````

We can derive the transformation for `let`:

``````𝒯⟦let x = t1 in t2⟧ = 𝒯⟦(λx. t2) t1⟧
= λs0. let {x, s1} = 𝒯⟦t1⟧ s0 in 𝒯⟦t2⟧ s1``````

And we can derive `𝒯⟦set t⟧` by expanding `𝒯⟦let x = t in set x⟧`, which ends up as:

``   𝒯⟦set t⟧ = λs0. let {s2, s1} = 𝒯⟦t⟧ s0 in {unit, s2}``

I’m not sure why you’d want to do it, but you could write `get t` where `t` is a term like `set 0; unit`. So we should derive a more general transformation for `get` as well:

``   𝒯⟦get t⟧ = λs0. let {z, s1} = 𝒯⟦t⟧ s0 in {s1, s1}``

To evaluate a whole program, we need to apply it to an initial state `s0`:

``let {v, s} = 𝒯⟦t⟧ s0 in v``

Exercise. Check that `𝒯⟦set (succ (get unit)); set (succ (get unit))⟧ 0` evaluates to `succ (succ 0)`.

### More general typing rules

If we want the state to have some other type than `Nat`, but still a fixed type (call it `Tst`), the typing rules are straightforward:

`````` Γ ⊢ t : Unit
―――――――――――――――  T-Get
Γ ⊢ get t : Tst

Γ ⊢ t : Tst
―――――――――――――――― T-Set
Γ ⊢ set t : Unit``````

Digression (updated after class; it unfortunately became more complicated, and you’re not responsible for this): But if want the type to change over time, we must extend the type system. The translation into λ-calculus is instructive: a term of base type `T` becomes a term of type `S0→{T,S1}`, where `S0` is the old type of the state and `S1` is the new type of the state. (We’re just using `S` to range over types when they happen to be state types; this has nothing to do with type schemas.) That suggests that our typing judgements must now have “slots” for both the old state and new state. Let’s write them as

``Γ ⊢ t : T | S0↝S1``

Terms of function type `T1→T2` have a more complicated translation, because the term itself might change the state, and the body of the function might also change the state. The upshot is that we also need to extend the type system; function types now look like `T1→(T2|S1↝S2)`, which means “a function from type `T1` to type `T2`, that, when applied, changes the state from type `S1` to `S2`”.

Then the typing rules are:

``````――――――――――――――――― Γ(x)=T                                   T-Var
Γ ⊢ x : T | S0↝S0

Γ, x:T1 ⊢ t2 : T2 | S1↝S2
―――――――――――――――――――――――――――――――――――――――                    T-Abs
Γ ⊢ λx:T1. t : T1→(T2|S1↝S2) | S0↝S0

Γ ⊢ t1 : T11→(T12|S2↝S3) | S0↝S1   Γ ⊢ t2 : T11 | S1↝S2
――――――――――――――――――――――――――――――――――――――――――――――――――――――――   T-App
Γ ⊢ t1 t2 : T12 | S0↝S3

Γ ⊢ t : Unit | S0↝S1
――――――――――――――――――――――                                     T-Get
Γ ⊢ get t : S1 | S0↝S1

Γ ⊢ t : S2 | S0↝S1
――――――――――――――――――――――――                                   T-Set
Γ ⊢ set t : Unit | S0↝S2``````

## References

Syntax and evaluation rules modified to use evaluation contexts:

``````E ::= ...
ref E
! E
E := t
v := E

t | μ ⟶ t' | μ'
———————————————————————
E[t] | μ ⟶ E[t'] | μ'

(λx. t12) v2 | μ ⟶ [x↦v2]t12 | μ

ref v | μ ⟶ l | (μ, l→v)   where l ∉ dom(μ)
!l | μ ⟶ v | μ           where μ(l) = v
l := v | μ ⟶ unit | [l↦v]μ``````

These rules extend the syntax to allow locations (`l`), but note that actual programs do not contain explicit locations (p. 165); they only appear in the course of evaluating a program.

Section 13.4 has a long discussion of store typings, which are needed for typing locations. For the purpose of type-checking a program, we don’t need store typings, because programs don’t contain explicit locations. But store typings are needed in order to prove type safety.

Simulating a store in the pure untyped lambda calculus would require implementing a data structure mapping from locations to values; since that would be somewhat complicated, we’ll skip this. In a typed lambda calculus, however, we’d need a separate such data structure for each type; I’m not sure how this would be done.

## Mutable variables

ML has references that look pretty much like the above. Thus there is a sharp distinction between variables, whose values are immutable, and locations, whose values are mutable. Clojure does something similar.

Why don’t other languages (Scheme, C) make this distinction? Because every variable in these languages is actually a reference. Every time a function is called, the argument is copied and the variable is bound to a reference to the copy. Every time the variable is accessed, it is automatically dereferenced. We can write a transformation from Scheme-style mutation to ML-style as follows:

``````     𝒯⟦x⟧ = !x
𝒯⟦λx.t⟧ = λx. let x = ref x in 𝒯⟦t⟧
𝒯⟦t1 t2⟧ = 𝒯⟦t1⟧ 𝒯⟦t2⟧
𝒯⟦x := t⟧ = x := 𝒯⟦t⟧``````

You may be familiar with the terminology from C/C++, where a reference is called an lvalue and a value is called an rvalue.

## (Bibliographic) References

[1] Matthias Felleisen. 1991. On the expressive power of programming languages. Science of Computer Programming 17, 1–3 (1991), 35–75. DOI:https://doi.org/10.1016/0167-6423(91)90036-W