Syntax of NB
Reading: TAPL 3.1-3
The book uses a very simple programming language, NB, to introduce
some key concepts we will use again and again. This language just has
natural numbers and Booleans and doesn’t let you do all that much with
them. One thing that may be unfamiliar in this language is the way it
represents natural numbers (called Peano numbers): zero is
0
, one is succ 0
, two is
succ (succ 0)
, and so on.
The syntax of all the languages used in TAPL, as well as many programming language research papers, is (I believe) based on ISWIM, an imaginary programming language presented in a classic paper called “The Next 700 Programming Languages” [1].
Context-free grammars
The syntax of a language is defined by a context-free grammar. TAPL’s notation for context-free grammars is a little different from what you may be used to.
If the syntax of NB were written in typical CFG notation (as in Theory of Computing), it would look like this:
Term → true
| false
| if Term then Term else Term
| 0
| succ Term
| pred Term
| iszero Term
TAPL writes it like this:
t ::= true
false
if t then t else t
0
succ t
pred t
iszero t
The use of ::=
instead of →
is standard
(and then it’s usually called Backus-Naur
Form instead of a CFG). Some authors use |
but TAPL
omits it to reduce clutter.
Although some authors do, TAPL doesn’t typographically distinguish nonterminals (metavariables) from terminals; generally, single letters are nonterminal symbols and everything else is a terminal symbol.
In a syntax rule, like t ::= if t then t else t
, there’s
no need to distinguish between the three terms t
in the
right-hand side. But these are, in general, three different terms, so
pretty much everywhere outside of syntax rules, we distinguish them by
writing t1
, t2
, t'
, etc.
Abstract and concrete syntax
Concrete syntax defines the set of strings that you actually
type into your editor. When you write the concrete syntax of a language,
you have to be concerned about possible ambiguities, for example, does
1 + 2 * 3
mean the same thing as (1 + 2) * 3
or 1 + (2 * 3)
? You have to be careful to write the grammar
or add extra information to it to resolve all such ambiguities.
Abstract syntax defines the set of trees that the semantics will be defined on. Because trees can’t be ambiguous, we don’t have to worry about resolving ambiguities and can focus on semantics instead.
In this course, we’re only concerned with abstract syntax. Nevertheless, because we have to write down terms somehow (and it would be too cumbersome to draw trees), we do make use of concrete syntax constantly. So, you do need to know:
succ
,pred
, andiszero
have higher precedence and are (perhaps surprisingly) left-associative, so you should writesucc (succ zero)
, notsucc succ 0
.if...then...else
has lower precedence
But we make no attempt to formalize these conventions.