Syntax of NB

Reading: TAPL 3.1-3

The book uses a very simple programming language, NB, to introduce some key concepts we will use again and again. This language just has natural numbers and Booleans and doesn’t let you do all that much with them. One thing that may be unfamiliar in this language is the way it represents natural numbers (called Peano numbers): zero is 0, one is succ 0, two is succ (succ 0), and so on.

The syntax of all the languages used in TAPL, as well as many programming language research papers, is (I believe) based on ISWIM, an imaginary programming language presented in a classic paper called “The Next 700 Programming Languages” [1].

Context-free grammars

The syntax of a language is defined by a context-free grammar. TAPL’s notation for context-free grammars is a little different from what you may be used to.

If the syntax of NB were written in typical CFG notation (as in Theory of Computing), it would look like this:

Term  true
      | false
      | if Term then Term else Term
      | 0
      | succ Term
      | pred Term
      | iszero Term

TAPL writes it like this:

t ::= true
      false
      if t then t else t
      0
      succ t
      pred t
      iszero t

The use of ::= instead of is standard (and then it’s usually called Backus-Naur Form instead of a CFG). Some authors use | but TAPL omits it to reduce clutter.

Although some authors do, TAPL doesn’t typographically distinguish nonterminals (metavariables) from terminals; generally, single letters are nonterminal symbols and everything else is a terminal symbol.

In a syntax rule, like t ::= if t then t else t, there’s no need to distinguish between the three terms t in the right-hand side. But these are, in general, three different terms, so pretty much everywhere outside of syntax rules, we distinguish them by writing t1, t2, t', etc.

Abstract and concrete syntax

Concrete syntax defines the set of strings that you actually type into your editor. When you write the concrete syntax of a language, you have to be concerned about possible ambiguities, for example, does 1 + 2 * 3 mean the same thing as (1 + 2) * 3 or 1 + (2 * 3)? You have to be careful to write the grammar or add extra information to it to resolve all such ambiguities.

Abstract syntax defines the set of trees that the semantics will be defined on. Because trees can’t be ambiguous, we don’t have to worry about resolving ambiguities and can focus on semantics instead.

In this course, we’re only concerned with abstract syntax. Nevertheless, because we have to write down terms somehow (and it would be too cumbersome to draw trees), we do make use of concrete syntax constantly. So, you do need to know:

  • succ, pred, and iszero have higher precedence and are (perhaps surprisingly) left-associative, so you should write succ (succ zero), not succ succ 0.
  • if...then...else has lower precedence

But we make no attempt to formalize these conventions.

References

[1]
P. J. Landin. 1966. The next 700 programming languages. Communications of the ACM 9, 3 (1966), 157–166. DOI:https://doi.org/10.1145/365230.365257