Theory of Neural Networks

CSE 60963 | Fall 2024

Programming Projects

For each project, you can choose one of our suggested topics, or you can propose your own topic.

Disclaimers about our suggested topics: Since you are probably the first person ever to implement the constructions below, there's a real chance that you will find bugs in the constructions! Humanity thanks you.

To propose your own topic: Please email your idea(s) to Prof. Chiang three weeks before the due date; you should converge on a topic by two weeks before the due date. It should involve some programming and it should relate to the course material somehow. While our suggested topics are implementations of theoretical constructions, you're welcome to propose experiments instead.

PP1

Implement one of the following, or propose a different topic. When the instructions say to construct a ReLU network, you should construct it as a PyTorch Sequential object containing only Linear and ReLU as submodules.

  1. Given a Boolean function \(f: \mathbb{R}^d \rightarrow \mathbb{R}\), construct a two-layer ReLU network that computes \(f\). Assume that \(f\) is represented as a Boolean formula with variables \(x_1, \ldots, x_d\).
  2. Given a \(\rho\)-Lipschitz continuous function \(f : \mathbb{R}^d \rightarrow \mathbb{R}\) and \(\epsilon>0\), construct a three-layer ReLU network that approximates \(f\) with error \(\epsilon\). Assume that \(f\) is represented as a Python function from Tensor to float.
  3. Given a continuous piecewise linear function \(f : \mathbb{R}^d \rightarrow \mathbb{R}\) with \(k\) pieces, construct a ReLU network with \(\log k\) layers that computes \(f\).

PP2

Implement one of the following, or propose a different topic. When the instructions say to construct a simple ReLU RNN, you should construct it as a PyTorch RNN(nonlinearity='relu').

  1. Given a finite automaton \(M\), construct a simple ReLU RNN that recognizes the same language as \(M\).
  2. Given a Turing machine \(M\), construct a simple ReLU RNN (or a not-simple ReLU RNN is okay too) that simulates \(M\) using intermediate steps.
  3. Given a probabilistic finite automaton \(M\) and \(\epsilon > 0\), construct a simple ReLU RNN that approximates \(M\) with total variation distance \(\epsilon\), as shown by Svete et al (2024).

PP3

Implement one of the following, or propose a different topic. When the instructions say to construct a transformer, you should construct it as a PyTorch TransformerEncoder. To construct a transformer layer that matches the definition from the notes, use layer = TransformerEncoderLayer(nhead=1, d_model=d, dim_feedforward=d, dropout=0, layer_norm_eps=0, bias=False).

  • The query, key, and value weight matrices (\(\mathbf{W}^{\mathrm{Q}}, \mathbf{W}^{\mathrm{K}}, \mathbf{W}^{\mathrm{V}}\)) are packed into a single matrix, layer.self_attn.in_proj_weight, with shape \(3d \times d\).
  • There is also an output linear layer, layer.self_attn.out_proj.weight, which you can set to the identity matrix.
  • To get unique-hard or average-hard attention, you can replace layer.self_attn with your own Module.
  • To disable layer normalization, you can replace layer.norm1 and layer.norm2 with the identity function.
  1. Choose a language and construct a (unique-hard, average-hard, or soft attention) transformer that recognizes it. You can follow one of the constructions in the book, or you can devise your own transformer.
    • \((ab)^*\)
    • \(\{w \mid \text{$w$ contains the substring $aba$}\}\)
    • The majority language: \(\{w \mid \text{$w$ contains more $a$'s than $b$'s}\}\)
    • The Dyck language (p. 86)
  2. Given an LTL formula \(\phi\), construct a unique-hard or average-hard attention transformer encoder that recognizes the language defined by \(\phi\).
  3. Given a \(K_t[\#]\) formula \(\phi\), construct a softmax attention transformer encoder that recognizes the language defined by \(\phi\).
  4. Given a Turing machine \(M\), construct an average-hard attention transformer decoder that simulates \(M\) using intermediate steps.