Theory of Neural Networks

CSE 60963 | Fall 2024

MWF 11:30am–12:20pm
David Chiang

Introduction to the theory of neural networks: expressivity (what functions a neural network can and cannot compute) and trainability (what functions a neural network can and cannot learn). Neural network architectures covered will include feed-forward, recurrent, convolutional and attention (transformer) neural networks.


  • Theory: Students should be familiar with finite automata, Turing machines, and first-order logic, and comfortable reading and writing proofs. This requirement is satisfied by Theory of Computing (CSE 30151) or equivalent.
  • Neural networks: Students should minimally understand feed-forward neural networks and how they are trained by gradient descent (backpropagation). They should ideally be familiar with recurrent neural networks and transformers. This requirement is satisfied by one of the following or permission of the instructor:
    • Neural Networks (CSE 40868/60868)
    • Natural Language Processing (CSE 40657/60657)


This offering of the course will be focused on expressivity of neural networks for sequence data (language models). This schedule is still subject to change.

Week Topics
1 Introduction
2 Perceptrons and the XOR problem
3 Feedforward NNs and universal approximation
4 (Tentative) Neural tangent kernel
5 (Tentative) Overparameterization and double descent
6 Recurrent NNs (RNNs) and finite automata
7 RNNs with intermediate steps and Turing machines (and beyond)
8 LSTMs and counter machines
9 Circuit complexity and descriptive complexity
10 (Tentative) State-space models
11 Transformers and the PARITY problem
12 Transformers with intermediate steps and Turing machines
13 Unique-hard attention transformers and counter-free automata
14 Average-hard and soft attention transformers
15 Conclusion


Tentatively, the requirements will be as follows. There will be no final exam.

  • Problem sets (50%)
  • Programming assignments (30%)
  • Final project (20%)