- Time
- MWF 11:30am–12:20pm
- Location
- TBD
- Instructor
- David Chiang

Introduction to the theory of neural networks: expressivity (what functions a neural network can and cannot compute) and trainability (what functions a neural network can and cannot learn). Neural network architectures covered will include feed-forward, recurrent, convolutional and attention (transformer) neural networks.

#### Prerequisites

- Theory: Students should be familiar with finite automata, Turing machines, and first-order logic, and comfortable reading and writing proofs. This requirement is satisfied by
*Theory of Computing*(CSE 30151) or equivalent. - Neural networks: Students should minimally understand feed-forward neural networks and how they are trained by gradient descent (backpropagation). They should ideally be familiar with recurrent neural networks and transformers. This requirement is satisfied by one of the following or permission of the instructor:
*Neural Networks*(CSE 40868/60868)*Natural Language Processing*(CSE 40657/60657)

#### Topics

This offering of the course will be focused on expressivity of neural networks for sequence data (language models). This schedule is still subject to change.

Week | Topics |
---|---|

1 | Introduction |

2 | Perceptrons and the XOR problem |

3 | Feedforward NNs and universal approximation |

4 | (Tentative) Neural tangent kernel |

5 | (Tentative) Overparameterization and double descent |

6 | Recurrent NNs (RNNs) and finite automata |

7 | RNNs with intermediate steps and Turing machines (and beyond) |

8 | LSTMs and counter machines |

9 | Circuit complexity and descriptive complexity |

10 | (Tentative) State-space models |

11 | Transformers and the PARITY problem |

12 | Transformers with intermediate steps and Turing machines |

13 | Unique-hard attention transformers and counter-free automata |

14 | Average-hard and soft attention transformers |

15 | Conclusion |

#### Requirements

Tentatively, the requirements will be as follows. There will be no final exam.

- Problem sets (50%)
- Programming assignments (30%)
- Final project (20%)