In this lecture, we will discuss gradient optimization methods for neural networks (and deep models in general).

**Please study the following material in preparation for the class:**

- Geoff Hinton’s coursera lectures 6.
- Chapter 4 and Chapter 8 of the Deep Learning textbook.
- Lecture slides. (covers SGD, SGD+momentum, Nesterov momentum, Adagrad, RMSprop, Adadelta)