Optimization for Machine Learning, 2021-2022

The purpose of this course is to introduce students to optimization techniques for large machine learning.

Teachers: Giovanni Neglia, Othmane Marfoq, Angelo Rodio.

Main references:

Léon Bottou, Frank E. Curtis, Jorge Nocedal, Optimization Methods for Large-Scale Machine Learning, available here
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, available here
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, see dedicated page
Mu Li, David G. Andersen, Alexander Smola, and Kai Yu, Communication Efficient Distributed Machine Learning with the Parameter Server, NIPS 2014 available here
Abadi et al, TensorFlow: A System for Large-Scale Machine Learning, OSDI 2016, [pdf]
Feng Niu, Benjamin Recht, Christopher Ré and Stephen J. Wright, Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, available here
Eric P. Xing, Qirong Ho, Pengtao Xie, Wei Dai, Strategies and Principles of Distributed Machine Learning on Big Data, available here
Giovanni Neglia, Gianmarco Calbi, Don Towsley, and Gayane Vardoyan, The Role of Network Topology for Distributed Machine Learning, Infocom 2019, available here
Giovanni Neglia, Chuan Xu, Don Towsley, and Gianmarco Calbi, Decentralized gradient methods: does topology matter?, AISTATS 2020, available here
Chuan Xu, Giovanni Neglia, Nicola Sebastianelly, Dynamic Backup Workers, under submission, available here
Dan Alistarh's keynote talk at DISC 2019 [slides].

Other resources:

Sébastien Bubeck, Convex Optimization: Algorithms and Complexity, available here
Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, Dive into Deep Learning, available here
Joseph E. Gonzalez, Emerging Systems for Large-Scale Machine Learning, invited tutorial at ICML 2014, slides [pdf], [pptx]
Yuxin Chen's course on Large-Scale Optimization for Data Science at Princeton
Alex Smola's course on Scalable Machine Learning at Berkeley

Evaluation
- For Data Science students: 25% classwork (a 10-minute test at every lesson, only 4 best marks will be considered), 50% theoretical exam, 25% lab evaluation.
Lessons
Lessons will be from 13.30 to 16.45.
- First lesson (Giovanni Neglia, January 11, online): introduction to the course, introduction to ML optimization (empirical risk vs expected risk, training/validation/test sets). Sections 1-3.1 of Bottou et al.
- Second lesson (Giovanni Neglia, January 18, online): math refresher (gradient, hessian, convex sets and functions), presentation of full batch gradient and stochastic gradient methods, why stochastic gradient descent (SGD) may outperform batch gradient (qualitative explanation, time to minimize the empirical error), overview of noise reduction and second order methods, mini-batch methods. Section 4.3 of Goodfellow et al, Sections 3.2, 3.3, introduction to Section 4 of Bottou et al.
- Third lesson (Giovanni Neglia, January 25): convergence results (expected decrease after one iteration), definition of strongly convexity. Sections 4.1 and 4.2 of Bottou et al.
- Fourth lesson (Giovanni Neglia, February 1): convergence results of stochastic gradient methods for strongly convex functions (both constant and decreasing learning rates), the role of the condition number. Section 4.2 of Bottou et al.
- Fifth lesson (Othmane Marfoq, February 8): convergence results of stochastic gradient methods for non-convex functions (with both constant and decreasing learning rates), noise reduction methods (dynamic sample size, gradient aggregation). Section 4.3, 5.1, 5.2, and 5.3 of Bottou et al.
- Sixth lesson (Othmane Marfoq, February 15): other optimization methods (momentum, Nesterov, coordinate descent method), second order methods (Newton, Hessian-free inexact, quasi-Newton, Gauss-Newton methods), introduction to neural networks. Sections 6.1, 6.2, and 6.3 of Bottou et al.
Labs
During these practical sessions students will have the opportunity to train ML models in a distributed way on Inria scientific cluster.
Sessions are organized by Othmane Marfoq and Angelo Rodio.
Students need to carry out some administrative/configuration steps before the start of the labs.
Labs website.
Participation to the labs will be graded by the teachers.

Exam
The exam will be on February 22nd between 13.30 and 16.30.

Last modified: February 18, 2022

Optimization for Machine Learning, 2021-2022

Lessons

Labs

Exam