If your loss function is positive, you need less tuning to train your machine learning model

Date and time: 17 October 2023, 15:00 – 16:00 CEST
Speaker: Robert M. Gower, Flatiron Institute, NYC.
Title: If your loss function is positive, you need less tuning to train your machine learning model

Where: DCS seminar room, Malvinas väg 10, Floor 6, room A641 and
Zoom: https://kth-se.zoom.us/j/61881721807

This seminar is co-sponsored by Digital Futures

Abstract: Training a modern production-grade large neural network is computationally expensive. This cost is multiplied when you consider that multiple runs are needed to tune the hyperparameters, with arguably the most important parameter being the learning rate. I will talk about some new adaptive learning rates that significantly reduce the need for manual tuning. The key idea is to use the fact that most loss functions are positive. By leveraging this positivity, we can design an adaptive learning rate schedule for a given optimizer.

The trick to incorporating this positivity into a given optimization method is to use the model-based viewpoint of optimization. In this viewpoint, every optimization method builds a simple model of the loss function it is trying to minimize. We then need only tweak this model by enforcing that it is positive. We’ll demonstrate how popular methods like stochastic gradient descent with momentum and Adam can be adjusted using this trick. We then show on a range of benchmark problems from standard vision tasks to translation using transformers, how these adaptive learning rate schedulers make tuning much easier.

Bio: Robert M. Gower is a Research Scientist at Flatiron Institute in New York City. He is a British/Brazilian mathematical optimizer, working on the design and analyses of new algorithms for solving optimization problems in Machine Learning, Statistics and more generally Data Science. Before joining the Flatiron Institute, he was an Assistant Professor at the Institut de Polytechnique du Paris and has been a visiting scientist at Google Research (2021) and Facebook Research (2020). Robert received his PhD in Applied Mathematics from the University of Edinburgh (2016) and a Bachelor and MSc in Applied Mathematics from the State University of Campinas (2011).

If your loss function is positive, you need less tuning to train your machine learning model

Date and time

Events & seminars

Digital Futures Summer Research Internship Programme (SRI) – Project presentations

Digital Humanities: what is in a name?

Digital Futures KTH 6G Summit

Digital Futures Faculty Lunch 23 September