Evaluating ML: When Will We Stop Fooling Ourselves?

Date and time: Tuesday 5 may 2026, 10:00-11:00 CEST
Speaker: Ricardo Baeza-Yates KTH, UPF & UChile
Title: Evaluating ML:
When Will We Stop
Fooling Ourselves?

Where: Digital Futures hub, Osquars Backe 5, floor 2 at KTH main campus OR Zoom
Directions: https://www.digitalfutures.kth.se/contact/how-to-get-here/
OR
Zoom: https://kth-se.zoom.us/j/69560887455

Host: Aristides Gionis, argioni@kth.se

Recorded presentation

A middle-aged man with a beard and dark hair smiles outdoors, wearing a blue patterned shirt. An urban landscape with buildings and greenery is blurred in the background under a partly cloudy sky.

Bio: Ricardo Baeza-Yates is a part-time WASP Professor at KTH Royal Institute of Technology in Stockholm, as well as part-time professor at the departments of Engineering of Universitat Pompeu Fabra in Barcelona and Computing Science of University of Chile in Santiago. Before, he has been Director of Research at the Institute for Experiential AI of Northeastern University in its Silicon Valley campus (2021-25) and VP of Research at Yahoo Labs, based first in Barcelona, Spain, and later in Sunnyvale, California (2006-16). He is a world expert in responsible AI and member of AI technology committees at GPAI/OCDE, ACM and IEEE.

He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 1999 and 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow. He has won national scientific awards in Chile (2024) and Spain (2018), among other accolades and distinctions. He obtained a Ph.D. in CS from the University of Waterloo, Canada, and his areas of expertise are responsible AI, web search and data mining plus data science and algorithms in general.

Abstract: ML Evaluation is usually based on an average measure of success such as accuracy. This kind of evaluation has several drawbacks: (1) the model works well for easy instances but badly for difficult ones, but the actual real distribution is usually not known; (2) this assumes that all errors have the same impact, which is almost never true; and (3) optimizing success does not minimize critical errors. In this presentation we discuss these problems and give some solutions that address them.

Evaluating ML: When Will We Stop Fooling Ourselves?

Events & seminars

Digital Futures Workshop on Responsible AI vs. AGI Hype

From Policy to Practice: AI and the Future Organisation of Social Care

WASP Fellowship event

Summer school on Machine Learning for Space 2026