Audio-Visual Scene Understanding Towards Unified, Explainable, and Robust Multisensory Perception

Date and time: 2 December 2021, 12:00 – 13:00 CET (UTC +1)
Speaker: Yapeng Tian, Department of Computer Science at the University of Rochester
Title: Audio-Visual Scene Understanding Towards Unified, Explainable, and Robust Multisensory Perception

Zoom: https://kth-se.zoom.us/j/69560887455
Meeting ID: 695 6088 7455
Password: 755440

Watch the recorded presentation here:

Abstract: Understanding surrounding scenes, i.e., recognizing objects, sounds, and human activities, is a fundamental capability in human intelligence. Similarly, developing computational models that can understand scenes is a central problem in AI. Humans use multiple cooperated senses with multisensory integration to understand a scene. For example, hearing helps capture the spatial location of a racing car behind us; seeing a person’s talking face can strengthen our perception on his/her speech. However, existing scene understanding algorithms are designed to solely rely on either visual or auditory modalities, and they are yet to explore whether joint audio-visual learning can facilitate understanding scenes in videos. In this talk, I will introduce a series of our works in audio-visual scene understanding for building machines with unified, explainable, and robust multisensory perception capability. At the end of the talk, I will discuss some challenges and future directions.

Bio: Yapeng Tian is a fifth-year PhD student in the Department of Computer Science at the University of Rochester. He received his master’s degree from Tsinghua University in 2017 and a bachelor’s degree from Xidian University in 2013. His research interests centre around solving core computer vision and audition problems and applying the developed learning approaches to broad AI applications in multisensory perception, computational photography, AR/VR, and HCI. He has published more than 20 papers in peer-reviewed conferences and journals, including CVPR, ICCV, ECCV, and IEEE TPAMI. He has organized tutorials on “audio-visual scene understanding” in WACV 2021 and CVPR 2021. He actively served as a PC member for major AI conferences.

Webpage: https://yapengtian.org/

Twitter account: https://twitter.com/YapengTian

LinkedIn profile: https://www.linkedin.com/in/yapeng-tian-780795141/

Audio-Visual Scene Understanding Towards Unified, Explainable, and Robust Multisensory Perception

Date and time

Events & seminars

Digital Futures Summer Research Internship Programme (SRI) – Project presentations

Digital Humanities: what is in a name?

Digital Futures KTH 6G Summit

Digital Futures Faculty Lunch 23 September