September 29, 2024
Contacts: Andrea Fusiello, Shaifali Parashar
Chalmers University of Technology
School of Information Science and Technology at ShanghaiTech
NAVER Labs Europe
Google DeepMind
Title: Geometry for Deep Learning and Deep Learning for Geometry
Multiple view geometry has long been a cornerstone of computer vision, providing robust solutions to 3D reconstruction, camera calibration, and motion estimation. Traditionally, these geometric methods have stood resilient against the rise of deep learning, offering interpretability, precision, and a solid mathematical foundation. However, recent advancements in machine learning are beginning to challenge this landscape, leading to a dynamic interaction between the two fields. In this keynote, I will delve into the evolving relationship between traditional geometric approaches and modern deep learning techniques. We will explore the history of multiple view geometry, highlighting its pivotal contributions to computer vision and its continued relevance in the deep learning era. I will illustrate how geometric methods are increasingly leveraged to generate high-quality training data for deep learning models, and how, conversely, deep learning is beginning to address complex 3D geometric problems once dominated by traditional methods. Rather than converging into a single unified approach, I argue that geometry and deep learning will continue to coexist, each complementing the other while maintaining their distinct advantages. There will always be a need for fully interpretable, theoretically grounded methods, which deep learning, with its black-box nature, cannot provide. This talk will examine how this interaction is driving innovation and what it means for the future of computer vision.
Biography
Fredrik Kahl has a highly diverse career that spans both academia and entertainment. He is primarily known as a distinguished Swedish professor specializing in mathematics, with a focus on computer vision and medical image analysis. Kahl has held significant academic positions, including at Lund University and Chalmers University of Technology, where he leads research groups in computer vision and machine learning. He has authored numerous scientific papers and has received prestigious awards, including the Marr Prize, for his contributions to multi-view geometry and global optimization.
Title: Event-based motion estimation, a case for geometric computer vision?
Neuromorphic cameras have recently gained in popularity owing to interesting properties such as high temporal resolution, high dynamic range, and absence of common blur effects. Owing to their event-driven, asynchronous working principle, they enable low latencies while keeping bandwidth requirements within limits. However, in order to preserve the benefits of event cameras, a new breed of algorithms acting at a low level and in an efficient, potentially asynchronous manner is required. With a potential to fill this gap, the present talk summarizes recent contributions on a somewhat overlooked problem in the realm of event-based motion estimation: spatio-temporal geometric solvers. Starting from dense correspondence-free methods operating in the image plane, the talk slowly works its way towards compact geometric solvers able to generate full motion hypotheses from small samples of events. Perhaps surprisingly, these newly proposed incidence relations are even applicable beyond event cameras, and may impact on how we approach motion estimation for high-speed or rolling shutter cameras.
Biography
Dr Laurent Kneip is a globally recognized expert in computer vision and robotics with many publications in top international conferences and journals such as the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), the International Conference on Computer Vision (ICCV), the European Conference on Computer Vision (ECCV), the International Conference on Robotics and Automation (ICRA), the Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and the Transactions on Robotics (TRO). He graduated as a Diplom-Ingenieur Univ. in mechatronical engineering from the Friedrich-Alexander University Erlangen/Nürnberg in 2008. In 2013, he received his PhD degree from the Autonomous Systems Lab (ASL) of the Eidgenössische Technische Hochschule (ETH) in Zurich. He then served as a lecturer and senior researcher at the Research School of Engineering at the Australian National University. In 2015, he was awarded the prestigious Discovery Early Career Researcher Award (DECRA) from the Australian Research Council (ARC), and he furthermore served as an Associate Investigator of the ARC Centre of Excellence for Robotic Vision. His contribution at ICCV 2017 received the Marr Prize Honorable Mention award, one of the most prestigious best paper awards in the computer vision community. Dr Laurent Kneip joined the School of Information Science and Technology at ShanghaiTech University in 2017, and was promoted to tenured Associate Professor in 2020. He founded and directs the Mobile Perception Lab, and also co-directs the ShanghaiTech Automation and Robotics Center.
Title: DUSt3R, the doom of Traditional Geometric Vision?
Geometric computer vision, or the tasks of estimating camera parameters and/or scene geometry from images, is one of the last bastion of classical computer vision. It is still largely dominated by engineered geometric approaches that still leverage pixel correspondences for epipolar geometry, PNP solvers, rotation averaging, bundle adjustment, etc. COLMAP, for instance, is largely used up to this day to generate ground-truth camera poses for large image collections, and Deep Learning (DL) methods strive to even get close to its performance. This is a great example of a handcrafted pipeline that crystalizes many traditional geometric vision tools. Interestingly, it is yet able to strongly benefit from DL works, for example leveraging robust pixel matching approaches. On the other side of the spectrum, recent DL advances like DUSt3R/MASt3R showed great capabilities in the few views regime, or in cases where traditional geometric tools are impotent, like in the monocular case, when the motion between views is a pure rotation, or even in "impossible matching" scenarios. These methods revolutionized traditional multi-view stereo reconstruction by regressing pointmaps that encode scene geometry without requiring calibrated nor posed cameras. DUSt3R simplifies the complex pipeline of traditional 3D reconstruction methods, significantly reducing computational overhead and enhancing performance across various benchmarks. Interestingly enough, our latest efforts to improve the accuracy and scalability mostly lead back to classical geometric vision solutions, showing the importance and relevance of classical theory in this framework. Instead of seeing an opposition between classical CV and DL, this talk will develop a vision where classical CV and DL coexist and evolve, and enrich each other to lead to better, more accurate and computationally efficient solutions for every-day use. Our end-goal is to unify and streamline the processing of 3D visual data, offering new perspectives and capabilities in visual perception, robotic navigation, cultural heritage preservation, and beyond.
Biography
Vincent is a research scientist in Geometric Deep Learning at Naver Labs Europe. He joined 5 years ago, in 2019, after completing his PhD on Multi-View Stereo Reconstruction for dynamic shapes at the INRIA Grenoble-Alpes under the supervision of E.Boyer and J-S. Franco. Other than that, he likes hiking in the mountains and finding simple solutions to complex problems. Interestingly, the latter usually comes with the former.
Title: Reinventing the Wheel in Computer Vision
Most computer vision algorithms today rely on machine learning models trained from massive amounts of data, e.g., as exemplified by the latest multimodal large language models. However, relying solely on general learning architectures can fail to capture the underlying regularities or physical structure of the systems being modeled. Examples of such failures include early network-based pose estimation algorithms as well as 3D reconstruction algorithms that fail to generalize outside their training domain. Modern techniques also often reinvent classical vision technique using new terminology without fully making the connection to previously developed approaches, often because the time elapsed between the new and old techniques spans several decades. In this talk, I will point out some of these rediscoveries and how making stronger connections to classic approaches can improve the performance and understanding of deep learning approaches.
Biography
Richard Szeliski is a Distinguished Scientist at Google DeepMind and an Affiliate Professor at the University of Washington. He is a Member of the National Academy of Engineering and a Fellow of the ACM and IEEE. Prof. Szeliski has done pioneering research in the fields of Bayesian methods for computer vision, image-based modeling, image-based rendering, and computational photography, which lie at the intersection of computer vision and computer graphics. His research on Photo Tourism, Photosynth, and Hyperlapse are exciting examples of the promise of large-scale image and video-based rendering. Prof. Szeliski received his Ph.D. degree in Computer Science from Carnegie Mellon University in 1988. He joined Google Research (now Google DeepMind) in 2022 after retiring from Facebook as the founding Director of the Computational Photography in 2020. Prior to Facebook, he worked at Microsoft Research for twenty years as well as several other industrial research labs. He has published over 180 research papers in computer vision, computer graphics, neural networks, and numerical analysis, as well as the books Computer Vision: Algorithms and Applications and Bayesian Modeling of Uncertainty in Low-Level Vision. He was a Program Chair for CVPR'2013 and ICCV'2003, served as an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and on the Editorial Board of the International Journal of Computer Vision, and was a Founding Editor of Foundations and Trends in Computer Graphics and Vision.