Cognitive Science Speaker Series | Continual Learning for Vision and Multi-Modal Large Language Models
Speaker: Christopher Kanan, Ph.D.
Title: Continual Learning for Vision and Multi-Modal Large Language Models
Short Bio: Christopher Kanan is an Associate Professor of Computer Science at the University of Rochester. He also has secondary appointments in the Goergen Institute for Data Science, Center for Visual Science, and Brain and Cognitive Sciences. His research focuses on deep learning, especially continual machine learning, where he works to make neural networks capable of learning over time for large-scale vision and multi-modal perception tasks. Other recent projects cover self-supervised learning, open-world learning, and creating bias robust neural network architectures. He also works on applications of machine learning, especially the use of AI in medicine. For three years, he led AI R&D at the start-up Paige, leading to the first FDA-cleared computer vision system for helping pathologists diagnose cancer in whole slide histology images. Kanan received a PhD in computer science from the University of California at San Diego, and then was a Postdoctoral Scholar at the California Institute of Technology (Caltech). Prior to joining the University of Rochester, he worked at NASA JPL, followed by becoming faculty at the Rochester Institute of Technology. He is an NSF CAREER award recipient and a Senior Member of both AAAI and IEEE.
Abstract: Deep learning has been tremendously successful, yet still lacks many capabilities. Inspired by cognitive science, my lab strives to transcend current AI limitations. After training on large amounts of data, conventional deep networks are frozen in time. Unlike humans, they cannot easily continue to learn more information. If this is attempted, models suffer from catastrophic forgetting, resulting in the loss of previously acquired skills. Taking inspiration from theories of memory consolidation in the brain, my lab has developed state-of-the-art continual learning methods for large-scale computer vision and multi-modal perception tasks that overcome catastrophic forgetting. Specifically, our work combines hippocampal indexing theory and replay mechanisms that occur during non-rapid eye movement (NREM) sleep into deep networks, resulting in methods that learn much more efficiently than conventional deep learning algorithms without sacrificing predictive performance. I also discuss recent work from my lab on overcoming linguistic forgetting when creating multi-modal large language models with capabilities similar to GPT-4V.
ASL-English interpreters have been requested. Light refreshments will be provided.
Event Snapshot
When and Where
Who
Open to the Public
Interpreter Requested?
Yes