Cognitive Science Speaker Series | Continual Learning for Vision and Multi-Modal Large Language Models

Event Image
White geometric background with an orange box and the title of the talk coming out of the box. Blue, green, yellow, and purple greater than signs pointing to the date, time, and location. A picture of the speaker and text below his picture with his name and title. RIT logo and college names next to it.

Speaker: Christopher Kanan, Ph.D.

Title: Continual Learning for Vision and Multi-Modal Large Language Models

Short Bio: Christopher Kanan is an Associate Professor of Computer Science at the University of Rochester. He also has secondary appointments in the Goergen Institute for Data Science, Center for Visual Science, and Brain and Cognitive Sciences. His research focuses on deep learning, especially continual machine learning, where he works to make neural networks capable of learning over time for large-scale vision and multi-modal perception tasks. Other recent projects cover self-supervised learning, open-world learning, and creating bias robust neural network architectures. He also works on applications of machine learning, especially the use of AI in medicine. For three years, he led AI R&D at the start-up Paige, leading to the first FDA-cleared computer vision system for helping pathologists diagnose cancer in whole slide histology images. Kanan received a PhD in computer science from the University of California at San Diego, and then was a Postdoctoral Scholar at the California Institute of Technology (Caltech). Prior to joining the University of Rochester, he worked at NASA JPL, followed by becoming faculty at the Rochester Institute of Technology. He is an NSF CAREER award recipient and a Senior Member of both AAAI and IEEE.

Abstract: Deep learning has been tremendously successful, yet still lacks many capabilities. Inspired by cognitive science, my lab strives to transcend current AI limitations. After training on large amounts of data, conventional deep networks are frozen in time. Unlike humans, they cannot easily continue to learn more information. If this is attempted, models suffer from catastrophic forgetting, resulting in the loss of previously acquired skills. Taking inspiration from theories of memory consolidation in the brain, my lab has developed state-of-the-art continual learning methods for large-scale computer vision and multi-modal perception tasks that overcome catastrophic forgetting. Specifically, our work combines hippocampal indexing theory and replay mechanisms that occur during non-rapid eye movement (NREM) sleep into deep networks, resulting in methods that learn much more efficiently than conventional deep learning algorithms without sacrificing predictive performance. I also discuss recent work from my lab on overcoming linguistic forgetting when creating multi-modal large language models with capabilities similar to GPT-4V.

ASL-English interpreters have been requested. Light refreshments will be provided.


Contact
Matthew Dye
Event Snapshot
When and Where
January 17, 2025
12:00 pm - 1:00 pm
Room/Location: GAN-2070
Who

Open to the Public

Interpreter Requested?

Yes

Topics
imaging science
research