Thursday, January 24 • 2:40pm - 3:00pm
Continuous Object Detection for Conversational Vision

Log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Object detection is a core computer vision task, where a machine learning (ML) model is trained to identify objects from a pre-specified set of object categories. In a real-life scenario, e.g., when an object detector is used to process the picture taken by a mobile phone camera, not all object categories are known to the ML model in advance since new objects of interest appear constantly in a user environment. As a result, it is important for object detection models to be continually learning -- they need to learn how to recognize new objects without suffering from the phenomenon of catastrophic forgetting, where the ML model forgets about old objects while learning about new ones. In this work, we discuss a new technology we have developed that can effectively do incremental learning for object detection in near-real time. We discuss the underlying mathematical framework of a novel loss function that enabled us to achieve state-of-the-art performance on benchmark datasets. We will also outline our efficient training and inference framework, which enabled our prototype system to successfully recognize objects in a real-world live demo scenario. We also discuss extensions of our incremental object detection work, where we can use auxiliary unlabeled data to get better models or use AutoML methods to automatically learn the best neural network architecture in the continuous learning mode. We next give a brief overview of a novel recurrent neural network model with attention that we have developed for the task of Visual Dialogue, where the user initiates a dialogue with the system regarding a picture. We conclude by discussing how incremental object detection, improved visual dialogue, and other novel research contributions form the cornerstones of a new framework of Conversational Vision, which is an active computer vision technology at the intersection of Natural Language Processing, Dialogue Understanding and Computer Vision.

avatar for Shalini Ghosh, Samsung Research America

Shalini Ghosh, Samsung Research America

Director of AI Research, Samsung Research America
Dr. Shalini Ghosh is the Director of AI Research at the Artificial Intelligence Center of Samsung Research America, where she leads a group working on Situated AI and Multi-modal Learning (i.e., learning from computer vision, language, and speech). She has extensive experience and... Read More →

Thursday January 24, 2019 2:40pm - 3:00pm
Grand Ballroom Hyatt Regency San Francisco, 5 Embarcadero Center, San Francisco, CA 94111, USA