View analytic

Log in to bookmark your favorites and sync them to your phone or calendar.

Deep Learning Stage [clear filter]
Thursday, January 24


Continuous Object Detection for Conversational Vision
Object detection is a core computer vision task, where a machine learning (ML) model is trained to identify objects from a pre-specified set of object categories. In a real-life scenario, e.g., when an object detector is used to process the picture taken by a mobile phone camera, not all object categories are known to the ML model in advance since new objects of interest appear constantly in a user environment. As a result, it is important for object detection models to be continually learning -- they need to learn how to recognize new objects without suffering from the phenomenon of catastrophic forgetting, where the ML model forgets about old objects while learning about new ones. In this work, we discuss a new technology we have developed that can effectively do incremental learning for object detection in near-real time. We discuss the underlying mathematical framework of a novel loss function that enabled us to achieve state-of-the-art performance on benchmark datasets. We will also outline our efficient training and inference framework, which enabled our prototype system to successfully recognize objects in a real-world live demo scenario. We also discuss extensions of our incremental object detection work, where we can use auxiliary unlabeled data to get better models or use AutoML methods to automatically learn the best neural network architecture in the continuous learning mode. We next give a brief overview of a novel recurrent neural network model with attention that we have developed for the task of Visual Dialogue, where the user initiates a dialogue with the system regarding a picture. We conclude by discussing how incremental object detection, improved visual dialogue, and other novel research contributions form the cornerstones of a new framework of Conversational Vision, which is an active computer vision technology at the intersection of Natural Language Processing, Dialogue Understanding and Computer Vision.

avatar for Shalini Ghosh, Samsung Research America

Shalini Ghosh, Samsung Research America

Director of AI Research, Samsung Research America
Dr. Shalini Ghosh is the Director of AI Research at the Artificial Intelligence Center of Samsung Research America, where she leads a group working on Situated AI and Multi-modal Learning (i.e., learning from computer vision, language, and speech). She has extensive experience and... Read More →

Thursday January 24, 2019 2:40pm - 3:00pm
Grand Ballroom Hyatt Regency San Francisco, 5 Embarcadero Center, San Francisco, CA 94111, USA


Advancing State-of-the-art Image Recognition with Deep Learning on Hashtags
At Facebook everyday hundreds of millions of users interact with billions of visual contents. By understanding what's in an image, our systems can help connect users with the things that matter most to them. To improve our recognition system, I will talk about two main research challenges: how we train models at the scale of billions, and how we improve the reliability of the model prediction. Since current models are typically trained on data that are individually labeled by human annotators, scaling up to billions is non-trivial. We solve the challenge by training image recognition networks on large sets of public images with user-supplied hashtags as labels. By leveraging weakly supervised pretraining, our best model achieved a record-high 85.4% accuracy on ImageNet dataset.

avatar for Yixuan Li, Facebook AI (Computer Vision Group)

Yixuan Li, Facebook AI (Computer Vision Group)

Research Scientist, Facebook AI (Computer Vision Group)
Yixuan Li is a Research Scientist at Facebook AI, Computer Vision Group. She leads the research effort on large-scale visual learning with high dimensional label space. Before joining Facebook, she obtained her PhD from Cornell University in 2017. Yixuan's research interests are in... Read More →

Thursday January 24, 2019 3:00pm - 3:20pm
Grand Ballroom Hyatt Regency San Francisco, 5 Embarcadero Center, San Francisco, CA 94111, USA
Friday, January 25


Brand is Beyond Logos – Understanding Visual Brand
Logos come to mind when we think about iconic brands. However, a spectrum of visual cues is used to establish the signature of a brand. This includes color, pattern, shape. We train deep neural network to predict a variety of fashion brand and analyze visual representations using strength and extent of neuron activations. Logo is demonstrated to be at one end of the spectrum. Study of versatility of neurons shows that they are diverse in nature and contain specialists and generalists. Potential applications of making neural network explainable include personalization, elimination of bias in prediction, model improvement.

avatar for Robinson Piramuthu, eBay

Robinson Piramuthu, eBay

Chief Scientist for Computer Vision, eBay
Robinson Piramuthu joined eBay in 2011 and is currently the Chief Scientist for Computer Vision. He has over 20 years of experience in computer vision which includes large scale visual search, coarse and fine-grained visual recognition, object detection, computer vision for fashion... Read More →

Friday January 25, 2019 12:10pm - 12:30pm
Grand Ballroom Hyatt Regency San Francisco, 5 Embarcadero Center, San Francisco, CA 94111, USA