Learning Representations: A Challenge for Learning Theory

submitted by George Yang on 09/28/15 1

VideoLectures.Net Computer Science View the talk in context: videolectures.net/colt2013_lecun_theory/ View the complete 26th Annual Conference on Learning Theory (COLT), Princeton 2013: videolectures.net/colt2013_princeton/ Speaker: Yann LeCun, Computer Science Department, New York University License: Creative Commons CC BY-NC-ND 3.0 More information at videolectures.net/site/about/ More talks at videolectures.net/ Perceptual tasks such as vision and audition require the construction of good features, or good internal representations of the input. Deep Learning designates a set of supervised and unsupervised methods to construct feature hierarchies automatically by training systems composed of multiple stages of trainable modules.The recent history of OCR, speech recognition, and image analysis indicates that deep learning systems yield higher accuracy than systems that rely on hand-crafted features or "shallow" architectures whenever more training data and more computational resources become available. Deep learning systems, particularly convolutional nets, hold the performances record in a wide variety of benchmarks and competition, including object recognition in image, semantic image labeling (2D and 3D), acoustic modeling for speech recognition, drug design, handwriting recognition, pedestrian detection, road sign recognition, etc. The most recent speech recognition and image analysis systems deployed by Google, IBM, Microsoft, Baidu, NEC and others all use deep learning and many use convolutional nets.While the practical successes of deep learning are numerous, so are the theoretical questions that surround it. What can circuit complexity theory tell us about deep architectures with their multiple sequential steps of computation, compared to, say, kernel machines with simple kernels that have only two steps? What can learning theory tell us about unsupervised feature learning? What can theory tell us about the properties of deep architectures composed of layers that expand the dimension of their input (e.g. like sparse coding), followed by layers that reduce it (e.g. like pooling)? What can theory tell us about the properties of the non-convex objective functions that arise in deep learning? Why is it that the best-performing deep learning systems happen to be ridiculously over-parameterized with regularization so aggressive that it borders on genocide? 0:00 Learning Representations: A Challenge For Learning Theory 1:36 Learning Representations: a challenge for AI, ML, Neuroscience, Cognitive Science 2:44 Architecture of "Mainstream" Machine Learning Systems 3:47 This Basic Model has not evolved much since the 50's 4:06 The Mammalian Visual Cortex is Hierarchical 5:21 Let's be inspired by nature, but not too much 6:57 Trainable Feature Hierarchies: End-to-end learning 7:31 Do we really need deep architectures? 8:47 Why would deep architectures be more efficient? 9:41 Deep Learning: A Theoretician's Nightmare? - 1 11:34 Deep Learning: A Theoretician's Nightmare? - 2 12:54 Deep Learning: A Theoretician's Paradise? 13:56 Deep Learning and Feature Learning Today 15:04 In Many Fields, Feature Learning Has Caused a Revolution 16:25 Convolutional Networks 16:56 Early Hierarchical Feature Models for Vision 18:17 The Convolutional Net Model 18:20 Feature Transform - 1 19:03 Feature Transform - 2 19:12 Convolutional Network (ConvNet) 19:15 Convolutional Network Architecture 19:17 Convolutional Network (vintage 1990) for more slides check videolectures.net/colt2013_lecun_theory/

Submit

Be the first to comment

Collections with this video

machine learning - terminators

Education

George Yang

Video added

You already added this video

Huzzah!