您的位置:首页 > 其它

Deep Learning in Computer Vision

2017-12-19 17:05 1546 查看


Topics in Computer Vision (CSC2523):


Deep Learning in Computer Vision


Winter 2016





In recent years, Deep Learning has become a dominant Machine Learning tool for a wide variety of domains. One of its biggest successes has been in Computer Vision where the performance in problems such object and
action recognition has been improved dramatically. In this course, we will be reading up on various Computer Vision problems, the state-of-the-art techniques involving different neural architectures and brainstorming about promising new directions.

Please sign up here in the beginning of
class.


Course overview

This class is a graduate seminar course in computer vision. The class will cover a diverse set of topics in Computer Vision and various Neural Network architectures. It will
be an interactive course where we will discuss interesting topics on demand and latest research buzz. The goal of the class is to learn about different domains of vision, understand, identify and analyze the main challenges, what works and what doesn't, as
well as to identify interesting new directions for future research.
Prerequisites: Courses in computer vision and/or machine learning (e.g., CSC320, CSC420, CSC411) are highly recommended (otherwise you will need some additional reading), and
basic programming skills are required for projects.

 back to top


Course Information


Time and Location


Winter 2016

Day: Tuesday
Time: 9am-11am
Room: ES B149 (Earth Science Building at 5 Bancroft Avenue)


Instructor


Sanja Fidler

Email: fidler@cs dot toronto dot edu
Homepage: http://www.cs.toronto.edu/~fidler
Office hours: by appointment (send email)

When emailing me, please put CSC2523 in the subject line.


Forum

This class uses piazza. On this webpage, we will post announcements and assignments.
The students will also be able to postquestions and discussions in a forum style manner, either to their instructors or to their peers.

 back to top


Invited Speakers

We will have an invited speaker for this course:

Raquel Urtasun

Assistant Professor, University of Toronto

Talk title: Deep Structured Models

as well as several invited lectures / tutorials:
Yuri Burda,
Postdoctoral Fellow, University of Toronto:    Lecture on Variational Autoencoders
Ryan Kiros, PhD student,
University of Toronto:    Lecture on Recurrent Neural Networks and Neural Language Models
Jimmy Ba, PhD student,
University of Toronto:    Lecture on Neural Programming
Yukun Zhu, Msc student,
University of Toronto:    Lecture on Convolutional Neural Networks
Elman Mansimov, Research
Assistant, University of Toronto:    Lecture on Image Generation with Neural Networks
Emilio Parisotto,
Msc student, University of Toronto:    Lecture on Deep Reinforcement Learning
Renjie Liao, PhD student,
University of Toronto:    Lecture on Highway and Residual Networks
Urban Jezernik,
PhD student, University of Ljubljana:    Lecture on Music Generation


Requirements

Each student will need to write two paper reviews each week, present once or twice in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).


Grading

The final grade will consist of the following 
Participation
 (attendance, participation in discussions, reviews)
15%
Presentation
 (presentation of papers in class)
25%
Project
 (proposal, final report)
60%


Detailed Requirements   (click to Expand / Collapse)

 back
to top


Syllabus

The first class will present a short overview of neural network architectures, however, the details will be covered when reading on particular topics. Readings will touch on a diverse set of topics in Computer Vision.
The course will be interactive -- we will add interesting topics on demand and latest research buzz.


Tentative Syllabus    (click to Expand / Collapse)

 back
to top


Schedule (tentative)


Schedule

DateTopic Reading / Material SpeakerSlides
Jan 12Admin & Introduction(s)   Sanja Fidleradmin
Convolutional Neural Networks
Jan 19Convolutional Neural Nets(tutorial) Resources: Stanford's cs231 class, VGG's Practical CNNTutorial
Code: CNN Tutorial for TensorFlowTutorial for caffe,
CNNTutorial for Theano
 Yukun Zhu
(invited)
[pdf]
[code]
 Image Segmentation Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs   [PDF] [code]

L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L Yuille
 Shenlong Wang[pdf]

[code]
Jan 26Very Deep Networks Highway Networks  [PDF] [code]

Rupesh Kumar Srivastava, Klaus Greff, Jurgen Schmidhuber

Deep Residual Learning for Image Recognition  [PDF]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
 Renjie Liao
(invited)
[pdf]
 Object Detection Rich feature hierarchies for accurate object detection and semantic segmentation   [PDF] [code]

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks   [PDF] [code
(Matlab)] [code (Python)]

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
 Kaustav Kundu[pdf]
Feb 2Stereo

Siamese Networks
 Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches  [PDF] [code]

Jure Žbontar, Yann LeCun

Learning to Compare Image Patches via Convolutional Neural Networks  [PDF] [code]

Sergey Zagoruyko, Nikos Komodakis
 Wenjie Luo[pdf]
 Depth from Single Image Designing Deep Networks for Surface Normal Estimation   [PDF]

Xiaolong Wang, David Fouhey, Abhinav Gupta
 Mian Wei[pptx]  [pdf]
Feb 9Image Generation Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks   [PDF]

Alec Radford, Luke Metz, Soumith Chintala

Generating Images from Captions with Attention   [PDF]

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov
 Elman Mansimov
(invited)
[pdf]
 Domain Adaptation, Zero-shot Learning Simultaneous Deep Transfer Across Domains and Tasks   [PDF]

Eric Tzeng, Judy Hoffman, Trevor Darrell

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions   [PDF]

Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov
 Lluis Castrejon[pdf]
Recurrent Neural Networks
Feb 23RNNs and Neural Language Models Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models   [PDF] [code]

Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel

Skip-Thought Vectors   [PDF] [code]

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler
 Jamie Kiros
(invited)
 
Mar 1Modeling Words Efficient Estimation of Word Representations in Vector Space  [PDF] [code]

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
 Eleni Triantafillou
[pdf]
 Describing Videos Sequence to Sequence -- Video to Text   [PDF]

Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko
 Erin Grant
[pdf]
 Image-based QA Ask Your Neurons: A Neural-based Approach to Answering Questions about Images   [PDF]

Mateusz Malinowski, Marcus Rohrbach, Mario Fritz
 Yunpeng Li
[pdf]
Mar 8Variational Autoencoders Auto-Encoding Variational Bayes   [PDF]

Diederik P Kingma, Max Welling

Tutorial: Bayesian Reasoning and Deep Learning   [PDF]

Shakir Mohamed
 Yura Burda
(invited)
[pdf]
 Text-based QA End-To-End Memory Networks   [PDF]

Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
 Marina Samuel
[pdf]
 Neural Reasoning Recursive Neural Networks Can Learn Logical Semantics   [PDF]

Samuel R. Bowman, Christopher Potts, Christopher D. Manning
 Rodrigo Toro Icarte
[pdf]
Mar 15Neural Programming Neural GPUs Learn Algorithms   [PDF]

Lukasz Kaiser, Ilya Sutskever

Neural Programmer-Interpreters   [PDF]

Scott Reed, Nando de Freitas

Neural Programmer: Inducing Latent Programs with Gradient Descent   [PDF]

Arvind Neelakantan, Quoc V. Le, Ilya Sutskever
 Jimmy Ba
(invited)
 
 Conversation Models A Neural Conversational Model   [PDF]

Oriol Vinyals, Quoc Le
 Caner Berkay Antmen
[pdf]
 Sentiment Analysis Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank   [PDF]

Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts
 Zhicong Lu
[pdf]
Mar 22Video Representations Unsupervised Learning of Video Representations using LSTMs  [PDF]

Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov
 Kamyar Ghasemipour
[pdf]
 CNN Visualization Explaining and Harnessing Adversarial Examples   [PDF]

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy
 Neill Patterson
[pdf]
Mar 29Direction Following (Robotics) Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences   [PDF]

Hongyuan Mei, Mohit Bansal, Matthew R. Walter
 Alan Yusheng Wu
[pdf]
 Visual Attention Recurrent Models of Visual Attention   [PDF]

Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
 Matthew Shepherd
[pdf]
 Music A First Look at Music Composition using LSTM Recurrent Neural Networks   [PDF]

Douglas Eck, Jurgen Schmidhuber

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network   [PDF]

Andrew J.R. Simpson, Gerard Roma, Mark D. Plumbley
 Charu Jaiswal
[pdf]
 Music generation Overview of music generation Urban Jezernik
(invited)
 
 Pose and Attributes PANDA: Pose Aligned Networks for Deep Attribute Modeling  [PDF]

Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev
 Sidharth Sahdev
[pptx]
 Image Style A Neural Algorithm of Artistic Style   [PDF]  [code]

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
 Nancy Iskander
[pdf]
Apr 5Human gaze Where Are They Looking?   [PDF]

Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
 Abraham Escalante
[pdf]
 Instance Segmentation Monocular Object Instance Segmentation and Depth Ordering with CNNs   [PDF]

Ziyu Zhang, Alex Schwing, Sanja Fidler, Raquel Urtasun

Instance-Level Segmentation with Deep Densely Connected MRFs  [PDF]

Ziyu Zhang, Sanja Fidler, Raquel Urtasun
 Min Bai
[pdf]
 Scene Understanding Attend, Infer, Repeat: Fast Scene Understanding with Generative Models   [PDF]

S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey E. Hinton
 Namdar Homayounfar
[pdf]
 Reinforcement Learning Playing Atari with Deep Reinforcement Learning   [PDF]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
 Jonathan Chung
[pdf]
 Medical Imaging Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning   [PDF]

Oren Z. Kraus, Lei Jimmy Ba, Brendan Frey
 Alex Lu
[pptx]
 Humor We Are Humor Beings: Understanding and Predicting Visual Humor  [PDF]

Arjun Chandrasekaran, Ashwin K Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
 Shuai Wang
[pdf]
 back
to top


Resources


Tutorials, related courses:

  Introduction to Neural Networks, CSC321 course at University of Toronto
  Course on Convolutional Neural Networks, CS231n course at Stanford University
  Course on Probabilistic Graphical Models, CSC412
course at University of Toronto, advanced machine learning course


Software:

  Caffe: Deep learning for image classification
  Tensorflow: Open Source Software Library for Machine Intelligence (good
software for deep learning)
  Theano: Deep learning library
  mxnet: Deep Learning library
  Torch: Scientific computing framework with wide support for machine learning algorithms
  LIBSVM: A Library for Support Vector Machines (Matlab,
Python)
  scikit: Machine learning in Python


Popular datasets:

  ImageNet: Large-scale object dataset
  Microsoft Coco: Large-scale image recognition, segmentation, and captioning dataset
  Mnist: handwritten digits
  PASCAL VOC: Object recognition dataset
  KITTI: Autonomous driving dataset
  NYUv2: Indoor RGB-D dataset
  LSUN: Large-scale Scene Understanding challenge
  VQA: Visual question answering dataset
  Madlibs: Visual Madlibs (question answering)
  Flickr30K: Image captioning dataset
  Flickr30K Entities: Flick30K with phrase-to-region
correspondences
  MovieDescription:
a dataset for automatic description of movie clips
  Action datasets: a list of action
recognition datasets
  MPI Sintel Dataset: optical flow dataset
  BookCorpus: a corpus of 11,000 books


Online demos:

  Lots of cool Toronto Deep Learning Demos: image classification
and captioning demos
  Lots of cool demos for ConvNets by Andrej Karpathy
  Reinforcement Learning with Neural
Nets (read paper for more info)
  Places: scene classification with neural nets
  CRF as RNN: Semantic Image Segmentation
  drawNet: visualization of ConvNet
activations
  Visualization of ConvNets for digit classification
  AI-painter: modify your photo in a certain style (eg, Van
Gogh); uses neural nets as explained in this paper


Main conferences:

  NIPS (Neural Information Processing Systems)
  ICML (International Conference on Machine Learning)
  ICLR (International Conference on Learning Representations)
  AISTATS (International Conference on Artificial Intelligence and Statistics)
  CVPR (IEEE Conference on Computer Vision and Pattern Recognition)
  ICCV (International Conference on Computer Vision)
  ECCV (European Conference on Computer Vision)
  ACL (Association for Computational Linguistics)
  EMNLP (Conference on Empirical Methods in Natural Language
Processing)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐