Tejas Kulkarni!1!
Introduction of Deep
Learning!
Zaikun Xu
USI, Master of Informatics
HPC Advisory Council Switzerland Conference 2016
2!
Overview!
Introduction to deep learning!
Big data + Big computational power (GPUs)!
Latest updates of deep learning!
Examples!
Feature extraction!
3!
Machine Learning!
source : http://www.cs.utexas.edu/~eladlieb/RLRG.html!
Reinforcement learning
Introduction to deep learning!
4!
Inspiration from Human
brain!
Introduction to deep learning!
•  Over 100 billion neurons !
•  Around 1000 connections !
•  Electrical signal transmitted through axon, can be inhibitory
or excitatory!
•  Activation of a neuron depends on inputs!
5!
Artificial Neuron!
•  Circles —————————- Neuron!
source : http://cs231n.stanford.edu/!
•  Arrow —————————— Synapse!
•  Weight —————————- Electrical signal!
Introduction to deep learning!
6!
Nonlinearity!
Introduction to deep learning!
•  Neural network learns complicated structured data with non-linear
activation functions!
7!
Perceptron!
•  Perceptron with linear activation!
Introduction to deep learning!
•  In1969, cannot learn XOR function and takes very
long to train!
Winter of AI!
8!
Back-Propogation!
•  1970, got BS on psychology and keep studying NN at University of
Edinburg !
Introduction to deep learning!
•  1986, Geoffrey Hinton and David Rumelhart, Nature paper
"Learning Representations by Back-propagating errors”!
•  Hidden layers added!
•  Computation linearly depends on # of neurons !
•  Of course, computers at that time is much faster!
9!
Convolutionary NN!
•  Yann Lecun, born at Paris, post-doc of Hinton at U
Toronto.!
Introduction to deep learning!
•  LeNet on hand-digital recognition at Bell Labs, 5% error
rate!
•  Attack skill : convolution, pooling!
10!
Local connection!
11!
Introduction to deep learning!
Weight sharing!
12!
Introduction to deep learning!
Convolution!
•  Feature map too big!
13!
Introduction to deep learning!
Pooling!
14!
Introduction to deep learning!
http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622!
Support Vector
Machine!
Introduction to deep learning!
•  Vladmir Vapnik, born at Soviet Union, colleague of Yann Lecun ,
invented SVM at 1963!
•  Attack skill : Max-margin + kernel mapping!
15!
Another winter looming!
Introduction to deep learning!
•  SVM : good at “capacity
control”, easy to use, repeat!
•  NN : good at modelling
complicated structure!
•  SVM achieved 0.8% error rate as 1998 and 0.56% in 2002 on the
same hand-digit recognition task. !
•  NN stops at local optima, hard and takes long to train,
overfitting!
16!
Recurrent Neural Nets!
Introduction to deep learning!
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/!
17!
Model sequential data!
Vanish gradient!
Introduction to deep learning!
18!
LSTM!
•  http://www.cs.toronto.edu/~graves/handwriting.html!
19!
Introduction to deep learning!
10 years funding!
•  Hinton was happy and rebranded its neural network to
deep learning. The story goes on …………..!
Introduction to deep learning!
20!
Other Pioneers!
Introduction to deep learning!
21!
Data preparation!
•  Input : Can be image, video, text, sounds….!
•  Output : Can be a translation of another language,
location of an object, a caption describing a image
….!
Introduction to deep learning!
Training data! Validation data ! Test data !
Model!
22!
Training !
•  Forward pass, propagating information from input to
ouput!
Introduction to deep learning!
23!
Training!
•  Backward pass, propagating errors back and
update weights accordingly!
Introduction to deep learning!
24!
Parameter update!
•  Gradient descent (sgd, mini-batch)!
Introduction to deep learning!
animation by Alec Radford!
•  x = x - lr * dx!
source : http://cs231n.stanford.edu/!
25!
Parameter update!
•  Gradient descent!
Introduction to deep learning!
•  x = x - lr * dx!
animation by Alec Radford)!
•  Second order method ? Quasi-Newton’s method!
26!
Problems!
•  While neural networks can approximate any
function of input X to output y, it is very hard to train
with multiple layers!
•  Initialization!
Introduction to deep learning!
•  Pre-training!
•  More data/Dropout to avoid overfitting!
•  Accelerate with GPU!
•  Support training in parallel!
27!
ImageNet!
•  source :
https://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-
challenge/!
Big data + Big computational power!
•  >1M well labeled images of 1000 classes!
28!
Big data + Big computational power!
29!
GPUs!
•  source :
http://techgage.com/article/gtc-2015-in-depth-recap-deep-learning-quadro-m6000-autonomous-driving-more/!
Big data + Big computational power!
30!
Parallelization!
•  Data Parallelization: same weight but different data,
gradient needs to be synchronized!
Big data + Big computational power!
•  Not scale well with size of cluster!
•  Works well on smaller clusters and easy to implement !
source : Nvidia!
31!
•  Model Parallelization: same data, but different parts
of weight!
Parallelization!
Big data + Big computational power!
•  Parameters in a
big Neural net
can not fit into
memory of a
single GPU!
Figure from Yi Wang! 32!
Model+Data Parallelization!
Big data + Big computational power!
Krizhevsky et al. 2014!
computation!
heavy!
weight heavy!
33!
!
•  Trained on 30M moves + Computation + Clear
Reward scheme!
Big data + Big computational power!
34!
•  Task : A classifier to recognize hand written digits!
Feature extraction!
source :https://indico.io/blog/visualizing-with-t-sne/!
35!
Hierarchal feature
extraction!
Feature extraction!
•  source : http://www.rsipvision.com/exploring-deep-learning/!
36!
Transfer learning!
•  Use features learned on ImageNet to tune your only classifier on
other tasks.!
Feature extraction!
•  9 categories + 200, 000 images!
•  extract features from last 2 layers + linear SVM!
37!
Word embedding!
•  Embed words into vector space such that similar words are more
close to each other in vector space!
Feature extraction!
source : http://anthonygarvan.github.io/wordgalaxy/!
•  Calculate cosine similarity !
Pairs - France+ Italy = Rome!
38!
Examples!
39!
Go!
•  3361 different legal positions, roughly 10170!
Examples!
Brute Force does not work !!!!
•  Monto Carlo Tree Search !
Heuristic Searching method!
Still slow since the search tree is so big!
40!
AlphaGo!
Examples!
•  Offline learning : from human go players, learn a Deep
learning based policy network and a rollout policy
•  Use self-playing result to train a value network,
which evaluate the current situation!
41!
Online Playing!
Examples!
•  When playing with Lee Sedol, policy network give
possible moves (about 10 to 20), the value network
calculates the weight of position.!
•  MCTS update the weight of each possible moves
and select one move.!
•  The engineering effort on systems with CPU and
GPU with data , model parallelism matters
42!
Image recognition!
Examples!
Train in a supervised fashion !43!
Caption generation!
Examples!
44!
Neural MT!
•  The main idea behind RNNs is to compress a sequence of input
symbols into a fixed-dimensional vector by using recursion.!
Examples!
source : https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/!
45!
Neural arts!
Examples!
46!
Go deeper !
Updates of deep learning!
•  Deep Residual Learning for Image Recognition, MSRA, 2015!47!
Go multi-modal!
Updates of deep learning!
48!
Go End-End!
Updates of deep learning!
ps://devblogs.nvidia.com/parallelforall/deep-speech-accurate-speech-recognition-gpu-accelerated-deep-learning/#.VO56E8ZtEkM.google_plus
49!
Go with attention!
Updates of deep learning!
50!
Go with Specalized Chips!
Updates of deep learning!
•  Prototypes!
51!
Go where!
•  Video understanding!
Updates of deep learning!
•  Deep reinforcement learning!
•  Natural language understanding!
•  Unsupervised learning !!!!
52!
Action classification!
53!
54!
55!
Summary!
•  “The takeaway is that deep learning excels in tasks
where the basic unit, a single pixel, a single
frequency, or a single word/character has little
meaning in and of itself, but a combination of such
units has a useful meaning.” — Dallin Akagi!
•  Structured data + well defined cost function/
rewards !
•  learning in a End-End fashion is nice trend!
56!
Thoughts!
•  Deep leaning will change our lives in a positive way!
•  We learn fast by either competing with friend like AlphaGo,
or they directly teach us in a effective way
•  What might happen in the future!
•  Still long way to go towards real AI!
57!
list of materials!
•  https://github.com/ChristosChristofidis/awesome-
deep-learning!
58!
Acknowledgement!
•  Tim Dettmers!
•  Lyu xiyu!
59!

Tutorial on Deep Learning