Tutorial on Deep Learning

Introduction of Deep
Learning!
Zaikun Xu
USI, Master of Informatics
HPC Advisory Council Switzerland Conference 2016
2!

Overview!
Introduction to deep learning!
Big data + Big computational power (GPUs)!
Latest updates of deep learning!
Examples!
Feature extraction!
3!

Machine Learning!
source : http://www.cs.utexas.edu/~eladlieb/RLRG.html!
Reinforcement learning
4!

Inspiration from Human
brain!
•  Over 100 billion neurons !
•  Around 1000 connections !
•  Electrical signal transmitted through axon, can be inhibitory
or excitatory!
•  Activation of a neuron depends on inputs!
5!

Artiﬁcial Neuron!
•  Circles —————————- Neuron!
source : http://cs231n.stanford.edu/!
•  Arrow —————————— Synapse!
•  Weight —————————- Electrical signal!
6!

Nonlinearity!
•  Neural network learns complicated structured data with non-linear
activation functions!
7!

Perceptron!
•  Perceptron with linear activation!
•  In1969, cannot learn XOR function and takes very
long to train!
Winter of AI!
8!

Back-Propogation!
•  1970, got BS on psychology and keep studying NN at University of
Edinburg !
•  1986, Geoffrey Hinton and David Rumelhart, Nature paper
"Learning Representations by Back-propagating errors”!
•  Hidden layers added!
•  Computation linearly depends on # of neurons !
•  Of course, computers at that time is much faster!
9!

Convolutionary NN!
•  Yann Lecun, born at Paris, post-doc of Hinton at U
Toronto.!
•  LeNet on hand-digital recognition at Bell Labs, 5% error
rate!
•  Attack skill : convolution, pooling!
10!

Local connection!
11!

Weight sharing!
12!

Convolution!
•  Feature map too big!
13!

Pooling!
14!
http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622!

Support Vector
Machine!
•  Vladmir Vapnik, born at Soviet Union, colleague of Yann Lecun ,
invented SVM at 1963!
•  Attack skill : Max-margin + kernel mapping!
15!

Another winter looming!
•  SVM : good at “capacity
control”, easy to use, repeat!
•  NN : good at modelling
complicated structure!
•  SVM achieved 0.8% error rate as 1998 and 0.56% in 2002 on the
same hand-digit recognition task. !
•  NN stops at local optima, hard and takes long to train,
overﬁtting!
16!

Recurrent Neural Nets!
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/!
17!
Model sequential data!

Vanish gradient!
18!

LSTM!
•  http://www.cs.toronto.edu/~graves/handwriting.html!
19!

10 years funding!
•  Hinton was happy and rebranded its neural network to
deep learning. The story goes on …………..!
20!

Other Pioneers!
21!

Data preparation!
•  Input : Can be image, video, text, sounds….!
•  Output : Can be a translation of another language,
location of an object, a caption describing a image
….!
Training data! Validation data ! Test data !
Model!
22!

Training !
•  Forward pass, propagating information from input to
ouput!
23!

Training!
•  Backward pass, propagating errors back and
update weights accordingly!
24!

Parameter update!
•  Gradient descent (sgd, mini-batch)!
animation by Alec Radford!
•  x = x - lr * dx!
source : http://cs231n.stanford.edu/!
25!

Parameter update!
•  Gradient descent!
•  x = x - lr * dx!
animation by Alec Radford)!
•  Second order method ? Quasi-Newton’s method!
26!

Problems!
•  While neural networks can approximate any
function of input X to output y, it is very hard to train
with multiple layers!
•  Initialization!
•  Pre-training!
•  More data/Dropout to avoid overﬁtting!
•  Accelerate with GPU!
•  Support training in parallel!
27!

ImageNet!
•  source :
https://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-
challenge/!
Big data + Big computational power!
•  >1M well labeled images of 1000 classes!
28!

29!

GPUs!
•  source :
http://techgage.com/article/gtc-2015-in-depth-recap-deep-learning-quadro-m6000-autonomous-driving-more/!
30!

Parallelization!
•  Data Parallelization: same weight but different data,
gradient needs to be synchronized!
•  Not scale well with size of cluster!
•  Works well on smaller clusters and easy to implement !
source : Nvidia!
31!

•  Model Parallelization: same data, but different parts
of weight!
Parallelization!
•  Parameters in a
big Neural net
can not ﬁt into
memory of a
single GPU!
Figure from Yi Wang! 32!

Model+Data Parallelization!
Krizhevsky et al. 2014!
computation!
heavy!
weight heavy!
33!

!
•  Trained on 30M moves + Computation + Clear
Reward scheme!
34!

•  Task : A classiﬁer to recognize hand written digits!
Feature extraction!
source :https://indico.io/blog/visualizing-with-t-sne/!
35!

Hierarchal feature
extraction!
Feature extraction!
•  source : http://www.rsipvision.com/exploring-deep-learning/!
36!

Transfer learning!
•  Use features learned on ImageNet to tune your only classiﬁer on
other tasks.!
Feature extraction!
•  9 categories + 200, 000 images!
•  extract features from last 2 layers + linear SVM!
37!

Word embedding!
•  Embed words into vector space such that similar words are more
close to each other in vector space!
Feature extraction!
source : http://anthonygarvan.github.io/wordgalaxy/!
•  Calculate cosine similarity !
Pairs - France+ Italy = Rome!
38!

Go!
•  3361 different legal positions, roughly 10170!
Examples!
Brute Force does not work !!!!
•  Monto Carlo Tree Search !
Heuristic Searching method!
Still slow since the search tree is so big!
40!

AlphaGo!
Examples!
•  Ofﬂine learning : from human go players, learn a Deep
learning based policy network and a rollout policy
•  Use self-playing result to train a value network,
which evaluate the current situation!
41!

Online Playing!
Examples!
•  When playing with Lee Sedol, policy network give
possible moves (about 10 to 20), the value network
calculates the weight of position.!
•  MCTS update the weight of each possible moves
and select one move.!
•  The engineering effort on systems with CPU and
GPU with data , model parallelism matters
42!

Image recognition!
Examples!
Train in a supervised fashion !43!

Caption generation!
Examples!
44!

Neural MT!
•  The main idea behind RNNs is to compress a sequence of input
symbols into a ﬁxed-dimensional vector by using recursion.!
Examples!
source : https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/!
45!

Go deeper !
Updates of deep learning!
•  Deep Residual Learning for Image Recognition, MSRA, 2015!47!

Go multi-modal!
48!

Go End-End!
ps://devblogs.nvidia.com/parallelforall/deep-speech-accurate-speech-recognition-gpu-accelerated-deep-learning/#.VO56E8ZtEkM.google_plus
49!

Go with attention!
50!

Go with Specalized Chips!
•  Prototypes!
51!

Go where!
•  Video understanding!
•  Deep reinforcement learning!
•  Natural language understanding!
•  Unsupervised learning !!!!
52!

Summary!
•  “The takeaway is that deep learning excels in tasks
where the basic unit, a single pixel, a single
frequency, or a single word/character has little
meaning in and of itself, but a combination of such
units has a useful meaning.” — Dallin Akagi!
•  Structured data + well deﬁned cost function/
rewards !
•  learning in a End-End fashion is nice trend!
56!

Thoughts!
•  Deep leaning will change our lives in a positive way!
•  We learn fast by either competing with friend like AlphaGo,
or they directly teach us in a effective way
•  What might happen in the future!
•  Still long way to go towards real AI!
57!

list of materials!
•  https://github.com/ChristosChristoﬁdis/awesome-
deep-learning!
58!

Acknowledgement!
•  Tim Dettmers!
•  Lyu xiyu!
59!

Tutorial on Deep Learning

More Related Content

What's hot

Similar to Tutorial on Deep Learning

More from inside-BigData.com

Recently uploaded

Tutorial on Deep Learning