Deep Neural Network
RTSS JUN YOUNG PARK
Reference
◦ R을 활용한 기계 학습 – Brett Lantz 著
◦ 2017-1학기 ‘현대사회와 빅데이터‘ 교재
◦ 데이터 전처리/표본 분석 과정 참조
Number of Parameters
From the last presentation …
How many parameters in this linear model ?
X W b S(Y)Y
0
1
0
0
0
Dog !
x
Test data (Image)
[1024x768] image
5 Classes
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊 + 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐵𝐵 = 𝐼𝐼 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼_𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ∗ 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 + 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = 3,932,165
Go Deep & Wide !
W1 W2 W3 ?
[784, 256] [256, 256] [256, 10]
Hidden Layer
[10]32
32
X Y
Invisible from the input/output.
Rectified Linear Units
◦ Why not Sigmoid ?
◦ Input signal may too near to 0 during back propagation. (Vanishing Gradient)
𝑅𝑅 𝑥𝑥 = �
𝑥𝑥, 𝑥𝑥 ≥ 0
0, 𝑥𝑥 < 0
𝜕𝜕
𝜕𝜕𝜕𝜕
{𝑅𝑅 𝑥𝑥 } = �
1, 𝑥𝑥 ≥ 0
0, 𝑥𝑥 < 0
Also …
Weight Initialization
◦ DBN (Deep Belief Networks )
◦ Process RBM for training each 2 layers
◦ After initialization -> We just need fine tuning(Training).
◦ Using Gaussian random number
◦ Xavier (2010)
◦ Divide Gaussian random number into number of inputs.
◦ He (2015)
◦ Divide the result of Xavier number into 2.
L2 Regularization
◦ Large weight may bend the model.
◦ To avoid ‘Large Weight’, We use the term below
ℒ =
1
𝑁𝑁
�
𝑖𝑖
𝐷𝐷 𝑆𝑆 𝑊𝑊𝑥𝑥𝑖𝑖 + 𝑏𝑏 , 𝐿𝐿𝑖𝑖 + 𝜆𝜆 � 𝑊𝑊2
0 ≤ 𝜆𝜆 ≤ 1 : Regularization strength
Dropout
◦ Forces the network to have redundant representation
While Testing : No Dropout While Training : Apply Dropout
Chain Rule
F GX Y
y = 𝑔𝑔(𝑓𝑓 𝑥𝑥 )
FX G’ *
y′
= 𝑔𝑔′
𝑓𝑓 𝑥𝑥 ∗ 𝑓𝑓𝑓(𝑥𝑥)
F’
X
Y’
◦ To make back propagation easier, We use operation graph like below.
Back Propagation
◦ Get derivatives using ‘Back Propagation’
+
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝑧𝑧 = 𝑥𝑥 + 𝑦𝑦
𝜕𝜕𝑧𝑧
𝜕𝜕𝜕𝜕
=
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦
= 1
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝜕𝜕𝑧𝑧
𝜕𝜕𝜕𝜕
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝜕𝜕𝑧𝑧
𝜕𝜕𝑦𝑦
x
𝑥𝑥
𝑦𝑦
𝑧𝑧
𝑧𝑧 = 𝑥𝑥𝑥𝑥
𝜕𝜕𝑧𝑧
𝜕𝜕𝜕𝜕
= 𝑦𝑦,
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦
= 𝑥𝑥
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝜕𝜕𝑧𝑧
𝜕𝜕𝜕𝜕
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝜕𝜕𝑧𝑧
𝜕𝜕𝑦𝑦
𝑦𝑦 �
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
𝑥𝑥 �
𝜕𝜕𝐿𝐿
𝜕𝜕𝑧𝑧
For signal 𝐿𝐿 …
Ensemble Learning
Practical Use
◦ Breast cancer diagnosis using ‘Deep Neural Network’
◦ The example from the book ‘Machine Learning with R’
◦ Using the dataset from ‘University of Wisconsin’
◦ The dataset includes 32 features
◦ Diagnosis, Radius, Perimeter, Area … and so on
Import/Define Methods
◦ Import packages for NumPy and TF
◦ Define the method for normalization
𝑧𝑧𝑛𝑛 =
𝑥𝑥𝑛𝑛 − min(𝒙𝒙)
max 𝒙𝒙 − min(𝒙𝒙)
Import Dataset
◦ Dataset from University of Wisconsin.
◦ Exclude unused feature (ID).
◦ Divide dataset for x and y.
One-Hot Encoding
‘M’
[1, 0]
[0, 1]
Malignant
Benign
Divide Dataset
No Cheating !
Design Neural Network
Build Session
◦ Can control forced/unforced.
◦ Restore previous trained weights.
◦ Write log for TensorBoard.
Training Neurons
◦ 10001 steps per a run.
◦ Add summary for Tensorboard.
Save Results and Get Accuracy
◦ Save previous training data to keep current weight and bias
◦ Each run trains 10001 times
Result #1
<1st Attempt> <2nd Attempt>
Attempt more …
To use Xavier initializer
Result #2
96.27% -> 97.01% 97.01% -> 97.76%
Self Test
◦ 모델의 Parameter 수는 어떻게 결정되는지 설명하라.
◦ ReLU 함수의 개형과 그 미분의 결과는 어떻게 되는지 Sigmoid 함수와 비교하여 설명하라.
◦ Weight Initialization 의 목적과 그 방법을 설명하라.
◦ L2 Regularization 의 목적과 그 원리를 설명하라.
◦ Dropout 은 왜 필요한가 ? 또 훈련/시험시에 어떻게 설정해야 적절한가 ?
◦ NN 에 있어 Back Propagation 이 왜 유리한가?
◦ Ensemble Learning 에 대하여 설명하라.

Deep Neural Network

  • 1.
  • 2.
    Reference ◦ R을 활용한기계 학습 – Brett Lantz 著 ◦ 2017-1학기 ‘현대사회와 빅데이터‘ 교재 ◦ 데이터 전처리/표본 분석 과정 참조
  • 3.
    Number of Parameters Fromthe last presentation … How many parameters in this linear model ? X W b S(Y)Y 0 1 0 0 0 Dog ! x Test data (Image) [1024x768] image 5 Classes 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑊𝑊 + 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐵𝐵 = 𝐼𝐼 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼_𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ∗ 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 + 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = 3,932,165
  • 4.
    Go Deep &Wide ! W1 W2 W3 ? [784, 256] [256, 256] [256, 10] Hidden Layer [10]32 32 X Y Invisible from the input/output.
  • 5.
    Rectified Linear Units ◦Why not Sigmoid ? ◦ Input signal may too near to 0 during back propagation. (Vanishing Gradient) 𝑅𝑅 𝑥𝑥 = � 𝑥𝑥, 𝑥𝑥 ≥ 0 0, 𝑥𝑥 < 0 𝜕𝜕 𝜕𝜕𝜕𝜕 {𝑅𝑅 𝑥𝑥 } = � 1, 𝑥𝑥 ≥ 0 0, 𝑥𝑥 < 0
  • 6.
  • 7.
    Weight Initialization ◦ DBN(Deep Belief Networks ) ◦ Process RBM for training each 2 layers ◦ After initialization -> We just need fine tuning(Training). ◦ Using Gaussian random number ◦ Xavier (2010) ◦ Divide Gaussian random number into number of inputs. ◦ He (2015) ◦ Divide the result of Xavier number into 2.
  • 8.
    L2 Regularization ◦ Largeweight may bend the model. ◦ To avoid ‘Large Weight’, We use the term below ℒ = 1 𝑁𝑁 � 𝑖𝑖 𝐷𝐷 𝑆𝑆 𝑊𝑊𝑥𝑥𝑖𝑖 + 𝑏𝑏 , 𝐿𝐿𝑖𝑖 + 𝜆𝜆 � 𝑊𝑊2 0 ≤ 𝜆𝜆 ≤ 1 : Regularization strength
  • 9.
    Dropout ◦ Forces thenetwork to have redundant representation While Testing : No Dropout While Training : Apply Dropout
  • 10.
    Chain Rule F GXY y = 𝑔𝑔(𝑓𝑓 𝑥𝑥 ) FX G’ * y′ = 𝑔𝑔′ 𝑓𝑓 𝑥𝑥 ∗ 𝑓𝑓𝑓(𝑥𝑥) F’ X Y’ ◦ To make back propagation easier, We use operation graph like below.
  • 11.
    Back Propagation ◦ Getderivatives using ‘Back Propagation’ + 𝑥𝑥 𝑦𝑦 𝑧𝑧 𝑧𝑧 = 𝑥𝑥 + 𝑦𝑦 𝜕𝜕𝑧𝑧 𝜕𝜕𝜕𝜕 = 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦 = 1 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝜕𝜕𝑧𝑧 𝜕𝜕𝜕𝜕 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝜕𝜕𝑧𝑧 𝜕𝜕𝑦𝑦 x 𝑥𝑥 𝑦𝑦 𝑧𝑧 𝑧𝑧 = 𝑥𝑥𝑥𝑥 𝜕𝜕𝑧𝑧 𝜕𝜕𝜕𝜕 = 𝑦𝑦, 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦 = 𝑥𝑥 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝜕𝜕𝑧𝑧 𝜕𝜕𝜕𝜕 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝜕𝜕𝑧𝑧 𝜕𝜕𝑦𝑦 𝑦𝑦 � 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 𝑥𝑥 � 𝜕𝜕𝐿𝐿 𝜕𝜕𝑧𝑧 For signal 𝐿𝐿 …
  • 12.
  • 13.
    Practical Use ◦ Breastcancer diagnosis using ‘Deep Neural Network’ ◦ The example from the book ‘Machine Learning with R’ ◦ Using the dataset from ‘University of Wisconsin’ ◦ The dataset includes 32 features ◦ Diagnosis, Radius, Perimeter, Area … and so on
  • 14.
    Import/Define Methods ◦ Importpackages for NumPy and TF ◦ Define the method for normalization 𝑧𝑧𝑛𝑛 = 𝑥𝑥𝑛𝑛 − min(𝒙𝒙) max 𝒙𝒙 − min(𝒙𝒙)
  • 15.
    Import Dataset ◦ Datasetfrom University of Wisconsin. ◦ Exclude unused feature (ID). ◦ Divide dataset for x and y.
  • 16.
  • 17.
  • 18.
  • 19.
    Build Session ◦ Cancontrol forced/unforced. ◦ Restore previous trained weights. ◦ Write log for TensorBoard.
  • 20.
    Training Neurons ◦ 10001steps per a run. ◦ Add summary for Tensorboard.
  • 21.
    Save Results andGet Accuracy ◦ Save previous training data to keep current weight and bias ◦ Each run trains 10001 times
  • 22.
  • 23.
    Attempt more … Touse Xavier initializer
  • 24.
    Result #2 96.27% ->97.01% 97.01% -> 97.76%
  • 25.
    Self Test ◦ 모델의Parameter 수는 어떻게 결정되는지 설명하라. ◦ ReLU 함수의 개형과 그 미분의 결과는 어떻게 되는지 Sigmoid 함수와 비교하여 설명하라. ◦ Weight Initialization 의 목적과 그 방법을 설명하라. ◦ L2 Regularization 의 목적과 그 원리를 설명하라. ◦ Dropout 은 왜 필요한가 ? 또 훈련/시험시에 어떻게 설정해야 적절한가 ? ◦ NN 에 있어 Back Propagation 이 왜 유리한가? ◦ Ensemble Learning 에 대하여 설명하라.