Convolutional Neural Network

Convolutional Neural Networks
RTSS JUN YOUNG PARK

What is Covnet(CNN) ?
C R C R P
…
DOG
POLAR BEAR
WOLF
ELEPHANT
FC
Reduce
Dimension

Discrete Convolution
1 2 3 0
0 1 2 3
3 0 1 2
2 3 0 1
2 0 1
0 1 2
1 0 2
*
15 16
6 15
2 0 3
0 1 4
3 0 2
- FMA(Fused Multiply-Add) for each area to get each ‘Value’
Multiply for each elements
Input data Kernel Output data
Σ

Padding
SAME VALID
0 0 0 0 0 0
0 12 27 7 3 0
0 31 9 6 8 0
0 9 15 3 12 0
0 6 3 30 13 0
0 0 0 0 0 0
12 27 7 3
31 9 6 8
9 15 3 12
6 3 30 13
Padding = 1

Stride - 1
KERNEL = 3X3, STRIDE = 1, PADDING = SAME
0 0 0 0 0 0 0
0 7 10 2 9 3 0
0 12 27 7 3 2 0
0 31 9 6 8 6 0
0 9 15 3 12 4 0
0 6 3 30 13 5 0
0 0 0 0 0 0 0
a b c d e
f g h i j
k l m n o
p .. .. .. ..
.. .. .. .. ..
2 0 1
0 1 2
1 0 2
* =
Stride

Stride - 2
KERNEL = 3X3, STRIDE = 2, PADDING = SAME
0 0 0 0 0 0 0
0 7 10 2 9 3 0
0 12 27 7 3 2 0
0 31 9 6 8 6 0
0 9 15 3 12 4 0
0 6 3 30 13 5 0
0 0 0 0 0 0 0
A B C
D E F
G H I
2 0 1
0 1 2
1 0 2
* =
Stride

Output Size
𝑆𝑆𝑍𝑍𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 =
𝑁𝑁−𝐹𝐹
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆
+ 1
𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑆𝑆𝑍𝑍𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 ∈ 𝑁𝑁
0 0 0 0 0 0 0
0 7 10 2 9 3 0
0 12 27 7 3 2 0
0 31 9 6 8 6 0
0 9 15 3 12 4 0
0 6 3 30 13 5 0
0 0 0 0 0 0 0

Fully Connected Layer
C R C R P
…
DOG
POLAR BEAR
WOLF
ELEPHANT
FC
Reduce
Dimension
By previous procedures …
P
32 x 32 x 3
x Build Classifier
<Feature extraction> <Classification>

Convolution Layers
Filter
(28, 28, 6)(32, 32, 3)
Convolution,
ReLU
With
6 (5,5,3)
filters
With
10 (5,5,6)
filters
(24, 24, 10)
……
𝑁𝑁𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 5 ∗ 5 ∗ 3 ∗ 6 + 5 ∗ 5 ∗ 6 ∗ 10
Convolution,
ReLU

Pooling layer
Convolution layer
Size Reduction
Pooling
Slice !

Pooling (Subsampling)
◦ Object
◦ To reduce Rows/Cols from the matrix
◦ Advantage
◦ No parameters to train
◦ Invariable channels
◦ Not affected by variance of input
◦ Method
◦ Max Pooling
Use maximum value of the target area
◦ Mean Pooling
Use average value of the target area
31 8
15 30
12 27 7 3
31 9 6 8
9 15 3 12
6 3 30 13
12 27 7 3
31 9 6 8
9 15 3 12
6 3 30 13
12.25 6.00
8.25 14.50
<Max Pooling>
<Mean Pooling>
Kernel : 2x2
Stride : 2

Applications
◦ LeNet-5 (1998)
◦ AlexNet (2012)
◦ GoogLeNet (2014)
◦ Inception module : Parallel composition of layers.
◦ 1x1 convolution : Mathematically equivalent to a multi-layer perceptron.
◦ ResNet (2015)
◦ Fast Forward : Step over to skip some layers (Residual Net)

Practical Use
◦ Build NN with simplified API and Class by TensorFlow
◦ Training with MNIST dataset
◦ Test with MNIST dataset and ‘Hand-Written’ own data.
◦ Applying some techniques that we discussed before
◦ Ensemble, Dropout, Batch

Define class for a model
Model
+ keep_prop
+ X, Y
+ sess
+ name
+ bool training
+ layers(conv, pool, dropout, dense …)
+ __init__(self, sess, name)
+ _build_net(self)
+ predict(self,x_test,training)
+ get_accuracy(self,x_test,y_test,training=False)
+ train(self,x_data_y_data_training=True)

Shape of Layers
IMG
[32, 32, 1] FILTER1
[32, 32, 32]
POOL1
[16, 16, 32]
FILTER2
[16, 16, 64]
POOL2
[8, 8, 64]
FILTER3
[8, 8, 128] POOL3
[4, 4, 128]
ReLU
ReLU ReLU
[4*4*128, 625]
Flatten ReLU
Drop
Drop
[625,
10]
Logits : 0 ~ 9
# of filters
Stride = 2
Stride = 2
Stride = 2

Implement Ensemble
Model 1 . . . Model 10
Predict Predict
+
Predictions
mean

Test Result
◦ Model : 5
◦ Epoch : 15
◦ Ensemble Accuracy : 99.05%

Test ‘Hand-Written’ Data
Scan & Resample for
each numbers
Image to Test
[28,28,1] - Bitmap

Self Assignment
◦ 미완성된 자필 숫자 인식기를 완성하라.
◦ 문제점
◦ pyplot.imshow() 메소드가 이미지를 자동으로 4채널(RGBA) 로 읽어 들임. (해결)
◦ Type Mismatch 오류, TensorFlow 에서 필요로 하는 Type을 제대로 설정하지 못하였을 가능성.

Trial and Error
◦ Convert RGB image to mono(gray) bitmap

Troubleshooting
◦ Content of MNIST dataset
……
Black (Background)
White (Content)
# of Test Images
Size of each image
(784 = 28*28)

Troubleshooting
◦ Causes of failure
◦ Different from MNIST dataset, Hand-Written data have black content and white background.
◦ After convert to grayscale, each pixel has too large value different from MNIST dataset.
◦ Solutions
◦ Invert Hand-Written data to have black background and white content.
◦ Normalize grayscale converted image.

Troubleshooting
◦ Read each image from the floder
◦ Convert to grayscale (28x28x3 -> 28x28)
◦ Reshape (28x28 -> 784)
◦ Append the image to the list.
◦ Convert the list to ndarray.
◦ Apply normalization.
◦ Get sum of predictions from models.
◦ Print the index of the max prediction value of each image.

Self Test
◦ Discrete Convolution 에 대해 설명하라.
◦ Pooling 의 대표적인 두가지 방법을 설명하라.
◦ Stride, Padding 에 대하여 설명하고 크기 변화가 어떻게 일어나는지 설명하라.

Convolutional Neural Network

More Related Content

What's hot

Similar to Convolutional Neural Network

More from Jun Young Park

Recently uploaded

Convolutional Neural Network