Convolutional Neural Networks
RTSS JUN YOUNG PARK
What is Covnet(CNN) ?
C R C R P
…
DOG
POLAR BEAR
WOLF
ELEPHANT
FC
Reduce
Dimension
Discrete Convolution
1 2 3 0
0 1 2 3
3 0 1 2
2 3 0 1
2 0 1
0 1 2
1 0 2
*
15 16
6 15
2 0 3
0 1 4
3 0 2
- FMA(Fused Multiply-Add) for each area to get each ‘Value’
Multiply for each elements
Input data Kernel Output data
Σ
Padding
SAME VALID
0 0 0 0 0 0
0 12 27 7 3 0
0 31 9 6 8 0
0 9 15 3 12 0
0 6 3 30 13 0
0 0 0 0 0 0
12 27 7 3
31 9 6 8
9 15 3 12
6 3 30 13
Padding = 1
Stride - 1
KERNEL = 3X3, STRIDE = 1, PADDING = SAME
0 0 0 0 0 0 0
0 7 10 2 9 3 0
0 12 27 7 3 2 0
0 31 9 6 8 6 0
0 9 15 3 12 4 0
0 6 3 30 13 5 0
0 0 0 0 0 0 0
a b c d e
f g h i j
k l m n o
p .. .. .. ..
.. .. .. .. ..
2 0 1
0 1 2
1 0 2
* =
Stride
Stride - 2
KERNEL = 3X3, STRIDE = 2, PADDING = SAME
0 0 0 0 0 0 0
0 7 10 2 9 3 0
0 12 27 7 3 2 0
0 31 9 6 8 6 0
0 9 15 3 12 4 0
0 6 3 30 13 5 0
0 0 0 0 0 0 0
A B C
D E F
G H I
2 0 1
0 1 2
1 0 2
* =
Stride
Output Size
𝑆𝑆𝑍𝑍𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 =
𝑁𝑁−𝐹𝐹
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆
+ 1
𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑆𝑆𝑍𝑍𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 ∈ 𝑁𝑁
0 0 0 0 0 0 0
0 7 10 2 9 3 0
0 12 27 7 3 2 0
0 31 9 6 8 6 0
0 9 15 3 12 4 0
0 6 3 30 13 5 0
0 0 0 0 0 0 0
Fully Connected Layer
C R C R P
…
DOG
POLAR BEAR
WOLF
ELEPHANT
FC
Reduce
Dimension
By previous procedures …
P
32 x 32 x 3
x Build Classifier
<Feature extraction> <Classification>
Convolution Layers
Filter
(28, 28, 6)(32, 32, 3)
Convolution,
ReLU
With
6 (5,5,3)
filters
With
10 (5,5,6)
filters
(24, 24, 10)
……
𝑁𝑁𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 5 ∗ 5 ∗ 3 ∗ 6 + 5 ∗ 5 ∗ 6 ∗ 10
Convolution,
ReLU
Pooling layer
Convolution layer
Size Reduction
Pooling
Slice !
Pooling (Subsampling)
◦ Object
◦ To reduce Rows/Cols from the matrix
◦ Advantage
◦ No parameters to train
◦ Invariable channels
◦ Not affected by variance of input
◦ Method
◦ Max Pooling
Use maximum value of the target area
◦ Mean Pooling
Use average value of the target area
31 8
15 30
12 27 7 3
31 9 6 8
9 15 3 12
6 3 30 13
12 27 7 3
31 9 6 8
9 15 3 12
6 3 30 13
12.25 6.00
8.25 14.50
<Max Pooling>
<Mean Pooling>
Kernel : 2x2
Stride : 2
Applications
◦ LeNet-5 (1998)
◦ AlexNet (2012)
◦ GoogLeNet (2014)
◦ Inception module : Parallel composition of layers.
◦ 1x1 convolution : Mathematically equivalent to a multi-layer perceptron.
◦ ResNet (2015)
◦ Fast Forward : Step over to skip some layers (Residual Net)
Practical Use
◦ Build NN with simplified API and Class by TensorFlow
◦ Training with MNIST dataset
◦ Test with MNIST dataset and ‘Hand-Written’ own data.
◦ Applying some techniques that we discussed before
◦ Ensemble, Dropout, Batch
Define class for a model
Model
+ keep_prop
+ X, Y
+ sess
+ name
+ bool training
+ layers(conv, pool, dropout, dense …)
+ __init__(self, sess, name)
+ _build_net(self)
+ predict(self,x_test,training)
+ get_accuracy(self,x_test,y_test,training=False)
+ train(self,x_data_y_data_training=True)
Shape of Layers
IMG
[32, 32, 1] FILTER1
[32, 32, 32]
POOL1
[16, 16, 32]
FILTER2
[16, 16, 64]
POOL2
[8, 8, 64]
FILTER3
[8, 8, 128] POOL3
[4, 4, 128]
ReLU
ReLU ReLU
[4*4*128, 625]
Flatten ReLU
Drop
Drop
[625,
10]
Logits : 0 ~ 9
# of filters
Stride = 2
Stride = 2
Stride = 2
Implement Layers
Implement Ensemble
Model 1 . . . Model 10
Predict Predict
+
Predictions
mean
Test Result
◦ Model : 5
◦ Epoch : 15
◦ Ensemble Accuracy : 99.05%
Test ‘Hand-Written’ Data
Scan & Resample for
each numbers
Image to Test
[28,28,1] - Bitmap
Load Image
Self Assignment
◦ 미완성된 자필 숫자 인식기를 완성하라.
◦ 문제점
◦ pyplot.imshow() 메소드가 이미지를 자동으로 4채널(RGBA) 로 읽어 들임. (해결)
◦ Type Mismatch 오류, TensorFlow 에서 필요로 하는 Type을 제대로 설정하지 못하였을 가능성.
Trial and Error
◦ Convert RGB image to mono(gray) bitmap
Troubleshooting
◦ Content of MNIST dataset
……
Black (Background)
White (Content)
# of Test Images
Size of each image
(784 = 28*28)
Troubleshooting
◦ Causes of failure
◦ Different from MNIST dataset, Hand-Written data have black content and white background.
◦ After convert to grayscale, each pixel has too large value different from MNIST dataset.
◦ Solutions
◦ Invert Hand-Written data to have black background and white content.
◦ Normalize grayscale converted image.
Troubleshooting
◦ Read each image from the floder
◦ Convert to grayscale (28x28x3 -> 28x28)
◦ Reshape (28x28 -> 784)
◦ Append the image to the list.
◦ Convert the list to ndarray.
◦ Apply normalization.
◦ Get sum of predictions from models.
◦ Print the index of the max prediction value of each image.
Test Result
8
5
4
Self Test
◦ Discrete Convolution 에 대해 설명하라.
◦ Pooling 의 대표적인 두가지 방법을 설명하라.
◦ Stride, Padding 에 대하여 설명하고 크기 변화가 어떻게 일어나는지 설명하라.

Convolutional Neural Network

  • 1.
  • 2.
    What is Covnet(CNN)? C R C R P … DOG POLAR BEAR WOLF ELEPHANT FC Reduce Dimension
  • 3.
    Discrete Convolution 1 23 0 0 1 2 3 3 0 1 2 2 3 0 1 2 0 1 0 1 2 1 0 2 * 15 16 6 15 2 0 3 0 1 4 3 0 2 - FMA(Fused Multiply-Add) for each area to get each ‘Value’ Multiply for each elements Input data Kernel Output data Σ
  • 4.
    Padding SAME VALID 0 00 0 0 0 0 12 27 7 3 0 0 31 9 6 8 0 0 9 15 3 12 0 0 6 3 30 13 0 0 0 0 0 0 0 12 27 7 3 31 9 6 8 9 15 3 12 6 3 30 13 Padding = 1
  • 5.
    Stride - 1 KERNEL= 3X3, STRIDE = 1, PADDING = SAME 0 0 0 0 0 0 0 0 7 10 2 9 3 0 0 12 27 7 3 2 0 0 31 9 6 8 6 0 0 9 15 3 12 4 0 0 6 3 30 13 5 0 0 0 0 0 0 0 0 a b c d e f g h i j k l m n o p .. .. .. .. .. .. .. .. .. 2 0 1 0 1 2 1 0 2 * = Stride
  • 6.
    Stride - 2 KERNEL= 3X3, STRIDE = 2, PADDING = SAME 0 0 0 0 0 0 0 0 7 10 2 9 3 0 0 12 27 7 3 2 0 0 31 9 6 8 6 0 0 9 15 3 12 4 0 0 6 3 30 13 5 0 0 0 0 0 0 0 0 A B C D E F G H I 2 0 1 0 1 2 1 0 2 * = Stride
  • 7.
    Output Size 𝑆𝑆𝑍𝑍𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑁𝑁−𝐹𝐹 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 + 1 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑆𝑆𝑍𝑍𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 ∈ 𝑁𝑁 0 0 0 0 0 0 0 0 7 10 2 9 3 0 0 12 27 7 3 2 0 0 31 9 6 8 6 0 0 9 15 3 12 4 0 0 6 3 30 13 5 0 0 0 0 0 0 0 0
  • 8.
    Fully Connected Layer CR C R P … DOG POLAR BEAR WOLF ELEPHANT FC Reduce Dimension By previous procedures … P 32 x 32 x 3 x Build Classifier <Feature extraction> <Classification>
  • 9.
    Convolution Layers Filter (28, 28,6)(32, 32, 3) Convolution, ReLU With 6 (5,5,3) filters With 10 (5,5,6) filters (24, 24, 10) …… 𝑁𝑁𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 5 ∗ 5 ∗ 3 ∗ 6 + 5 ∗ 5 ∗ 6 ∗ 10 Convolution, ReLU
  • 10.
    Pooling layer Convolution layer SizeReduction Pooling Slice !
  • 11.
    Pooling (Subsampling) ◦ Object ◦To reduce Rows/Cols from the matrix ◦ Advantage ◦ No parameters to train ◦ Invariable channels ◦ Not affected by variance of input ◦ Method ◦ Max Pooling Use maximum value of the target area ◦ Mean Pooling Use average value of the target area 31 8 15 30 12 27 7 3 31 9 6 8 9 15 3 12 6 3 30 13 12 27 7 3 31 9 6 8 9 15 3 12 6 3 30 13 12.25 6.00 8.25 14.50 <Max Pooling> <Mean Pooling> Kernel : 2x2 Stride : 2
  • 12.
    Applications ◦ LeNet-5 (1998) ◦AlexNet (2012) ◦ GoogLeNet (2014) ◦ Inception module : Parallel composition of layers. ◦ 1x1 convolution : Mathematically equivalent to a multi-layer perceptron. ◦ ResNet (2015) ◦ Fast Forward : Step over to skip some layers (Residual Net)
  • 13.
    Practical Use ◦ BuildNN with simplified API and Class by TensorFlow ◦ Training with MNIST dataset ◦ Test with MNIST dataset and ‘Hand-Written’ own data. ◦ Applying some techniques that we discussed before ◦ Ensemble, Dropout, Batch
  • 14.
    Define class fora model Model + keep_prop + X, Y + sess + name + bool training + layers(conv, pool, dropout, dense …) + __init__(self, sess, name) + _build_net(self) + predict(self,x_test,training) + get_accuracy(self,x_test,y_test,training=False) + train(self,x_data_y_data_training=True)
  • 15.
    Shape of Layers IMG [32,32, 1] FILTER1 [32, 32, 32] POOL1 [16, 16, 32] FILTER2 [16, 16, 64] POOL2 [8, 8, 64] FILTER3 [8, 8, 128] POOL3 [4, 4, 128] ReLU ReLU ReLU [4*4*128, 625] Flatten ReLU Drop Drop [625, 10] Logits : 0 ~ 9 # of filters Stride = 2 Stride = 2 Stride = 2
  • 16.
  • 17.
    Implement Ensemble Model 1. . . Model 10 Predict Predict + Predictions mean
  • 18.
    Test Result ◦ Model: 5 ◦ Epoch : 15 ◦ Ensemble Accuracy : 99.05%
  • 19.
    Test ‘Hand-Written’ Data Scan& Resample for each numbers Image to Test [28,28,1] - Bitmap
  • 20.
  • 21.
    Self Assignment ◦ 미완성된자필 숫자 인식기를 완성하라. ◦ 문제점 ◦ pyplot.imshow() 메소드가 이미지를 자동으로 4채널(RGBA) 로 읽어 들임. (해결) ◦ Type Mismatch 오류, TensorFlow 에서 필요로 하는 Type을 제대로 설정하지 못하였을 가능성.
  • 22.
    Trial and Error ◦Convert RGB image to mono(gray) bitmap
  • 23.
    Troubleshooting ◦ Content ofMNIST dataset …… Black (Background) White (Content) # of Test Images Size of each image (784 = 28*28)
  • 24.
    Troubleshooting ◦ Causes offailure ◦ Different from MNIST dataset, Hand-Written data have black content and white background. ◦ After convert to grayscale, each pixel has too large value different from MNIST dataset. ◦ Solutions ◦ Invert Hand-Written data to have black background and white content. ◦ Normalize grayscale converted image.
  • 25.
    Troubleshooting ◦ Read eachimage from the floder ◦ Convert to grayscale (28x28x3 -> 28x28) ◦ Reshape (28x28 -> 784) ◦ Append the image to the list. ◦ Convert the list to ndarray. ◦ Apply normalization. ◦ Get sum of predictions from models. ◦ Print the index of the max prediction value of each image.
  • 26.
  • 27.
    Self Test ◦ DiscreteConvolution 에 대해 설명하라. ◦ Pooling 의 대표적인 두가지 방법을 설명하라. ◦ Stride, Padding 에 대하여 설명하고 크기 변화가 어떻게 일어나는지 설명하라.