Introduction to Neural Networks
Logistics
• We can’t hear you…
• Recording will be available…
• Slides will be available…
• Code samples and notebooks will be available…
• Queue up Questions…
Accelerate innovation by unifying data science, engineering
and business
• Founded by the original creators of Apache Spark
• Contributes 75% of the open source code, 10x more than
any other company
• Trained 100k+ Spark users on the Databricks platform
VISION
WHO WE ARE
Unified Analytics Platform powered by Apache Spark™PRODUCT
Aboutourspeaker
Denny Lee
Technical Product Marketing Manager
Former:
• Senior Director of Data Sciences Engineering at SAP Concur
• Principal Program Manager at Microsoft
• Azure Cosmos DB Engineering Spark and Graph Initiatives
• Isotope Incubation Team (currently known as HDInsight)
• Bing’s Audience Insights Team
• Yahoo!’s 24TB Analysis Services cube
DeepLearningFundamentalsSeries
This is a three-part series:
• Introduction to Neural Networks
• Training Neural Networks
• Applying your Neural Networks
This series will be make use of Keras (TensorFlow backend) but as it is a
fundamentals series, we are focusing primarily on the concepts.
IntroductiontoNeuralNetworks
• What is Deep Learning?
• What can Deep Learning do for you?
• What are artificial neural networks?
• Let’s start with a perceptron…
• Understanding the effect of activation functions
UpcomingSessions
Training your Neural Network
• Tuning training
• Training Algorithms
• Optimization (including Adam)
• Convolutional Neural Networks
Applying your Neural Networks
• Diving further into Convolutional
Neural Networks (CNNs)
• CNN Architectures
• Convolutions at Work!
Convolutional Neural Networks
28 x 28 28 x 28 14 x 14
Convolution
32 filters
Convolution
64 filters
Subsampling
Stride (2,2)
Feature Extraction Classification
0
1
8
9
FullyConnected
Dropout
Dropout
Deep Learning, ML, AI, … oh my!
DeepLearning,ML,AI,…ohmy!
Source: http://bit.ly/2suufGJ
DeepLearning,ML,AI,…ohmy!
Artificial Intelligence
Artificial Intelligence is human
intelligence exhibited by machines
The world’s best Dota 2 players just
got destroyed by a killer AI from Elon
Musk’s startup
Elon Musk-funded Dota 2 bots spank
top-tier humans, and they know how
to trash talk
DeepLearning,ML,AI,…ohmy!
Machine Learning
Field of study that gives computers
the ability to learn without being
explicitly programmed.
Source: https://bit.ly/2NnyrOz
DeepLearning,ML,AI,…ohmy!
Deep Learning
Using deep neural networks to
implement machine learning
Ask not what AI can do for you….
Applicationsofneuralnetworks
Object Classification
Object Recognition: SpeechObject Recognition: Facial Game Playing
Language Translation
Healthcare
Source: http://bit.ly/2BXF5sR
Fakingvideos
Source: https://youtu.be/dkoi7sZvWiU Source: https://youtu.be/JzgOfISLNjk
Turning a horse video into a zebra video
in real time using GANs
AI-generated "real fake" video of
Barack Obama
Self-drivingcars
Source: https://youtu.be/URmxzxYlmtg?t=6m30s
Turningthedayintonight
Source: https://youtu.be/N7KbfWodXJE
Whatareartificialneuralnetworks?
Source: http://bit.ly/2jc9mYw
Conceptually,whatareartificialneuralnetworks?
• A neuron receives a signal, processes it,
and propagates the signal (or not)
• The brain is comprised of around 100
billion neurons, each connected to ~10k
other neurons: 1015 synaptic connections
• ANNs are a simplistic imitation of a brain
comprised of dense net of simple
structures
Source:http://bit.ly/2jjGlNd
Conceptualmathematicalmodel
• Receives input from sources
• Computes weighted sum
• Passes through an activation function
• Sends the signal to succeeding
neurons
h1 = x1w1 + x2w2 + . . . xnwn ∑
w1
w2
w3
wn
f(
∑
)
o1
o2
o3
om
. . . . . .
m
n
“Parallels”
A single neuron in the brain is an incredibly complex machine that even today we
don’t understand. A single “neuron” in a neural network is an incredibly simple
mathematical function that captures a minuscule fraction of the complexity of a
biological neuron. So to say neural networks mimic the brain, that is true at the
level of loose inspiration, but really artificial neural networks are nothing like
what the biological brain does.
Andrew Ng
Medium Article: Google Brain’s Co-inventor Tells
Why He’s Building Chinese Neural Networks
“Parallels”
A single neuron in the brain is an incredibly complex machine that even today we
don’t understand. A single “neuron” in a neural network is an incredibly simple
mathematical function that captures a minuscule fraction of the complexity of a
biological neuron. So to say neural networks mimic the brain, that is true at the
level of loose inspiration, but really artificial neural networks are nothing like
what the biological brain does.
Andrew Ng
Medium Article: Google Brain’s Co-inventor Tells
Why He’s Building Chinese Neural Networks
ArtificialNeuralNetwork
• Organized into layers of neurons as a black-
box model
• Typically 3 or more: input, hidden and output
• For image detectors,
• the first layers are organized similar as the
visual cortex,
• but activation functions are simpler for
ANNs vs. biological neural networks
How perceptive of you…
or
The Immense Power of
Simple Structures
Perceptron
Simplified (binary) artificial neuron
𝑥1
𝑥2
𝑥3
𝑜𝑢𝑡𝑝𝑢𝑡?
x1 → Is the weather good?
x2 → Is the powder good?
x3 → Am I in the mood to drive?
Do I snowboard this weekend?
Perceptron
Simplified (binary) artificial neuron with weights
𝑥1
𝑥2
𝑥3
𝑜𝑢𝑡𝑝𝑢𝑡? 𝑜𝑢𝑡𝑝𝑢𝑡 =
 0,   ∑
𝑛
𝑗=0
𝑤𝑗 𝑥𝑗 ≤ 𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑
 1,   ∑
𝑛
𝑗=0
𝑤𝑗 𝑥𝑗 > 𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑
w1
w2
w3
Perceptron
Simplified (binary) artificial neuron; no weights
𝑥1
𝑥2
𝑥3
𝑜𝑢𝑡𝑝𝑢𝑡?
1
1
0
𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑 = 1
∑
𝑗
𝑤𝑗 = 2
1
x1 = 1 (good weather)
x2 = 1 (a lot of powder)
x3 = 0 (driving sucks)
Do I snowboard this weekend?
Perceptron
Simplified (binary) artificial neuron; add weights
𝑤1 𝑥1
𝑜𝑢𝑡𝑝𝑢𝑡?
3 𝑥 1
1 𝑥 1
5 𝑥 0
𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑 = 5
0
Persona: Après-ski’er
𝑤2 𝑥2
𝑤3 𝑥3
x1 = 1 (good weather)
x2 = 1 (a lot of powder)
x3 = 0 (driving sucks)
w1 = 3
w2 = 1
w3 = 5∑
j
wjxj = 3 + 1 + 0 = 4
Do I snowboard this weekend?
Perceptron
Simplified (binary) artificial neuron; add weights
𝑤1 𝑥1
𝑜𝑢𝑡𝑝𝑢𝑡?
2 𝑥 1
6 𝑥 1
1 𝑥 0
𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑 = 5
1
Persona: Shredder
𝑤2 𝑥2
𝑤3 𝑥3
x1 = 1 (good weather)
x2 = 1 (a lot of powder)
x3 = 0 (driving sucks)
w1 = 2
w2 = 6
w3 = 1∑
j
wjxj = 2 + 6 + 0 = 8
Do I snowboard this weekend?
IntroducingBias
Perceptron needs to take into account the bias
𝑜𝑢𝑡𝑝𝑢𝑡 =
{
 0,   𝑤𝑥 + 𝑏  ≤ 0 
 1,   𝑤𝑥 + 𝑏  > 0
where b is how easy it is to get the perceptron to fire
e.g. Shredder has a strong positive bias to go to Whistler
while Après-Ski’er bias is not as strong
SigmoidNeuron
𝑥1
𝑜𝑢𝑡𝑝𝑢𝑡?𝑥2
𝑥3
𝑤3
𝑤2
𝑤1
The more common artificial neuron
Instead of [0, 1], now (0…1)
Where output is defined by σ(wx + b)
σ(wx + b)
SigmoidNeuron
𝑥1
𝑜𝑢𝑡𝑝𝑢𝑡?𝑥2
𝑥3
𝑤3
𝑤2
𝑤1
Persona: Shredder
x1 = 1 (good weather)
x2 = 1 (a lot of powder)
x3 = 0 (driving sucks)
w1 = 0.3
w2 = 0.6
w3 = 0.1
= σ(1x0.3 + 1x0.6 + 0x0.1)
= σ(0.9)
= 0.7109
? = σ(wx + b)
Do I snowboard this weekend?1
1
0
SimplifiedTwo-LayerANN
Input Hidden Output
x1 → Apres Ski′er
x2 → Shredder
h1 → weather
h2 → powder
h3 → driving
Do I snowboard this weekend?
SimplifiedTwo-LayerANN
1
1
0.8
0.2
0.7
0.6
0.9
0.1
0.8
0.75
0.69
h1 = σ(1x0.8 + 1x0.6) = 0.80
h2 = σ(1x0.2 + 1x0.9) = 0.75
h3 = σ(1x0.7 + 1x0.1) = 0.69
SimplifiedTwo-LayerANN
1
1
0.8
0.2
0.7
0.6
0.9
0.1
0.8
0.75
0.69
0.2
0.8
0.5
0.75
out = σ(0.2x0.8 + 0.8x0.75 + 0.5x0.69)
= σ(1.105)
= 0.75
The Immense Power of Simple Structures
Great Resources
• Andrej Karpathy’s CS231N Neural Networks
• Steve Miller’s How to build a neural network
• Michael Nielsen’s Neural Networks and Deep Learning, Chapter 1
Optimization Primer
Cost function
Source: https://bit.ly/2IoAGzL
For this linear regression example, to
determine the best (slope of the line) for
we can calculate the cost function, such as
Mean Square Error, Mean absolute error,
Mean bias error, SVM Loss, etc.
For this example, we’ll use sum of squared
absolute differences
y = x ⋅ p
p
cost =
∑
|t − y|2
Visualize this cost function
Source: https://bit.ly/2IoAGzL
Calculate its derivative
Source: https://bit.ly/2IoAGzL
Gradient Descent
Source: https://bit.ly/2IoAGzL
Gradient Descent
The goal is to find the lowest point of the cost function (i.e. least amount of cost or the minimum or minima).
Gradient descent iteratively (i.e. step by step) descends the cost function curve to find the minima.
Source: https://bit.ly/2IoAGzL
Gradient Descent Optimization
Source: https://bit.ly/2IoAGzL
Backpropagation
Input Hidden Output
0.8
0.2
0.75
Backpropagation
Input Hidden Output
0.85
• Backpropagation: calculate the
gradient of the cost function in a
neural network
• Used by gradient descent optimization
algorithm to adjust weight of neurons
• Also known as backward propagation
of errors as the error is calculated and
distributed back through the network
of layers
0.10
Neurons … Activate!
Activationfunctions
• Bias (threshold) activation function was proposed first
• Sigmoid and tanh introduce non-linearity with different codomains
• ReLU is one of the more popular ones because its simple to compute and very
robust to noisy inputs
Source:http://bit.ly/2tKhbMS
Source:https://bit.ly/2wv3kcd
Sigmoidfunction
• Sigmoid non-linearity squashes real
numbers between [0, 1]
• Historically a nice interpretation of neuron
firing rate (i.e. not firing at all to fully-
saturated firing ).
• Currently, not used as much because really
large values too close to 0 or 1 result in
gradients too close to 0 stopping
backpropagation.Source: http://bit.ly/2GgMbGW
Sigmoidfunction(continued)
Output is not zero-centered: During gradient
descent, if all values are positive then during
backpropagation the weights will become all
positive or all negative creating zig zagging
dynamics.
Source: https://bit.ly/2IoAGzL
Tanhfunction
• Tanh function squashes real numbers [-1, 1]
• Same problem as sigmoid that its
activations saturate thus killing gradients.
• But it is zero-centered minimizing the zig
zagging dynamics during gradient descent.
• Currently preferred sigmoid nonlinearity
Source: http://bit.ly/2C4y89z
ReLU:RectifierLinearUnit
Source: http://bit.ly/2EwRndsCC0
ReLU’s activation is at threshold of zero
• Quite popular over the last few years
• Speeds up Stochastic Gradient Descent
(SGD) convergence
• It is easier to implement due to simpler
mathematical functions
• Sensitive to high learning rate during
training resulting in “dead” neurons (i.e.
neurons that will not activate across the
entire dataset).
Neurons … Activate!
DEMO
WhatisTensorFlow
• Open source software library for
numerical computation using data
flow graphs
• Based loosely on neural networks
• Built in C++ with a Python interface
• Quickly gained high popularity
• Supports GPU computations
WhatisKeras
• Python Deep Learning Library
• High level neural networks API
• Runs on top of TensorFlow, CNTK, or
Theano
• Simpler and easier to use making it
easier for rapid prototyping
I’d like to thank…
GreatReferences
• Machine Learning 101
• Andrej Karparthy’s ConvNetJS MNIST Demo
• What is back propagation in neural networks?
• CS231n: Convolutional Neural Networks for Visual Recognition
• Syllabus and Slides | Course Notes | YouTube
• With particular focus on CS231n: Lecture 7: Convolution Neural Networks 
• Neural Networks and Deep Learning
• TensorFlow
GreatReferences
• Deep Visualization Toolbox
• What's the difference between gradient descent and stochastic gradient
descent?
• Back Propagation with TensorFlow
• TensorFrames: Google TensorFlow with Apache Spark
• Integrating deep learning libraries with Apache Spark
• Build, Scale, and Deploy Deep Learning Pipelines with Ease
Attribution
Tomek Drabas
Brooke Wenig
Timothee Hunter
Cyrielle Simeone
Q&A
What’s next?
Training your Neural Networks
October 9, 2018 | 09:00 PDT
https://dbricks.co/2pugWSC

Introduction to Neural Networks