Introduction to Neural Networks

Logistics
• We can’t hear you…
• Recording will be available…
• Slides will be available…
• Code samples and notebooks will be available…
• Queue up Questions…

Accelerate innovation by unifying data science, engineering
and business
• Founded by the original creators of Apache Spark
• Contributes 75% of the open source code, 10x more than
any other company
• Trained 100k+ Spark users on the Databricks platform
VISION
WHO WE ARE
Unified Analytics Platform powered by Apache Spark™PRODUCT

Aboutourspeaker
Denny Lee
Technical Product Marketing Manager
Former:
• Senior Director of Data Sciences Engineering at SAP Concur
• Principal Program Manager at Microsoft
• Azure Cosmos DB Engineering Spark and Graph Initiatives
• Isotope Incubation Team (currently known as HDInsight)
• Bing’s Audience Insights Team
• Yahoo!’s 24TB Analysis Services cube

DeepLearningFundamentalsSeries
This is a three-part series:
• Introduction to Neural Networks
• Training Neural Networks
• Applying your Neural Networks
This series will be make use of Keras (TensorFlow backend) but as it is a
fundamentals series, we are focusing primarily on the concepts.

IntroductiontoNeuralNetworks
• What is Deep Learning?
• What can Deep Learning do for you?
• What are artificial neural networks?
• Let’s start with a perceptron…
• Understanding the effect of activation functions

UpcomingSessions
Training your Neural Network
• Tuning training
• Training Algorithms
• Optimization (including Adam)
• Convolutional Neural Networks
Applying your Neural Networks
• Diving further into Convolutional
Neural Networks (CNNs)
• CNN Architectures
• Convolutions at Work!

Convolutional Neural Networks
28 x 28 28 x 28 14 x 14
Convolution
32 filters
Convolution
64 filters
Subsampling
Stride (2,2)
Feature Extraction Classification
0
1
8
9
FullyConnected
Dropout
Dropout

Deep Learning, ML, AI, … oh my!

DeepLearning,ML,AI,…ohmy!
Source: http://bit.ly/2suufGJ

Artificial Intelligence
Artificial Intelligence is human
intelligence exhibited by machines
The world’s best Dota 2 players just
got destroyed by a killer AI from Elon
Musk’s startup
Elon Musk-funded Dota 2 bots spank
top-tier humans, and they know how
to trash talk

Machine Learning
Field of study that gives computers
the ability to learn without being
explicitly programmed.
Source: https://bit.ly/2NnyrOz

Deep Learning
Using deep neural networks to
implement machine learning

Ask not what AI can do for you….

Applicationsofneuralnetworks
Object Classification
Object Recognition: SpeechObject Recognition: Facial Game Playing
Language Translation

Healthcare
Source: http://bit.ly/2BXF5sR

Fakingvideos
Source: https://youtu.be/dkoi7sZvWiU Source: https://youtu.be/JzgOfISLNjk
Turning a horse video into a zebra video
in real time using GANs
AI-generated "real fake" video of
Barack Obama

Self-drivingcars
Source: https://youtu.be/URmxzxYlmtg?t=6m30s

Turningthedayintonight
Source: https://youtu.be/N7KbfWodXJE

Whatareartificialneuralnetworks?
Source: http://bit.ly/2jc9mYw

Conceptually,whatareartificialneuralnetworks?
• A neuron receives a signal, processes it,
and propagates the signal (or not)
• The brain is comprised of around 100
billion neurons, each connected to ~10k
other neurons: 1015 synaptic connections
• ANNs are a simplistic imitation of a brain
comprised of dense net of simple
structures
Source:http://bit.ly/2jjGlNd

Conceptualmathematicalmodel
• Receives input from sources
• Computes weighted sum
• Passes through an activation function
• Sends the signal to succeeding
neurons
h1 = x1w1 + x2w2 + . . . xnwn ∑
w1
w2
w3
wn
f(
∑
)
o1
o2
o3
om
. . . . . .
m
n

“Parallels”
A single neuron in the brain is an incredibly complex machine that even today we
don’t understand. A single “neuron” in a neural network is an incredibly simple
mathematical function that captures a minuscule fraction of the complexity of a
biological neuron. So to say neural networks mimic the brain, that is true at the
level of loose inspiration, but really artificial neural networks are nothing like
what the biological brain does.
Andrew Ng
Medium Article: Google Brain’s Co-inventor Tells
Why He’s Building Chinese Neural Networks

ArtificialNeuralNetwork
• Organized into layers of neurons as a black-
box model
• Typically 3 or more: input, hidden and output
• For image detectors,
• the first layers are organized similar as the
visual cortex,
• but activation functions are simpler for
ANNs vs. biological neural networks

How perceptive of you…
or
The Immense Power of
Simple Structures

Perceptron
Simplified (binary) artificial neuron
𝑥1
𝑥2
𝑥3
𝑜𝑢𝑡𝑝𝑢𝑡?
x1 → Is the weather good?
x2 → Is the powder good?
x3 → Am I in the mood to drive?
Do I snowboard this weekend?

Perceptron
Simplified (binary) artificial neuron with weights
𝑥1
𝑥2
𝑥3
𝑜𝑢𝑡𝑝𝑢𝑡? 𝑜𝑢𝑡𝑝𝑢𝑡 =
0, ∑
𝑛
𝑗=0
𝑤𝑗 𝑥𝑗 ≤ 𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑
1, ∑
𝑛
𝑗=0
𝑤𝑗 𝑥𝑗 > 𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑
w1
w2
w3

Perceptron
Simplified (binary) artificial neuron; no weights
𝑥1
𝑥2
𝑥3
1
1
0
𝑡h𝑟𝑒𝑠h𝑜𝑙𝑑 = 1
∑
𝑗
𝑤𝑗 = 2
1
x1 = 1 (good weather)
x2 = 1 (a lot of powder)
x3 = 0 (driving sucks)

Perceptron
Simplified (binary) artificial neuron; add weights
𝑤1 𝑥1
3 𝑥 1
1 𝑥 1
5 𝑥 0
0
Persona: Après-ski’er
𝑤2 𝑥2
𝑤3 𝑥3
w1 = 3
w2 = 1
w3 = 5∑
j
wjxj = 3 + 1 + 0 = 4

Perceptron
Simplified (binary) artificial neuron; add weights
𝑤1 𝑥1
2 𝑥 1
6 𝑥 1
1 𝑥 0
1
Persona: Shredder
𝑤2 𝑥2
𝑤3 𝑥3
w1 = 2
w2 = 6
w3 = 1∑
j
wjxj = 2 + 6 + 0 = 8

IntroducingBias
Perceptron needs to take into account the bias
𝑜𝑢𝑡𝑝𝑢𝑡 =
{
0, 𝑤𝑥 + 𝑏 ≤ 0
1, 𝑤𝑥 + 𝑏 > 0
where b is how easy it is to get the perceptron to fire
e.g. Shredder has a strong positive bias to go to Whistler
while Après-Ski’er bias is not as strong

SigmoidNeuron
𝑥1
𝑜𝑢𝑡𝑝𝑢𝑡?𝑥2
𝑥3
𝑤3
𝑤2
𝑤1
The more common artificial neuron
Instead of [0, 1], now (0…1)
Where output is defined by σ(wx + b)
σ(wx + b)

SigmoidNeuron
𝑥1
𝑜𝑢𝑡𝑝𝑢𝑡?𝑥2
𝑥3
𝑤3
𝑤2
𝑤1
Persona: Shredder
w1 = 0.3
w2 = 0.6
w3 = 0.1
= σ(1x0.3 + 1x0.6 + 0x0.1)
= σ(0.9)
= 0.7109
? = σ(wx + b)
Do I snowboard this weekend?1
1
0

SimplifiedTwo-LayerANN
Input Hidden Output
x1 → Apres Ski′er
x2 → Shredder
h1 → weather
h2 → powder
h3 → driving

1
1
0.8
0.2
0.7
0.6
0.9
0.1
0.8
0.75
0.69
h1 = σ(1x0.8 + 1x0.6) = 0.80
h2 = σ(1x0.2 + 1x0.9) = 0.75
h3 = σ(1x0.7 + 1x0.1) = 0.69

1
1
0.8
0.2
0.7
0.6
0.9
0.1
0.8
0.75
0.69
0.2
0.8
0.5
0.75
out = σ(0.2x0.8 + 0.8x0.75 + 0.5x0.69)
= σ(1.105)
= 0.75

The Immense Power of Simple Structures
Great Resources
• Andrej Karpathy’s CS231N Neural Networks
• Steve Miller’s How to build a neural network
• Michael Nielsen’s Neural Networks and Deep Learning, Chapter 1

Cost function
Source: https://bit.ly/2IoAGzL
For this linear regression example, to
determine the best (slope of the line) for
we can calculate the cost function, such as
Mean Square Error, Mean absolute error,
Mean bias error, SVM Loss, etc.
For this example, we’ll use sum of squared
absolute differences
y = x ⋅ p
p
cost =
∑
|t − y|2

Visualize this cost function

Calculate its derivative

Gradient Descent

Gradient Descent
The goal is to find the lowest point of the cost function (i.e. least amount of cost or the minimum or minima).
Gradient descent iteratively (i.e. step by step) descends the cost function curve to find the minima.

Gradient Descent Optimization

Backpropagation
Input Hidden Output
0.8
0.2
0.75

Backpropagation
Input Hidden Output
0.85
• Backpropagation: calculate the
gradient of the cost function in a
neural network
• Used by gradient descent optimization
algorithm to adjust weight of neurons
• Also known as backward propagation
of errors as the error is calculated and
distributed back through the network
of layers
0.10

Activationfunctions
• Bias (threshold) activation function was proposed first
• Sigmoid and tanh introduce non-linearity with different codomains
• ReLU is one of the more popular ones because its simple to compute and very
robust to noisy inputs
Source:http://bit.ly/2tKhbMS
Source:https://bit.ly/2wv3kcd

Sigmoidfunction
• Sigmoid non-linearity squashes real
numbers between [0, 1]
• Historically a nice interpretation of neuron
firing rate (i.e. not firing at all to fully-
saturated firing ).
• Currently, not used as much because really
large values too close to 0 or 1 result in
gradients too close to 0 stopping
backpropagation.Source: http://bit.ly/2GgMbGW

Sigmoidfunction(continued)
Output is not zero-centered: During gradient
descent, if all values are positive then during
backpropagation the weights will become all
positive or all negative creating zig zagging
dynamics.

Tanhfunction
• Tanh function squashes real numbers [-1, 1]
• Same problem as sigmoid that its
activations saturate thus killing gradients.
• But it is zero-centered minimizing the zig
zagging dynamics during gradient descent.
• Currently preferred sigmoid nonlinearity
Source: http://bit.ly/2C4y89z

ReLU:RectifierLinearUnit
Source: http://bit.ly/2EwRndsCC0
ReLU’s activation is at threshold of zero
• Quite popular over the last few years
• Speeds up Stochastic Gradient Descent
(SGD) convergence
• It is easier to implement due to simpler
mathematical functions
• Sensitive to high learning rate during
training resulting in “dead” neurons (i.e.
neurons that will not activate across the
entire dataset).

WhatisTensorFlow
• Open source software library for
numerical computation using data
flow graphs
• Based loosely on neural networks
• Built in C++ with a Python interface
• Quickly gained high popularity
• Supports GPU computations

WhatisKeras
• Python Deep Learning Library
• High level neural networks API
• Runs on top of TensorFlow, CNTK, or
Theano
• Simpler and easier to use making it
easier for rapid prototyping

GreatReferences
• Machine Learning 101
• Andrej Karparthy’s ConvNetJS MNIST Demo
• What is back propagation in neural networks?
• CS231n: Convolutional Neural Networks for Visual Recognition
• Syllabus and Slides | Course Notes | YouTube
• With particular focus on CS231n: Lecture 7: Convolution Neural Networks
• Neural Networks and Deep Learning
• TensorFlow

GreatReferences
• Deep Visualization Toolbox
• What's the difference between gradient descent and stochastic gradient
descent?
• Back Propagation with TensorFlow
• TensorFrames: Google TensorFlow with Apache Spark
• Integrating deep learning libraries with Apache Spark
• Build, Scale, and Deploy Deep Learning Pipelines with Ease

Attribution
Tomek Drabas
Brooke Wenig
Timothee Hunter
Cyrielle Simeone

What’s next?
Training your Neural Networks
October 9, 2018 | 09:00 PDT
https://dbricks.co/2pugWSC

Introduction to Neural Networks

More Related Content

What's hot

Similar to Introduction to Neural Networks

More from Databricks

Recently uploaded

Introduction to Neural Networks