Deep Learning Crash
Course
By : Vishwas Narayan
Deep Learning is Everywhere
What you should Learn is to
● Build a groundbreaking intelligence just like humans through Deep Learning
● Build Neural Network with your approach and also make them make some
sustainable Decision
● Understand and give your own approach to give a effective training for the
deep learning model
What is the Difference between the Ai and others
Like
1. Deep Learning with ML and AI
2. Machine Learning with AI
Here never forget somehow you train to get the model
What you will learn?
● Loss Function and Optimizers
● Gradient Descent Algorithm
● Neural Network Architecture
What is Deep Learning?
Microsoft Word - Turing Test.doc (umbc.edu)
Machine learning is basically
Teaching machine learn patterns in data
What is Deep Learning
A Machine learning Technique that learn features and tasks directly from data.
Inputs are run through “Neural Networks”
Neural Network have hidden layers
Why Deep Learning?
● Machines never get Fatigue.
● They need to get trained from the Human intelligence -
● They just fetch patterns.
In Deep Learning
Features can be learnt from Raw Data
What they Really mean to us?
Black Box
Neuron
X
Y
Some functional Output
X
Y
Some functional Output that
is inspired by brain
DATA
Algorithm
Output
Traditionally
DATA
Algorithm
Output
DATA
Output
DATA
Model
Model
Predictions
Neuron
DATA
Output
Model
Insight
Intent
Why do we need now?
● Data is Prevalent?
● Improved Hardware Architecture
● New Software Architecture
Neural Networks
Inspired by the Neurons on the Brain.
Building Block of the Neural Network is
Neuron
Neural Network
● Take data as the input
● Train themselves to understand the patterns in the data
A simple Neural Network
Learning Process of the Neural Network
1. Learning Process of the Neural Network
2. Forward Propagation
3. Back Propagation
Forward Propagation
Weights and Biases
● Weights - How important information can this neural network can get
● Bias - allow the right decision to be taken into consideration
Back Propagation
Feedback loop
In Backpropagation
Loss function helps the Neural Network Quantify the deviation front he expected
output.
Randomly Initialize the Parameters
Back Propagation
● Use of the Loss Function
● Go Backwards and self tune the initial weights and biases
● Values adjusted to better fit prediction of the model that is trained from the
data.
Learning Algorithm for the Neural Network
● Initialize the parameters with some tuned and calculated value.
● Feed Input data to the Network
● Compare the Predicted value with the expected data and calculate loss
● Perform Backpropagation to propagate this loss back through the network.
● Update parameters based on the loss
● Iterate the previous steps till the loss is minimized.
Terms used in the Neural Network
Activation Function
● Helps to decide whether a Neuron can be a drop out or can contribute to the
next layer based on the dataset that it is trained on.
● Introduce non linearity into the Neural Network.
Which Activation Function to use?
● For the Binary Classification: Sigmoid or the Relu is Used for the best results.
● In the case of classifiers, sigmoid functions and their combinations often
perform better.
● Because of the vanishing gradient issue, sigmoids and tanh functions are
sometimes avoided.
● The ReLU function is a generic activation function that is employed in the
majority of applications these days.
Activation Function Condition
● If we have dead neurons in our networks, the leaky ReLU function is the best
option.
● Remember that the ReLU function should only be utilised in the hidden
layers.
● As a general guideline, you should start with the ReLU function and then go
on to other activation functions if the ReLU function does not produce the best
results.
The king here is the data nad queen also
No matter what you are training a model using a dataset that is available for
you,Neural Network is as it becomes.
Loss Function
We know that from the Random Weights and Biases the Neural Network Makes
decision - Expected Output ,Weights and Biases are calculated.
Thus they Quantify the deviation of the predicted output by the NEural NEtwork to
the expected output.
The loss functions in Regression are
● Absolute Error Loss
● Huber Loss
● Squared Error Loss
The loss Function in Binary Classification
Binary Cross Entropy
Hinge Loss
Multi Class Classification loss functions
Multi-Class Cross Entropy Loss
KL(Kullback–Leibler) Divergence
Optimizers
During the Training process we will adjust the parameters to minimize the loss
function and make our model as optimized as possible for the Use.
Optimizers are basically
A function that combines to the loss function and model parameters by updating
the Neural Network based on the output of the Loss Function.
Gradient Descent
Iterative function that starts off at a random Point on the loss function and travel
down its slope in steps(learning rate -from user) until it reaches the lowest point of
the function.
1. Again this depends on the data.but they are
2. Most popular optimizer
3. Fast,Robust,Flexible.
Algorithm in the Lay man Terms
1. Calculate what a small change in the each individual weights would do to the
loss function
2. Adjust each parameter based int eh its gradient(differential)
3. Repeat Steps one and Two until lower loss function is calculated by the
Neural Network.
To avoid getting stuck in the local minima
We use Learning Rate
● Usually a small number that is multiplied to the scale of the gradients,which is
any changes made to the weights are quite small.
● If we take large steps as learning rate then algorithm will tend to overshoot
the global minimum
● Where we also don't want the algorithm to take forever to train and converge
to the Global minimum.
They are more robust ,why?
● Like Gradient Descent,Except uses a subset of training example rather than
the entire lot.
● SGD is Gradient descent that uses batch on each training.
● Use of the Momentum to Accumulate gradients.
● Less intensive computation as they are batched.
Backpropagation
A simple implementation of the Gradient descent on the neural network.
AdaGrad
● Adaptive learning rate to individual features.
● Some weights will have different learning rates
● Ideal for the Sparse datasets with many input examples missing
● Learning rate tends to get lower accordingly.
Parameters and Hyperparameters
● What are model parameters?
● Variable from the neural Network whose values can be estimated from the
data.
● Required by the model to make prediction
● Value define the learnt parameters from the data.
● Not set manually.saved as the Neural Network is trained.
Example - Weights and Biases.
What are model Hyper parameters?
● They are configured externally to the neural network : Value cannot be
estimated until we train the dataset on the neural network
● No clear way to find the best value
● When the DL algorithm is tuned ,you are really tuning the hyperparameters.
● This is tuned manually
Example - Learning Rate,C and Alpha in SVM,Epochs,k in the kNEes
Summary
Model parameters -> Estimated front the data
Model Hyperparameters -> Can't be estimated from the data
HyperParameters are often called as the parameters as they are a part of the
Machine Learning that must set manually and tuned.
Epochs,Batches,Batch Size and Iterations
Need to learn to do this to your Neural Network when the dataset is too Big.
Break the dataset into smaller chunks and feed those chunks to the Neural
Network One by One.
Epochs
When the Entire dataset is passed forward to the Neural Network and only once
they get trained in the Network.
We use more than one epoch to help model generalize better and accurate.
There is no absolute count for the dataset as its different for different datasets.
Batch and Batch size
We divide large dataset into the smaller batches and feed those batches to the
Neural Network
Batch Size - Total number of the training examples in the Batches.
Iterations
Number of Batches needed to complete one epoch
Number f batches = Number of iterations in one epochs
Let's have some more insights
Suppose we have 1 million Number if dataset as the Training Example and you
divide the dataset into the batches of 500 ,to Complete 1 Epoch ,it would take
20000 iterations.
Conclusion for the terms used in NN’s
How to design an Architecture?
Which Activation Function to use?
The only thing is to
Types of Learning
There are Three Main Types -
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning
Supervised Learning
● Algorithms designed to learn from Examples.
● Models are trained on well-labelled data
● Each example has
● Input Object - Typically a Vector
● Desired Output Value Supervised Signal
During Training
Searches the pattern and correlate with the desired output.
After Training
Takes the unseen inputs and determine which label to classify it to.
Objective of a Supervised learning model
Is to predict the correct label for the unseen data.
Supervised learning is of two types
● Classification
● Regression
Classification
● Take Input data and assign it to a class/category.
● Models finds features in the data that correlates to the class and creates a
mapping function
● This mapping function will be used to classify unseen data from testing and
the validation set from the cross validation of the data
Binary and Multiclass classification
Definition and Example
Popular Classification Algorithms
● Logistic Regression.
● Naïve Bayes.
● Stochastic Gradient Descent.
● K-Nearest Neighbours.
● Decision Tree.
● Random Forest.
● Support Vector Machine
Regression
Model tries to find a relationship between dependent and independent variable.
Goal is always to predict continuous values such as a test score.
Equation is always continuous
Simple Linear Regression
Different Regression Algorithm
● Linear Regression.
● Logistic Regression.
● Ridge Regression.
● Lasso Regression.
● Polynomial Regression.
● Bayesian Linear Regression.
Application of Supervised Learning
● Text categorization
● Face Detection
● Signature recognition
● Customer discovery
● Spam detection
● Weather forecasting
● Predicting housing prices based on the prevailing market price
● Stock price predictions, among others
Unsupervised Learning
● Uses to manifest underlying pattern in data
● Used in Exploratory Data Analysis
● Need no labelled data,they use the feature from the Data
Unsupervised Learning is of
● Clustering
● Association
Clustering -Partitional Clustering
● Partitional Clustering
● Each Data point can belong to a single cluster
Clustering - Hierarchical Clustering
● Clusters within the clusters
● Datapoint may belong to different clusters
Association
Attempts to find different relationship between the different entities.
Example - Market Basket Analysis
Some Clustering Algorithm
1. Clustering Dataset
2. Affinity Propagation
3. Agglomerative Clustering
4. BIRCH
5. DBSCAN
6. K-Means
7. Mini-Batch K-Means
8. Mean Shift
9. OPTICS
10.Spectral Clustering
11.Gaussian Mixture Model
Application of the Unsupervised Learning
● Fraud detection
● Malware detection
● Identification of human errors during data entry
● Conducting accurate basket analysis, etc.
Reinforcement Learning
Enable the intelligent entity to learn in an interactive environment by trial and
error(by Policy and reward network) based on its own actions and experience.
This is a very new way of getting the things learnt.
If your Neural Network doesn't work well then you have to use the Reinforcement
:Learning.
Reward and Punishment is the key here
Uses the positive and negative signals as the behavior to understand what has
been learnt.
Goal of Reinforcement learning is to
● Find a Suitable model that would maximize the total cumulative reward and
make a very approximate result that might help in making some more
discussion.
● Maximize the points won in a training over many examples
● Penalize when they make wrong decisions
● Reward where they make Right decision
Usually modelled as a “Markov Decision Process”
Penalty/
Reward
Next State
Action
Application of the Robotics
● Robotics
● Business strategy
● Traffic Light Control
● Web system configuration
● NLP
○ to personalize suggestions
○ deliver more meaningful notifications to users
○ optimize video streaming quality.
● Gaming
● Bidding
Some core and Canonical Problems in Deep Learning
Basically we find this as the situation:
Model should perform well on training data and new test data
Most common problem faced will always be overfitting
So the data points as the example is here
● Data is Skew
● Data is Random
● They don't care anything and
anybody they are just generated
● They are collected to make
sense and make a model
● They are collected to make the
right decision from the model
Underfitting
Over-fitting
Tackling Overfitting is
1. Hold-out
2. Cross-validation
3. Data augmentation
4. Feature selection
5. L1 / L2 regularization
6. Remove layers / number of units per layer
7. Dropout
8. Early stopping
Data Augmentation
Just create some fake data as much as possible from the data itself.
Early Stopping
Use Early Stopping to Halt the Training of Neural Networks At the Right Time (machinelearningmastery.com)
When do you need to do this?
Training error decreases steadily but the validation error increases after a certain
point.
Neural Network are in plenty
So go the sources that I am saying in this stream.
So we talked a lot about the models
So now let's get to know how we can build a model.
Gathering Data
Picking the right data is very important,Good way to start is you need to make
assumption about the data that you need.
Size of the data set also matters
No one size fits all
Amount of the data needed = 10 times the model parameters
Quality of the data also matters
Data has to be more Accurate and Reliable with no Adversaries.
Noiseless Features.
Some Dataset Repositories are
I will list out here.
Pre- Processing dataset
Split the dataset into the subset.
Training data set
Testing Data Set
Validation Data Set
We can randomly split the dataset
This process depends on
● Number of the samples in eh data
● Model Being Trained
Simple rule of thumb
● Few Hyperparameters means small validation set
● Many hyperparameters means large validation set
The ratio in which you split the dataset is specific to your
Use Case
Dataset
Train
Train
Test
Folds
Test Validation
Look for the missing data
● Nan or Null
● Eliminated Features or the Missing Value
● Impute the Missing data
Sampling
Use a sample of the dataset
Why we need this
● Faster Convergence
● Reduces the Disk Space
Preprocessing is Required for the FEature Scaling
● Crucial Step for the Model TRaining:
● Normalization
● Standardization
Then obviously train and Evaluate.
Optimization
Will be Continued ...

Deep learning crash course

  • 1.
  • 2.
    Deep Learning isEverywhere
  • 6.
    What you shouldLearn is to ● Build a groundbreaking intelligence just like humans through Deep Learning ● Build Neural Network with your approach and also make them make some sustainable Decision ● Understand and give your own approach to give a effective training for the deep learning model
  • 7.
    What is theDifference between the Ai and others Like 1. Deep Learning with ML and AI 2. Machine Learning with AI Here never forget somehow you train to get the model
  • 8.
    What you willlearn? ● Loss Function and Optimizers ● Gradient Descent Algorithm ● Neural Network Architecture
  • 9.
    What is DeepLearning? Microsoft Word - Turing Test.doc (umbc.edu)
  • 11.
    Machine learning isbasically Teaching machine learn patterns in data
  • 18.
    What is DeepLearning A Machine learning Technique that learn features and tasks directly from data. Inputs are run through “Neural Networks” Neural Network have hidden layers
  • 19.
    Why Deep Learning? ●Machines never get Fatigue. ● They need to get trained from the Human intelligence - ● They just fetch patterns.
  • 20.
    In Deep Learning Featurescan be learnt from Raw Data
  • 21.
    What they Reallymean to us? Black Box Neuron X Y Some functional Output X Y Some functional Output that is inspired by brain
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Why do weneed now? ● Data is Prevalent? ● Improved Hardware Architecture ● New Software Architecture
  • 27.
    Neural Networks Inspired bythe Neurons on the Brain.
  • 28.
    Building Block ofthe Neural Network is Neuron
  • 29.
    Neural Network ● Takedata as the input ● Train themselves to understand the patterns in the data
  • 30.
  • 31.
    Learning Process ofthe Neural Network 1. Learning Process of the Neural Network 2. Forward Propagation 3. Back Propagation
  • 32.
  • 33.
    Weights and Biases ●Weights - How important information can this neural network can get ● Bias - allow the right decision to be taken into consideration
  • 34.
  • 35.
    In Backpropagation Loss functionhelps the Neural Network Quantify the deviation front he expected output.
  • 36.
  • 37.
    Back Propagation ● Useof the Loss Function ● Go Backwards and self tune the initial weights and biases ● Values adjusted to better fit prediction of the model that is trained from the data.
  • 38.
    Learning Algorithm forthe Neural Network ● Initialize the parameters with some tuned and calculated value. ● Feed Input data to the Network ● Compare the Predicted value with the expected data and calculate loss ● Perform Backpropagation to propagate this loss back through the network. ● Update parameters based on the loss ● Iterate the previous steps till the loss is minimized.
  • 39.
    Terms used inthe Neural Network Activation Function ● Helps to decide whether a Neuron can be a drop out or can contribute to the next layer based on the dataset that it is trained on. ● Introduce non linearity into the Neural Network.
  • 43.
    Which Activation Functionto use? ● For the Binary Classification: Sigmoid or the Relu is Used for the best results. ● In the case of classifiers, sigmoid functions and their combinations often perform better. ● Because of the vanishing gradient issue, sigmoids and tanh functions are sometimes avoided. ● The ReLU function is a generic activation function that is employed in the majority of applications these days.
  • 44.
    Activation Function Condition ●If we have dead neurons in our networks, the leaky ReLU function is the best option. ● Remember that the ReLU function should only be utilised in the hidden layers. ● As a general guideline, you should start with the ReLU function and then go on to other activation functions if the ReLU function does not produce the best results.
  • 45.
    The king hereis the data nad queen also No matter what you are training a model using a dataset that is available for you,Neural Network is as it becomes.
  • 46.
    Loss Function We knowthat from the Random Weights and Biases the Neural Network Makes decision - Expected Output ,Weights and Biases are calculated. Thus they Quantify the deviation of the predicted output by the NEural NEtwork to the expected output.
  • 47.
    The loss functionsin Regression are ● Absolute Error Loss ● Huber Loss ● Squared Error Loss
  • 48.
    The loss Functionin Binary Classification Binary Cross Entropy Hinge Loss
  • 49.
    Multi Class Classificationloss functions Multi-Class Cross Entropy Loss KL(Kullback–Leibler) Divergence
  • 50.
    Optimizers During the Trainingprocess we will adjust the parameters to minimize the loss function and make our model as optimized as possible for the Use.
  • 51.
    Optimizers are basically Afunction that combines to the loss function and model parameters by updating the Neural Network based on the output of the Loss Function.
  • 52.
    Gradient Descent Iterative functionthat starts off at a random Point on the loss function and travel down its slope in steps(learning rate -from user) until it reaches the lowest point of the function. 1. Again this depends on the data.but they are 2. Most popular optimizer 3. Fast,Robust,Flexible.
  • 53.
    Algorithm in theLay man Terms 1. Calculate what a small change in the each individual weights would do to the loss function 2. Adjust each parameter based int eh its gradient(differential) 3. Repeat Steps one and Two until lower loss function is calculated by the Neural Network.
  • 55.
    To avoid gettingstuck in the local minima We use Learning Rate ● Usually a small number that is multiplied to the scale of the gradients,which is any changes made to the weights are quite small. ● If we take large steps as learning rate then algorithm will tend to overshoot the global minimum ● Where we also don't want the algorithm to take forever to train and converge to the Global minimum.
  • 57.
    They are morerobust ,why? ● Like Gradient Descent,Except uses a subset of training example rather than the entire lot. ● SGD is Gradient descent that uses batch on each training. ● Use of the Momentum to Accumulate gradients. ● Less intensive computation as they are batched.
  • 58.
    Backpropagation A simple implementationof the Gradient descent on the neural network.
  • 59.
    AdaGrad ● Adaptive learningrate to individual features. ● Some weights will have different learning rates ● Ideal for the Sparse datasets with many input examples missing ● Learning rate tends to get lower accordingly.
  • 60.
    Parameters and Hyperparameters ●What are model parameters? ● Variable from the neural Network whose values can be estimated from the data. ● Required by the model to make prediction ● Value define the learnt parameters from the data. ● Not set manually.saved as the Neural Network is trained. Example - Weights and Biases.
  • 61.
    What are modelHyper parameters? ● They are configured externally to the neural network : Value cannot be estimated until we train the dataset on the neural network ● No clear way to find the best value ● When the DL algorithm is tuned ,you are really tuning the hyperparameters. ● This is tuned manually Example - Learning Rate,C and Alpha in SVM,Epochs,k in the kNEes
  • 62.
    Summary Model parameters ->Estimated front the data Model Hyperparameters -> Can't be estimated from the data HyperParameters are often called as the parameters as they are a part of the Machine Learning that must set manually and tuned.
  • 63.
    Epochs,Batches,Batch Size andIterations Need to learn to do this to your Neural Network when the dataset is too Big. Break the dataset into smaller chunks and feed those chunks to the Neural Network One by One.
  • 64.
    Epochs When the Entiredataset is passed forward to the Neural Network and only once they get trained in the Network. We use more than one epoch to help model generalize better and accurate. There is no absolute count for the dataset as its different for different datasets.
  • 65.
    Batch and Batchsize We divide large dataset into the smaller batches and feed those batches to the Neural Network Batch Size - Total number of the training examples in the Batches.
  • 66.
    Iterations Number of Batchesneeded to complete one epoch Number f batches = Number of iterations in one epochs
  • 67.
    Let's have somemore insights Suppose we have 1 million Number if dataset as the Training Example and you divide the dataset into the batches of 500 ,to Complete 1 Epoch ,it would take 20000 iterations.
  • 68.
    Conclusion for theterms used in NN’s How to design an Architecture? Which Activation Function to use? The only thing is to
  • 69.
    Types of Learning Thereare Three Main Types - ● Supervised Learning ● Unsupervised Learning ● Reinforcement Learning
  • 70.
    Supervised Learning ● Algorithmsdesigned to learn from Examples. ● Models are trained on well-labelled data ● Each example has ● Input Object - Typically a Vector ● Desired Output Value Supervised Signal
  • 71.
    During Training Searches thepattern and correlate with the desired output.
  • 72.
    After Training Takes theunseen inputs and determine which label to classify it to.
  • 73.
    Objective of aSupervised learning model Is to predict the correct label for the unseen data.
  • 75.
    Supervised learning isof two types ● Classification ● Regression
  • 76.
    Classification ● Take Inputdata and assign it to a class/category. ● Models finds features in the data that correlates to the class and creates a mapping function ● This mapping function will be used to classify unseen data from testing and the validation set from the cross validation of the data
  • 77.
    Binary and Multiclassclassification Definition and Example
  • 78.
    Popular Classification Algorithms ●Logistic Regression. ● Naïve Bayes. ● Stochastic Gradient Descent. ● K-Nearest Neighbours. ● Decision Tree. ● Random Forest. ● Support Vector Machine
  • 79.
    Regression Model tries tofind a relationship between dependent and independent variable. Goal is always to predict continuous values such as a test score.
  • 80.
  • 81.
  • 82.
    Different Regression Algorithm ●Linear Regression. ● Logistic Regression. ● Ridge Regression. ● Lasso Regression. ● Polynomial Regression. ● Bayesian Linear Regression.
  • 83.
    Application of SupervisedLearning ● Text categorization ● Face Detection ● Signature recognition ● Customer discovery ● Spam detection ● Weather forecasting ● Predicting housing prices based on the prevailing market price ● Stock price predictions, among others
  • 84.
    Unsupervised Learning ● Usesto manifest underlying pattern in data ● Used in Exploratory Data Analysis ● Need no labelled data,they use the feature from the Data
  • 85.
    Unsupervised Learning isof ● Clustering ● Association
  • 86.
    Clustering -Partitional Clustering ●Partitional Clustering ● Each Data point can belong to a single cluster
  • 87.
    Clustering - HierarchicalClustering ● Clusters within the clusters ● Datapoint may belong to different clusters
  • 88.
    Association Attempts to finddifferent relationship between the different entities. Example - Market Basket Analysis
  • 89.
    Some Clustering Algorithm 1.Clustering Dataset 2. Affinity Propagation 3. Agglomerative Clustering 4. BIRCH 5. DBSCAN 6. K-Means 7. Mini-Batch K-Means 8. Mean Shift 9. OPTICS 10.Spectral Clustering 11.Gaussian Mixture Model
  • 90.
    Application of theUnsupervised Learning ● Fraud detection ● Malware detection ● Identification of human errors during data entry ● Conducting accurate basket analysis, etc.
  • 91.
    Reinforcement Learning Enable theintelligent entity to learn in an interactive environment by trial and error(by Policy and reward network) based on its own actions and experience. This is a very new way of getting the things learnt. If your Neural Network doesn't work well then you have to use the Reinforcement :Learning.
  • 92.
    Reward and Punishmentis the key here Uses the positive and negative signals as the behavior to understand what has been learnt.
  • 93.
    Goal of Reinforcementlearning is to ● Find a Suitable model that would maximize the total cumulative reward and make a very approximate result that might help in making some more discussion. ● Maximize the points won in a training over many examples ● Penalize when they make wrong decisions ● Reward where they make Right decision Usually modelled as a “Markov Decision Process”
  • 94.
  • 95.
    Application of theRobotics ● Robotics ● Business strategy ● Traffic Light Control ● Web system configuration ● NLP ○ to personalize suggestions ○ deliver more meaningful notifications to users ○ optimize video streaming quality. ● Gaming ● Bidding
  • 96.
    Some core andCanonical Problems in Deep Learning Basically we find this as the situation: Model should perform well on training data and new test data Most common problem faced will always be overfitting
  • 97.
    So the datapoints as the example is here ● Data is Skew ● Data is Random ● They don't care anything and anybody they are just generated ● They are collected to make sense and make a model ● They are collected to make the right decision from the model
  • 98.
  • 99.
  • 100.
    Tackling Overfitting is 1.Hold-out 2. Cross-validation 3. Data augmentation 4. Feature selection 5. L1 / L2 regularization 6. Remove layers / number of units per layer 7. Dropout 8. Early stopping
  • 102.
    Data Augmentation Just createsome fake data as much as possible from the data itself.
  • 104.
    Early Stopping Use EarlyStopping to Halt the Training of Neural Networks At the Right Time (machinelearningmastery.com)
  • 105.
    When do youneed to do this? Training error decreases steadily but the validation error increases after a certain point.
  • 106.
    Neural Network arein plenty So go the sources that I am saying in this stream.
  • 107.
    So we talkeda lot about the models So now let's get to know how we can build a model.
  • 108.
    Gathering Data Picking theright data is very important,Good way to start is you need to make assumption about the data that you need.
  • 109.
    Size of thedata set also matters No one size fits all Amount of the data needed = 10 times the model parameters
  • 110.
    Quality of thedata also matters Data has to be more Accurate and Reliable with no Adversaries. Noiseless Features.
  • 111.
    Some Dataset Repositoriesare I will list out here.
  • 112.
    Pre- Processing dataset Splitthe dataset into the subset. Training data set Testing Data Set Validation Data Set We can randomly split the dataset
  • 113.
    This process dependson ● Number of the samples in eh data ● Model Being Trained
  • 114.
    Simple rule ofthumb ● Few Hyperparameters means small validation set ● Many hyperparameters means large validation set
  • 115.
    The ratio inwhich you split the dataset is specific to your Use Case
  • 116.
  • 118.
    Look for themissing data ● Nan or Null ● Eliminated Features or the Missing Value ● Impute the Missing data
  • 119.
    Sampling Use a sampleof the dataset
  • 120.
    Why we needthis ● Faster Convergence ● Reduces the Disk Space
  • 121.
    Preprocessing is Requiredfor the FEature Scaling ● Crucial Step for the Model TRaining: ● Normalization ● Standardization Then obviously train and Evaluate.
  • 122.

Editor's Notes

  • #10 Microsoft Word - Turing Test.doc (umbc.edu)
  • #43 Activation Functions In Neural Network | by Gaurav Rajpal | Analytics Vidhya | Medium
  • #54 Calculating Gradient Descent Manually | by Chi-Feng Wang | Towards Data Science