MACHINE
LEARNING
TOWARDS DATA SCIENCE
MACHINE
LEARNING
#BASICS OF MACHINE LEARNING
#DATA / MODEL /ALGORITHMS
#TYPES OF MACHINE LEARNING
SUPERVISED ML
UNSUPERVISED ML
REINFORCEMENT ML
#MACHINE LEARNING ALOGRITHMS
#APPLYING ALOGRITHMS
MACHINE LEARNING
WHY NOW?
INTEROPERABILITY
CONVERSION OF TECHNOLOGIES
EXPLOSION OF BIG DATA
ADVANCES IN MACHINE LEARNING
ALGORITHMS
MACHINE LEARNING as a
concept is based in computers
using STATISTICAL LEARNING
and OPTIMIZATION METHODS
for analysing datasets and
identifying the patterns.
“Statistical Learning is math intensive and
inferential”
Unlike programming in Machine
Learning explicit instructions are not
given. Instead computer is provided
with the DATA and TOOLS needed for
studying & solving the problem.
The computer is also given the ability
to REMEMBER what it did so it can
ADAPT, EVOLVE, AND LEARN.
MACHINE LEARNING gives computers
the ability to learn without being
explicitly programmed to learn.
MACHINE LEARNING is training of
algorithms for accomplishing a task.
“Algorithm is set of rules to be followed for
solving a particular problem.”
MACHINE LEARNING uses
OUTCOME of an Algorithm for
improving FUTURE OUTCOMES
and DECISIONS
MACHINE LEARNING is learning
from EXAMPLES and EXPERIENCES.
DATA, MODEL & ALGORITHM
MACHINE LEARNING is helpful when
explicit instructions to the computer
can not be given instead a
programming model is used that
allows the computer to learn.
The PROGRAMMING MODEL is a
Machine Learning algorithm trained
using smaller chunk of data a.k.a.
the TRAINING DATA.
Trained Machine Learning
Programming Model is then checked
using larger chunk of data a.k.a. the
TEST DATA for fine tuning of the model
i.e. ADAPT, EVOLVE, AND LEARN
ILLUSTRATION:
Let's take 10,000 email
messages as our TRAINING DATA
for building & refining our
Programming Model (Algorithm)
before testing over a 1 lakh
messages (TEST DATA)
Machine Learning Programming
Model is exposed to different
examples of spam using
TEST DATA
Machine Learning Programing
Model uses BINARY
CLASSIFICATION Algorithm for
splitting the email in the two
groups: The Spam, and the
Regular mails by finding groups
of words that are likely to be
found in spam messages.
How Machine Learning work with spam program?
ILLUSTRATION:
ML Algorithm helps computer in
making accurate predictions or
see patterns between different
parts of the data.
Hyper-parameters of the
algorithm are tweaked until the
machine starts predicting
correctly whether or not an
email message is spam.
The tweaked algorithm with
perfect predictions now
becomes a DATA MODEL.
Tweaking the Hyper-parameters
of Machine Learning Algorithm
requires expertise and
EXTENSIVE TRIAL and ERROR.
How Machine Learning work with spam program?
Hyper parameters are the variables which determines how the Algorithm will be trained.
Hyper parameters are set before training i.e. before optimizing the weights and bias
MACHINE LEARNING helps in
finding patterns, making decisions,
and gaining greater insights.
Effective Machine Learning
requires a lot of data for better
understanding in improving
DATA MODEL.
Companies are using Machine
Learning for better understanding
there users.
TYPES OF
MACHINE LEARNING
1. SUPERVISED LEARNING
2. UN-SUPERVISED LEARNING
3. RE-ENFORCEMENT LEARNING
SUPERVISED LEARNING:
Supervised learning uses labelled
datasets to train algorithms to
classify data or predict outcomes
accurately.
Supervised learning uses a training
set to teach models to produce the
desired output.
The training dataset includes inputs
and correct outputs, which allow the
model to learn over time.
Supervised learning depends on
labelled data i.e. the right and wrong
answer/data.
Learning from Tutor (Closer View)
SUPERVISED LEARNING:
1. Image- and Object-Recognition
2. Predictive Analytics
3. Customer Sentiment Analysis
4. Spam detection
Applications:
UNSUPERVISED LEARNING:
Unsupervised Learning uses machine
learning algorithms to analyse and
CLUSTER UNLABELLED DATASETS for
discovering hidden patterns or data
groupings without the need for
human intervention.
Unsupervised Learning does not
work with labelled data and it also
does not show computer the correct
answer
Learning by Observing (Distant View)
UNSUPERVISED LEARNING:
Unsupervised Learning uses
algorithms for allowing computer to
create connections by studying and
observing the data and comes up
with its own observations.
Unsupervised Learning uses
unlabelled data for discovering
patterns for solving clustering or
association problems.
Unsupervised Learning is useful
when common properties within a
data set can not be clearly defined.
Learning by Observing (Distant View)
UNSUPERVISED LEARNING:
1. Exploratory Data Analysis
2. Cross-selling Strategies
3. Customer Segmentation, and
4. Image recognition.
Applications:
REINFORCEMENT LEARNING:
Reinforcement Learning is the
iterative continuous process of
training machine learning models
for improving the outcomes &
decisions. The more rounds of
feedback, the better the algorithms
performance.
Reinforcement Learning uses training
method based on REWARDING
desired behaviours and/or
PUNISHING undesired ones
Learning by Iteration
REINFORCEMENT LEARNING:
In Reinforcement Learning very clear
goals are defined for the machine to
follow and behave accordingly.
VIDEO GAMES are full of
reinforcement cues. Complete a
level and earn a badge. Defeat the
bad guy in a certain number of
moves and earn a bonus. This helps
machine to learn how to improve
performance for the next game.
Learning by Iteration
REINFORCEMENT LEARNING:
Reinforcement Learning is to find the
optimal way to accomplish a
particular goal, or improve
performance on a specific task by
learnings from past feedback and
exploration.
Applications:
1. Self Driving Cars
2. Industry Automation
3. Healthcare
4. Gaming
https://neptune.ai/blog/reinforcement-learning-applications
Learning by Iteration
REINFORCEMENT LEARNING:
Reinforcement Learning allows
machines to quickly grow without
having the need of hours of
observing and studying massive
amounts of data.
Learning by Iteration
TYPES OF MACHINE LEARNING (SUMMARIZED)
SUPERVISED
Learning from Tutor
(Closer View)
A knowledgeable tutor is
needed
UNSUPERVISED
Learning by Observing
(Distant View)
Lots of correct Data is
needed
REINFORCEMENT
Learning by Iteration
(Holistic View)
Trail & Error
MACHINE LEARNING ALOGRITHMS
SUPERVISED LEARNING
Types of Supervised learning problems:
1. BINARY CLASSIFICATION and
2. REGRESSION
BINARY CLASSIFICATION ALGORITHMS:
1. Decision trees
2. K-nearest neighbour
3. Random forest
4. Naive Bayes
REGRESSION ALGORITHMS:
1. Linear regression
2. Logistic regression
3. Random Forest
UNSUPERVISED LEARNING
DEPENDENT on labelled data.
Types of Unsupervised learning problems:
1. CLUSTERING and
2. ASSOCIATION
CLUSTERING ALGORITHMS:
1. K – Mean Clustering
2. Hierarchical Clustering
3. Probabilistic Clustering
ASSOCIATION ALGORITHMS:
1. Principal Component Analysis
2. Singular Value Decomposition
INDEPENDENT of labelled data.
https://www.sas.com/en_gb/insights/articles/analytics/machine-learning-algorithms.html
https://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861
APPLYING MACHINE LEARNING ALGORITHMS:
BIAS and VARIANCE measure
the difference between
PREDICTION and the OUTCOME.
BIAS is the gap between
predicted value and the actual
outcome.
VARIANCE is the scattering of
predicted values.
BIAS & VARIANCES are not
right or wrong answers but are
controls that need to be
tweaked for improving
predictions
BIAS andVARIANCE in DATA MODEL
HIGH BIAS and a
LOW VARIANCE: meaning
Predictions are consistently
wrong
HIGH BIAS and a
HIGH VARIANCE: meaning
Predictions are consistently
wrong in a very inconsistent
way
Predictions are very close to
each other but in the wrong
direction
Predictions are scattered and in
the wrong direction
APPLYING MACHINE LEARNING ALGORITHMS:
Machine learning is
working with a training
dataset.
Training data is a smaller
set of data used for tuning
algorithms.
Tuned algorithms helps in
creating a model that work
well for the larger test
dataset.
OVER and UNDERFITTING of DATA in a MODEL
HIGH BIAS and a
LOW VARIANCE: meaning
Predictions are consistently
wrong
HIGH BIAS and a
HIGH VARIANCE: meaning
Predictions are consistently
wrong in a very inconsistent
way
OVERFITTING
UNDERFITTING
OVERFITTING
The ML Programming Model is difficult to
understand. OVER FITTING.
UNDERFITTING
The ML Programming model that works
well with TAINING DATASET but inflexible
when worked with TEST DATA.
UNDERFITTING
DATA CLUSTERED IN THE WRONG PLACE
APPLYING MACHINE LEARNING ALGORITHMS:
OVERFITTING and UNDERFITTING
DATA SCATTERED
APPLYING MACHINE LEARNING ALGORITHMS:
OVER and UNDERFITTING of DATA in a MODEL
OVER & UNDERFITTING reflects not
capturing enough information to make
accurate predictions.
SIGNAL and NOISE.
SIGNAL are indicators that helps in
making accurate predictions while
NOISE are the variances in the data
that might not offer any insights.
Working with machine learning
algorithms, the trick is to capture as
much of the SIGNAL while not getting
too distracted by the NOISE in data.
SELECTING MACHINE LEARNING ALGORITHMS:
MOST SUITABLE ALGORITHMS – SELECTION CRITERIA
FOR LABELLED DATA USE SUPERVISED LEARNING.
Labelled data helps to understand both the
input and the output. Here machine doesn't
have to find its own patterns.
FOR UNLABELLED DATA USE UNSUPERVISED
LEARNING.
Machine create its own clusters and decides
what clusters make the most sense
FOR MASSIVE AMOUNTS OF UNLABELLED DATA
Use k-means clustering.
FOR A BUNCH OF LABELLED DATA
Use regression, k-nearest neighbour or decision trees.
THANKYOU!
VINOD.KR.SHARMA@GMAIL.COM
HTTPS://WWW.LINKEDIN.COM/IN/CAVINODKRSHARMA/

Machine Learning (Towards Data Science)

  • 1.
  • 2.
    MACHINE LEARNING #BASICS OF MACHINELEARNING #DATA / MODEL /ALGORITHMS #TYPES OF MACHINE LEARNING SUPERVISED ML UNSUPERVISED ML REINFORCEMENT ML #MACHINE LEARNING ALOGRITHMS #APPLYING ALOGRITHMS
  • 3.
    MACHINE LEARNING WHY NOW? INTEROPERABILITY CONVERSIONOF TECHNOLOGIES EXPLOSION OF BIG DATA ADVANCES IN MACHINE LEARNING ALGORITHMS
  • 4.
    MACHINE LEARNING asa concept is based in computers using STATISTICAL LEARNING and OPTIMIZATION METHODS for analysing datasets and identifying the patterns. “Statistical Learning is math intensive and inferential” Unlike programming in Machine Learning explicit instructions are not given. Instead computer is provided with the DATA and TOOLS needed for studying & solving the problem. The computer is also given the ability to REMEMBER what it did so it can ADAPT, EVOLVE, AND LEARN.
  • 5.
    MACHINE LEARNING givescomputers the ability to learn without being explicitly programmed to learn. MACHINE LEARNING is training of algorithms for accomplishing a task. “Algorithm is set of rules to be followed for solving a particular problem.” MACHINE LEARNING uses OUTCOME of an Algorithm for improving FUTURE OUTCOMES and DECISIONS MACHINE LEARNING is learning from EXAMPLES and EXPERIENCES.
  • 6.
    DATA, MODEL &ALGORITHM MACHINE LEARNING is helpful when explicit instructions to the computer can not be given instead a programming model is used that allows the computer to learn. The PROGRAMMING MODEL is a Machine Learning algorithm trained using smaller chunk of data a.k.a. the TRAINING DATA. Trained Machine Learning Programming Model is then checked using larger chunk of data a.k.a. the TEST DATA for fine tuning of the model i.e. ADAPT, EVOLVE, AND LEARN
  • 7.
    ILLUSTRATION: Let's take 10,000email messages as our TRAINING DATA for building & refining our Programming Model (Algorithm) before testing over a 1 lakh messages (TEST DATA) Machine Learning Programming Model is exposed to different examples of spam using TEST DATA Machine Learning Programing Model uses BINARY CLASSIFICATION Algorithm for splitting the email in the two groups: The Spam, and the Regular mails by finding groups of words that are likely to be found in spam messages. How Machine Learning work with spam program?
  • 8.
    ILLUSTRATION: ML Algorithm helpscomputer in making accurate predictions or see patterns between different parts of the data. Hyper-parameters of the algorithm are tweaked until the machine starts predicting correctly whether or not an email message is spam. The tweaked algorithm with perfect predictions now becomes a DATA MODEL. Tweaking the Hyper-parameters of Machine Learning Algorithm requires expertise and EXTENSIVE TRIAL and ERROR. How Machine Learning work with spam program? Hyper parameters are the variables which determines how the Algorithm will be trained. Hyper parameters are set before training i.e. before optimizing the weights and bias
  • 9.
    MACHINE LEARNING helpsin finding patterns, making decisions, and gaining greater insights. Effective Machine Learning requires a lot of data for better understanding in improving DATA MODEL. Companies are using Machine Learning for better understanding there users.
  • 10.
    TYPES OF MACHINE LEARNING 1.SUPERVISED LEARNING 2. UN-SUPERVISED LEARNING 3. RE-ENFORCEMENT LEARNING
  • 11.
    SUPERVISED LEARNING: Supervised learninguses labelled datasets to train algorithms to classify data or predict outcomes accurately. Supervised learning uses a training set to teach models to produce the desired output. The training dataset includes inputs and correct outputs, which allow the model to learn over time. Supervised learning depends on labelled data i.e. the right and wrong answer/data. Learning from Tutor (Closer View)
  • 12.
    SUPERVISED LEARNING: 1. Image-and Object-Recognition 2. Predictive Analytics 3. Customer Sentiment Analysis 4. Spam detection Applications:
  • 13.
    UNSUPERVISED LEARNING: Unsupervised Learninguses machine learning algorithms to analyse and CLUSTER UNLABELLED DATASETS for discovering hidden patterns or data groupings without the need for human intervention. Unsupervised Learning does not work with labelled data and it also does not show computer the correct answer Learning by Observing (Distant View)
  • 14.
    UNSUPERVISED LEARNING: Unsupervised Learninguses algorithms for allowing computer to create connections by studying and observing the data and comes up with its own observations. Unsupervised Learning uses unlabelled data for discovering patterns for solving clustering or association problems. Unsupervised Learning is useful when common properties within a data set can not be clearly defined. Learning by Observing (Distant View)
  • 15.
    UNSUPERVISED LEARNING: 1. ExploratoryData Analysis 2. Cross-selling Strategies 3. Customer Segmentation, and 4. Image recognition. Applications:
  • 16.
    REINFORCEMENT LEARNING: Reinforcement Learningis the iterative continuous process of training machine learning models for improving the outcomes & decisions. The more rounds of feedback, the better the algorithms performance. Reinforcement Learning uses training method based on REWARDING desired behaviours and/or PUNISHING undesired ones Learning by Iteration
  • 17.
    REINFORCEMENT LEARNING: In ReinforcementLearning very clear goals are defined for the machine to follow and behave accordingly. VIDEO GAMES are full of reinforcement cues. Complete a level and earn a badge. Defeat the bad guy in a certain number of moves and earn a bonus. This helps machine to learn how to improve performance for the next game. Learning by Iteration
  • 18.
    REINFORCEMENT LEARNING: Reinforcement Learningis to find the optimal way to accomplish a particular goal, or improve performance on a specific task by learnings from past feedback and exploration. Applications: 1. Self Driving Cars 2. Industry Automation 3. Healthcare 4. Gaming https://neptune.ai/blog/reinforcement-learning-applications Learning by Iteration
  • 19.
    REINFORCEMENT LEARNING: Reinforcement Learningallows machines to quickly grow without having the need of hours of observing and studying massive amounts of data. Learning by Iteration
  • 20.
    TYPES OF MACHINELEARNING (SUMMARIZED) SUPERVISED Learning from Tutor (Closer View) A knowledgeable tutor is needed UNSUPERVISED Learning by Observing (Distant View) Lots of correct Data is needed REINFORCEMENT Learning by Iteration (Holistic View) Trail & Error
  • 21.
    MACHINE LEARNING ALOGRITHMS SUPERVISEDLEARNING Types of Supervised learning problems: 1. BINARY CLASSIFICATION and 2. REGRESSION BINARY CLASSIFICATION ALGORITHMS: 1. Decision trees 2. K-nearest neighbour 3. Random forest 4. Naive Bayes REGRESSION ALGORITHMS: 1. Linear regression 2. Logistic regression 3. Random Forest UNSUPERVISED LEARNING DEPENDENT on labelled data. Types of Unsupervised learning problems: 1. CLUSTERING and 2. ASSOCIATION CLUSTERING ALGORITHMS: 1. K – Mean Clustering 2. Hierarchical Clustering 3. Probabilistic Clustering ASSOCIATION ALGORITHMS: 1. Principal Component Analysis 2. Singular Value Decomposition INDEPENDENT of labelled data. https://www.sas.com/en_gb/insights/articles/analytics/machine-learning-algorithms.html https://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861
  • 22.
    APPLYING MACHINE LEARNINGALGORITHMS: BIAS and VARIANCE measure the difference between PREDICTION and the OUTCOME. BIAS is the gap between predicted value and the actual outcome. VARIANCE is the scattering of predicted values. BIAS & VARIANCES are not right or wrong answers but are controls that need to be tweaked for improving predictions BIAS andVARIANCE in DATA MODEL HIGH BIAS and a LOW VARIANCE: meaning Predictions are consistently wrong HIGH BIAS and a HIGH VARIANCE: meaning Predictions are consistently wrong in a very inconsistent way Predictions are very close to each other but in the wrong direction Predictions are scattered and in the wrong direction
  • 23.
    APPLYING MACHINE LEARNINGALGORITHMS: Machine learning is working with a training dataset. Training data is a smaller set of data used for tuning algorithms. Tuned algorithms helps in creating a model that work well for the larger test dataset. OVER and UNDERFITTING of DATA in a MODEL HIGH BIAS and a LOW VARIANCE: meaning Predictions are consistently wrong HIGH BIAS and a HIGH VARIANCE: meaning Predictions are consistently wrong in a very inconsistent way OVERFITTING UNDERFITTING
  • 24.
    OVERFITTING The ML ProgrammingModel is difficult to understand. OVER FITTING. UNDERFITTING The ML Programming model that works well with TAINING DATASET but inflexible when worked with TEST DATA. UNDERFITTING DATA CLUSTERED IN THE WRONG PLACE APPLYING MACHINE LEARNING ALGORITHMS: OVERFITTING and UNDERFITTING DATA SCATTERED
  • 25.
    APPLYING MACHINE LEARNINGALGORITHMS: OVER and UNDERFITTING of DATA in a MODEL OVER & UNDERFITTING reflects not capturing enough information to make accurate predictions. SIGNAL and NOISE. SIGNAL are indicators that helps in making accurate predictions while NOISE are the variances in the data that might not offer any insights. Working with machine learning algorithms, the trick is to capture as much of the SIGNAL while not getting too distracted by the NOISE in data.
  • 26.
    SELECTING MACHINE LEARNINGALGORITHMS: MOST SUITABLE ALGORITHMS – SELECTION CRITERIA FOR LABELLED DATA USE SUPERVISED LEARNING. Labelled data helps to understand both the input and the output. Here machine doesn't have to find its own patterns. FOR UNLABELLED DATA USE UNSUPERVISED LEARNING. Machine create its own clusters and decides what clusters make the most sense FOR MASSIVE AMOUNTS OF UNLABELLED DATA Use k-means clustering. FOR A BUNCH OF LABELLED DATA Use regression, k-nearest neighbour or decision trees.
  • 27.