Machine Learning (Towards Data Science)

MACHINE
LEARNING
TOWARDS DATA SCIENCE

MACHINE
LEARNING
#BASICS OF MACHINE LEARNING
#DATA / MODEL /ALGORITHMS
#TYPES OF MACHINE LEARNING
SUPERVISED ML
UNSUPERVISED ML
REINFORCEMENT ML
#MACHINE LEARNING ALOGRITHMS
#APPLYING ALOGRITHMS

MACHINE LEARNING
WHY NOW?
INTEROPERABILITY
CONVERSION OF TECHNOLOGIES
EXPLOSION OF BIG DATA
ADVANCES IN MACHINE LEARNING
ALGORITHMS

MACHINE LEARNING as a
concept is based in computers
using STATISTICAL LEARNING
and OPTIMIZATION METHODS
for analysing datasets and
identifying the patterns.
“Statistical Learning is math intensive and
inferential”
Unlike programming in Machine
Learning explicit instructions are not
given. Instead computer is provided
with the DATA and TOOLS needed for
studying & solving the problem.
The computer is also given the ability
to REMEMBER what it did so it can
ADAPT, EVOLVE, AND LEARN.

MACHINE LEARNING gives computers
the ability to learn without being
explicitly programmed to learn.
MACHINE LEARNING is training of
algorithms for accomplishing a task.
“Algorithm is set of rules to be followed for
solving a particular problem.”
MACHINE LEARNING uses
OUTCOME of an Algorithm for
improving FUTURE OUTCOMES
and DECISIONS
MACHINE LEARNING is learning
from EXAMPLES and EXPERIENCES.

DATA, MODEL & ALGORITHM
MACHINE LEARNING is helpful when
explicit instructions to the computer
can not be given instead a
programming model is used that
allows the computer to learn.
The PROGRAMMING MODEL is a
Machine Learning algorithm trained
using smaller chunk of data a.k.a.
the TRAINING DATA.
Trained Machine Learning
Programming Model is then checked
using larger chunk of data a.k.a. the
TEST DATA for fine tuning of the model
i.e. ADAPT, EVOLVE, AND LEARN

ILLUSTRATION:
Let's take 10,000 email
messages as our TRAINING DATA
for building & refining our
Programming Model (Algorithm)
before testing over a 1 lakh
messages (TEST DATA)
Machine Learning Programming
Model is exposed to different
examples of spam using
TEST DATA
Machine Learning Programing
Model uses BINARY
CLASSIFICATION Algorithm for
splitting the email in the two
groups: The Spam, and the
Regular mails by finding groups
of words that are likely to be
found in spam messages.
How Machine Learning work with spam program?

ILLUSTRATION:
ML Algorithm helps computer in
making accurate predictions or
see patterns between different
parts of the data.
Hyper-parameters of the
algorithm are tweaked until the
machine starts predicting
correctly whether or not an
email message is spam.
The tweaked algorithm with
perfect predictions now
becomes a DATA MODEL.
Tweaking the Hyper-parameters
of Machine Learning Algorithm
requires expertise and
EXTENSIVE TRIAL and ERROR.
How Machine Learning work with spam program?
Hyper parameters are the variables which determines how the Algorithm will be trained.
Hyper parameters are set before training i.e. before optimizing the weights and bias

MACHINE LEARNING helps in
finding patterns, making decisions,
and gaining greater insights.
Effective Machine Learning
requires a lot of data for better
understanding in improving
DATA MODEL.
Companies are using Machine
Learning for better understanding
there users.

TYPES OF
MACHINE LEARNING
1. SUPERVISED LEARNING
2. UN-SUPERVISED LEARNING
3. RE-ENFORCEMENT LEARNING

SUPERVISED LEARNING:
Supervised learning uses labelled
datasets to train algorithms to
classify data or predict outcomes
accurately.
Supervised learning uses a training
set to teach models to produce the
desired output.
The training dataset includes inputs
and correct outputs, which allow the
model to learn over time.
Supervised learning depends on
labelled data i.e. the right and wrong
answer/data.
Learning from Tutor (Closer View)

SUPERVISED LEARNING:
1. Image- and Object-Recognition
2. Predictive Analytics
3. Customer Sentiment Analysis
4. Spam detection
Applications:

UNSUPERVISED LEARNING:
Unsupervised Learning uses machine
learning algorithms to analyse and
CLUSTER UNLABELLED DATASETS for
discovering hidden patterns or data
groupings without the need for
human intervention.
Unsupervised Learning does not
work with labelled data and it also
does not show computer the correct
answer
Learning by Observing (Distant View)

Unsupervised Learning uses
algorithms for allowing computer to
create connections by studying and
observing the data and comes up
with its own observations.
Unsupervised Learning uses
unlabelled data for discovering
patterns for solving clustering or
association problems.
Unsupervised Learning is useful
when common properties within a
data set can not be clearly defined.
Learning by Observing (Distant View)

1. Exploratory Data Analysis
2. Cross-selling Strategies
3. Customer Segmentation, and
4. Image recognition.
Applications:

REINFORCEMENT LEARNING:
Reinforcement Learning is the
iterative continuous process of
training machine learning models
for improving the outcomes &
decisions. The more rounds of
feedback, the better the algorithms
performance.
Reinforcement Learning uses training
method based on REWARDING
desired behaviours and/or
PUNISHING undesired ones
Learning by Iteration

In Reinforcement Learning very clear
goals are defined for the machine to
follow and behave accordingly.
VIDEO GAMES are full of
reinforcement cues. Complete a
level and earn a badge. Defeat the
bad guy in a certain number of
moves and earn a bonus. This helps
machine to learn how to improve
performance for the next game.

Reinforcement Learning is to find the
optimal way to accomplish a
particular goal, or improve
performance on a specific task by
learnings from past feedback and
exploration.
Applications:
1. Self Driving Cars
2. Industry Automation
3. Healthcare
4. Gaming
https://neptune.ai/blog/reinforcement-learning-applications

Reinforcement Learning allows
machines to quickly grow without
having the need of hours of
observing and studying massive
amounts of data.

TYPES OF MACHINE LEARNING (SUMMARIZED)
SUPERVISED
Learning from Tutor
(Closer View)
A knowledgeable tutor is
needed
UNSUPERVISED
Learning by Observing
(Distant View)
Lots of correct Data is
needed
REINFORCEMENT
(Holistic View)
Trail & Error

MACHINE LEARNING ALOGRITHMS
SUPERVISED LEARNING
Types of Supervised learning problems:
1. BINARY CLASSIFICATION and
2. REGRESSION
BINARY CLASSIFICATION ALGORITHMS:
1. Decision trees
2. K-nearest neighbour
3. Random forest
4. Naive Bayes
REGRESSION ALGORITHMS:
1. Linear regression
2. Logistic regression
3. Random Forest
UNSUPERVISED LEARNING
DEPENDENT on labelled data.
Types of Unsupervised learning problems:
1. CLUSTERING and
2. ASSOCIATION
CLUSTERING ALGORITHMS:
1. K – Mean Clustering
2. Hierarchical Clustering
3. Probabilistic Clustering
ASSOCIATION ALGORITHMS:
1. Principal Component Analysis
2. Singular Value Decomposition
INDEPENDENT of labelled data.
https://www.sas.com/en_gb/insights/articles/analytics/machine-learning-algorithms.html
https://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861

APPLYING MACHINE LEARNING ALGORITHMS:
BIAS and VARIANCE measure
the difference between
PREDICTION and the OUTCOME.
BIAS is the gap between
predicted value and the actual
outcome.
VARIANCE is the scattering of
predicted values.
BIAS & VARIANCES are not
right or wrong answers but are
controls that need to be
tweaked for improving
predictions
BIAS andVARIANCE in DATA MODEL
HIGH BIAS and a
LOW VARIANCE: meaning
Predictions are consistently
wrong
HIGH BIAS and a
HIGH VARIANCE: meaning
wrong in a very inconsistent
way
Predictions are very close to
each other but in the wrong
direction
Predictions are scattered and in
the wrong direction

Machine learning is
working with a training
dataset.
Training data is a smaller
set of data used for tuning
algorithms.
Tuned algorithms helps in
creating a model that work
well for the larger test
dataset.
OVER and UNDERFITTING of DATA in a MODEL
HIGH BIAS and a
LOW VARIANCE: meaning
wrong
HIGH BIAS and a
HIGH VARIANCE: meaning
wrong in a very inconsistent
way
OVERFITTING
UNDERFITTING

OVERFITTING
The ML Programming Model is difficult to
understand. OVER FITTING.
UNDERFITTING
The ML Programming model that works
well with TAINING DATASET but inflexible
when worked with TEST DATA.
UNDERFITTING
DATA CLUSTERED IN THE WRONG PLACE
OVERFITTING and UNDERFITTING
DATA SCATTERED

OVER and UNDERFITTING of DATA in a MODEL
OVER & UNDERFITTING reflects not
capturing enough information to make
accurate predictions.
SIGNAL and NOISE.
SIGNAL are indicators that helps in
making accurate predictions while
NOISE are the variances in the data
that might not offer any insights.
Working with machine learning
algorithms, the trick is to capture as
much of the SIGNAL while not getting
too distracted by the NOISE in data.

SELECTING MACHINE LEARNING ALGORITHMS:
MOST SUITABLE ALGORITHMS – SELECTION CRITERIA
FOR LABELLED DATA USE SUPERVISED LEARNING.
Labelled data helps to understand both the
input and the output. Here machine doesn't
have to find its own patterns.
FOR UNLABELLED DATA USE UNSUPERVISED
LEARNING.
Machine create its own clusters and decides
what clusters make the most sense
FOR MASSIVE AMOUNTS OF UNLABELLED DATA
Use k-means clustering.
FOR A BUNCH OF LABELLED DATA
Use regression, k-nearest neighbour or decision trees.

THANKYOU!
VINOD.KR.SHARMA@GMAIL.COM
HTTPS://WWW.LINKEDIN.COM/IN/CAVINODKRSHARMA/

Machine Learning (Towards Data Science)

More Related Content

What's hot

Similar to Machine Learning (Towards Data Science)

More from LEAP - Learn, Enrich Accentuate & Perform

Recently uploaded

Machine Learning (Towards Data Science)