Dynamically updated parallel k-NN
search algorithm using MPI
Keywords:
 MPI (Message Passing Interface)
 Machine Learning
 Classification
 k-NN
 Clustering
 K-means
# MPI ( Message Passing Interface)
 MPI is an industrial standard that specifies library routines needed for writing
message passing programs
 MPI uses a library approach to support parallel programming
# Machine Learning
 Ability of a system to learn without being coded.
 To get more proficient in performing a task from being familiar, with the
time.
# Classification
 Supervised machine Learning Approach.
 Supervised in the sense that we know the class of the training instances to
which it belongs.
 To predict the class of a test instance on the basis of some similarity measure.
k-NN Classifier
 A famous classification approach
 Based on the assumption that a instance has similar class to which it closest
or close in the feature space.
 To classify a test instance ,find its k-nearest neighbor according to some kind
of similarity . Classify on the basis of majority .
# Clustering
 Unsupervised Machine Learning Approach.
 Unsupervised as we don’t know the class of the training instances to which
they belong.
 Partition the data so as instances in one group are more similar but too
different from the others.
K-means Clustering
 A clustering approach , to cluster the training instances into k-number of
cluster. The value of k being provided by the user.
 Clustering is done by a utilizing a SSE (sum of square error) function.
Problems with the k-NN
 Time complexity
 Sensitive to the local structure of the data.
 Curse of dimensionality.
# Our Proposed Approach to solve k-NN
parallel using MPI.
 Pre-processing step
 Perform clustering process on training set to divide it into p mutually
exclusive partition {P1,P2,…,Pp}, where p is number of process .
 Create the Representative Instance to represent each partition
# Step- IInd
For i =1 to p
 Apply k-means approach
 Evaluate nearest neighbor similarity of training instances with representative
instance(centroid) of each partition.
 Perform
 Competence Enhancement – Repeated Wilson Editing Rule (noise removal)
 Competence Preservation (removal of superfluous instance)
 Store the outliers of each cluster separately.
 Update the centroid of the cluster.
 Repeat step I & II until number of instances in the selected one partition >=k.
# Step- IIIrd
 Take a test instance .
 Select the partition whose R.I is closest to test instance.
 Repeat until reach at the last sub-partition.
 Apply the majority rule.
 Select the class label who has majority for the test instance.
# Updation of training set
 When the similarity value of the new test instance with the R.I. of the different
partition exceeds the max radius value which we store during the pre-processing
step.
 Update the R.I. of that partition only, which is closed to the new test instance
# Research papers considered to design the
dynamically updated parallel k-NN using MPI.
# For Preprocessing (Clustering process ):
 Efficient and Fast Initialization Algorithm for K-means Clustering
By Mohammed El Agha, Wesam M. Ashour ,
Islamic University of Gaza, Gaza, Palestine
 A new algorithm for initial cluster centers in k-means algorithm
By Murat Erisoglu, Nazif Calis, Sadullah Sakallioglu
Department of Statistics, Faculty of Science and Letters,
Cukurova University, 01300 Adana, Turkey
 An empirical comparison of four initialization methods for the K-Means algorithm
By J.M. Pe~na *,1, J.A. Lozano, P. Larra~naga,
Department of Computer Science and Artificial Intelligence,
Intelligent Systems Group, University of the Basque Country,
P.O. Box 649,E-20080 San Sebastian, Spain
# For finding k-NN and removal of Noise and Superfluous Instances.
 Fast Condensed Nearest Neighbor Rule
By Fabrizio Angiulli
ICAR-CNR, Via Pietro Bucci 41C, 87036 Rende (CS), Italy
 Advances in Instance Selection for Instance-Based Learning Algorithms
By HENRY BRIGHTON, Language Evolution and Computation
Research Unit, Department of Theoretical and Applied
Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK
 CHRIS MELLISH, Department of Artificial Intelligence,
The University of Edinburgh, Edinburgh EH1 1HN, UK
 Superlinear Parallelization of k-Nearest Neighbor Retrieval
By Antal van den Bosch Ko van der SlootILK Research Group
Dept. of Communication and Information Sciences,
Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg,
The Netherlands
 Parallel Algorithms on Nearest Neighbor Search
By BERKAY AYDIN, Georgia State University
 K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local
Information into Global Optimization
By Chris Ding and Xiaofeng He
 Instance-based classifiers applied to medical databases: Diagnosis and knowledge
extraction
By Francesco Gagliardi, Department of Philosophy,
University of Rome
Thank You!...

Dynamically updated parallel k-NN

  • 1.
    Dynamically updated parallelk-NN search algorithm using MPI
  • 2.
    Keywords:  MPI (MessagePassing Interface)  Machine Learning  Classification  k-NN  Clustering  K-means
  • 3.
    # MPI (Message Passing Interface)  MPI is an industrial standard that specifies library routines needed for writing message passing programs  MPI uses a library approach to support parallel programming
  • 4.
    # Machine Learning Ability of a system to learn without being coded.  To get more proficient in performing a task from being familiar, with the time.
  • 5.
    # Classification  Supervisedmachine Learning Approach.  Supervised in the sense that we know the class of the training instances to which it belongs.  To predict the class of a test instance on the basis of some similarity measure.
  • 6.
    k-NN Classifier  Afamous classification approach  Based on the assumption that a instance has similar class to which it closest or close in the feature space.  To classify a test instance ,find its k-nearest neighbor according to some kind of similarity . Classify on the basis of majority .
  • 7.
    # Clustering  UnsupervisedMachine Learning Approach.  Unsupervised as we don’t know the class of the training instances to which they belong.  Partition the data so as instances in one group are more similar but too different from the others.
  • 8.
    K-means Clustering  Aclustering approach , to cluster the training instances into k-number of cluster. The value of k being provided by the user.  Clustering is done by a utilizing a SSE (sum of square error) function.
  • 9.
    Problems with thek-NN  Time complexity  Sensitive to the local structure of the data.  Curse of dimensionality.
  • 10.
    # Our ProposedApproach to solve k-NN parallel using MPI.  Pre-processing step  Perform clustering process on training set to divide it into p mutually exclusive partition {P1,P2,…,Pp}, where p is number of process .  Create the Representative Instance to represent each partition
  • 11.
    # Step- IInd Fori =1 to p  Apply k-means approach  Evaluate nearest neighbor similarity of training instances with representative instance(centroid) of each partition.  Perform  Competence Enhancement – Repeated Wilson Editing Rule (noise removal)  Competence Preservation (removal of superfluous instance)  Store the outliers of each cluster separately.  Update the centroid of the cluster.  Repeat step I & II until number of instances in the selected one partition >=k.
  • 12.
    # Step- IIIrd Take a test instance .  Select the partition whose R.I is closest to test instance.  Repeat until reach at the last sub-partition.  Apply the majority rule.  Select the class label who has majority for the test instance.
  • 13.
    # Updation oftraining set  When the similarity value of the new test instance with the R.I. of the different partition exceeds the max radius value which we store during the pre-processing step.  Update the R.I. of that partition only, which is closed to the new test instance
  • 14.
    # Research papersconsidered to design the dynamically updated parallel k-NN using MPI. # For Preprocessing (Clustering process ):  Efficient and Fast Initialization Algorithm for K-means Clustering By Mohammed El Agha, Wesam M. Ashour , Islamic University of Gaza, Gaza, Palestine  A new algorithm for initial cluster centers in k-means algorithm By Murat Erisoglu, Nazif Calis, Sadullah Sakallioglu Department of Statistics, Faculty of Science and Letters, Cukurova University, 01300 Adana, Turkey  An empirical comparison of four initialization methods for the K-Means algorithm By J.M. Pe~na *,1, J.A. Lozano, P. Larra~naga, Department of Computer Science and Artificial Intelligence, Intelligent Systems Group, University of the Basque Country, P.O. Box 649,E-20080 San Sebastian, Spain
  • 15.
    # For findingk-NN and removal of Noise and Superfluous Instances.  Fast Condensed Nearest Neighbor Rule By Fabrizio Angiulli ICAR-CNR, Via Pietro Bucci 41C, 87036 Rende (CS), Italy  Advances in Instance Selection for Instance-Based Learning Algorithms By HENRY BRIGHTON, Language Evolution and Computation Research Unit, Department of Theoretical and Applied Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK  CHRIS MELLISH, Department of Artificial Intelligence, The University of Edinburgh, Edinburgh EH1 1HN, UK
  • 16.
     Superlinear Parallelizationof k-Nearest Neighbor Retrieval By Antal van den Bosch Ko van der SlootILK Research Group Dept. of Communication and Information Sciences, Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg, The Netherlands  Parallel Algorithms on Nearest Neighbor Search By BERKAY AYDIN, Georgia State University  K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local Information into Global Optimization By Chris Ding and Xiaofeng He  Instance-based classifiers applied to medical databases: Diagnosis and knowledge extraction By Francesco Gagliardi, Department of Philosophy, University of Rome
  • 17.