Dynamically updated parallel k-NN

Dynamically updated parallel k-NN
search algorithm using MPI

Keywords:
 MPI (Message Passing Interface)
 Machine Learning
 Classification
 k-NN
 Clustering
 K-means

# MPI ( Message Passing Interface)
 MPI is an industrial standard that specifies library routines needed for writing
message passing programs
 MPI uses a library approach to support parallel programming

# Machine Learning
 Ability of a system to learn without being coded.
 To get more proficient in performing a task from being familiar, with the
time.

# Classification
 Supervised machine Learning Approach.
 Supervised in the sense that we know the class of the training instances to
which it belongs.
 To predict the class of a test instance on the basis of some similarity measure.

k-NN Classifier
 A famous classification approach
 Based on the assumption that a instance has similar class to which it closest
or close in the feature space.
 To classify a test instance ,find its k-nearest neighbor according to some kind
of similarity . Classify on the basis of majority .

# Clustering
 Unsupervised Machine Learning Approach.
 Unsupervised as we don’t know the class of the training instances to which
they belong.
 Partition the data so as instances in one group are more similar but too
different from the others.

K-means Clustering
 A clustering approach , to cluster the training instances into k-number of
cluster. The value of k being provided by the user.
 Clustering is done by a utilizing a SSE (sum of square error) function.

Problems with the k-NN
 Time complexity
 Sensitive to the local structure of the data.
 Curse of dimensionality.

# Our Proposed Approach to solve k-NN
parallel using MPI.
 Pre-processing step
 Perform clustering process on training set to divide it into p mutually
exclusive partition {P1,P2,…,Pp}, where p is number of process .
 Create the Representative Instance to represent each partition

# Step- IInd
For i =1 to p
 Apply k-means approach
 Evaluate nearest neighbor similarity of training instances with representative
instance(centroid) of each partition.
 Perform
 Competence Enhancement – Repeated Wilson Editing Rule (noise removal)
 Competence Preservation (removal of superfluous instance)
 Store the outliers of each cluster separately.
 Update the centroid of the cluster.
 Repeat step I & II until number of instances in the selected one partition >=k.

# Step- IIIrd
 Take a test instance .
 Select the partition whose R.I is closest to test instance.
 Repeat until reach at the last sub-partition.
 Apply the majority rule.
 Select the class label who has majority for the test instance.

# Updation of training set
 When the similarity value of the new test instance with the R.I. of the different
partition exceeds the max radius value which we store during the pre-processing
step.
 Update the R.I. of that partition only, which is closed to the new test instance

# Research papers considered to design the
dynamically updated parallel k-NN using MPI.
# For Preprocessing (Clustering process ):
 Efficient and Fast Initialization Algorithm for K-means Clustering
By Mohammed El Agha, Wesam M. Ashour ,
Islamic University of Gaza, Gaza, Palestine
 A new algorithm for initial cluster centers in k-means algorithm
By Murat Erisoglu, Nazif Calis, Sadullah Sakallioglu
Department of Statistics, Faculty of Science and Letters,
Cukurova University, 01300 Adana, Turkey
 An empirical comparison of four initialization methods for the K-Means algorithm
By J.M. Pe~na *,1, J.A. Lozano, P. Larra~naga,
Department of Computer Science and Artificial Intelligence,
Intelligent Systems Group, University of the Basque Country,
P.O. Box 649,E-20080 San Sebastian, Spain

# For finding k-NN and removal of Noise and Superfluous Instances.
 Fast Condensed Nearest Neighbor Rule
By Fabrizio Angiulli
ICAR-CNR, Via Pietro Bucci 41C, 87036 Rende (CS), Italy
 Advances in Instance Selection for Instance-Based Learning Algorithms
By HENRY BRIGHTON, Language Evolution and Computation
Research Unit, Department of Theoretical and Applied
Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK
 CHRIS MELLISH, Department of Artificial Intelligence,
The University of Edinburgh, Edinburgh EH1 1HN, UK

 Superlinear Parallelization of k-Nearest Neighbor Retrieval
By Antal van den Bosch Ko van der SlootILK Research Group
Dept. of Communication and Information Sciences,
Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg,
The Netherlands
 Parallel Algorithms on Nearest Neighbor Search
By BERKAY AYDIN, Georgia State University
 K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local
Information into Global Optimization
By Chris Ding and Xiaofeng He
 Instance-based classifiers applied to medical databases: Diagnosis and knowledge
extraction
By Francesco Gagliardi, Department of Philosophy,
University of Rome

Dynamically updated parallel k-NN

More Related Content

What's hot

Similar to Dynamically updated parallel k-NN

Dynamically updated parallel k-NN