You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Note: instead of repeatedly computing centroid distances for each data point, we only need to compute the distances for the most recent centroid and to maintain a hash of closest distance results...
208
210
dapply(d2,dist,npts,ndims,buffer,centroids[j-1]);
209
211
csum=0.0;// total cumulative distance
212
+
ind=0;
210
213
for(i=0;i<npts;i++){
211
-
ind=2*i;
212
214
if(d2[i]<dhash[ind]){
213
215
dhash[ind]=d2[i];
214
216
dhash[ind+1]=j-1;
215
217
csum+=d2[i];
216
218
}else{
217
219
csum+=dhash[ind];
218
220
}
221
+
ind+=2;// +stride
219
222
}
220
223
// Compute the cumulative probabilities...
221
224
probs[0]=dhash[0]/csum;
225
+
ind=2;
222
226
for(i=1;i<npts;i++){
223
-
probs[i]=probs[i-1]+(dhash[2*i]/csum);
227
+
probs[i]=probs[i-1]+(dhash[ind]/csum);
228
+
ind+=2;// +stride
224
229
}
225
230
// Based Arthur's and Vassilvitskii's paper "kmeans++: The Advantages of Careful Seeding" (see conclusion), randomly select candidate centroids and pick the candidate which minimizes the total squared distance...
0 commit comments