Newest 'cluster-analysis' Questions

0 votes

0 answers

63 views

Clustering without all pairwise distances

I have a set of binarized images containing forms, each image follows one of N layouts. There are a few outliers which do not follow a layout and contain random text and images. The distance between ...

sebastian

1,818

asked Nov 19 at 22:02

0 votes

0 answers

73 views

FlowSOM randomly stops because of missing consensus.pdf

I am using FlowSOM() Clustering from the FlowSOM and am getting an error while a vectorized function is running: Error in map2(): ℹ In index: 8. ℹ With name: FileID8. Caused by error in map() at ...

Mikey

9

asked Nov 17 at 17:43

0 votes

0 answers

34 views

How can I reconcile multiple related documents (invoices, returns, and credit notes) with inconsistent data?

I need some help with a fairly complex task I’ve been assigned: document reconciliation between different types of records. In short, I have to match documents with different “causal codes”: 2: Goods ...

H3doX

1

asked Nov 13 at 17:23

0 votes

0 answers

52 views

Selective Inference on Ordinal Clustering

I've been using an ordered stereotype (OSM) approach to ordinal clustering with the R library 'clustord' clustord is very well-documented with step-by step vignette. Therefore, to execute row ...

EB3112

339

asked Oct 1 at 10:17

0 votes

0 answers

47 views

How to avoid overmerging with mclust, and failure to reproduce clustering?

I have been working with mclust, and have encountered issues that I can't find an obvious reason for. My main concern is that the threshold for multiple components to be found seems really high, and I ...

atelopus

1

asked Sep 17 at 10:42

0 votes

0 answers

31 views

How to programmatically handle container partition redistribution in GridDB cluster after node failure?

Question GridDB Container Partition Recovery After Node Failure I'm working with a 3-node GridDB cluster and need to implement automatic recovery logic when one node fails. My application creates ...

Muhammad Saleem

11

asked Aug 28 at 9:50

0 votes

0 answers

57 views

Changing post and line colour in deg patterns cluster figures

I have had cluster plots produced for some RNA Seq time course data using the LRT analysis. I believe the plots are produced using the command: clusters <- degPatterns(cluster_rlog, metadata = meta,...

Rob Staruch

43

asked Aug 28 at 9:31

5 votes

3 answers

246 views

Efficiently group rows within tolerance for multiple numeric columns

I'm trying to group rows that have values within specific error/tolerance. Input looks like this: input <- data.frame(Row_number = 1:22, Name = c(rep("A",6), rep("...

Jennifer

317

asked Aug 19 at 0:12

0 votes

0 answers

42 views

Conditional logistic regression with robust standard errors for data matched with replacement

I am working with matched case-control data that used risk-set sampling with replacement (a control can be matched to more than one case). I am trying to figure out the correct syntax for conditional ...

user28632583

1

asked Jul 28 at 23:47

2 votes

1 answer

159 views

How to dynamically partition a 2D array into boxes based on inverse area density?

Context: I have a 2D array (size N x M), let's call it U, where each cell contains a non-negative value K ≥ 0 representing a "density" at that point. I want to algorithmically divide the ...

JC Denton

21

asked May 12 at 19:06

0 votes

1 answer

46 views

Spatial clustering with two separate datasets

I'm hoping to get some advice on approaching a clustering problem. I have two separate spatial datasets, being real data and modelled data. The real data contains a binary output (0,1), which is ...

jonboy

392

asked Mar 28 at 3:48

1 vote

1 answer

126 views

How to set minimum-maximum load constraint in Google Route Optimization API

I'm using Google RO API to create clusters. There is a capacity constraint on the clusters and the clusters should not overlap with each other. To do this, I've set the load demand of each shipment to ...

Darsh Patel

43

asked Mar 24 at 5:36

0 votes

1 answer

177 views

DragonFly benchmark: slow on Cluster

I need help regarding dragonfly db, particularly benchmarking. So here is the story, I tried benchmarking dragonfly as a cache to replace redis. I got the expected result when testing single node; it ...

amzshow

58

asked Mar 13 at 6:24

3 votes

5 answers

154 views

Combine connected list elements to form distinct list elements

I need to combine interconnected list elements to form distinct elements in base R with no additional packages required (while removing NA and zero-length elements). Edit: I look for flexibility of ...

Peter

2,473

asked Feb 27 at 10:01

1 vote

1 answer

163 views

Capacitated Clustering using Google Route Optimization API

Fixed sized clusters I need help with a capacitated clustering task. I have 400 locations (the number can vary each time), and I need to create fixed-size clusters (e.g., 40 locations per cluster). ...

Darsh Patel

11

asked Feb 24 at 14:50

0 votes

0 answers

31 views

Cluster lat/lon values based on values

I'm trying to cluster values from a map in Python (these values could be income, kindness towards dogs or amount of penguins in supermarkets, for me the values are floats) from different data sources. ...

Auke Van Der Woude

11

asked Feb 16 at 16:07

0 votes

0 answers

58 views

Finding subclusters of a specific cluster

I performed HDBSCAN Clustering hdbscan_clusterer = hdbscan.HDBSCAN(min_cluster_size=200) df['Cluster'] = hdbscan_clusterer.fit_predict(data_matrix_for_clustering) Now, I’m interested in getting the ...

name0

1

asked Feb 12 at 4:28

0 votes

0 answers

48 views

Evaluating Fuzzy clustering quality

Initially, I performed kmeans clustering and obtained some meaningful clusters. To refine these clusters, I ran Fuzzy C Means on the Kmeans center using "e1071" package. Are there any ...

Mary

221

asked Feb 6 at 15:17

2 votes

1 answer

186 views

Clustering lines in bands

Little intro I have data (link at the bottom), with on the y-axis the score, x-axis the position, for different labels. Now I want to know if there is one label that is "significantly" ...

CodeNoob

1,840

asked Dec 29, 2024 at 15:55

1 vote

2 answers

95 views

Using Python to group similar values from pair combinations

I have a list of paired values. Values in each pair are declared as similar, meaning two values are considered similar if they appear together in a pair from the list. My goal is to create a list of ...

Ömer Faruk Güllüoğlu

11

asked Dec 26, 2024 at 13:03

0 votes

0 answers

39 views

Mapbox Maps iOS - Show unclustered image as an icon in the clustered layer

I'm using GeoJSONSource to show images on the map (like images on the map in Apple Photos). Those images are loaded from the FeatureCollection object and first thing I do is to add them to map style. ...

Alzemic

11

asked Dec 23, 2024 at 14:30

0 votes

0 answers

84 views

clustering in Lavaan for structural equation model

I am trying to fit SEM in lavaan that includes both a measurement and structural model. The measurement model consists of six latent variables, which serve as outcomes in the structural model. The ...

Quy Pham

1

asked Dec 17, 2024 at 14:40

2 votes

0 answers

78 views

Fuzzy C-means : All clusters centers converge to the same point after the first centroids update

I am implementing Fuzzy C-means to work with image segmentation following the given algorithm : However when updating the centroids (this is the first thing that I do) all clusters centers converge ...

Albert4224

103

asked Dec 9, 2024 at 8:48

1 vote

0 answers

27 views

Cluster detection error (Invalid date in population file ) using rsatscan library purly spatial and discrete Poisson

I'm trying to run a purely spatial analysis using SaTScan in R, but I'm getting date-related errors even though I'm not using any temporal data. Here's the error: Error: Invalid date '775' in ...

Sarahk

11

asked Dec 5, 2024 at 22:22

1 vote

1 answer

75 views

How to use WeightedCluster to aggregate sequences and apply on Multichannel sequence analysis

I have 54399 cases, and 2 channels (HOM and HOS), and I want to use multichannel sequence analysis, the data example is as follows: ID HOM1 HOM2 HOM3 HOM4 HOS1 HOS2 HOS3 HOS4 1 A A B C NO YES NO NO 2 ...

Fanny0000

11

asked Dec 2, 2024 at 17:50

1 vote

0 answers

49 views

use a priority queue to do hierarchical clustering without import heapq

I am using priority queue to do the hierarchical clustering(can not import heapq), and want to use the complete-link method, but I don't know what is the problem of my code, the reason is far from ...

吳思覦

11

asked Nov 30, 2024 at 15:12

-1 votes

1 answer

184 views

get this error when i run the k-means code -> AttributeError: 'NoneType' object has no attribute 'split'

from sklearn.cluster import KMeans cs = [] for i in range(1, 11): kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0) kmeans.fit(X) cs.append(...

Niubie

1

asked Nov 29, 2024 at 6:29

0 votes

1 answer

104 views

Community Detection with both Node and Edge Weights

I have a directed graph where there are importance or weight attributes for both the nodes and edges. I am looking for a community or module detection implementation in python that will consider both ...

abbas786

401

asked Nov 26, 2024 at 18:57

1 vote

1 answer

211 views

K-Means taking a long time

I'm using k-means for my project for the first time. my dataset has more than 400,000 rows and 11 columns, I run the k-means for k= 3, 5, 7, 9, and 10. it took more than 65 minutes and still no output....

Joud

7

asked Nov 20, 2024 at 20:46

4 votes

3 answers

157 views

Filter rows based on combined set of values in a string

In R, I have the following dataframe with the column "overlap" listing rows that have overlapping values on some other column. df <- data.frame(overlap = c("1,2,3", "1,2,3&...

bcrew

105

asked Nov 18, 2024 at 18:35

0 votes

0 answers

181 views

Efficient parallelization of silhouette score calculation

I have a large dataset (2 million rows, 100 columns), and I need to perform clusterization. I used the elbow method to determine the optimal number of clusters. However, in order to get a more refined ...

AbliusKarfax

1

asked Nov 12, 2024 at 12:26

0 votes

1 answer

87 views

How to solve "Duplicated samples have been found in X" error for DBCV metric

I'm trying to compute the DBCV metric (provided by "git+https://github.com/FelSiq/DBCV") on density-based clusters from a dataset similar to the one shown here: The calculation is performed ...

giuseppe sabino

1

asked Nov 5, 2024 at 22:49

1 vote

0 answers

55 views

Clustering for grouping sentences and then caption the cluster with a short name

I have a series of text utterances in summary form (form of sentences). I am trying to perform clustering and group them with similarity in context (not in literal meaning) and report the clusters ...

eashwar natarajan

71

asked Oct 30, 2024 at 4:58

0 votes

1 answer

60 views

Problems with creating a mathematical clustering model with an additive criterion in CPLEX OPL Studio

I'm trying to create a model in CPLEX OPL Studio for clustering with an additive criterion, but I have a number of errors that I don't know how to fix correctly, because I'm very bad at OPL Studio ...

Zraf Ker

3

asked Oct 27, 2024 at 19:03

-2 votes

1 answer

40 views

Clustering a Grid into segments of equal length

I have a grid with many interconnections. The grid consists of edges of different length. I would like to cluster this grid into segments of similar length. The edges which are summarized in a cluster ...

Matthias

1

asked Oct 24, 2024 at 12:26

1 vote

0 answers

45 views

how to visualize collective behaviour of self propelled rod in two dimension?

multiple no. of self propelled rods (modelled using odd number of connected hard spheres ) with a fixed self propelled velocity is moving in a medium (2D) with three different diffusion constants for ...

Anonymous One

11

asked Oct 17, 2024 at 10:37

0 votes

1 answer

138 views

How to create a dendrogram colored by clusters with hclust and cutreeDynamic

I'm working on a clustering problem and I would like to use the hclust functions to create the dendrogram and cutreeDynamic to create clusters from the mentioned dendrogram. In fact, I have already ...

José Adrián Pardo Pérez

25

asked Oct 12, 2024 at 18:03

0 votes

1 answer

177 views

How to use the dissimilarity matrix output from vegdist() function for hclust()?

I have computed the dissimilarity matrix using vegdist() function, and method specified as "morisita". However, even though hclust() function is built to read both distance or dissimilarity ...

Sukhraj Kaur 1910115

1

asked Oct 12, 2024 at 6:28

3 votes

1 answer

59 views

How can I find contour or edges in my picture with opencv in Python3?

I want to detecting the three rectangles(white, gray, black) in this picture, like below image. I tried to use find_contour function in opencv for Python, but the light gray stripes disturbed find ...

lksj

129

asked Oct 2, 2024 at 9:16

1 vote

2 answers

152 views

Clustering longitudinal data with labels?

I have longitudinal data as follows: import pandas as pd # Define the updated data with samples only in 'sample_A' or 'sample_B' data = { 'gene_id': ['gene_1', 'gene_1', 'gene_1', 'gene_1', '...

donkey

1,458

asked Sep 26, 2024 at 4:54

1 vote

1 answer

107 views

Clustering geometries recursively exceeds cluster size limit

I want each cluster to have a maximum of 20 items. Here is my code in PostgreSQL with PostGIS extension: WITH RECURSIVE clustered_data AS (-- Step 1: Perform initial clustering SELECT pma.* ...

Ray92

463

asked Sep 25, 2024 at 8:16

4 votes

1 answer

510 views

Topic modelling many documents with low memory overhead

I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...

Bbrk24

1,053

asked Sep 24, 2024 at 21:57

0 votes

1 answer

545 views

What is the interpretation of this wavy T-SNE plot?

I am trying the T-SNE method to explore high-dimensional datasets and reduce its dimensionality. And I have ended up with the following plot. I have used the TSNE parameters n_components=2 and init='...

skan

7,790

asked Sep 20, 2024 at 15:55

0 votes

1 answer

56 views

Spring Boot 3 Session Clustering Error Deployed on External Tomcat10

I use Spring Boot 3.x and an external Tomcat 10. Set up session clustering on an external Tomcat If I check on the jsp page, the session is shared, but If I check the same logic with spring boot ...

watercolor

1

asked Sep 18, 2024 at 12:09

-1 votes

1 answer

134 views

What features to extract to cluster text?

I want to make a classifier for text, which is further use to suggest the most similar text for a one given. The flow of the app is the following: extract the main 10 topics from the text, using a ...

will

161

asked Sep 11, 2024 at 14:56

0 votes

1 answer

153 views

pheatmap clustering order

I have this dataset: > dput(mdata2) structure(list(EE = c(3.3221428469822, 3.62699732299098, 1.75430154205983, 0.809228977410138, 1.24117055233438, 2.93403148663873, 4.01630566539058, 1....

Fabrizio

947

asked Sep 10, 2024 at 15:12

0 votes

1 answer

257 views

Clustering for SBERT embedding

I have a set of sentences which I have transformed into vectors using SBERT embedding. I would like to cluster these vectors. When looking for informations online, I keep seeing post telling to do ...

Alex Jax

1

asked Aug 29, 2024 at 15:11

2 votes

1 answer

96 views

How to delete edges based on cluster_edge_betweenness output

I want to do the same as asked here, using the first approach from the question. Sadly, the mods variable from the following line is not defined and I'm asking my self how to adjust: g2 <- delete....

Sulz

523

asked Aug 27, 2024 at 10:04

0 votes

0 answers

47 views

Centroids result in NONE while trying to re-implement the kmeans algorithm

I have defined specific function in my project of re-implementing the kmeans algorithm but at the point where the centroids are ment to be re-assigned and obtain newer values, they come out as NONE. ...

KIZ-MAN

33

asked Aug 25, 2024 at 21:11

0 votes

1 answer

52 views

Which model should I use to run K means clustering in h2o.ai?

I am using h2o.ai and a sample credit card dataset to run kmeans clustering. Which model should I use to run K means clustering in h2o.ai? I chose Unsupervised learning. There are 2 options with ...

user26844683

1

asked Aug 16, 2024 at 5:57

Collectives™ on Stack Overflow