1,177 questions
4
votes
2
answers
158
views
Align a dendrogram with faceted time series
I am doing some time series clustering, and would like to align the dendrogram with the time series shapes. This is almost there:
library(ggplot2)
library(reshape2)
library(stats)
library(patchwork)
...
3
votes
3
answers
189
views
How do I cluster/group duplicate customer objects in my list based on email OR mobile in C#
For my current project in C#, I am tasked with fetching customer details from a data source, 'cleansing' said customers (making sure the name is capitalised correctly, mobile formatted correctly, etc.)...
2
votes
0
answers
96
views
How to correctly calculate distance and similarity for each step in hierarchical clustering (Ward.D2)?
I am grouping my data using the ward.D2 hierarchical clustering method in R.
I need to calculate the distance and similarity for each step, from 2 to 20 clusters.
Similarity is calculated using the ...
0
votes
0
answers
90
views
WGCNA analysis negative kME
I am doing a WGCNA analysis, and I am facing a lot of struggles.
I used a signed network during my bwnet analysis (TOMType = "signed").
I thought this meant that only positive correlations ...
-1
votes
1
answer
140
views
How can I perform image clustering effectively?
I have images of graph lines with trends, and I want to cluster similar trends together. However, after trying several clustering algorithms, they are not working as well as I expected. I believe that ...
0
votes
0
answers
58
views
Finding subclusters of a specific cluster
I performed HDBSCAN Clustering
hdbscan_clusterer = hdbscan.HDBSCAN(min_cluster_size=200)
df['Cluster'] = hdbscan_clusterer.fit_predict(data_matrix_for_clustering)
Now, I’m interested in getting the ...
0
votes
0
answers
124
views
Low silhouette scores in mixed data clustering: Impact of categorical variables and possible solutions?
I am clustering a dataset with both numerical and categorical variables. To handle the high dimensionality, I performed dimensionality reduction separately for both types of inputs, retaining 21 ...
0
votes
1
answer
55
views
De-AggregateObject and Change the aggregrated object from a Class in c++ implemented in Ns3.38
I implemented Vehicle Ad-hoc Network in NS3.38 and used default Node class in NS3. also I created Cluster Object that has some attributes. in some conditions I need to remove node from cluster and ...
0
votes
1
answer
138
views
How to create a dendrogram colored by clusters with hclust and cutreeDynamic
I'm working on a clustering problem and I would like to use the hclust functions to create the dendrogram and cutreeDynamic to create clusters from the mentioned dendrogram. In fact, I have already ...
0
votes
1
answer
177
views
How to use the dissimilarity matrix output from vegdist() function for hclust()?
I have computed the dissimilarity matrix using vegdist() function, and method specified as "morisita". However, even though hclust() function is built to read both distance or dissimilarity ...
1
vote
1
answer
99
views
How to offset, align and increase padding of dendogram labels
I am trying to create a dendogram in R. As of now, I have used the factoextra package, and specifically the fviz_dend function. The code is as follows:
num_data_scaled <- scale(num_data)
res.hc &...
1
vote
1
answer
53
views
Rotate x-axis value in dendogram using fviz_dend
I was trying to totate x-axis value (45 degree) in dendogram using fviz_dend function from factoextra package, but nothing works.
I also tried to follow the answer in this post rotating dendogram x ...
1
vote
1
answer
171
views
How to speed up R dist matrix for hierarchical clustering for large matrix input data?
I have a large matrix (approximately 35,000 x 35,000) and I'm preparing a distance object in R for hierarchical clustering. The base R function dist() is too slow, so I'm using the distances function ...
0
votes
0
answers
42
views
Is Riskfolio's HCPortfolio working right?
enter image description hereThis is a portion of my project where it yields the following graphic.
I noticed how the correlation between EWH and MCHI (which is usually move in parallel to each other) ...
1
vote
1
answer
602
views
How to use the Agglomerative Clustering algorithm from scikit-learn python library with a declared number of objects in the cluster?
I use the scikit-learn Agglomerative Clustering python library in my code to automatically cluster points and place a new, larger point in the center of the cluster. I have a set of several thousand ...
1
vote
0
answers
102
views
Key word argument "connectivity" in sklearn AgglomerativeClustering does not work as expected
In my Python code, I have a set of objects that I want to cluster based on a given distance matrix. However, there are some objects that should never end up in the same cluster. The number of clusters ...
0
votes
1
answer
41
views
Calculating mean of coordinates for each unique value in large dataset for hierarchical analysis
I am a beginner with R, but I have been analysing a large data set of GPS data, made up of unique individuals (name) (approx 100 unique names) with 1,000,000+ lines of data. Each unique name has ...
0
votes
1
answer
182
views
scipy: How to plot the hierarchical clustering tree
I am interested in plotting the tree represented by the output of hierarchy.to_tree().
To clarify my question, I give the following MWE:
import numpy as np
from scipy.cluster import hierarchy
from ...
0
votes
1
answer
102
views
How to configure in build keepalived of opensips?
I am trying to configure opensips keepalived for nodes handling in cluster. I followed the instrustions mentioned in this guide. https://controlpanel.opensips.org/htmldoc_9_X_X/keepalived.html. After ...
1
vote
0
answers
33
views
Set sample points for each cluster in kmeans using Python
So I'm working on a project where I am using embeddings generated form Universal Sentence Encoder and giving them as input to kmeans clustering present in sklearn.cluster.
The problem is that I ran ...
1
vote
1
answer
477
views
In scikit-learn's agglomerative clustering algorithm how would you get all the intermediate clusters?
I am running this relatively straightforward algorithm.
if I understand the algorithm correctly if you cluster to, say, 8 clusters, you should had the results for all clusters above 8, right?
Would ...
1
vote
1
answer
278
views
Computing p-values when using pvclust with Bray-Curtis similarity
Using species abundance data recorded for multiple samples, I want to create a dendrogram where branches represent the similarity of samples. The distance measure should be Bray-Curtis-similarity. For ...
0
votes
1
answer
319
views
Clustering for the Protein sequences (With/without MSA)
I have NGS data (Unique clones only) and I want to group them based on the similarity (clustering is preferable) using Python language. Please have a look into the below sample sequences. Also to ...
1
vote
1
answer
100
views
Sequence alignment for hierarchical cluster analysis on categorical sequence data
I have a dataset of short-term behaviors displayed by 30 individuals.
#Load packages
library(TraMineR)
# Function to generate a random non-numerical sequence
generate_random_sequence <- function(...
0
votes
2
answers
720
views
Clustering data using scipy and a distance matriz in Python
I am working in Python. I am using a binary dataframe in which I have a ser of values of 0 and 1 for diferent users at diferent times.
I can perform hierarchical clustering directly from the dataframe ...
0
votes
0
answers
89
views
How to change the dendrogram labels in python
I have this:
And I want to have 25 clusters. So I used this:
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 25, affinity = 'euclidean', linkage = 'ward')...
1
vote
0
answers
76
views
Is there a way to keep hierarchical clustering order constant across groups for correlation matrices?
I have created multiple correlational matrices for age ranges (e.g. 1-5, 5-10, 10-15 years old). However, when I do hierarchical clustering using ggcorrplot for example, running ggcorrplot(...
1
vote
1
answer
608
views
Can the total Entropy of all clusters be greater than 1, after classification?
After doing k-means classification on a dataset (value of k = 3), I tried to find out the total entropy of all the clusters. (Total number of datapoints, or, the total length of the dataset was : 500)
...
1
vote
0
answers
69
views
color_branches from dendextend package retrieving overlapping subdendrograms
I am trying to use the color_branches function from dendextend package to color the dendrogram branches of my heatmap create with complexheatmap.
I bumped into a strange behaviour, as you can see in ...
0
votes
0
answers
83
views
Can't replicate clustermap plot using just sns.heatmap
Given a linkage matrix Z the resulting heatmap I get from g = sns.clustermap(corr) is different from the heatmap I get using sns.heatmap(corr[np.ix_(g.dendrogram_row.reordered_ind, g.dendrogram_row....
1
vote
0
answers
88
views
Clustering of variable length trajectories in R
I have a dataset that comprises several spatial path trajectories of variable lengths (i.e. a time series of X & Y coordinates). I am looking to group these based on the similarities of their ...
0
votes
1
answer
196
views
Dendrogram to a directed graph/tree
I am trying to convert a dendrogram into a graph/tree to perform calculations with its nodes, leaves and subtrees and find a miximal subgraphs, but I have not found a function/package that helps me in ...
1
vote
0
answers
68
views
Error calculating Dunn's Index in c++ using Armadillo library
I have been trying to find Dunns index using the Armadillo library for a larger algorithm I'm working on. Whenever I run the code, I get an output Dunns index:-nan(ind) and an error saying I'm out of ...
1
vote
0
answers
95
views
Complication with clustree image generation
Running the below code (initially without par (lwd = 1) line ) returned the following error, and the same error was returned again after setting par(lwd = 1) or to 0.5 or 10. Truthfully don't totally ...
2
votes
1
answer
134
views
Pair-cluster across many variables, respecting pre-existing grouping variable
I have a tibble with an id column, a G grouping variable, and 300 numeric variables.
I want a method that clusters the raws to the point that each row is matched/paired in a cluster with another ...
0
votes
1
answer
282
views
How to get the maximum which n_clusters(param of hdbscan.flat.HDBSCAN_flat()) can specify
Question 1
I got a Warning UserWarning: HDBSCAN can only compute 3 clusters. Setting n_clusters to 3... when I specified param n_clusters=4, using HDBSCAN_flat(). Can I get the max_eom_clusters ...
0
votes
0
answers
90
views
adding variables to cfviz_cluster
I have built a cluster plot with fviz_cluster. shown in attached image.
fviz_cluster(hcpc, palette = c("blue", "red", "black"),
ellipse.type = "convex&...
0
votes
1
answer
164
views
Is it possible to pre-specify a cluster structure and then merge it according to common clustering criteria in R?
Suppose that I had a dataset where I've done a cluster analysis with, say, k=9. Perhaps this has been from a k-means or I've just done a complete linkage hierarchical agglomeration or I eyeballed it ...
1
vote
0
answers
33
views
KMeans / ValueError: Found input variables with inconsistent numbers of samples:
My initial data is :
data_init = pd.read_csv('data_merged.csv')
Total periode to cover 25 months
initial_period_data = data_init[(data_init['order_purchase_timestamp'] >= earliest_timestamp) & (...
1
vote
0
answers
23
views
Distance measurement methods used in Hierarchical Clustering
In Hierarchical Clustering, what are the distance measurement methods used? Are different measurement methods used depending on the purpose?
If performing Hierarchical Clustering for Region Proposal, ...
1
vote
0
answers
115
views
Problem with number of groups in ROCK alghoritm in Python
I am doing data grouping in python. I was able to group the set seamlessly using the K-modes algorithm and got 4 groups.
Now it tries to do the same using the ROCK algorithm to compare the results. ...
0
votes
0
answers
56
views
Clustering high dimension data set
I have a dataset of 90000 rows and 200 columns. I trying to form clusters. I have reduced the size using pca. When try it in python I get MemoryError. I understand the problem is because of the no of ...
1
vote
0
answers
169
views
How to create a tanglegram in R that connects two dendrograms cut through a given number of clusters?
I have two dendrograms resulting from hierarchical clustering and I want to visually compare them using a tanglegram. However, I only want to display a certain number of clusters that are cut at a ...
2
votes
0
answers
301
views
Hierarchical bootstrap clustering using ClusterBootstrap
I have a dataset in which I am measuring area of between 10-40 microglia for some 25ish subjects, each measured in 3 different slices of tissue. I want to do hierarchical clustering to ask if the area ...
0
votes
0
answers
87
views
Changing color of dendrogram branches Python
I have succeeded in changing the color of leaf labels in my dendrogram according to its classification, but I want to follow this coloring upwards.
from scipy.cluster import hierarchy
import ...
0
votes
2
answers
61
views
Convert 2d array with 3 columns into a hierarchical structure with no associative keys
I have data like this:
Which can also be seen as this PHP array:
$items = [
['item1' => 'a', 'item2' => 'c', 'item3' => 'h'],
['item1' => 'a', 'item2' => 'c', 'item3' => 'i']...
1
vote
0
answers
189
views
For a dendrogram plot in R, how to add custom text while coloring the branches according to the clusters?
I have this dendrogram:
hc <- hclust(dist_s, method = 'average')
At first, I was showing it with dendextend:
dend = as.dendrogram(hc)
par(mar = c(3, 2, 2, 8))
dend %>%
set("labels_cex&...
-1
votes
3
answers
150
views
How to extract a hierarchical structure from a flat table in SQL Server database
I want to extract a hierarchical structure from a table in the sql server database. Table look similar to this with levels going till lvl 10:
lvl1
lvl2
lvl3
I want to extract a hierarchical structure ...
1
vote
0
answers
623
views
Is there a way to better optimize a hierarchical agglomerative clustering algorithm without using most sklearn libraries?
I am trying to write a hierarchical agglomerative clustering algorithm from scratch without using most of the related libraries, and my program does successfully put the test set into the closest ...
0
votes
0
answers
305
views
Is Neighbor-Joining Clustering Availalble in SciPy
I would like to use scipy.cluster.hierarchy to perform neighbor joining on a distance matrix. However, I have been unable to locate in the documentation that this is an available option. The reason ...