Newest 'similarity' Questions

1 vote

1 answer

56 views

Grouping landmak vectors by similarity (Python or R) - which is the simplest solution?

Pardon my verbosity but I think that I've much to explain. My English leaves something to desire. I'm 66 years old, from Italy, with some experience in programming, but I never ventured in the realm ...

Cesare Brizio

13

asked Nov 24 at 21:16

2 votes

0 answers

96 views

How to correctly calculate distance and similarity for each step in hierarchical clustering (Ward.D2)?

I am grouping my data using the ward.D2 hierarchical clustering method in R. I need to calculate the distance and similarity for each step, from 2 to 20 clusters. Similarity is calculated using the ...

pnlp

35

asked Aug 9 at 12:20

1 vote

1 answer

157 views

Neo4j vector similarity function

I'm trying to understand the difference between the vector.similarity.cosine Cypher function and the gds.similarity.cosine function in Neo4j. According to the Neo4j documentation, both are used to ...

Gal Shubeli

13

asked Jun 4 at 14:29

1 vote

1 answer

207 views

Rapidfuzz giving no matches but Fuzzywuzzy does

I have been developing a matching system which matches the rows of the client and our central database depending on similarity. I have used a hybrid approach where I needed to somehow map the Company, ...

Prabhjit Singh

21

asked Jun 4 at 6:21

1 vote

0 answers

174 views

How to compute text–image similarity under local inference with generative vision-language models (e.g. Qwen2.5-VL, Gemma 3)?

I’ve been working with Qwen2.5-VL and Gemma3 locally, and I need to measure the similarity between text and image embeddings—similar to CLIP/SigLIP—but I’m resource-limited and can’t spin up ...

H.H

11

asked May 28 at 22:51

1 vote

3 answers

94 views

In sql, group by using similar group_name

How can I perform a GROUP BY in SQL when the group_name values are similar but not exactly the same? In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "...

Ahamad

1

asked May 15 at 7:23

0 votes

0 answers

91 views

Use terra to calculate relative similarity of raster values between areas inside and outside of a group of polygons in R

This question builds on a helpful solution provided for calculating uniqueness across a SpatRaster -- Using terra in R to calculate & map similarity and uniqueness across cells of a large GDM-...

Sean Basquill

1

asked Feb 25 at 16:01

0 votes

1 answer

197 views

Using terra in R to calculate & map (dis)similarity and uniqueness across cells of a large GDM-based raster

I am looking to employ terra to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R with the raster package. The workflow involves spatial analyses of ...

Sean Basquill

1

asked Feb 17 at 0:29

0 votes

0 answers

41 views

Near Similarity and duplication detection

I have a ticketing system where people create ticket for their issue. When someone is trying to create a new ticket I have to search my elastic search to identify whether a similar ticket is already ...

Ramji

75

asked Jan 23 at 8:28

0 votes

1 answer

74 views

how to compare lists with their frequency

I'm interested in finding similarities between two lists. I have the count of duplicates in the first column, and the pattern is in the second column. What would be the most logical way to compare ...

rollTHERoad

1

asked Jan 8 at 13:41

0 votes

0 answers

99 views

wrong similarity search result on chroma db

I am appending each row in csv file into chromadb with format, such as below; #Acme1 prod #Acme1 line, AC1 #Acme2 prod #Acme2 line, AC2 embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-...

Orkun Gedik

1

asked Dec 23, 2024 at 5:00

0 votes

1 answer

74 views

catelog sentences into 5 words that represent them

I have dataframe with 1000 text rows. df['text'] I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1) every score will be in df["word1&...

rafine

469

asked Dec 19, 2024 at 10:16

1 vote

1 answer

64 views

Visualize species occurrence similarities of 5x5 grid-cells and soil samples in R

I have two data frames: ref_df containing for each row information about species and the latitude and longitude they was recorded and sample_df with sample names as rows and species names as columns,...

Elie Tièche

11

asked Dec 10, 2024 at 12:51

0 votes

1 answer

84 views

similarity from word to sentence after doing words Embedding

I have dataframe with 1000 text rows. I did word2vec . Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king". I ...

rafine

469

asked Dec 9, 2024 at 8:14

3 votes

1 answer

66 views

How can one obtain the "correct" embedding layer in BERT?

I want to utilize BERT to assess the similarity between two pieces of text: from transformers import AutoTokenizer, AutoModel import torch import torch.nn.functional as F import numpy as np tokenizer ...

Beitian Ma

33

asked Dec 4, 2024 at 3:15

1 vote

0 answers

32 views

Ideal method to do image similarity comparison between two binary images - Tactile Maps

I have a task for image similarity comparisons between two binary images of tactile maps (maps for the visually impaired to map out an area). The goal is to have an output score of how similar the two ...

Michael Liang

35

asked Nov 14, 2024 at 20:51

0 votes

0 answers

28 views

compare two blob colomn values to find percentage of similarity

how to compare two blob colomns on the same table ? For example: PIC1 contains a picture of a bridge taken from 5 meter distance, whereas PIC2 contains a picture of the same bridge taken from 10 meter ...

padjee

265

asked Nov 12, 2024 at 8:07

1 vote

1 answer

49 views

Fetch rows from PostgreSQL with rearranged words similar to a given string

I want to retrieve all rows from a PostgreSQL database that contain sentences similar to a provided string. The sentences in the database can have their words in any order (rearranged). How can I ...

Shivam Baldha

56

asked Oct 9, 2024 at 10:35

1 vote

0 answers

46 views

Directed Graph Edit Distance Computation Issue Using AStar Algorithm in Graph-Matching-Toolkit

I am comparing pairs of identical directed graphs represented in GXL format using the Graph-Matching-Toolkit(https://github.com/dzambon/graph-matching-toolkit). Since the graphs are identical, I ...

NoName Su

11

asked Oct 3, 2024 at 13:30

0 votes

1 answer

42 views

error in recontructuion function of SA attack algorithm for CB systems

I try to execute the attached code in the following link: https://github.com/biometricsecurity/Preimage-attack-on-BTP-template The code aims to measure the similarity attacks at CB systems. I have ...

GeGe

1

asked Aug 25, 2024 at 7:25

1 vote

1 answer

44 views

How to create a list of comparisons with an agentset?

The agents (called farms) in my model have peers (an agentset of farms they have links to). Each farm has an attribute called rotation, which is a list of values between 0 and 1 (same length for all ...

Bartosz Bartkowski

23

asked Aug 14, 2024 at 14:35

0 votes

1 answer

131 views

Implementation of Angular Metric for Shape Similarity (AMSS) with Python

Is there any existing optimized Python implementation of Angular Metric for Shape Similarity (AMSS)? Otherwise, could I approximate it by considering the derivative DTW and using cosine similarity ...

hfaila

1

asked Aug 12, 2024 at 12:37

1 vote

0 answers

53 views

Mismatch between Milvus Id and Filename

I am trying to build a small image search program and using Milvus as a database to store my embeddings, on trying to retrieve the result by matching the vector embeddings with the embeddings obtained ...

lazy panda

11

asked Aug 8, 2024 at 10:04

0 votes

1 answer

505 views

Why are my dimensions different when using OpenAi embeddings in Python?

I have a single Python function that I am using the embed JSON objects are different lengths. The issue I am having is that, somehow, the dimensions are different when comparing the vectors and I ...

Ken Tola

29

asked Jul 3, 2024 at 18:26

1 vote

0 answers

61 views

How can I store boolean array of size 3000 efficiently in milvus?

I have a boolean array which represents store-availability of retail products across 3000 different stores. so, my schema looks like below: product_id = FieldSchema( name="product_id", ...

Mohamed Niyaz

218

asked May 30, 2024 at 9:02

0 votes

0 answers

241 views

How can I perform accurate vector search for complex objects?

I have objects that have many attributes, example: item = { 'id': 123, 'name': 'Keyboard', 'price': 12, 'url': 'example.com', 'description': 'a keyboard that etc...', 'details': { 'color': '...

AbdulmohsenA

1

asked May 17, 2024 at 12:57

1 vote

0 answers

87 views

How can one output n vectors with unique metadata in a query with ChromaDB?

Following on the example here, one way to create a query of the collection from ChromaDB with filtering by a given type of metadata (i.e. "source_type") is results = collection.query( ...

user18959

11

asked May 12, 2024 at 5:23

0 votes

1 answer

203 views

Plot upper triangle correlation matrix with similarity scores using ggplot

I have a dataframe as given below: The table only has values from the upper triangle of a matrix. I want to plot a correlation plot (correlogram) where the colours show the correlation and size ...

Kamalika Ray

79

asked Apr 29, 2024 at 10:47

0 votes

0 answers

119 views

Multi-attribute similarity search across millions or records based on criteria

Problem description: I am trying to perform an efficient multi-attribute similarity search across millions of records in a database. However, my process requires an hierarchical order of criteria for ...

AK2001

1

asked Apr 24, 2024 at 9:42

1 vote

0 answers

79 views

Is it possible to compare multiple line graphs to give a sort of ' similarity rating'

So I am trying to measure data from a smartphone ambient light sensor (ALS). My goal is to be able to be able to look at the data and be able to infer the location of the device. To do this my plan is ...

nosilak0

11

asked Apr 3, 2024 at 15:36

1 vote

2 answers

150 views

similarity between two numpy arrays based on shape but not distance

import matplotlib.pyplot as plt import numpy as np from numpy.linalg import norm def cosine_similarity(arr1:np.ndarray, arr2:np.ndarray)->float: dot_product = np.dot(arr1, arr2) magnitude =...

Prashant

947

asked Mar 29, 2024 at 7:24

1 vote

2 answers

136 views

How to detect if two sentences are simmilar, not in meaning, but in syllables/words?

Here are some examples of the types of sentences that need to be considered "similar" there was a most extraordinary noise going on shrinking rapidly she soon made out there was a most ...

BLOCKCRAFT 2.0

13

asked Mar 29, 2024 at 2:08

1 vote

0 answers

650 views

Langchain FAISS | Any solutions or alternatives for similarity search on vector DBs for slightly repetitive short words with numerics?

So basically I am trying to search a cell line vector data base that has entries that look like this using langchain: ID: 253F1 AC: CVCL_B513 SY: NA OX: NCBI_TaxID=9606; ! Homo sapiens (Human) CA: ...

Nicholas Piccaro

11

asked Mar 19, 2024 at 18:55

-2 votes

1 answer

46 views

I have plots of points that I extract from an image. How can I determine a similarity measure between two different plots? [closed]

Each point has an x, y, and size. For example these should result in similar: Plot 1-A: Plot 1-B: And these should not result in similar: Plot 2-A: Plot 3-A: Are there any algorithms or ways to ...

vnnsnnd

3

asked Mar 12, 2024 at 7:14

0 votes

2 answers

191 views

Shared triples between two knowledge graphs

I want to compare two semantic Knowledge Graphs, to see if they have any triple in common, using cypher. MATCH (n1)-[r1]-(c1) MATCH (n2)-[r2]-(c2) WHERE r1.filePath = "../data/graph1.json" ...

biowhat

17

asked Mar 6, 2024 at 14:20

0 votes

1 answer

75 views

record matching/similarity calculation for numbers and characters

I have a dataframe structured the following way with much more rows and columns: Report_ID Block_ID Number Character 1 1 5 A 2 1 3 A 3 1 2 B 4 2 10 A 5 2 11 B 6 2 100 C 7 3 2 D 8 3 #NA A 9 3 8 D 10 3 ...

Marius

3

asked Mar 6, 2024 at 13:58

1 vote

2 answers

99 views

VBA collect consecutive similar cells in the row

I have a list of non conformities appeared in different time with different products. I need to find out similar problems. I already made sorting Now I need to get new sheet with similar rows with ...

Andrei Samoilov

13

asked Mar 5, 2024 at 5:54

2 votes

1 answer

358 views

Textual similarity between two tags in Nodejs

I want to rate the similarity between two tags. For example the words technology, computer and chip should have high similarity, a word like food should be low similarity. Given the recent ...

Sir hennihau

1,874

asked Feb 28, 2024 at 15:04

2 votes

0 answers

41 views

Get similarity within a column based on another column

I have a table with three columns: Source, Target, Similarity. The first two are strings, the last one is a float. This table came about by comparing source elements and target elements and finding ...

Mitsarien

55

asked Feb 28, 2024 at 11:06

0 votes

1 answer

2k views

SSIM (Structural similarity index measure) performance

I have a reference image A and 2 target images B and C , I tried to measure the SSIM as follows : (from a human vision perception A & B are from the same class) and A & C from different ...

Ziri

746

asked Feb 14, 2024 at 9:17

1 vote

0 answers

132 views

Similarity search in a python database using rdkit

How to run a similarity search in a database and the output should be a table with molecules which passed a specific treshold? I tried this query = sql.SQL(""" SELECT *, ...

Vincent_chem

11

asked Feb 6, 2024 at 15:18

0 votes

0 answers

75 views

How to quickly query similar text through postgresql?

I need to query the top few texts that are most similar based on the input content. The table structure is as follows： create table documents ( id bigserial primary ...

accbear

23

asked Jan 30, 2024 at 11:21

0 votes

1 answer

79 views

Trying to skip columns in loop if requirement isn't satisfied

I have this python code trying to find similarities between brands based on the types of products they sell and at what price point. One issue I'm running into is price, sales, and number of products ...

krizik

1

asked Jan 17, 2024 at 20:49

0 votes

2 answers

107 views

Creating a similarity matrix with jagged arrays

i have a dataframe as such. id action enc Cell 1 run,swim,walk 1,2,3 Cell 2 swim,climb,surf,gym 2,4,5,6 Cell 3 jog,run] 7,1 This table goes on for roughly 30k rows. After gathering all these actions, ...

Jacob

3

asked Jan 16, 2024 at 8:28

1 vote

0 answers

49 views

Find proportion of two vectors/arrays which overlap

I have two vectors, for example: a = 25,26,37,36,27,33,104,44,40,49,45,48,50,55,56,59,54,57,105,64,73,76,72,67,68,71,78,82,77,79,86,84,83,85,91,92,96,97,102,101,93,98,99,100,94,95,88,87,65,66,90,89,80,...

purecobalt

13

asked Jan 12, 2024 at 16:10

-3 votes

2 answers

3k views

Meaning behind 'thefuzz' / 'rapidfuzz' similarity metric when comparing strings

When using thefuzz in Python to calculate a simple ratio between two strings, a result of 0 means they are totally different while a result of 100 represents a 100% match. What do intermediate results ...

David Shaw

85

asked Jan 9, 2024 at 14:12

1 vote

0 answers

45 views

Algorithm to generate top N% of permutations with most dissimilarities

Let us consider a set of data with, say, 10 values. We need to estimate its characterstics by Monte Carlo method, that is with a large number of randomly generated permutated sets. If we'd consider ...

Stan

8,778

asked Jan 4, 2024 at 17:44

1 vote

1 answer

101 views

Maximum jaccard similarity in igraph

From the igraph documentation: "The Jaccard similarity coefficient of two vertices is the number of common neighbors divided by the number of vertices that are neighbors of at least one of the ...

Zachary

381

asked Dec 18, 2023 at 11:54

0 votes

2 answers

1k views

Langchain: ChromaDB: Not able to retrive large numbers of PDF files vector database from Chroma persistence directory

My programme is chatting with PDF files in a directory. Surprisingly the code works if there 5 PDF files in directory of 1 page each. But it doesn't work when there are 1000 files of 1 page each. It ...

Rajeshwar Singh Jenwar

1

asked Dec 11, 2023 at 10:02

2 votes

2 answers

105 views

Cosine similarity between each two rows in a dataframe

I have a data frame called text with two columns, year and text. Find the dput output below for an example: text <- structure(list(year = 2000:2007, text = c("I went to McDonald's and they ...

fsure

335

asked Dec 7, 2023 at 11:52

Collectives™ on Stack Overflow