Newest 'topic-modeling' Questions

-1 votes

1 answer

94 views

Unsupervised Topic Modeling for Short Event Descriptions

I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...

Arthur GONAY

9

asked Apr 16 at 11:17

0 votes

1 answer

105 views

MiniBatchKMeans BERTopic not returning topics for half of data

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...

Matthieu B

17

asked Feb 18 at 17:42

0 votes

0 answers

43 views

Calculating Topic Correlations or Coocurrences for keyATM

I have been playing around with the keyATM package extensively, however unfortunately there is no approach how to calculate topic correlations and cooccurences, once the model is calculated. I already ...

dpaltra22

1

asked Feb 4 at 20:17

0 votes

1 answer

111 views

Correct topics from LDA Sequence Model in Gensim

Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...

hyco

221

asked Jan 21 at 16:24

1 vote

1 answer

161 views

Inspect all probabilities of BERTopic model

Say I build a BERTopic model using from bertopic import BERTopic topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20) topics, probs = topic_model.fit_transform(docs) Inspecting probs gives me ...

coolhand

2,109

asked Dec 20, 2024 at 20:49

0 votes

0 answers

41 views

importing util library failed

i am trying to pip install bertopic command for installing and usng bertopic model, here is my next code : from bertopic import BERTopic topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia&...

user4356954

asked Dec 11, 2024 at 20:12

0 votes

0 answers

97 views

Unhashable type when calling HuggingFace topic model `topic_labels_` function

If I try to follow the topic modeling tutorial at: https://huggingface.co/docs/hub/en/bertopic The first few lines give me an error: from bertopic import BERTopic topic_model = BERTopic.load("...

coolhand

2,109

asked Dec 9, 2024 at 16:59

0 votes

0 answers

58 views

Topic modelling outputs are gender biased?

Has anyone had this issue? My topic modelling seems to be presenting responses that are very dominated by male respondents. The volume of responses across three different questions is over 800 in each ...

GrBrn

3

asked Oct 29, 2024 at 11:32

0 votes

1 answer

67 views

Stopwords problem in text data preprocessing in Python

I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...

deniz

11

asked Oct 20, 2024 at 13:10

0 votes

0 answers

45 views

Cannot find AIC/BIC of my topic modelling after using "lda.collapsed.gibbs.sampler" in LDA package

I have used "lda.collapsed.gibbs.sampler" to do my topic modelling and LDA visualisation, and now I want to determine which number of models (K) best fits my model. Then I tried to use AIC/...

Pang kalok

19

asked Oct 7, 2024 at 19:55

4 votes

1 answer

510 views

Topic modelling many documents with low memory overhead

I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...

Bbrk24

1,053

asked Sep 24, 2024 at 21:57

0 votes

1 answer

47 views

How to extract terms and probabilities from tmResult$terms in topic modeling?

I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities. I ...

NoaMi

41

asked Aug 8, 2024 at 12:16

0 votes

1 answer

107 views

How is coherence score calculated in Mallet?

I do understand how the diagnostics output shows the coherence values for each topic but my values range between -150 and -600 and other posts that I have seen where Mallet was used show coherence ...

Glorifier

31

asked Jul 6, 2024 at 14:33

0 votes

1 answer

69 views

Inconsistent Results When Running Python Mallet/Gibb's Sampling as a Soft-Clustering Method to Identify Optimal Number of Topics

Sorry, but I am inexperienced with Mallet and could use some help. I am currently trying to use Mallet as a soft-clustering technique to assign group membership for a given set of terms contained ...

A Bolton

1

asked Jun 24, 2024 at 0:15

0 votes

1 answer

84 views

R + quanteda + automatic detection of topics: error when running model

I have a set of many (around 20 thousand) short job descriptions in English. My purpose for now is to be able to detect their optimal number of topics. I use an R script which worked decently on a ...

larry77

1,543

asked Jun 18, 2024 at 13:14

0 votes

0 answers

56 views

Errors attaching metadata to corpus

I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...

Nicolette

1

asked Jun 14, 2024 at 20:00

0 votes

0 answers

154 views

LDA Error in x$terms %||% attr(x, "terms")

everyone. I can't understand why is giving me an error. Later on, the code was working with no errors. Packages are: quanteda, quanteda.texmodels, quanteda.textstats, quanteda.textplots, newsmap, ...

Diego Gimenez

1

asked Jun 3, 2024 at 10:59

2 votes

3 answers

94 views

Find matching rows in dataframes based on number of matching items

I have two topic models, topics1 and topics2. They were created from very similar but different datasets. As a result, the words representing each topic/cluster as well as the topic numbers will be ...

Adam_G

7,979

asked May 23, 2024 at 21:06

1 vote

0 answers

50 views

RStudio stm package Error in makeTopMatrix(prevalence, data)

I am receiving the following error message: Error in makeTopMatrix(prevalence, data) : Error creating model matrix. This could be caused by many things including explicit calls to a namespace within ...

Violet Massie-Vereker

11

asked May 16, 2024 at 16:15

2 votes

0 answers

138 views

R stm package plot custom labels font size

In page 19 of the stm tutorial, Figure 6: Graphical display of topical prevalence contrast https://cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf How to change the font size of the ...

James

45

asked May 9, 2024 at 0:34

2 votes

0 answers

54 views

Top2Vec model gets stuck on Colab

I'm trying to implement Top2Vec on Colab. The following code is working fine with the dataset "https://raw.githubusercontent.com/wjbmattingly/bap_sent_embedding/main/data/vol7.json" ...

PS Nayak

423

asked Apr 30, 2024 at 6:28

0 votes

1 answer

158 views

How to assign topics to individual documents/ tweets in Bi-term Topic Modeling?

I am a newbie at this, so I apologize if I am asking the obvious here. I ran a bi-term topic modeling algorithm to model short text data and discover topics among them. I am using LDAvis package to ...

vaibhavchutani

1

asked Mar 15, 2024 at 12:04

0 votes

1 answer

217 views

topic modeling from quotes

Based on the folloiwng link : quotes with help of following code(this site was based on javascript, so first i have disabled it) import selenium from selenium import webdriver from selenium....

user4356954

asked Mar 8, 2024 at 21:51

0 votes

1 answer

124 views

stm Structural Topic Model - estimateEffect returns only 10 years

I ran an stm topic model and used estimateEffect: prep <- estimateEffect(1:20 ~ Party + s(Year), model, meta = out$meta, uncertainty = "Global") What is shown in ...

Astrid

1

asked Mar 7, 2024 at 18:31

0 votes

1 answer

134 views

ImportError: cannot import name 'remove_stopwords' from partially initialized module 'gensim.parsing.preprocessing'

I have Python 3.12.2 and gensim 4.3.2 but when I tried to use Import gensim in my python code I got the error below: ImportError Traceback (most recent call last) Cell In[...

Saifu

118

asked Mar 6, 2024 at 12:44

2 votes

1 answer

1k views

BERTopic: "Make sure that the iterable only contains strings"

I'm still fairly new to Python so this might be easier than it appears to me, but I'm stuck. I'm trying to use BERTopic and visualize the results with PyLDAVis. I want to compare the results with the ...

Dominik

23

asked Jan 19, 2024 at 13:36

0 votes

1 answer

1k views

Trying to transcribe audio files in R

i'm new to R and trying to use a script in order to transcribe audio files. I found this terrific person, who proposes a solution for audio transcription. https://www.bnosac.be/index.php/blog/105-...

mehmetkay-sudo

1

asked Dec 24, 2023 at 12:55

1 vote

1 answer

1k views

Summarization and Topic Extraction with LLMs (private) and LangChain or LlamaIndex using flan-t5-small

has anyone used Langchain or LlamaIndex imports to deal with single documents that amount to >512 tokens? Yes, I know there are other approaches to dealing with it, but it is difficult to find ...

Ja4H3ad

61

asked Dec 19, 2023 at 2:26

2 votes

1 answer

128 views

BERTopic: add legend to term score decline

I plot the term score decline for a topic model I created on Google Colab with BERTopic. Great function. Works neat! But I need to add a legend. This parameter is not specified in the topic_model....

Simone

625

asked Dec 15, 2023 at 10:07

0 votes

1 answer

49 views

Tracing terms in topic models to their full-text version in R

How does one retrieve full-text examples of the terms making up a topic model? The goal is to get to know more context of what the ngram is about, to help assign labels better. To achieve this, the ...

Connor

1

asked Dec 7, 2023 at 14:02

-1 votes

1 answer

1k views

Bert topic clasiffying over a quarter of documents in outlier topic -1

I am running Bert topic with default options import pandas as pd from sentence_transformers import SentenceTransformer import time import pickle from bertopic import BERTopic llm_mod = "all-...

RM-

1,028

asked Nov 28, 2023 at 12:32

1 vote

0 answers

50 views

Keyatm covariate model gives me same result of the predicted mean of the document-topic distribution for four different country categories

I have four different groups of countries, I would assume that the predicted mean of the document-topic distribution differs over countries. Yet i get the same results. I run this code (pretty much in ...

topicmodler

11

asked Nov 28, 2023 at 9:09

0 votes

1 answer

484 views

R: stm + searchK fails to determine the optimal number of topics

Please have a look at the self-contained example at the end of the post. I simplified the reprex and you can download the dfm (document-feature matrix) from https://e.pcloud.link/publink/show?code=...

larry77

1,543

asked Nov 14, 2023 at 12:45

0 votes

1 answer

401 views

R: Quanteda+LDA, how to Visualise the Results?

Please have a look at the snippet at the end of this post. I run a simplified tutorial example of topic modeling with quanteda, but once the model has finished running, I find it difficult to extract ...

larry77

1,543

asked Oct 30, 2023 at 14:55

1 vote

0 answers

223 views

BERTopic Visualization in dark

I want to change the default visualizations within BERTopic to display a dark theme rather than a white or bright theme. Basically I'm trying to do: import plotly.io as pio pio.templates.default ...

RobjSky

31

asked Oct 30, 2023 at 9:50

0 votes

1 answer

146 views

Referring to "short texts" in topic modelling and natural language processing, what is the definition of the length of a short text?

When it comes to "short texts" in topic modelling and natural language processing, what exactly is the definition of a short text? I have not been able to find a definitive answer. Could ...

Junhao

143

asked Oct 25, 2023 at 20:39

0 votes

1 answer

426 views

Long text topic modelling differences

I have some very long documents. They have overall topics that are fairly standard, but each document will emphasise the topics differently AND within those topics they will have different subtopics I ...

LeanneB

69

asked Oct 20, 2023 at 9:27

0 votes

1 answer

111 views

Problem with visualizing topics with pyLDAvis

I have a problem by running this code and cant get a visualisation from the topics and words from the LDA model. Anyone who knows how to solve this problem. I get the following warning "...

Brandon Haak

1

asked Oct 9, 2023 at 12:23

-1 votes

2 answers

45 views

How to assign column names?

I am writing a code for Topic modeling. I received this error. install.packages("tm") install.packages("topicmodels") library(tm) library(topicmodels) docs <- Corpus(...

Junaid

1

asked Sep 12, 2023 at 8:57

2 votes

0 answers

127 views

How to implement TorchDrift's Drift Detection for Monitoring Separate Text Embedding Distributions Across Multiple Topics?

I'm working on a project involving text data with multiple topics, and I want to use the Kernel Maximum Mean Discrepancy (Kernel MMD) for drift detection on text embeddings for each topic separately. ...

Matteo Citterio

74

asked Sep 1, 2023 at 9:15

1 vote

0 answers

294 views

Plotting a structural topic model - how to allow for discontinuity over time

I am running a structural topic model using the stm package in R. My model includes an interaction effect between faction_id and numeric_date (a measure of time). I am using the following code to ...

Elisa Benni

11

asked Aug 23, 2023 at 15:10

1 vote

0 answers

115 views

Python BERTopic 'numpy.float64' object cannot be interpreted as an integer

I am trying to replicate the Topic Modeling exercise from this article titled NLP Tutorial: Topic Modeling in Python with BerTopic. The article comes from the website HackerNoon if you'd prefer to ...

user432299

23

asked Aug 14, 2023 at 22:53

0 votes

1 answer

667 views

Clustering topics and naming the cluster in Python

I have millions of topics in my data. These topics are one to 12 words. For instance 'Cancer Biology and Genetics' could be one topic and 'Regenerative medicines' could be another. I want to create ...

abhi

55

asked Aug 11, 2023 at 17:34

4 votes

3 answers

831 views

Jupyter keeps crashing when using BERTopic's fit_transform()

topics, probs = topic_model.fit_transform(docs) Whenever I run fit_transform like in the line above, my Jupyter notebook keeps dying, and I don't know why. I am using Python 3.9.15 on a macOS 13.4.1 ...

Jethro R. Lee

41

asked Aug 8, 2023 at 5:04

2 votes

1 answer

85 views

How to import excel file in mallet

I have excel file that contains posts title of stack overflow posts. My excel sheet have more than 10,000 lines. Therefore it is not possible to make separate txt for each row. If I copy my excel data ...

I192058 Misbah Minhas

35

asked Aug 2, 2023 at 4:32

1 vote

1 answer

782 views

AttributeError: 'TfidfVectorizer' object has no attribute 'get_feature_names' [duplicate]

I'm trying to visualize the LDA Topics using the pyLDAvis library I'll be using sklearn.decomposition's LatentDirichletAllocation` My sklearn's version: 1.2.2 The error: AttributeError: '...

userrr

279

asked Jul 10, 2023 at 17:01

-1 votes

1 answer

500 views

Integrate GridSearchCV with LDA Gensim

Data Source: Glassdoor reviews split into two dataframe columns "Pros" & Cons" - Pros refer to what the employees liked about the company - Cons refer to what the ...

userrr

279

asked Jun 30, 2023 at 17:40

1 vote

1 answer

936 views

Getting an error from hdbscan while importing bertopic

I'm trying to import bertopic but it gives the following error. I tried different versions and re create a new environment. But it's still same. I'm using Apple M2 Pro processor lib version BERTopic 0....

Salihcan

131

asked Jun 30, 2023 at 12:56

3 votes

1 answer

143 views

Error in posterior function when running LDA

I am trying to conduct topic modelling on a dataset. I follow standard procedure, clean the data, tokenize, create a dtm and apply the LDA function (topics <- tidy(my_topic_model, matrix = "...

Abdul Aziz Rajper

53

asked May 31, 2023 at 20:28

1 vote

0 answers

497 views

What if I have too many documents labelled in -1 cluster in bertopic?

I'm generating topics using bertopic on multilingual dataset (mainly Russian and English). I'm reducing the number of topics to 140. After generating topics, I'm analyzing its quality using the ...

ApaarBawa

85

asked May 25, 2023 at 16:03

Collectives™ on Stack Overflow