Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
1 answer
94 views

I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...
Arthur GONAY's user avatar
0 votes
1 answer
105 views

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
Matthieu B's user avatar
0 votes
0 answers
43 views

I have been playing around with the keyATM package extensively, however unfortunately there is no approach how to calculate topic correlations and cooccurences, once the model is calculated. I already ...
dpaltra22's user avatar
0 votes
1 answer
111 views

Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...
hyco's user avatar
  • 221
1 vote
1 answer
161 views

Say I build a BERTopic model using from bertopic import BERTopic topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20) topics, probs = topic_model.fit_transform(docs) Inspecting probs gives me ...
coolhand's user avatar
  • 2,109
0 votes
0 answers
41 views

i am trying to pip install bertopic command for installing and usng bertopic model, here is my next code : from bertopic import BERTopic topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia&...
user avatar
0 votes
0 answers
97 views

If I try to follow the topic modeling tutorial at: https://huggingface.co/docs/hub/en/bertopic The first few lines give me an error: from bertopic import BERTopic topic_model = BERTopic.load("...
coolhand's user avatar
  • 2,109
0 votes
0 answers
58 views

Has anyone had this issue? My topic modelling seems to be presenting responses that are very dominated by male respondents. The volume of responses across three different questions is over 800 in each ...
GrBrn's user avatar
  • 3
0 votes
1 answer
67 views

I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...
deniz's user avatar
  • 11
0 votes
0 answers
45 views

I have used "lda.collapsed.gibbs.sampler" to do my topic modelling and LDA visualisation, and now I want to determine which number of models (K) best fits my model. Then I tried to use AIC/...
Pang kalok's user avatar
4 votes
1 answer
510 views

I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...
Bbrk24's user avatar
  • 1,053
0 votes
1 answer
47 views

I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities. I ...
NoaMi's user avatar
  • 41
0 votes
1 answer
107 views

I do understand how the diagnostics output shows the coherence values for each topic but my values range between -150 and -600 and other posts that I have seen where Mallet was used show coherence ...
Glorifier's user avatar
0 votes
1 answer
69 views

Sorry, but I am inexperienced with Mallet and could use some help. I am currently trying to use Mallet as a soft-clustering technique to assign group membership for a given set of terms contained ...
A Bolton's user avatar
0 votes
1 answer
84 views

I have a set of many (around 20 thousand) short job descriptions in English. My purpose for now is to be able to detect their optimal number of topics. I use an R script which worked decently on a ...
larry77's user avatar
  • 1,543
0 votes
0 answers
56 views

I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...
Nicolette's user avatar
0 votes
0 answers
154 views

everyone. I can't understand why is giving me an error. Later on, the code was working with no errors. Packages are: quanteda, quanteda.texmodels, quanteda.textstats, quanteda.textplots, newsmap, ...
Diego Gimenez's user avatar
2 votes
3 answers
94 views

I have two topic models, topics1 and topics2. They were created from very similar but different datasets. As a result, the words representing each topic/cluster as well as the topic numbers will be ...
Adam_G's user avatar
  • 7,979
1 vote
0 answers
50 views

I am receiving the following error message: Error in makeTopMatrix(prevalence, data) : Error creating model matrix. This could be caused by many things including explicit calls to a namespace within ...
Violet Massie-Vereker's user avatar
2 votes
0 answers
138 views

In page 19 of the stm tutorial, Figure 6: Graphical display of topical prevalence contrast https://cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf How to change the font size of the ...
James's user avatar
  • 45
2 votes
0 answers
54 views

I'm trying to implement Top2Vec on Colab. The following code is working fine with the dataset "https://raw.githubusercontent.com/wjbmattingly/bap_sent_embedding/main/data/vol7.json" ...
PS Nayak's user avatar
  • 423
0 votes
1 answer
158 views

I am a newbie at this, so I apologize if I am asking the obvious here. I ran a bi-term topic modeling algorithm to model short text data and discover topics among them. I am using LDAvis package to ...
vaibhavchutani's user avatar
0 votes
1 answer
217 views

Based on the folloiwng link : quotes with help of following code(this site was based on javascript, so first i have disabled it) import selenium from selenium import webdriver from selenium....
user avatar
0 votes
1 answer
124 views

I ran an stm topic model and used estimateEffect: prep <- estimateEffect(1:20 ~ Party + s(Year), model, meta = out$meta, uncertainty = "Global") What is shown in ...
Astrid's user avatar
  • 1
0 votes
1 answer
134 views

I have Python 3.12.2 and gensim 4.3.2 but when I tried to use Import gensim in my python code I got the error below: ImportError Traceback (most recent call last) Cell In[...
Saifu's user avatar
  • 118
2 votes
1 answer
1k views

I'm still fairly new to Python so this might be easier than it appears to me, but I'm stuck. I'm trying to use BERTopic and visualize the results with PyLDAVis. I want to compare the results with the ...
Dominik's user avatar
  • 23
0 votes
1 answer
1k views

i'm new to R and trying to use a script in order to transcribe audio files. I found this terrific person, who proposes a solution for audio transcription. https://www.bnosac.be/index.php/blog/105-...
mehmetkay-sudo's user avatar
1 vote
1 answer
1k views

has anyone used Langchain or LlamaIndex imports to deal with single documents that amount to >512 tokens? Yes, I know there are other approaches to dealing with it, but it is difficult to find ...
Ja4H3ad's user avatar
  • 61
2 votes
1 answer
128 views

I plot the term score decline for a topic model I created on Google Colab with BERTopic. Great function. Works neat! But I need to add a legend. This parameter is not specified in the topic_model....
Simone's user avatar
  • 625
0 votes
1 answer
49 views

How does one retrieve full-text examples of the terms making up a topic model? The goal is to get to know more context of what the ngram is about, to help assign labels better. To achieve this, the ...
Connor's user avatar
  • 1
-1 votes
1 answer
1k views

I am running Bert topic with default options import pandas as pd from sentence_transformers import SentenceTransformer import time import pickle from bertopic import BERTopic llm_mod = "all-...
RM-'s user avatar
  • 1,028
1 vote
0 answers
50 views

I have four different groups of countries, I would assume that the predicted mean of the document-topic distribution differs over countries. Yet i get the same results. I run this code (pretty much in ...
topicmodler's user avatar
0 votes
1 answer
484 views

Please have a look at the self-contained example at the end of the post. I simplified the reprex and you can download the dfm (document-feature matrix) from https://e.pcloud.link/publink/show?code=...
larry77's user avatar
  • 1,543
0 votes
1 answer
401 views

Please have a look at the snippet at the end of this post. I run a simplified tutorial example of topic modeling with quanteda, but once the model has finished running, I find it difficult to extract ...
larry77's user avatar
  • 1,543
1 vote
0 answers
223 views

I want to change the default visualizations within BERTopic to display a dark theme rather than a white or bright theme. Basically I'm trying to do: import plotly.io as pio pio.templates.default ...
RobjSky's user avatar
  • 31
0 votes
1 answer
146 views

When it comes to "short texts" in topic modelling and natural language processing, what exactly is the definition of a short text? I have not been able to find a definitive answer. Could ...
Junhao's user avatar
  • 143
0 votes
1 answer
426 views

I have some very long documents. They have overall topics that are fairly standard, but each document will emphasise the topics differently AND within those topics they will have different subtopics I ...
LeanneB's user avatar
  • 69
0 votes
1 answer
111 views

I have a problem by running this code and cant get a visualisation from the topics and words from the LDA model. Anyone who knows how to solve this problem. I get the following warning "...
Brandon Haak's user avatar
-1 votes
2 answers
45 views

I am writing a code for Topic modeling. I received this error. install.packages("tm") install.packages("topicmodels") library(tm) library(topicmodels) docs <- Corpus(...
Junaid's user avatar
  • 1
2 votes
0 answers
127 views

I'm working on a project involving text data with multiple topics, and I want to use the Kernel Maximum Mean Discrepancy (Kernel MMD) for drift detection on text embeddings for each topic separately. ...
Matteo Citterio's user avatar
1 vote
0 answers
294 views

I am running a structural topic model using the stm package in R. My model includes an interaction effect between faction_id and numeric_date (a measure of time). I am using the following code to ...
Elisa Benni's user avatar
1 vote
0 answers
115 views

I am trying to replicate the Topic Modeling exercise from this article titled NLP Tutorial: Topic Modeling in Python with BerTopic. The article comes from the website HackerNoon if you'd prefer to ...
user432299's user avatar
0 votes
1 answer
667 views

I have millions of topics in my data. These topics are one to 12 words. For instance 'Cancer Biology and Genetics' could be one topic and 'Regenerative medicines' could be another. I want to create ...
abhi's user avatar
  • 55
4 votes
3 answers
831 views

topics, probs = topic_model.fit_transform(docs) Whenever I run fit_transform like in the line above, my Jupyter notebook keeps dying, and I don't know why. I am using Python 3.9.15 on a macOS 13.4.1 ...
Jethro R. Lee's user avatar
2 votes
1 answer
85 views

I have excel file that contains posts title of stack overflow posts. My excel sheet have more than 10,000 lines. Therefore it is not possible to make separate txt for each row. If I copy my excel data ...
I192058 Misbah Minhas's user avatar
1 vote
1 answer
782 views

I'm trying to visualize the LDA Topics using the pyLDAvis library I'll be using sklearn.decomposition's LatentDirichletAllocation` My sklearn's version: 1.2.2 The error: AttributeError: '...
userrr's user avatar
  • 279
-1 votes
1 answer
500 views

Data Source: Glassdoor reviews split into two dataframe columns "Pros" & Cons" - Pros refer to what the employees liked about the company - Cons refer to what the ...
userrr's user avatar
  • 279
1 vote
1 answer
936 views

I'm trying to import bertopic but it gives the following error. I tried different versions and re create a new environment. But it's still same. I'm using Apple M2 Pro processor lib version BERTopic 0....
Salihcan's user avatar
  • 131
3 votes
1 answer
143 views

I am trying to conduct topic modelling on a dataset. I follow standard procedure, clean the data, tokenize, create a dtm and apply the LDA function (topics <- tidy(my_topic_model, matrix = "...
Abdul Aziz Rajper's user avatar
1 vote
0 answers
497 views

I'm generating topics using bertopic on multilingual dataset (mainly Russian and English). I'm reducing the number of topics to 140. After generating topics, I'm analyzing its quality using the ...
ApaarBawa's user avatar

1
2 3 4 5
20