Clustering topics and naming the cluster in Python

Question

I have millions of topics in my data. These topics are one to 12 words. For instance 'Cancer Biology and Genetics' could be one topic and 'Regenerative medicines' could be another. I want to create clusters of similar topics and name them. I tried BERT+K-Means to cluster these topics and it works fine. I don't know much about NLP and want to use the best approach to this. I also do not have a way to name these clusters in a way that it makes sense and represents the cluster. Please advise

NLP from scratch · Accepted Answer · 2023-08-14 13:30:27Z

1

I'm not aware of a standard automated method for naming clusters - part of the work in doing topic modeling and unsupervised learning in general is to do post-analysis and decide whether the topics/clusters make sense and what they "are about". There are definitely approaches in research, however, e.g. Automatic Labelling of Topics with Neural Embeddings.

Depending on what you are trying to do, you might be able to cheat and give the list of words to a generative AI model (i.e. just ask ChatGPT) for suggestions to start / get some initial results quickly.

edited Aug 14, 2023 at 13:30

answered Aug 14, 2023 at 13:29

NLP from scratch

4062 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Clustering topics and naming the cluster in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related