I have millions of topics in my data. These topics are one to 12 words. For instance 'Cancer Biology and Genetics' could be one topic and 'Regenerative medicines' could be another. I want to create clusters of similar topics and name them. I tried BERT+K-Means to cluster these topics and it works fine. I don't know much about NLP and want to use the best approach to this. I also do not have a way to name these clusters in a way that it makes sense and represents the cluster. Please advise
1 Answer
I'm not aware of a standard automated method for naming clusters - part of the work in doing topic modeling and unsupervised learning in general is to do post-analysis and decide whether the topics/clusters make sense and what they "are about". There are definitely approaches in research, however, e.g. Automatic Labelling of Topics with Neural Embeddings.
Depending on what you are trying to do, you might be able to cheat and give the list of words to a generative AI model (i.e. just ask ChatGPT) for suggestions to start / get some initial results quickly.