Page tree
Skip to end of metadata
Go to start of metadata

In Research.fi portal's topic modeling unsupervised machine learning methods (topic modeling) were used to algorithmically cluster nearly 7 000 projects into topics based on their title, short project description, possible keywords, and fields of science. Several unsupervised topic modeling methods (contextualized topic model, top2vec, hierarchical stochastic block model, BERTopic) were compared to identify topics.

After comparisons, 92 topics were selected. Each project is classified to a single, most likely topic. Each topic is algorithmically labeled with their most significant keywords. As the keywords describe the common characteristics of each topic, they may not be fully descriptive for every project in the topic. Furthermore, all projects may not have been clustered correctly into their most appropriate topic. Therefore, the topic modeling results are not suitable for accurately determining the sizes of individual topics.

New projects are classified to the existing topics daily. When a considerable amount of new data is available, the topic modeling will be performed again to see whether entirely new topics should be added. In this case, the number of topics and topic keywords may change, and individual projects may move between topics.

More information on topic modeling methods

Contextualized topic models

  • Bianchi, F., Terragni, S., Hovy, D., Nozza, D., & Fersini, E. (2021). Cross-lingual Contextualized Topic Models with Zero-shot Learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 1676–1683). Association for Computational Linguistics.
  • https://github.com/MilaNLProc/contextualized-topic-models

Hierarchical stochastic block models

top2vec

BERTopic

The results of topic modeling are shown in Research.fi

New filter added in the Projects section ("Identified topics"), see image below

New visualizations in the Projects section's 'Show as image' feature (new theme: "Identified topics"), see images below


  • No labels