A Flask app that computes topics from news articles. It uses BERTopic for topic modeling.
Features:
- Choose a time period and what parts of an article to include (headline, teaser, etc.) and compute most dominant topics for this period
- Visualize top-ten topics and their representation as well as how many articles of a given topic have been published in each medium
- Generate summaries for each topic using an LLM
- If a topic isn't concise enough, you can use the drill down function to compute child-topics from a given topic
Demonstration (video is sped-up):
topic_modeling.mp4
Future features:
- Improve LLM summaries and labels
- Pre-train topic clusters (possibly schedule them so they run automatically)
- Expose routes as API endpoints
- Build a proper JS frontend
- Deploy a demo
- Add more news sources