Adding analysis feature: Topic modelling #40

LaChapeliere · 2020-10-01T20:06:44Z

First step is research. LDA seems like a promising method but needs to be adapted for tweets ( see issue #7 ). word2vec is interesting too, but requires a manual step to code the themes of the most common related words (ask @LaChapeliere for more details about that). Other technics can be explored too.
For each method, the implementation's accuracy should be evaluated in some way. The doc should suggest the best preprocessing parameters. The implementation should allow users to split the data according to time periods and compare results over time (the data-splitting part should be made part of the preprocessing module, since it will be common to several analysis pipelines).
See the old implementation of LDA and word2vec in the resiliency_challenge-legacy branch, and related issues #6 and #5.

LaChapeliere added enhancement New feature or request research-needed Requires literature review major Requires a significative amount of work labels Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding analysis feature: Topic modelling #40

Adding analysis feature: Topic modelling #40

LaChapeliere commented Oct 1, 2020

Adding analysis feature: Topic modelling #40

Adding analysis feature: Topic modelling #40

Comments

LaChapeliere commented Oct 1, 2020