Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding analysis feature: Topic modelling #40

Open
LaChapeliere opened this issue Oct 1, 2020 · 0 comments
Open

Adding analysis feature: Topic modelling #40

LaChapeliere opened this issue Oct 1, 2020 · 0 comments
Labels
enhancement New feature or request major Requires a significative amount of work research-needed Requires literature review

Comments

@LaChapeliere
Copy link
Contributor

First step is research. LDA seems like a promising method but needs to be adapted for tweets ( see issue #7 ). word2vec is interesting too, but requires a manual step to code the themes of the most common related words (ask @LaChapeliere for more details about that). Other technics can be explored too.
For each method, the implementation's accuracy should be evaluated in some way. The doc should suggest the best preprocessing parameters. The implementation should allow users to split the data according to time periods and compare results over time (the data-splitting part should be made part of the preprocessing module, since it will be common to several analysis pipelines).
See the old implementation of LDA and word2vec in the resiliency_challenge-legacy branch, and related issues #6 and #5.

@LaChapeliere LaChapeliere added enhancement New feature or request research-needed Requires literature review major Requires a significative amount of work labels Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request major Requires a significative amount of work research-needed Requires literature review
Projects
None yet
Development

No branches or pull requests

1 participant