Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Topic Modeling - Implement Latent Dirichlet Allocation (LDA) and NMF #317

Open
HsiangNianian opened this issue Nov 17, 2024 · 0 comments

Comments

@HsiangNianian
Copy link
Member

Topic modeling is an unsupervised learning task used to discover abstract topics within a collection of documents. We'll implement LDA and NMF algorithms to extract topics from large text corpora.

Algorithm Choice: Should we implement both LDA and NMF for comparison, or focus on one?
Data Handling: How to preprocess the text (e.g., stop words removal, TF-IDF)?
Evaluation: How to evaluate topic coherence and interpretability?

Expected Outcome

  • Working implementations of LDA and NMF that can extract topics from a collection of documents.
  • Examples and usage guidelines for analyzing text corpora.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

1 participant