Skip to content

cmishra/Text-Mining-Final-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-Mining-Final-Project

Code to that created the Presidential Documents LDA (to access it, click "Presidential Documents LDA" under the Deliverable dropdown).

Technical Details

Technical details with equations rendered properly are available if you follow the directions above. The technical details (with equations written in LaTeX form) are available below.

Powerpoint presentation available here.

Data set: Compiled Presidential Documents

Purpose: How can we quantitatively measure what the administration was focused on and how the focus changed over time?

Methodology: Use Latent Dirichlet Allocation (LDA) to find a set of topics on the data. Identify the number of topics using a greedy heuristic that maximizes Log-Likelihood.

Metric: Recall that within LDA, $\theta_d$ is the distribution of topics in document $d$. During a unit of time, the set of public interactions, statements, and documents released are $i$. Of a topic $t$, the relevance score $S_{t,i}=\sum_{d\in i)\phi_{d,t}$

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published