Code to that created the Presidential Documents LDA (to access it, click "Presidential Documents LDA" under the Deliverable dropdown).
Technical details with equations rendered properly are available if you follow the directions above. The technical details (with equations written in LaTeX form) are available below.
Powerpoint presentation available here.
Data set: Compiled Presidential Documents
Purpose: How can we quantitatively measure what the administration was focused on and how the focus changed over time?
Methodology: Use Latent Dirichlet Allocation (LDA) to find a set of topics on the data. Identify the number of topics using a greedy heuristic that maximizes Log-Likelihood.
Metric: Recall that within LDA,