Tuesday, May 03, 2016

Topic Modeling with LDA

Rob McDaniel gave a nice presentation on the flaming-hot topic of topic analysis yesterday evening hosted by Seattle metastartup Pitchbook. Grab slides and code from the github repo.

Rob is interested in using NLP to discern the level of objectivity or bias in text. As an example, he took the transcripts of the debates of this year's presidential campaign. Here's part of what he did with them:

For more, have a look at the post on Semantic analysis of GOP debates.

Interesting tidbits:

  • Wikipedia is a source of documents labeled as not objective.
  • Movie reviews are a source of documents labeled by rating, number of stars.
  • Topic cohesion measures how well a given document stays "on-topic" or even "on-message".
  • KL Divergence is entropy based measure of relatedness of topics.

There was an interesting side discussion of the orthogonality of topic modeling and word embedding (word2vec).

Some of the sources Rob mentioned were Tethne and one of it's tutorials, also a pair of papers Introduction to Probabilistic Topic Models and Probabilistic Topic Models both by David Blei.