This repo contains code to apply topic modeling on the SuperUserForum using the package gensim
The ipython notebook I used for my analysis can be found here
An interactive view of the project can be found here
Slides used when I presented to my class
- Download dataset Dump was downloaded from archive.org
Save and the folder to 'data' folder The data folder should have a file Posts.xml
- Build sqlitedb The xml file is huge and not easily accesible. In its current format, it is hard to get an entire page (question and answers).
The script analysis/makdeb.py makes a sqlite database
python analysis/makedb.py data/Posts.xml data/data.sqlite
- Analyze the results
ipython notebook
navigate to analysis/superuser.ipynb
-
bower,npm must be installed
-
Install bower dependencies
bower install
- install node dependencies
npm install
- run webapp
python runserver.py