Come back after SxSW for a more detailed portfolio and navigation.
Deliverables
It is not uncommon to be asked to extract main themes from a participant data set. Since these requests may be hard to predict while being time-consuming, I built tools that would allow us to automatically identify topics in a timely manner. This machine-learning algorithm is one of them.
I selected topic modeling for this tool. The initial tests from the model failed to generate meaningful categories. To increase the likelihood of obtain context-specific, relevant words, I created a custom list of stopwords to replace the default list from R. This was a reiterative process.
You can find a more detailed account of these tests here.
In creating this alternate code, my goal was to evaluate this ML algorithm and see if it could outperform the rule-based, automated classification system I had previously designed. The rule-based system remained more reliable.
Wanting to showcase the results to clients or to our executives during presentations, I created a client-facing application that displayed the results from both codes. I used data visualization to summaries findings.
We now have three ways of labeling themes in our datasets. Manual labeling remains the norm for weekly operations. Annual reporting benefits from the rule-based and machine-learning labeling, to help with the large data volume.
Researcher
Analyst
Programmer (R)
Project Manager