Automated analysis of participant feedback

Deliverables


Machine-Learning as a Discovery Tool

It is not uncommon to be asked to extract main themes from a participant data set. Since these requests may be hard to predict while being time-consuming, I built tools that would allow us to automatically identify topics in a timely manner. This machine-learning algorithm is one of them.

Machine-learning as discovery Tool: Could I use ML to generate alternative groupings, as a mean of enriching out themes identification?  Image of a woman smiling and thinking.

Topic Modeling

I selected topic modeling for this tool. The initial tests from the model failed to generate meaningful categories. To increase the likelihood of obtain context-specific, relevant words, I created a custom list of stopwords to replace the default list from R. This was a reiterative process.

You can find a more detailed account of these tests here.


Evaluating the Model

In creating this alternate code, my goal was to evaluate this ML algorithm and see if it could outperform the rule-based, automated classification system I had previously designed. The rule-based system remained more reliable.


Client-Facing App

Wanting to showcase the results to clients or to our executives during presentations, I created a client-facing application that displayed the results from both codes. I used data visualization to summaries findings.


New Workflow Definition

We now have three ways of labeling themes in our datasets. Manual labeling remains the norm for weekly operations. Annual reporting benefits from the rule-based and machine-learning labeling, to help with the large data volume.


Github Project (Public)


Repository (Private)


My Role

  • Researcher

  • Analyst

  • Programmer (R)

  • Project Manager