Automated analysis of participant feedback

Deliverables


Understanding Participant Needs

Every year, my team issues data for an annual report sent to clients. This report also includes highlights from research I may have conducted with participants sponsored by their organization, highlighting the impact of the program and how we might further support them.


 
 

Research Design Choices

Unlike interview-based research, I was working with data previously collected in a survey. The large quantity of data, however, meant I needed a way to classify and analyze themes at an aggregate level. While evaluating the choice of an inductive reasoning approach, I also had to be mindful of potential limitations.


Creating a Custom Dictionary

Another decision point involved the purchase of a commercial dictionary. In our case, the data entered often yielded spelling mistakes, wordings borrowed from other languages and highly specialized vocabulary pertaining to healthcare. In addition, several terms were used with connotation, rather than denotation, as per their ascribed dictionary meaning. For this reason, I opted for the creation of a custom dictionary.


Clustering Vocabulary

The most labor-intensive part of dictionary design involved clustering vocabulary terms. I transferred the list of tokenized, stemmed terms into Mural and classified them manually, as I looked up each word into context. This generated a user-friendly tool which other team members can now access when requesting further research on the data set. It lists all terms classified under each theme.


Trade-Offs and Reproducibility

I have reused this code and method for other custom dictionaries I have created for our organization. While reducing the time spent building dictionaries, the initial manual classification activity takes time. One way of reducing this time is to add rules such as only examining words recurring more than once. For each project, there is a trade-off to be determined between the speed with which I am able to produce the new dictionary, and its level of accuracy.


GitHub Project (Public)


Repository (Private)


My Role

  • Researcher

  • Analyst

  • Programmer (R)

  • Project Manager