Health Conditions : Reconciling Structured and Unstructured Data Categorization

Deliverables


Imposing Deductive Logic

I am often asked for reports containing demographic data from a clients’ participants. Medical Conditions was a field that was particularly time-consuming to analyze, since it contains both structured and unstructured data. I used a dictionary-based approach, similar to the one in this project, but this time, imposing a deductive logic.

In this case, the classification relied on existing categories, contained in checkboxes on the form. Some participants entered the medical conditional manually instead of checking the box. Others entered a synonym for the condition, for example a term in popular language or a scientific name, while others described symptoms. Analyzing the data, I mapped these instances to the established categories.


Challenges from this Deductive Approach

Navigating unstructured data with a deductive logic meant running into issues I had not previously encountered with an inductive, grounded-theory based approach.

Working with a SME

To help accurate group conditions, I worked with a colleague who was knowledgeable about conditions. Acting as a subject matter expert, she helped resolve some of the harder-to-classified and uncommon entries.


Low Maintenance Cost

This dictionary, once built, no longer requires re-iteration or maintenance. Unlike the other custom dictionaries, its content is unlikely to change even as we collect more surveys.


GitHub Project (Public)


Repository (Private)


My Role

  • Researcher

  • Analyst

  • Programmer (R)

  • Project Manager