Cultivating a networked AI community @Penn.

AI Hacks 2020 Recap


About the Hackathon

AI Hacks 2020, the University of Pennsylvania’s premier student run datathon, was held from November 30 to December 7, 2020, and hosted over 160 participants from over 30 universities across the world. Teams analyzed data sets of hundreds of thousands of real customer interactions to develop new product features for NeuroFlow, an award-winning behavioral health platform.


Logo dark


About NeuroFlow

NeuroFlow is a healthcare technology and analytics company enabling behavioral health access and engagement across the continuum of care.

Combining validated techniques, data science, and behavioral economics, NeuroFlow helps leading insurance, healthcare, and government organizations deliver personalized, evidence based behavioral health solutions.


The Challenge

NeuroFlow’s behavioral health platform provides a wide range of videos to promote mindfulness and self care to its patients. Teams worked to develop a patient video recommender system to optimize the user experience on the platform.

Upon the start of the challenge period, teams were provided access to approximately 140,000 data points on patient interactions with videos on NeuroFlow’s platform. This dataset includes demographic information about the users and key metadata about the videos.

After the conclusion of the challenge period, finalist teams were invited to present their solutions to NeuroFlow representatives and researchers at the Wharton Customer Analytics Initiative.


The Winning Solution

The winning team developed a sophisticated model to handle data processing, feature engineering, dimensionality reduction, and neural network prediction.

To categorize videos that were missing topic labels, the team analyzed notes about the videos using Multinomial Naive Bayes and transcripts of the videos using Latent Dirichlet Allocation.

This dual categorizing method proved highly accurate and allowed for videos to be assigned to multiple categories when applicable. The model then performed PCA dimensionality reduction on both the video space and user space. This allowed the model to identify only the most important features and prevent overfitting to spurious variables.

To make a video recommendation, the algorithm takes a user in the user space outputted by the PCA, trains a neural network on a user and its older watch history, and uses a softmax classifier to predict the most recent video watched.

See the full presentation here to learn more about the winning model.


Technical Workflow

The winning team developed a sophisticated model to handle data processing, feature engineering, dimensionality reduction, and neural network prediction.

To categorize videos that were missing topic labels, the team analyzed notes about the videos using Multinomial Naive Bayes and transcripts of the videos using Latent Dirichlet Allocation.

This dual categorizing method proved highly accurate and allowed for videos to be assigned to multiple categories when applicable. The model then performed PCA dimensionality reduction on both the video space and user space. This allowed the model to identify only the most important features and prevent overfitting to spurious variables.

To make a video recommendation, the algorithm takes a user in the user space outputted by the PCA, trains a neural network on a user and its older watch history, and uses a softmax classifier to predict the most recent video watched.

See the full presentation here to learn more about the winning model.

Logo dark