Alexander
Levin-Koopman

Logo

As a lecturer in Applied Data Science at the University of Michigan School of Information, I'm passionate about solving problems through insights extracted from complex data sets. With a Master's degree in Applied Data Science (4.0 GPA) from the University of Michigan and a Bachelor's degree in Mathematics from the University of Colorado at Boulder, I've developed a strong foundation in statistical analysis, machine learning, and data visualization.
My academic background has equipped me with a unique ability to approach problems from multiple angles, think creatively, and communicate complex ideas effectively. While my experience is rooted in academia, I'm eager to apply my skills in a real-world setting and drive business outcomes through data-driven decision making.
I thrive on solving intricate problems and uncovering hidden patterns in data. My goal is to leverage my expertise to inform strategic decisions, optimize processes, and create value for organizations. If you're looking for a driven and analytical problem-solver who is passionate about data science, let's connect!

View My LinkedIn Profile
View My GitHub Profile

Hosted on GitHub Pages — Theme by orderedlist

Popularity of Youtube Cooking Videos Prediction

Project description: We focused our project on cooking videos specifically as we wanted to avoid highly polarizing topics, to make it more likely that the popularity of a video is based on the content not the subject. We used the youtube api and youtube-dl to gather the data, a CNN to transform the thumbnail into a numeric feature, and used PCA to transform the vectorized texts into informative numeric features. Finally we used developed a metric that was a PCA transform of a videos likes, favorites, and views. One of the key difficulties was splitting the datasets to ensure that there was no leakage. To do this the CNN was trained on a separate dataset that was not used to train the final model.


Figure 1: Training Results of the CNN

The CNN train and validate mean squared error during training on the video thumbnails.

CNN Model Definition | CNN Model Training | CNN Model testing


Figure 2: Feature Importance

Many of the features in this plot are engineered from the existing data, and the full description of each feature can be seen in Table B1 of Appendix B in our report.


Figure 3: Linear Regression Base Model

The output of a base linear regression model on the test dataset. We can see that there are two outliers severely skewing the results.


Figure 4: Ensemble Stacking Regressor

The stacking regressor on the test dataset, performed significantly better than the base linear regression model.


Final Modeling Training and Analysis | Final Dataset Construction | Full Report