Portfolio

Machine Learning and Data Science

Pure Python ML

This is a collection of Notebooks designed for educational purposes, implementing machine learning algorithms in pure python. These are not meant for production but to provide a deeper understanding of the algorithms that are behind the the models we use everyday. This is an ongoing project and will continually be updated with more model types and families.

Youtube Video Popularity Prediction

Predict the popularity of a youtube cooking video based on the thumbnail, subtitles, bit rate, duration, and other types of metadata associated with the video. We used A CNN to transform the thumbnails into a usable predictive feature for a stacking regressor model. This project was done in collaboration with teammates Corbin Callahan and Jeffrey Olson.
Full Report | Github Repository

Twitter Bot Detection

Classify whether a twitter user is a bot account. We accomplished this with the combination of a neural net and an ensemble stacking classifier. This project was done in collaboration with teammates Rania El Shenety and Jeffrey Olson
Full Report | Github Repository

Topic Analysis of Yelp Reviews

Examination of the yelp dataset through topic analysis and word2vec vector embeddings. Using python, spaCy and gensim I created a preprocessing pipeline, trained a Latent Dirichlet Allocation (LDA) model, and a Word2Vec model. Every step used generators to stream the documents from disk to minimize memory utilization and ensure optimal performance.
Github Repository | Dataset

Kmeans with Automatic Optimal Clustering

Custom KMeans estimator in scikit-learn that finds the optimal number of clusters based on the Calinski and Harabasz score and the Davies-Bouldin score.
Github Repository

Testing Adaboost Algorithm with Different Base Estimators

Explores the Adaboost algorithm using different base estimators

Data Analysis

Vermont Lake Health Analysis

Examination of lake health in Vermont in relation to human activity, as measured through chemical tests over decades on more than 400 lakes. This project was done in collaboration with teammates Anze Zorn and Jeffrey Olson
Full Report | Github Repository

Database Design

Relational Database of the Yelp Dataset

Creation of an SQLite database containing the yelp dataset using python and SQLalchemy. I used this database to do a quick inspection and network analysis of the user relationships contained within.
GitHub Repository | Dataset

Tutorials

Seaborn Jointplot Tutorial

A tutorial on how to plot and customize the jointplot function and JointGrid class in the seaborn plotting library

Other Projects

Home Lab Build From Recycled Hardware

A step by step process for a home lab build from recycled hardware. Here I took five computers past their end of life and created a development desktop, a nas like storage and NFS share, and a proxmox virtual environment cluster.