As a lecturer in Applied Data Science at the University of Michigan School of Information, I'm passionate about solving problems through insights extracted from complex data sets. With a Master's degree in Applied Data Science (4.0 GPA) from the University of Michigan and a Bachelor's degree in Mathematics from the University of Colorado at Boulder, I've developed a strong foundation in statistical analysis, machine learning, and data visualization.
My academic background has equipped me with a unique ability to approach problems from multiple angles, think creatively, and communicate complex ideas effectively. While my experience is rooted in academia, I'm eager to apply my skills in a real-world setting and drive business outcomes through data-driven decision making.
I thrive on solving intricate problems and uncovering hidden patterns in data. My goal is to leverage my expertise to inform strategic decisions, optimize processes, and create value for organizations. If you're looking for a driven and analytical problem-solver who is passionate about data science, let's connect!
View My LinkedIn Profile
View My GitHub Profile
Hosted on GitHub Pages — Theme by orderedlist
This is a collection of Notebooks designed for educational purposes, implementing machine learning algorithms in pure python. These are not meant for production but to provide a deeper understanding of the algorithms that are behind the the models we use everyday. This is an ongoing project and will continually be updated with more model types and families.
Predict the popularity of a youtube cooking video based on the thumbnail, subtitles, bit rate, duration, and other types of metadata associated with the video. We used A CNN to transform the thumbnails into a usable predictive feature for a stacking regressor model. This project was done in collaboration with teammates Corbin Callahan and Jeffrey Olson.
Full Report | Github Repository
Classify whether a twitter user is a bot account. We accomplished this with the combination of a neural net and an ensemble stacking classifier. This project was done in collaboration with teammates Rania El Shenety and Jeffrey Olson
Full Report | Github Repository
Examination of the yelp dataset through topic analysis and word2vec vector embeddings. Using python, spaCy and gensim I created a preprocessing pipeline, trained a Latent Dirichlet Allocation (LDA) model, and a Word2Vec model. Every step used generators to stream the documents from disk to minimize memory utilization and ensure optimal performance.
Github Repository | Dataset
Custom KMeans estimator in scikit-learn that finds the optimal number of clusters based on the Calinski and Harabasz score and the Davies-Bouldin score.
Github Repository
Explores the Adaboost algorithm using different base estimators
Examination of lake health in Vermont in relation to human activity, as measured through chemical tests over decades on more than 400 lakes. This project was done in collaboration with teammates Anze Zorn and Jeffrey Olson
Full Report | Github Repository
Creation of an SQLite database containing the yelp dataset using python and SQLalchemy. I used this database to do a quick inspection and network analysis of the user relationships contained within.
GitHub Repository | Dataset
A tutorial on how to plot and customize the jointplot function and JointGrid class in the seaborn plotting library
A step by step process for a home lab build from recycled hardware. Here I took five computers past their end of life and created a development desktop, a nas like storage and NFS share, and a proxmox virtual environment cluster.