A machine learning pipeline that predicts restaurant success by analyzing Yelp business data, customer reviews, and operational features.
Quick Facts
- Context: Term project for CS559 - Machine Learning: Fundamentals and Applications (Fall 2025)
- Tech Stack: Python, Logistic Regression, Random Forest, SVM, Neural Networks
- Links: GitHub Repo
Overview and Problem
This project develops a machine learning pipeline to predict restaurant success using the Yelp Academic Dataset. By leveraging real-world business data, the goal is to identify the key factors that separate thriving establishments from struggling ones.
What I Built
- Engineered an end-to-end pipeline to process millions of reviews, tips, and business records from Yelp’s JSON data.
- Extracted features from nested attributes such as parking availability, ambience, operating hours, and customer engagement metrics.
- Defined restaurant success programmatically as achieving both high ratings (4+ stars) and above-median review counts.
- Automated a modular architecture for metadata analysis, preprocessing, cleaning, visualization, and model training.
Key Results and Impact
- Trained and compared seven different machine learning models using grid search cross-validation to optimize performance.
- Created a practical, configurable pipeline script that allows each stage to be executed independently.
Related: Projects MOC