A machine learning pipeline that predicts restaurant success by analyzing Yelp business data, customer reviews, and operational features.

Quick Facts

  • Context: Term project for CS559 - Machine Learning: Fundamentals and Applications (Fall 2025)
  • Tech Stack: Python, Logistic Regression, Random Forest, SVM, Neural Networks
  • Links: GitHub Repo

Overview and Problem

This project develops a machine learning pipeline to predict restaurant success using the Yelp Academic Dataset. By leveraging real-world business data, the goal is to identify the key factors that separate thriving establishments from struggling ones.

What I Built

  • Engineered an end-to-end pipeline to process millions of reviews, tips, and business records from Yelp’s JSON data.
  • Extracted features from nested attributes such as parking availability, ambience, operating hours, and customer engagement metrics.
  • Defined restaurant success programmatically as achieving both high ratings (4+ stars) and above-median review counts.
  • Automated a modular architecture for metadata analysis, preprocessing, cleaning, visualization, and model training.

Key Results and Impact

  • Trained and compared seven different machine learning models using grid search cross-validation to optimize performance.
  • Created a practical, configurable pipeline script that allows each stage to be executed independently.

Related: Projects MOC