Loading…

Yopics - Yelp Review Topic Modeling

Eduardo Moreno

Updated Oct. 7, 2015

Transcript

The Problem

A single score is meant to capture a diverse array of experiences.

The App

Challenges to be overcome

What themes contributed to that score?

particular foods, decor or service?

Combines map functionality with dynamic topic modeling, visualizing topics of reviews for nearby or queried locations

Scrape and grow DB with data for locations not currently stored upon each user search

Improving the model to eliminate noise in data (i.e., stemming/lemmatizing, associations, strip punctuation, etc.)

Python server for NLP

DB with Sequelize

(>2mil reviews and 62k locations)

Search by topic, other UI beautifiers

Data analysis: trends in data within a city, state or country; correlation between score and topic; sub-scores for location by topic

JS server for routing, connecting with client and Flask API for topic model

Client

Yopics

=

App in action:

yopics.herokuapp.com

The Data

Repo:

github.com/emoren619/yelpData

+

1.1mil reviews (scraped in one day)

1.6mil reviews (from Yelp academic data set)

trained LDA model and stored in DB

from 21 cities and 62k locations

My Approach:

Topic Modeling

Non-tech speak:

Computer reads text to predict significant topics or themes

Tech speak:

Unsupervised machine learning with LDA (Latent Dirichlet allocation)

and tf-idf preprocessing

Results: