Loading…
Transcript

The Problem

A single score is meant to capture a diverse array of experiences.

The App

Challenges to be overcome

What themes contributed to that score?

particular foods, decor or service?

Combines map functionality with dynamic topic modeling, visualizing topics of reviews for nearby or queried locations

  • Scrape and grow DB with data for locations not currently stored upon each user search
  • Improving the model to eliminate noise in data (i.e., stemming/lemmatizing, associations, strip punctuation, etc.)

Python server for NLP

DB with Sequelize

(>2mil reviews and 62k locations)

  • Search by topic, other UI beautifiers

+

  • Data analysis: trends in data within a city, state or country; correlation between score and topic; sub-scores for location by topic

JS server for routing, connecting with client and Flask API for topic model

Client

Yopics

Links

=

App in action:

yopics.herokuapp.com

The Data

Repo:

github.com/emoren619/yelpData

+

1.1mil reviews (scraped in one day)

+

1.6mil reviews (from Yelp academic data set)

trained LDA model and stored in DB

from 21 cities and 62k locations

My Approach:

Topic Modeling

Non-tech speak:

Computer reads text to predict significant topics or themes

Tech speak:

Unsupervised machine learning with LDA (Latent Dirichlet allocation)

and tf-idf preprocessing

Results:

Yopics - Yelp Review Topic Modeling

- basics of the location (e.g., 'burgers', 'bar', 'seafood')

- particular foods (e.g., 'famous for the wings')

- details of the location (e.g., 'bad service', 'cheap prices', 'beautiful decor', 'too crowded', 'great for lunch')

By: Eduardo Moreno