Prezi

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in the manual

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Netflix Project

No description
by Lance ... on 4 July 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Netflix Project

Collaborative Filtering and Data Mining: Netflix
Goals:
Our Data
Data contains; date of rating, rating, user ID, movie ID.
Ratings from more than 500,000 users
More than 17,000 titles available to rate
Data consists of more than 100 million rows
Mining Netflix
Profitable and unprofitable customers:
Metrics: Revenue per view, average days between activity, average user rating.
Active customers are less profitable than passive customers....
E.Lance
M.Tu
L.Sills

Create a recommendation engine using a subset of Netflix’s database


Try to find a correlation between customer satisfaction and profitability
Recommendation Engine:
k-nearest/Pearson correlation:
Obstacles
Cosine Similarity:
Large data set, R can not hold a 9 GB file in memory.
Original data spread over 17k+ text files. Data migration ~ hassle...
Can not re-cluster users every time a new user is added. Reference frame needed.
Solutions
R and Large Data:
Locally hosted MySQL server.
Use package RMySQL, to communicate with the data base. Surprisingly easy to use.
Training and Results
Fine tuning the algorithm by randomly sampling from users and titles
Experiment with parameter to obtain the best result, best results obtained with k=5 and titles in common = 40. MAE=16%
Error measured by:
Happy or Unhappy Customers?
Unprofitable
Profitable
First clue...
Unprofitable customers seem to be slightly happier.
Are Happy Customers Less Profitable?
Segment customers by average rating...
Customers: Happy vs Unhappy
Data Migration:
File format: first line Movie ID-> UserID,Rating,Date
Developed a script in R to sequentially read from each file and prepare an SQL statement to write a data frame into DB.
Reference Frame:
Create an imaginary user with ratings = most probable rating for each movie. New and existing users can now be compared to this user and clustered accordingly
Solutions
Solutions
Engine Test
Engine Test
To check the recommendation result compare with the real rating. First we select the ClusterSize range from 10 to 150 and movies numbers range from 10 to 90.
Segment Customers by Profitability Profile
Select customers who have at least a 1 year subscription
Profitable customers: Rpv > $4.79
Unprofitable customers: Rpv< $1
Recommendation function ( Customer ID, Movie ID, Cluster size, number of movies)
The result shows when we select large ranges, not only the error rate start to increase, but also there are not enough customer we can compare with.
First Customer
Second Customer
Actually rating : 5
Actually rating : 2
Customers: Happy vs Unhappy
Thank you
Thank
You!
See the full transcript