Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Movie Rating Prediction

No description

Juan Urresta

on 21 October 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Movie Rating Prediction

Movie Rating Prediction
Project Overview
Our project is composed of two parts:
Comparing the performance of classification algorithms.
Create an application that uses the best algorithm.
Data mining and Machine learing tool.
Java API.
Algorithms Evaluation.
Application - Web interface.
Data Description
Movielens data set.
100 000 records.
943 users with at least 20 ratings each.
1682 movies.
Classification Algorithms.
Experimental Setup and Results
Data Preprocessing
Data stored in different files.
Use a shell script to join the data.
Formatted as numerical and nominal values (depending on the algorithm needs).
Analysis of the best attributes.
K Nearest Neighbor (Numerical).
K Nearest Neighbor (Nominal).
Decision Tree.
Naive Bayes
K Nearest Neighbor (Numerical)
80/20% split for training/test set.
No measure of accuracy because the predicted classes are in the real range [1,5]
80/20% split for training/test set.
Decision Tree
80/20% split for training/test set.
Confidence Factor: Confidence factor used for pruning (smaller value = more pruning)
Pruned: Whether pruning is performed.
K Nearest Neighbor (Nominal)
Naive Bayes
We used a 80/20% split for training/test set.
Development of an application in a web interface.
Each team member explore the best configuration for the algorithm.
Choose the best algorithm based in accuracy and mean absolute error.
Integration of the application with the algorithm.
Zip code utility development.
Data set, with more attributes is required in order to get better results.
The poor classification is not a matter of the classifier or the parameter but of the data.
Limitations and Future Work
Lack of attributes that could have more impact in the rating of a movie.
Complete the data set with more attributes.
Diego Moncayo
Quentin Rosee
Vikash Sabnani
Juan Urresta
Full transcript