Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Copy of

No description
by

Jiaqian Chong

on 22 November 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Copy of

Introduction
Overview
Project Background

Thank You
Tan Wei Hian (113251H)
Front End - Project Scope & Schedule
MoodSense Project 2
Moodsense Project 2
Tools Required
MoodSense Project 2
Implementation: Creating SVM model using Rapidminer

Calculates the confidence of each class on a tweet.
Implementation: Class Confidence
Why use SVM?
MoodSense Project 2
Background of SVM(Support Vector Machine)
Classifies the tweet based on the highest confidence class.
Database
Web Application
Web service
Social Media
Supervised learning model
Project Objectives
Utilize Rapidminer environment.
Apply the model on the data(tweets)
X-validation
Standard way to estimate the accuracy of a model on independent data.
High Accuracy:
Integration with Java Web Service
Initialize Rapidminer process.
Tried and Tested:
Applied in Bioinformatics to classify proteins structure with up to 90% accuracy.

Used extensively in text/document categorization(spam filtering etc.).
Multiclass: Creating SVM model with Rapidminer
Identify the mood classes (joy, sadness, disgusted,surprised,anger).
Build binary classifiers using one-vs-rest.
Use Rapidminer process documents operator.
Splits the texts of the document into a series of words.
Reduce words to their base or stem.
Rapidminer - text processing operators
Reflects how important a word is to a document

The more frequent = less important v.v.
TF-IDF
Train SVM model with the training data
Training Data
Word Vectors
Implement a new machine learning algorithm(Support Vector Machine).
SVM model assigns examples into one of the two classes.
Binary Classifier
Distinguish that class from all other classes.
Labeled training data
Learn-by-examples.
Requires labeled data. Eg. Images of cars.
Multiclass problem
Classify data into more than two classes.
Solution - build multiple one vs rest binary classifiers.
SVM model: Representation of the classes as points in space.
Unknown data are mapped into the same space.
Belong to one of the class based on which side they fall on.
SVM is binary by nature.
Red - Non-car
Enhance
mood
dictionary
Implement
fuzzy
matching
algorithm
Improve
current
UI
Organize data
dictionary
1. Check if there is any word that is found in text file but not in Excel
2. Check if there is any word that is found in Excel file but not in text file
Validation
Enhance mood dictionary
FuzzyMatch
Correct
spelling
errors
Check
for
negations
Store data
in array

Delete the
last char
Swap the
adjacent
letters
Replaced last
char with
possible
alphabets
Add
alphabets
behind the
word
edits()
check dictionary
&
populate the
possible
matches.
Validation
&
Compile
Remove words
that starts with
"@" & remove
punctuations
possible
matches -
return most
popular
(most weight)
word
Check any
words matches
any words
in data
dictionary
Matches
check for
"not" & "no"
infront of
words
Not Matches
returns
"undefined"
returns
the mood
Get process(.rmp) and set operator parameters.
correct()
Enhance UI

Gather Twitter Data.



Classifies data into 1 of the 5 moods.




Assist Starhub in making business decisions .
As you can see here, on the left side are data dictionary that the moodsense proj 1 have been using, whereas the one on the right are a list of words that our seniors have added. So there's a total of 5 list of different moods. The first few weeks I have been working on increasing the overall data dictionary to increase the accuracy of the mood classification. bigger datasets = more accuracy.
Future Implementations
MOODSENSE
PROJECT 2

Q&A?
.rmp file
Rapidminer
Process
Future Implementation
Green - Car
SVM
Naive Bayes
1189 entries
240 entries
More accurate classification
Results!
Moodsense Project 2
Total number of tweets tested: 500
Correct classifications from Rapidminer = 388 (78%)
Correct classifications from previous batch's method = 199 (40%)
Optimize the SVM to improve mood classification.
Backend - Support Vector Machine
Full transcript