Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Arabic Sentiment Analysis in Social Media

Graduation Project 2
by

Narmeen Shaban

on 15 June 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Arabic Sentiment Analysis in Social Media

Various forms a root word can take on context, tense, and surrounding words.
Sentiment Analysis
in Social Media

Technologies we've used
Strategy
Results
Supervised By:
Dr. Rihab Duwairi
Our Project's Strength Points
a Project by
AraSyntilyzer Team
What is Sentiment Analysis ???
Challenges of using arabic language


Sentiment analysis in Arabic language.

Handling Jordanian dialect, by creating the first Jordanian dialect to MSA parallel dictionary.

Handling Arabizi words (words written using English alphabet along with numbers and symbols) by converting them to Arabic words.

We also handle the emoticons(Smilies) people use and use it as a way to make our accuracy richer.
High Accuracy.
The Process
Raed Marji
Narmeen Sha'ban
Sally Rushaidat
Awesomely
Present To You
Sentiment = feelings or Attitudes or Emotions or Opinions

Using NLP, or machine learning
methods to extract and identify sentiment content of a
text unit.
Nothing can measure
the feedback of customers
upon our Service !
Challenge
is
Accepted !
Let us do the magic !
How are you
going to do it ?
In Arabic
Going to use Twitter because Twitter is
an ocean of Sentiments.
We have chosen Twitter public timeline to
be our source of arabic data.
Many people tends to use the dialect of his country instead of using MSA.

e.g. = both have the same meaning .
Repeating the same letter more than once to intense on the meaning.

e.g.
Arabic has various diacritics ,based on the presence or absence of such diacritics,the meaning of words can be totally different.
Negation words that is used to negate past or present tense verbs, which change the meaning of the verb to exactly the opposite.
e.g.
Collect the training set.

Label the training set.

Analyze the dataset to gather information about what is the best methodology to normalize the data.

Normalize the data in the preprocessing phase.

Create the feature vector for each entry in the dataset.

Input the feature vectors into the classifier to build the model.

Use the model to verify our results using cross validation on the same training set.
DEMO
Android App

Two Android Applications were developed throughout this project.
The Rating Application
The Twitter Meter Application
Collecting the Dataset
A PHP script was written to interact with Twitter’s Search API to fetch the tweets based on certain search queries.
These tweets needed to be annotated so
Crowdsourcing
was used through an API, written in both PHP and SQL scripts.
The rating Process
Tweets are:
1. Saved in the main dataset.
2. Filtered upon certain criteria and then sent to the “ToBeRated” dataset where they will be output to the rating API, in order to be annotated.

3. After the tweet is rated it moves back to the “ToBeRated” dataset and waits its turn to enter the final stage.

Entering the final stage is controlled by two regulations:
1. A tweet is considered rated correctly if a rate by one of the supervisors is a match with a rate by a non supervisor.
2. If the previous two rates are dissimilar a third rater comes to the rescue, and the majority voting decides the final outcome.

If these conditions are met the tweet is moved to the final stage and is then used in the following phases.
Pre-Processing
Used Pre-processing Operations:

Text:Tokenize
Text:Replace Tokens
Filter Stopwords (Arabic)
Text:Filter Stopwords (Dictionary)
Stem (Arabic, Light)
Stem (Arabic)
Wordnet:Find Synonyms (WordNet)
The developed RM Extensions:

Emoticons Convertor.
Repetitions Remover.
Negation Detection.
Dialect to MSA convertor.
Arabizi Convertor.
Links Remover
@mentions Remover
Emoticons Convertor
Repetitions Remover
Negation Detection
Dialect to MSA convertor.
Arabizi Convertor
Links Remover
@mentions Remover
FUTURE WRK
Dictionaries

Jordanian Dialect to MSA parallel dictionary.
Negation Dictionary
Arabizi Dictionary
Steps done in order to create these dictionaries:

1. Choosing 100 long chat histories between Jordanian users we collected from friends.
2. Manually extracting each word that is related to the
Jordanian dialect
.
3. Putting Synonym for each Jordanian dialect word, taking in consideration that many words in our
dialect
can have the same meaning.
These words were gathered in the same manner as the dialectical dictionary and it includes words used formally in arabic (e.g: , ).
Contains most used words in
Arabizi
collected from around the internet and from chat logs.

A PHP code was written to interact with an open API that converts the
Arabizi
words into Arabic.
The results were stored into a parallel dictionary, and each new word that goes through the PHP script is saved into our dictionary if it's not already there
Classification

In order to classify tweets into their appropriate classes (Positive, Negative and Neutral) based on the previously annotated training data, Rapidminer's built-in classifiers were used.

These classifiers include:

Naive Bayes (NB)
Support Vector Machine (SVM)
K- Nearest Neighbor (KNN)


AND A SPECIAL THANKS GOES TO OUR AMAZING FAMILIES

THANK U ALL
QUESTIONS ?
Implement more effective algorithms in our extension to RapidMiner
Scrape Google Play reviews to compare to tweets.
Implement POS TAGGING to remove named entities
Work on the API to implement it on the best model
handling dialect using the levenshtein distance (edit Distance)
Word net synonyms
android app that collects dialect words by users
build a website for this tool
release the results a web service for Developers to build apps on it.
Full transcript