Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Arabic Sentiment Analysis in Social Media
Transcript of Arabic Sentiment Analysis in Social Media
in Social Media
Technologies we've used
Dr. Rihab Duwairi
Our Project's Strength Points
a Project by
What is Sentiment Analysis ???
Challenges of using arabic language
Sentiment analysis in Arabic language.
Handling Jordanian dialect, by creating the first Jordanian dialect to MSA parallel dictionary.
Handling Arabizi words (words written using English alphabet along with numbers and symbols) by converting them to Arabic words.
We also handle the emoticons(Smilies) people use and use it as a way to make our accuracy richer.
Present To You
Sentiment = feelings or Attitudes or Emotions or Opinions
Using NLP, or machine learning
methods to extract and identify sentiment content of a
Nothing can measure
the feedback of customers
upon our Service !
Let us do the magic !
How are you
going to do it ?
Going to use Twitter because Twitter is
an ocean of Sentiments.
We have chosen Twitter public timeline to
be our source of arabic data.
Many people tends to use the dialect of his country instead of using MSA.
e.g. = both have the same meaning .
Repeating the same letter more than once to intense on the meaning.
Arabic has various diacritics ,based on the presence or absence of such diacritics,the meaning of words can be totally different.
Negation words that is used to negate past or present tense verbs, which change the meaning of the verb to exactly the opposite.
Collect the training set.
Label the training set.
Analyze the dataset to gather information about what is the best methodology to normalize the data.
Normalize the data in the preprocessing phase.
Create the feature vector for each entry in the dataset.
Input the feature vectors into the classifier to build the model.
Use the model to verify our results using cross validation on the same training set.
Two Android Applications were developed throughout this project.
The Rating Application
The Twitter Meter Application
Collecting the Dataset
A PHP script was written to interact with Twitter’s Search API to fetch the tweets based on certain search queries.
These tweets needed to be annotated so
was used through an API, written in both PHP and SQL scripts.
The rating Process
1. Saved in the main dataset.
2. Filtered upon certain criteria and then sent to the “ToBeRated” dataset where they will be output to the rating API, in order to be annotated.
3. After the tweet is rated it moves back to the “ToBeRated” dataset and waits its turn to enter the final stage.
Entering the final stage is controlled by two regulations:
1. A tweet is considered rated correctly if a rate by one of the supervisors is a match with a rate by a non supervisor.
2. If the previous two rates are dissimilar a third rater comes to the rescue, and the majority voting decides the final outcome.
If these conditions are met the tweet is moved to the final stage and is then used in the following phases.
Used Pre-processing Operations:
Filter Stopwords (Arabic)
Text:Filter Stopwords (Dictionary)
Stem (Arabic, Light)
Wordnet:Find Synonyms (WordNet)
The developed RM Extensions:
Dialect to MSA convertor.
Dialect to MSA convertor.
Jordanian Dialect to MSA parallel dictionary.
Steps done in order to create these dictionaries:
1. Choosing 100 long chat histories between Jordanian users we collected from friends.
2. Manually extracting each word that is related to the
3. Putting Synonym for each Jordanian dialect word, taking in consideration that many words in our
can have the same meaning.
These words were gathered in the same manner as the dialectical dictionary and it includes words used formally in arabic (e.g: , ).
Contains most used words in
collected from around the internet and from chat logs.
A PHP code was written to interact with an open API that converts the
words into Arabic.
The results were stored into a parallel dictionary, and each new word that goes through the PHP script is saved into our dictionary if it's not already there
In order to classify tweets into their appropriate classes (Positive, Negative and Neutral) based on the previously annotated training data, Rapidminer's built-in classifiers were used.
These classifiers include:
Naive Bayes (NB)
Support Vector Machine (SVM)
K- Nearest Neighbor (KNN)
AND A SPECIAL THANKS GOES TO OUR AMAZING FAMILIES
THANK U ALL
Implement more effective algorithms in our extension to RapidMiner
Scrape Google Play reviews to compare to tweets.
Implement POS TAGGING to remove named entities
Work on the API to implement it on the best model
handling dialect using the levenshtein distance (edit Distance)
Word net synonyms
android app that collects dialect words by users
build a website for this tool
release the results a web service for Developers to build apps on it.