Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of Sentiment Analysis
The queries was based on using countries Hashtags (ex: #Jordan , #Syria ) this helped us limit the scope of the tweets to a certain geographical region.
until now we have more than 105k+ tweet and 5000+ annotated tweet for the training set. Text Pre-processing: Natural language processing had to be employed to reduce dimensionality and size of text data.
So Rapid Miner was the answer! Sentiment Analysis
in Social Media We Know How You Feel Raed Marji
Sally Rushaidat Awesomely
Present To You Sentiment Analysis :
is a set of methods, implemented in computer software, that detect, measure, report, and exploit attitudes, opinions, and emotions in online, social information sources. How Are We Going To Do it ?!
1.The biggest problem that one faces when dealing with Arabic text is the various forms a root word can take on given context, tense, and surrounding words
Ex: Sentiment Analysis Fields :
Finance. Supervised By:
Dr. Rihab Dwairi Because Twitter is an ocean of Sentiments. We have chosen Twitter public timeline to be our source of arabic data. The Training Set
The training set consists of a dataset of tweets that is manually annotated by us into three main classes (Positive , Neutral , Negative)
Many modules had to be developed (using PHP):
Arabizi to Arabic
Emoticons Convertor: (To be developed)
Abbreviations Expander: (To be developed)
Repetitions Remover : (In Development)
Negation Detection : (In Development) Classification After fetching and processing the tweets we need to classify them into their appropriate classes (Positive , Negative , Neutral ) This was done by Rapid Miner we used RapidMiner’s built in implementations of two common classifiers:
Naive Bayes (NB)
Support Vector Machine (SVM) Results: Future Work: Data Gathering. Text Processing Classification. After executing some test runs on the test dataset, we came out with these results: Processing more dialects in tweets to monitor the effect on accuracy.
Android app for submitting Dialect words alternatives.
A basic Dialect to MSA dictionary.
Building a bigger training set by implementing a simple game to encourage rating of tweets.
convert some of the PHP modules to Java modules for rapid miner. The Biggest Challanges :
3.Arabizi . In sentiment analysis every character matters , the chosen variation of a word has significance.
The accuracy is still not as desired. This is due to not having enough annotated tweets to use in the training set.
Adding other features like time of day, length and gender might have a noticeable increase in the results
Stemming words in our case, had no significance on the quality of the results. Used Rapid Miner Modules:
Filter Stopwords (Arabic)
Stem (Arabic, Light)Stem (Arabic)
Find Synonyms (WordNet)
Filter Stopwords (Dictionary) THANK YOU