Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Our Proposed Method
Lexicon-Based Method
Our Proposed Method
Given a review r, using a sentiment analysis method to classify its sentiment polarity, either positive or negative. We use result 1 stands for positive and -1 for negative
Given a review r, using a sentiment analysis method to compute its sentiment polarity extent in five levels, a scale of 1 to 5.
Bayesian Theory
Naive Bayes Classifier
words likelyhood in a sentiment class
Lexicon-Based Methods
score sentiment keywords
Machine Learning Algorithms
Support Vector Machines, Maximum Entropy, Neural NetWork
concept-level techniques
ontologies, entity recogition, semantic vector space
599B Thesis Presentation
Objective is to capture the rich sentiment cluster and to ignore noise information.
Extract Feature Cluster
Use a lexical resource to help identifying the most representative sentiment cluster
Learning features from training set and using them to prediction the sentiment of a test review
Clustering Reviews
similar words would have close distance
Build Feature Vector
Classification
review text
list of words
Skip-gram model
word vector representation
K-means alogrithm
word vector clusters
lexicon score
feature cluster
average vectors
To counter the imbalance between the rare and frequent words.
each word wi in the training set is discarded with probability computed by the above formula where f(wi) is the frequency of word wi and t is a chosen threshold
feature vector
Support Vector Machines
classification
prediction result
>>> model['good']
array([-0.08681812, 0.00687875, 0.04907861, 0.02860245, 0.04553717,
0.03739114, -0.1756956 , 0.03151671, 0.01958673, -0.07936199,
0.13719471, 0.0059606 , -0.0209059 , 0.05191493, -0.061775 ,
0.06466751, -0.08121822, 0.07896858, 0.11988235, -0.13294919,
......
0.07510926, -0.02882694, -0.10323888, -0.03204431, -0.07488418,
-0.04350474, -0.00963599, -0.0847501 , -0.08905368, 0.04573591,
0.08111843, -0.08563627, -0.01498571, -0.071936 , 0.0613132 ], dtype=float32)
>>> model['good'].shape
(300,)
Student:Yuanlin Xu
Faculty Adviser: Chengyu Sun
SVM model is a representation of the samples as points in space, mapped so that the samples of the separate classes are divided by a clear gap that is as wide as possible.
New samples are then mapped into that same space and predicted to belong to a class based on which side of the gap they fall on
Classification via SVM
a Skip-gram model
Objective is to find word representations that are useful for predicting the surrounding words
SentiWordNet is a lexical resource for Sentiment Analysis. It assigns to each synset three sentiment scores: positivity, negativity and objectivity.
synset: a set of synonyms
To maximize the average log probability
where c is the size of the training context
Defines p(wt+j|wt) using the softmax function
where vw and v'w are the "input" and "output" vector representations of w, and W is the number of words in the vocabulary.
Assign a score to each word synset by using the weighted average for all words in a synset
Detect Negations
Clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as with-cluster sum-of-squares.
Divides a set of N samples X into K disjoint clusters C, each described by the mean uj of the samples in the cluster
["it's", 'worth', 'reading', 'book', 'reminds', 'stories', 'father', 'used', 'tell', 'growing', 'hood']
['reminds', 'father', 'growing', 'hood']
["it's", 'worth', 'reading', 'book', 'stories', 'used', 'tell']
Determine the attitude of a speaker or a writer
Determine the overall contextual polarity of a document
negative
positive
neutral
['recently', 'upgraded', 'car', 'camry', 'cassette', 'deck', 'bad', 'transmitter', 'barely', 'listening', 'podcasts', 'good', 'music', 'trebles', 'cause', 'static', 'level', 'fade', 'making', 'annoying', 'sound', 'hard', 'make', 'work', 'recommended']
['car', 'cassette', 'deck', 'transmitter', 'podcasts', 'cause', 'hard']
['bad', 'barely', 'listening', 'good', 'music', 'static', 'level', 'fade', 'making', 'annoying', 'sound', 'make']
['recently', 'upgraded', 'work', 'recommended']