Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Using Word Vector Cluster for Sentiment Analysis

Yuanlin Xu

Updated March 12, 2015

Transcript

Conclusion

Combines word representation with a clustering algorithm
Computes a feature vector for classification
Implements a domain independent algorithm for Sentiment Analysis
Tests our proposed method with Naive Bayes Classifier and Lexicon-based method

Thank you

Two Classes Prediction Results

Two Classes Prediction Comparisions

Our Proposed Method

Lexicon-Based Method

Five Classes Prediction Results

Problem Definitation

Our Proposed Method

Given a review r, using a sentiment analysis method to classify its sentiment polarity, either positive or negative. We use result 1 stands for positive and -1 for negative

Evaluation

Practical Implementations

Given a review r, using a sentiment analysis method to compute its sentiment polarity extent in five levels, a scale of 1 to 5.

Five Classes Prediction Comparisions

Sentiment Analysis Interest over Time (From Google Trends):

Naive Bayes Approaches

Common Approaches

Bayesian Theory

Naive Bayes Classifier

words likelyhood in a sentiment class

Lexicon-Based Methods

score sentiment keywords

Machine Learning Algorithms

Support Vector Machines, Maximum Entropy, Neural NetWork

concept-level techniques

ontologies, entity recogition, semantic vector space

Lexicon-Based Approaches

599B Thesis Presentation

Objective is to capture the rich sentiment cluster and to ignore noise information.

Extract Feature Cluster

Use a lexical resource to help identifying the most representative sentiment cluster

Learning features from training set and using them to prediction the sentiment of a test review

Clustering Reviews

similar words would have close distance

Sentiment Analysis

Build Feature Vector

Classification

Word Vector Representation

-- from word2vec package

Using A Vector Clustering Approach

number of features for each word : 300
minimum word count : 5
context window : 15
down sampling : 0.001

An Overview of Our Proposed Method

review text

list of words

Skip-gram model

word vector representation

K-means alogrithm

word vector clusters

lexicon score

feature cluster

Review Preprocessing

average vectors

To counter the imbalance between the rare and frequent words.

each word wi in the training set is discarded with probability computed by the above formula where f(wi) is the frequency of word wi and t is a chosen threshold

feature vector

Support Vector Machines

classification

prediction result

Stopwords Defined in NLTK Corpus

Positive: 5,4 Negative: 1,2
Remove non-letters
words to lower case
Remove stop words
Generate a Bag of Words model

>>> model['good']

array([-0.08681812, 0.00687875, 0.04907861, 0.02860245, 0.04553717,

0.03739114, -0.1756956 , 0.03151671, 0.01958673, -0.07936199,

0.13719471, 0.0059606 , -0.0209059 , 0.05191493, -0.061775 ,

0.06466751, -0.08121822, 0.07896858, 0.11988235, -0.13294919,

......

0.07510926, -0.02882694, -0.10323888, -0.03204431, -0.07488418,

-0.04350474, -0.00963599, -0.0847501 , -0.08905368, 0.04573591,

0.08111843, -0.08563627, -0.01498571, -0.071936 , 0.0613132 ], dtype=float32)

>>> model['good'].shape

(300,)

“Author Joshua Bloch vividly discusses the concept of concurrency and stirs us through issues around concurrency and synchronization in java. Being implemented on Collections framework he gives us wonderful insights of various collections in context of concurrency. If you are using java for few years now and have brushed and bruised by java threads/synchronization and concurrency issues and are now seething to get that one weapon that will defeat it once and for all, this is what you are looking for.”

Student:Yuanlin Xu

Faculty Adviser: Chengyu Sun

Dataset

Support Vector Machines

SVM model is a representation of the samples as points in space, mapped so that the samples of the separate classes are divided by a clear gap that is as wide as possible.

New samples are then mapped into that same space and predicted to belong to a class based on which side of the gap they fall on

Feature Vector

Average word vectors in the feature cluster

Data Extraction

Classification via SVM

a Skip-gram model

word2vec

SentiWordnet Lexicon

Objective is to find word representations that are useful for predicting the surrounding words

SentiWordNet is a lexical resource for Sentiment Analysis. It assigns to each synset three sentiment scores: positivity, negativity and objectivity.

synset: a set of synonyms

To maximize the average log probability

where c is the size of the training context

Defines p(wt+j|wt) using the softmax function

where vw and v'w are the "input" and "output" vector representations of w, and W is the number of words in the vocabulary.

Sentiment Score

Assign a score to each word synset by using the weighted average for all words in a synset

K-means Clustering Algorithm

Detect Negations

Clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as with-cluster sum-of-squares.

Divides a set of N samples X into K disjoint clusters C, each described by the mean uj of the samples in the cluster

["it's", 'worth', 'reading', 'book', 'reminds', 'stories', 'father', 'used', 'tell', 'growing', 'hood']

['reminds', 'father', 'growing', 'hood']

["it's", 'worth', 'reading', 'book', 'stories', 'used', 'tell']

Introduction

Determine the attitude of a speaker or a writer

Determine the overall contextual polarity of a document

negative

positive

neutral

['recently', 'upgraded', 'car', 'camry', 'cassette', 'deck', 'bad', 'transmitter', 'barely', 'listening', 'podcasts', 'good', 'music', 'trebles', 'cause', 'static', 'level', 'fade', 'making', 'annoying', 'sound', 'hard', 'make', 'work', 'recommended']

['car', 'cassette', 'deck', 'transmitter', 'podcasts', 'cause', 'hard']

['bad', 'barely', 'listening', 'good', 'music', 'static', 'level', 'fade', 'making', 'annoying', 'sound', 'make']

['recently', 'upgraded', 'work', 'recommended']

Choose a template

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Music Festival (AI Assisted)

Elevate your presentation with our dynamic and visually stunning Music Festival Prezi AI-assisted presentation template, designed to captivate audiences and showcase the rhythm of your event in every slide.

Whiteboard (AI Assisted)

Unleash creativity and collaboration with our Whiteboard Prezi AI-assisted presentation template, seamlessly combining the simplicity of a traditional whiteboard with the power of digital innovation for dynamic and interactive visual storytelling.

See more templates →

Presentations from around the world

Pablo

Luis Alejandro Valdes Caycedo

EQUIPO #4

miguel aleman

Creative Report

Gabriele Roncoroni

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better