Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

POM

Public Opinion Mining: As my graduation project me and my teammates created a public opinion mining system and web service that provides sentiment analysis and stigmatization to users about a certain topic.
by

amr mohsen

on 17 March 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of POM

Fields such as:
Decision Support System
Text Mining
Summarization
Opinion Mining
reference: britannica.com
Outline:
Introduction
Project Status
Survey Phase
System Analysis
System Design
System Implementation
System Testing
Conclusion & Future Development
Community (Social Network):
Friends and relatives.
Acquaintances.
Consumer reports.
Websites such as:
Blogs. (google blogs)
E-commerce sites. (amazon, ebay)
Review Sites. (CNET)
etc...
Web Data Mining
Also called knowledge discovery is the process of discovering interesting and useful patterns and relationships in large volumes of data from the public web.
Applications of web data mining such as:
Business
Social Network analysis
Web crawler, also known as spider or robot, is a program that automatically download Web pages.
Collecting Data
We will use a Focused crawler but it will be limited to specific domains of our choice.
What type will we use ?
Indexing Data
What is a web crawler ?
Similar Projects
Introduction
Created by three Computer Science students at Stanford University
as their graduation project.
Functional Requirements (use case):
Non-Functional Requirements:
Use Case:
Entity Relationship
Advantage
Provide APIs for classifying tweets.
Unlimited Crawler Depth
Resuming Crawling
Easy to integrate within an application
Configuration is not complicated
Public Opinion Mining
4106/4206 Graduation Project
Final Presentation
HICIT Department
of Computer Science
Supervisor Professor
Ahmed Al-Abassy
Amal Ibrahim
Teacher Assistant
The user enters the keyword of what he wants to get opinions about.
Searching in the trusted websites we have embedded in our system.
Searching in Social Network (Facebook, Google Plus, and Twitter).
Display a report and a time line chart of the final Result to the user in a web page.
User is able to save, print or send the results.
Availability and Portability
Robustness
Capacity/Scalability
Usability
Security
Delivery Requirement
Monitors positive and negative feelings in twitter conversations about stuff like movies, musicians, TV shows and popular brands
Advantage
Live Tweet analysis
Disadvantage
Skip some output Result
25 Trends
Advantage
Support result with chart and detailed information
Disadvantage
The Algorithm is designed to support Arabic which represents only 3% of the total web content.
The Service is paid.
Team members
Ahmed Mahmoud
Amr Mohsen
Ashraf Hesham
Mohammed Al-Adley
Mahmoud Habib
What makes our project different ?
Works on social network, blogs, news sites, product review sites, etc.
The ability to track opinions change.
A web service "RESTAPI" that enables developers to easily use our services.
System is implemented as a web service.
Algorithm automated optimization.
Internet
User
connect
web server
request
response
Cloud Service
Web Crawler
web page 1
web page 2
web page 3
web page 4
host
Save
Rretrieve
Apply
crawl
crawl
crawl
crawl
Final Result
Reports
Statistics-Time Line
provide
Types of web crawler:
Universal Crawler
Focused Crawler
Topical Crawler
After the webpages collected, they need to be indexed using document indexers (Search Engine) to allow searching for them using tags or keywords.
Pre-Processing Steps:
Extraction Phase:
Collecting Data
Indexing Data (Solr)
Extracting Reviews
Project Background
Objective
Project Background
What is Web Data Mining?
Pre Processing
Before the algorithm is applied, data need to be collected and cleaned to be processed by the algorithm. this is called Pre-Processing and it's the first step of data mining.
System Architecture
Opinion Mining (Sentiment Analysis)
It aims to analyze and track the mood of the public about a particular product. Opinion mining, which is also called sentiment analysis, involves building a system to collect and examine opinions about the product made in:
blog posts
Comments
Reviews or tweets
TweetFeel
Comparison between similar projects
Disadvantage
Provide only Overview of the result
(2010)
(2013)
(2011)
Developers
Public Opinion Mining
which car should I buy?
which school should i apply to?
which professor to work for?
whom should I vote for?
Opinion Mining Algorithm
The Algorithm divided into 3 phases :
Mining Opinions from the Web Beyond
Relevance Retrieval
Extraction Phase
It aims to extract opinion evidence from words, sentences and documents then identify their polarities.
User
Word Level
Sentence Level
Document Level
Summarization Phase
It aims to produce a cross-document opinion summary.
The developer can use the API we provided in his applications.
The developer has to register or login to use the web service.
Extraction.
Summarization.
Tracking.
Tracking Phase
The opinion tracking phase aims to tell how people change their opinions as time goes by.
System Components
Survey Phase
Similar Projects
Comparison to extract the Motivation
System Analysis
System Architecture
Functional Requirements
Use Case
Non- Functional requirements
System Design
System Components
Pre Processing
Data Store
Algorithm

Website:
HTML5
CSS
JavaScript
JQuery
Crawler 4j
Crawler4j is an open source Java crawler which provides a simple interface for crawling the Web.
Linux OS
App. Server Glassfish 3
Java EE
Netbeans IDE
APIs:
Apache Open NLP
Jsoup HTML Parser
Apache Open NLP
Twitter4J
SolrJ-lib
JAWS API
Eclipse link
Jersey
JFree Chart
The Project is divided into two phases:
Phase ONE (Define Basics)
Objective: analyze the project to understand it more and to implement the initial prototype.
Phase TWO (Update Basics)
Objective: Modify the system built in previous phase, enhance the accuracy and the performance.
Project Status
Phases
Objectives
Thank You :)
Questions?
Tables:
Objective:
Analyzing public reviews to determine its polarity regarding to a specific subject of matter.
Split Document..
into Sentences..
into Word's...
Database
Demos
Crawling Data
Web Data Mining Algorithm
Website
It's a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understand the market dynamics. etc...
Opinion mining can be useful in several ways.
Why Public Opinion Mining?
Everything is our decisions is based on information, the greater amount of information to be collected, the more complex the analyzing operation is to take the correct decisions. Critical decisions requires time, mental effort to collect information and analyze it.

Life = Risks

Risk = Make Decision

Make Decision = Collect & Analyze (data & information)
Sentiment 140
It provides the first fully-automated Arabic social media analytics services for brands, entities, companies and individuals in the Arab region
Account
Topic
Data
SocialMedia
APIKeys
NLPModels
ExtractedOpinion
Opinion
Favorite
Feedback
History
Dictionary
Entity Relationship cont.
Search Engine - Solr
Component
Pre-Processing
Component
Extracting Reviews
Extracting reviews from web pages is done using HTML parsers which reconstruct the DOM tree of a page and allow retrieving its contents.
Software: Apache Nutch2
Software: Solr
Data Store
Component
Data Store
This is the part that is responsible of the whole data manipulation process.
We have two categories of the database, one of them is created by use,
but the second one is included in the Search Engine.
This database is called "HBase".
What is HBase?
Apache Hbase is an open source distributed, versioned, column-oriented
store that provides Bigtable like capabilities on top of Hadoop and HDFS.
When we use HBase?
we used Hbase because it is the most stable DBMS to be used with Nutch 2
MySQL there exists a bug in Nutch 2 implementation which prevents us from using MySQL
Web Crawlers:
Crawler 4j
Apache Nutch 2
Search Engine:
Solr
Database Engine:
MySQL
HBase
T
O
O
L
S

From Where we get Opinions & Reviews?
Software: Jsoup Parser
Dtatabase Engine: MySQL
Database Engine: HBase
System Implementation
Software & Tools
Demos
Example
Apache Nutch 2
Scalability: it does not save data into the database !
Advantages:
Disadvantages:
Nutch is a project of the Apache Software Foundation and is part of the larger Apache Community of developers and users.
Search Engine (Solr)
Solr is written in Java and runs as
a standalone full-text search server
within a servlet container such as Jetty.

Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.
In our system, we use Solr version 4.2, it took us time and hard work to deploy it on Glassfish 3 which is an application server.
Solr Features
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Linearly scalable, auto index replication, auto failover and
recovery
Near Real-time indexing
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture
Conclusion & Future Development
Achieved Work
Possible Extensions
Achieved Work
The Project objectives were achieved successfully, we could implement all the basic main functions.
We Can Summarize the Work done in the following steps:
The system can crawl the data from both (Social Network & Trusted Websites).
The system is able to track the change of opinions.
A web service "REST API" that enables developers to easily use our services.
System is implemented as a web service.
Algorithm automated optimization.
Possible Extensions
There are some improvements, we really think that this field “Opinion Mining” is a very promising field in the future, as mentioned in ACM Research "Sociability in the Web", more than 60% of the users of the Internet now uses the social network DAILY.
Considered Extensions:
Improve the performance of the Web Opinion Mining Algorithm to be faster and more accurate.
Apply a set of rules and restrictions detect Spams and Fake Reviews.
Enhance the Search process by allowing the user to customize the search process.
The ability to create a history of previously minded opinions.
A Mobile Application will be created to crawl opinions and reviews on the go.
Create a Chrome Extension to be added on Google Chrome and use it easily.
System Testing
Testing Approaches
Testing Result
Testing Approaches
We Applied a set of test approaches and cases on the web crawler, the system components to compare the actual results with the expected ones
The Applied Approaches are:
Unit Testing
Integration Testing
System Testing
Testing Results
Unit Testing:
We conducted test cases for each implemented module using sample valid and invalid test data.

Integration Testing:
During integration testing, we faced many problems related to interfaces between modules and logical errors. The problems were fixed before integrating additional modules.

System Testing (Validation Test):
Finally we conducted a set of test cases to check that the system meets its requirements as stated in the system specification. Validation testing was conducted using sample real data with the following specifications:

.
.
.
.
Tables:
Webpage
Summarization Example
Critical decisions are always required in the daily life of individuals and organizations
, making such decisions may lead to a crisis or may lead to a great positive impact of improvement to these organizations or individuals, what determine if such decisions are tending to right or wrong is ONLY one thing and this thing is called INFORMATION.
Information turned to be one of the most important weapons in the world
, for example by getting information from someone’s account on Facebook leads to know more and more about him such as his name, age, his interests, hobbies, the location he lives in, photos & videos of him, even his current mood.
Critical decisions are always required in the daily life of individuals and organizations.
Information turned to be one of the most important weapons in the world.
And by using these information by some other individuals allow them to profane his privacy in somehow to force him to do things for them.
And by using these information by some other individuals allow them to profane his privacy in somehow to force him to do things for them.
Summarize into
Tracking Example
.
Full transcript