Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.



Public Opinion Mining: As my graduation project me and my teammates created a public opinion mining system and web service that provides sentiment analysis and stigmatization to users about a certain topic.

amr mohsen

on 17 March 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of POM

Fields such as:
Decision Support System
Text Mining
Opinion Mining
reference: britannica.com
Project Status
Survey Phase
System Analysis
System Design
System Implementation
System Testing
Conclusion & Future Development
Community (Social Network):
Friends and relatives.
Consumer reports.
Websites such as:
Blogs. (google blogs)
E-commerce sites. (amazon, ebay)
Review Sites. (CNET)
Web Data Mining
Also called knowledge discovery is the process of discovering interesting and useful patterns and relationships in large volumes of data from the public web.
Applications of web data mining such as:
Social Network analysis
Web crawler, also known as spider or robot, is a program that automatically download Web pages.
Collecting Data
We will use a Focused crawler but it will be limited to specific domains of our choice.
What type will we use ?
Indexing Data
What is a web crawler ?
Similar Projects
Created by three Computer Science students at Stanford University
as their graduation project.
Functional Requirements (use case):
Non-Functional Requirements:
Use Case:
Entity Relationship
Provide APIs for classifying tweets.
Unlimited Crawler Depth
Resuming Crawling
Easy to integrate within an application
Configuration is not complicated
Public Opinion Mining
4106/4206 Graduation Project
Final Presentation
HICIT Department
of Computer Science
Supervisor Professor
Ahmed Al-Abassy
Amal Ibrahim
Teacher Assistant
The user enters the keyword of what he wants to get opinions about.
Searching in the trusted websites we have embedded in our system.
Searching in Social Network (Facebook, Google Plus, and Twitter).
Display a report and a time line chart of the final Result to the user in a web page.
User is able to save, print or send the results.
Availability and Portability
Delivery Requirement
Monitors positive and negative feelings in twitter conversations about stuff like movies, musicians, TV shows and popular brands
Live Tweet analysis
Skip some output Result
25 Trends
Support result with chart and detailed information
The Algorithm is designed to support Arabic which represents only 3% of the total web content.
The Service is paid.
Team members
Ahmed Mahmoud
Amr Mohsen
Ashraf Hesham
Mohammed Al-Adley
Mahmoud Habib
What makes our project different ?
Works on social network, blogs, news sites, product review sites, etc.
The ability to track opinions change.
A web service "RESTAPI" that enables developers to easily use our services.
System is implemented as a web service.
Algorithm automated optimization.
web server
Cloud Service
Web Crawler
web page 1
web page 2
web page 3
web page 4
Final Result
Statistics-Time Line
Types of web crawler:
Universal Crawler
Focused Crawler
Topical Crawler
After the webpages collected, they need to be indexed using document indexers (Search Engine) to allow searching for them using tags or keywords.
Pre-Processing Steps:
Extraction Phase:
Collecting Data
Indexing Data (Solr)
Extracting Reviews
Project Background
Project Background
What is Web Data Mining?
Pre Processing
Before the algorithm is applied, data need to be collected and cleaned to be processed by the algorithm. this is called Pre-Processing and it's the first step of data mining.
System Architecture
Opinion Mining (Sentiment Analysis)
It aims to analyze and track the mood of the public about a particular product. Opinion mining, which is also called sentiment analysis, involves building a system to collect and examine opinions about the product made in:
blog posts
Reviews or tweets
Comparison between similar projects
Provide only Overview of the result
Public Opinion Mining
which car should I buy?
which school should i apply to?
which professor to work for?
whom should I vote for?
Opinion Mining Algorithm
The Algorithm divided into 3 phases :
Mining Opinions from the Web Beyond
Relevance Retrieval
Extraction Phase
It aims to extract opinion evidence from words, sentences and documents then identify their polarities.
Word Level
Sentence Level
Document Level
Summarization Phase
It aims to produce a cross-document opinion summary.
The developer can use the API we provided in his applications.
The developer has to register or login to use the web service.
Tracking Phase
The opinion tracking phase aims to tell how people change their opinions as time goes by.
System Components
Survey Phase
Similar Projects
Comparison to extract the Motivation
System Analysis
System Architecture
Functional Requirements
Use Case
Non- Functional requirements
System Design
System Components
Pre Processing
Data Store

Crawler 4j
Crawler4j is an open source Java crawler which provides a simple interface for crawling the Web.
Linux OS
App. Server Glassfish 3
Java EE
Netbeans IDE
Apache Open NLP
Jsoup HTML Parser
Apache Open NLP
Eclipse link
JFree Chart
The Project is divided into two phases:
Phase ONE (Define Basics)
Objective: analyze the project to understand it more and to implement the initial prototype.
Phase TWO (Update Basics)
Objective: Modify the system built in previous phase, enhance the accuracy and the performance.
Project Status
Thank You :)
Analyzing public reviews to determine its polarity regarding to a specific subject of matter.
Split Document..
into Sentences..
into Word's...
Crawling Data
Web Data Mining Algorithm
It's a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understand the market dynamics. etc...
Opinion mining can be useful in several ways.
Why Public Opinion Mining?
Everything is our decisions is based on information, the greater amount of information to be collected, the more complex the analyzing operation is to take the correct decisions. Critical decisions requires time, mental effort to collect information and analyze it.

Life = Risks

Risk = Make Decision

Make Decision = Collect & Analyze (data & information)
Sentiment 140
It provides the first fully-automated Arabic social media analytics services for brands, entities, companies and individuals in the Arab region
Entity Relationship cont.
Search Engine - Solr
Extracting Reviews
Extracting reviews from web pages is done using HTML parsers which reconstruct the DOM tree of a page and allow retrieving its contents.
Software: Apache Nutch2
Software: Solr
Data Store
Data Store
This is the part that is responsible of the whole data manipulation process.
We have two categories of the database, one of them is created by use,
but the second one is included in the Search Engine.
This database is called "HBase".
What is HBase?
Apache Hbase is an open source distributed, versioned, column-oriented
store that provides Bigtable like capabilities on top of Hadoop and HDFS.
When we use HBase?
we used Hbase because it is the most stable DBMS to be used with Nutch 2
MySQL there exists a bug in Nutch 2 implementation which prevents us from using MySQL
Web Crawlers:
Crawler 4j
Apache Nutch 2
Search Engine:
Database Engine:

From Where we get Opinions & Reviews?
Software: Jsoup Parser
Dtatabase Engine: MySQL
Database Engine: HBase
System Implementation
Software & Tools
Apache Nutch 2
Scalability: it does not save data into the database !
Nutch is a project of the Apache Software Foundation and is part of the larger Apache Community of developers and users.
Search Engine (Solr)
Solr is written in Java and runs as
a standalone full-text search server
within a servlet container such as Jetty.

Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.
In our system, we use Solr version 4.2, it took us time and hard work to deploy it on Glassfish 3 which is an application server.
Solr Features
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Linearly scalable, auto index replication, auto failover and
Near Real-time indexing
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture
Conclusion & Future Development
Achieved Work
Possible Extensions
Achieved Work
The Project objectives were achieved successfully, we could implement all the basic main functions.
We Can Summarize the Work done in the following steps:
The system can crawl the data from both (Social Network & Trusted Websites).
The system is able to track the change of opinions.
A web service "REST API" that enables developers to easily use our services.
System is implemented as a web service.
Algorithm automated optimization.
Possible Extensions
There are some improvements, we really think that this field “Opinion Mining” is a very promising field in the future, as mentioned in ACM Research "Sociability in the Web", more than 60% of the users of the Internet now uses the social network DAILY.
Considered Extensions:
Improve the performance of the Web Opinion Mining Algorithm to be faster and more accurate.
Apply a set of rules and restrictions detect Spams and Fake Reviews.
Enhance the Search process by allowing the user to customize the search process.
The ability to create a history of previously minded opinions.
A Mobile Application will be created to crawl opinions and reviews on the go.
Create a Chrome Extension to be added on Google Chrome and use it easily.
System Testing
Testing Approaches
Testing Result
Testing Approaches
We Applied a set of test approaches and cases on the web crawler, the system components to compare the actual results with the expected ones
The Applied Approaches are:
Unit Testing
Integration Testing
System Testing
Testing Results
Unit Testing:
We conducted test cases for each implemented module using sample valid and invalid test data.

Integration Testing:
During integration testing, we faced many problems related to interfaces between modules and logical errors. The problems were fixed before integrating additional modules.

System Testing (Validation Test):
Finally we conducted a set of test cases to check that the system meets its requirements as stated in the system specification. Validation testing was conducted using sample real data with the following specifications:

Summarization Example
Critical decisions are always required in the daily life of individuals and organizations
, making such decisions may lead to a crisis or may lead to a great positive impact of improvement to these organizations or individuals, what determine if such decisions are tending to right or wrong is ONLY one thing and this thing is called INFORMATION.
Information turned to be one of the most important weapons in the world
, for example by getting information from someone’s account on Facebook leads to know more and more about him such as his name, age, his interests, hobbies, the location he lives in, photos & videos of him, even his current mood.
Critical decisions are always required in the daily life of individuals and organizations.
Information turned to be one of the most important weapons in the world.
And by using these information by some other individuals allow them to profane his privacy in somehow to force him to do things for them.
And by using these information by some other individuals allow them to profane his privacy in somehow to force him to do things for them.
Summarize into
Tracking Example
Full transcript