Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Twitter data mining project

by Rasha Alamani

Naif Ashy

on 4 December 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Twitter data mining project

Prepared by :
Amal Alsaffar
Rasha Alyamani
Naif Ashy

November 30, 2012 Outline THANK YOU The evaluation Question * Objectives of Formative Evaluation

- Determine if the Antidote Kit is effective in treating relevant poison cases
- Determine if costs to manufacture the kit is worth the healthcare savings
- Identify barriers to using the Antidote Kit and propose solutions
- Assess shortcomings of the Antidote Kit and future improvements (life cycle) Methods Twitter Account Coding
Data Mining Project References Asur, S., Huberman, B.A. (n.d.). Predicting the Future With Social Media. Retreived from http://www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdf

Feldman, G. F. (2010, July 12 ). Social Networks Suggested Readings & Twitter Datasets. Retrieved October 25, 2012, from Management & Social Psychology: http://mgto.org/social-networks-suggested-readings-twitter-dataset/

Kornwitz, J. (2012, September 18). Data mining in the social-media ecosystem. PHYS.org. Retrieved from http://phys.org/news/2012-09-social-media-ecosystem.html

Kutools for Excel. (n.d.). (Detong Technology Ltd) Retrieved November 17, 2012, from Extend office: http://www.extendoffice.com/product/kutools-for-excel/excel-convert-text-to-date.html

Nejat, M. H., Aghazarian, V., & Hedayati, A. R. (August 11-12, 2012). Comparative study of the performance of ensemble and base classifiers in text data categorization. 5. Retrieved from http://psrcentre.org/images/extraimages/812085.pdf

Pick, T. (2012). 72 Fascinating Social Media Marketing Facts and Statistics for 2012. JeffBullas.com. Retrieved from http://www.jeffbullas.com/2012/07/24/72-fascinating-social-media-marketing-facts-and-statistics-for-2012/#3ALcy6IJ3S163ZhM.99 

Volkova, S. (n.d.). Link prediction in social networks. (1), 26. Retrieved from http://svolkova.weebly.com/uploads/1/6/7/1/1671882/cis_830_final_project_-_link_prediction_in_social_networks_-_part_2.pdf

Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA
pentaho data integration. (n.d.). (Pentaho Corporation) Retrieved November 17, 2012, from pentaho: http://www.pentaho.com/explore/pentaho-data-integration/

Weka Software Questions User Role Analysis In Excel
Trouble opening twitter data file on WEKA ,so we tried many solutions such as:
Using Date Format Converter to organize dates in (inTwitterSince) attribute (Kutools for Excel).
We used Pentaho application to covert exl to csv (pentaho data integration).

Unfortunately, all the solutions above were failed and that makes us so confused.

Solved by the following steps:
Using Excels find and replace function, replace \ ,= signs, @ and % with a space.
Enter this formula :
Used attributes such as (Longitude, Latitude, profilePicURL, and location). were removed from the data set. That made the dataset left with 9 attributes. Data pre-processing Recommendations Account Type Analysis Introduction and Research
Data pre-processing
In Excel
In Weka
User Role Analysis
Account Type Analysis
List of References
Questions (Note: all of these without tax included) Material Cost
Twitter is one of the entirely open social networks where a researcher can see who a user follows, the number of followers each user has, and user-to-user interactions etc.

Provided by the course instructor, Dr. Anatoliy Gruzd .
This dataset has 760 instances and 11 attributes.
Our focus will be directed towards categorizing twitter users based on user role and account type (Single/Group). Introduction The question underlying this project:
What are the classifiers to be used in order to predict the categorization of the Twitter user role and account type? 1- All the missing values were removed from the kloutscore and bio_desc attributes by using ReplaceMissingValues.

2- Remove missing values in single /group attribute

3- Remove missing values in role attribute .

4- Removing outliers

5- The data reduction would not need to be applied because of the small number of records that is 760 records in the dataset. However, it got to the level of 696 records after removing the outliers. In WEKA Naïve Bayes Multinomial Text which is a probability-based classification method
Decision tree using J48 to create the appropriate rules and decisions .
Meta Classifiers: Meta classification indicates the usage of combination of multiple classifiers. Various classification method is proposed for the combination of classifiers : Boosting using AdaboostM1 which creates a strong algorithm that is derived from multiple weak classifiers.
Bagging by averaging the prediction over multiple classifiers. For higher percentages we tried three different stages.
In Each stages, four classifiers which are J48, Naïve, Bagging and AdaBoostM1 were applied Procedure Result Discussions Procedures Procedures Results Discussions Project Question Dataset description Removing the other class attribute which is single/group attribute. Bad Stage with very low results.
AdaBoostM1 classier was given the heights percentage which is 41.954% .
Changing Parameters didn't affect the results. First Stage Results Second Stage: Procedures
this Stage also was very bad
applying 4 classifiers on the class attribute (student,professor,librarian,manager, Assocciation , media , university, other and researcher). Third Stage: Procedures In Weka:
Removing group and single attribute
Remove missing values in role attribute which are(N,2 Other attributes and business) by RemoveWithValues
Channing IMH to IM by selecting IMH in Edit button and choose IM

In excel:

In Weka:

(student,professor , Assocciation , media , university, it, im, JOAT and researcher) each time ONE OF THEM will be our class attribute . •The 3rd Stage is the best one because it answers the question of the user role analysis.
•The results are not low as the first stage and not very high as second stage.
•They are between 97.8437 % and 73.5849 %.
•2 nd stage was shown same result by applying different classifiers.
3rd stage was shown same results in (JOAT, university)and their results are high.
Changing classifiers parameters didn't affect results.
Changing re-sample parameters gave better results. Deleted role attribute
Set Single/Group as class attribute
Changed “bio_desc” to string
Created subsets
1820 attributes & 371 instances
Post att. selection Discussion Applied selected algorithms Experiments J48 Tree The best classifier
Bagging (REPTree)
AdaBoostM1 (J48)

The weakest classifier

Meta algorithms were the most sufficient to predict classification of twitter users' account type based on the information provided in their bio_desc. Creating customer profiles using mining techniques for marketing: based on role class
Making content and datasets available for further research on text mining
Demographics & behavioral pattern detection use to explore user influence (KloutScore) and social behavior
More demographic, political & geographic information  increased engagement with multimedia content
By detecting user role, researchers/data miners can predict community formations and assign tags and advertisements

Additional attributes such as friends and followers, retweets and mentions allow for predicting interactions and correlations that could be useful in personalization and recommendations in hashtags and ads Results Best one
Results between 97.8437% and 73.5849% Results
Full transcript