Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Twitter data mining project
Transcript of Twitter data mining project
November 30, 2012 Outline THANK YOU The evaluation Question * Objectives of Formative Evaluation
- Determine if the Antidote Kit is effective in treating relevant poison cases
- Determine if costs to manufacture the kit is worth the healthcare savings
- Identify barriers to using the Antidote Kit and propose solutions
- Assess shortcomings of the Antidote Kit and future improvements (life cycle) Methods Twitter Account Coding
Data Mining Project References Asur, S., Huberman, B.A. (n.d.). Predicting the Future With Social Media. Retreived from http://www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdf
Feldman, G. F. (2010, July 12 ). Social Networks Suggested Readings & Twitter Datasets. Retrieved October 25, 2012, from Management & Social Psychology: http://mgto.org/social-networks-suggested-readings-twitter-dataset/
Kornwitz, J. (2012, September 18). Data mining in the social-media ecosystem. PHYS.org. Retrieved from http://phys.org/news/2012-09-social-media-ecosystem.html
Kutools for Excel. (n.d.). (Detong Technology Ltd) Retrieved November 17, 2012, from Extend office: http://www.extendoffice.com/product/kutools-for-excel/excel-convert-text-to-date.html
Nejat, M. H., Aghazarian, V., & Hedayati, A. R. (August 11-12, 2012). Comparative study of the performance of ensemble and base classifiers in text data categorization. 5. Retrieved from http://psrcentre.org/images/extraimages/812085.pdf
Pick, T. (2012). 72 Fascinating Social Media Marketing Facts and Statistics for 2012. JeffBullas.com. Retrieved from http://www.jeffbullas.com/2012/07/24/72-fascinating-social-media-marketing-facts-and-statistics-for-2012/#3ALcy6IJ3S163ZhM.99
Volkova, S. (n.d.). Link prediction in social networks. (1), 26. Retrieved from http://svolkova.weebly.com/uploads/1/6/7/1/1671882/cis_830_final_project_-_link_prediction_in_social_networks_-_part_2.pdf
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA
pentaho data integration. (n.d.). (Pentaho Corporation) Retrieved November 17, 2012, from pentaho: http://www.pentaho.com/explore/pentaho-data-integration/
Weka Software Questions User Role Analysis In Excel
Trouble opening twitter data file on WEKA ,so we tried many solutions such as:
Using Date Format Converter to organize dates in (inTwitterSince) attribute (Kutools for Excel).
We used Pentaho application to covert exl to csv (pentaho data integration).
Unfortunately, all the solutions above were failed and that makes us so confused.
Solved by the following steps:
Using Excels find and replace function, replace \ ,= signs, @ and % with a space.
Enter this formula :
=CLEAN(SUBSTITUTE( F2, HERE,CHAR(10),CHAR(32)))
Used attributes such as (Longitude, Latitude, profilePicURL, and location). were removed from the data set. That made the dataset left with 9 attributes. Data pre-processing Recommendations Account Type Analysis Introduction and Research
User Role Analysis
Account Type Analysis
List of References
Questions (Note: all of these without tax included) Material Cost
Twitter is one of the entirely open social networks where a researcher can see who a user follows, the number of followers each user has, and user-to-user interactions etc.
Provided by the course instructor, Dr. Anatoliy Gruzd .
This dataset has 760 instances and 11 attributes.
Our focus will be directed towards categorizing twitter users based on user role and account type (Single/Group). Introduction The question underlying this project:
What are the classifiers to be used in order to predict the categorization of the Twitter user role and account type? 1- All the missing values were removed from the kloutscore and bio_desc attributes by using ReplaceMissingValues.
2- Remove missing values in single /group attribute
3- Remove missing values in role attribute .
4- Removing outliers
5- The data reduction would not need to be applied because of the small number of records that is 760 records in the dataset. However, it got to the level of 696 records after removing the outliers. In WEKA Naïve Bayes Multinomial Text which is a probability-based classification method
Decision tree using J48 to create the appropriate rules and decisions .
Meta Classifiers: Meta classification indicates the usage of combination of multiple classifiers. Various classification method is proposed for the combination of classifiers : Boosting using AdaboostM1 which creates a strong algorithm that is derived from multiple weak classifiers.
Bagging by averaging the prediction over multiple classifiers. For higher percentages we tried three different stages.
In Each stages, four classifiers which are J48, Naïve, Bagging and AdaBoostM1 were applied Procedure Result Discussions Procedures Procedures Results Discussions Project Question Dataset description Removing the other class attribute which is single/group attribute. Bad Stage with very low results.
AdaBoostM1 classier was given the heights percentage which is 41.954% .
Changing Parameters didn't affect the results. First Stage Results Second Stage: Procedures
this Stage also was very bad
applying 4 classifiers on the class attribute (student,professor,librarian,manager, Assocciation , media , university, other and researcher). Third Stage: Procedures In Weka:
Removing group and single attribute
Remove missing values in role attribute which are(N,2 Other attributes and business) by RemoveWithValues
Channing IMH to IM by selecting IMH in Edit button and choose IM
(student,professor , Assocciation , media , university, it, im, JOAT and researcher) each time ONE OF THEM will be our class attribute . •The 3rd Stage is the best one because it answers the question of the user role analysis.
•The results are not low as the first stage and not very high as second stage.
•They are between 97.8437 % and 73.5849 %.
•2 nd stage was shown same result by applying different classifiers.
3rd stage was shown same results in (JOAT, university)and their results are high.
Changing classifiers parameters didn't affect results.
Changing re-sample parameters gave better results. Deleted role attribute
Set Single/Group as class attribute
Changed “bio_desc” to string
1820 attributes & 371 instances
Post att. selection Discussion Applied selected algorithms Experiments J48 Tree The best classifier
The weakest classifier
Meta algorithms were the most sufficient to predict classification of twitter users' account type based on the information provided in their bio_desc. Creating customer profiles using mining techniques for marketing: based on role class
Making content and datasets available for further research on text mining
Demographics & behavioral pattern detection use to explore user influence (KloutScore) and social behavior
More demographic, political & geographic information increased engagement with multimedia content
By detecting user role, researchers/data miners can predict community formations and assign tags and advertisements
Additional attributes such as friends and followers, retweets and mentions allow for predicting interactions and correlations that could be useful in personalization and recommendations in hashtags and ads Results Best one
Results between 97.8437% and 73.5849% Results