Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
An Analysis of Professional Exchange and Community Dynamics on Twitter Around the #OR2012 Conference Hashtag
Transcript of An Analysis of Professional Exchange and Community Dynamics on Twitter Around the #OR2012 Conference Hashtag
Nicola Osborne, Social Media Officer, EDINA, University of Edinburgh. @suchprettyeyes
Professor Ewan Klein, School of Informatics, University of Edinburgh. @ewanhklein
Miranda Taylor, Research Associate, School of Informatics, University of Edinburgh. @MirandaBTay Tweet Collection current version of TAGS Explorer: http://mashe.hawksey.info/2013/02/twitter-archive-tagsv5/ Background to OR2012 International conference for repository community
450+ delegates from over 30 countries
4000+ tweets to #OR2012
Extensive and supported use of social media across the conference (live blog; Twitter; Flickr; YouTube; Crowdvine; Delicious; OR Google Group).
Run as an "amplified" event (Dempsey 2007), reaching out beyond those attending in person.
Built on experience of amplifying Repository Fringe events through social media (Osborne 2011). Analysis via Twitter Workbench Introduction Gain deeper understanding of conversation topics and activity around OR2012.
Comparison of techniques - a relatively large twitter data set for manual analysis, a relatively small set for computational methods.
Opportunity to see if attempts to promote social media to participants, and to scaffold it's adoption and use, had been successful.
Opportunity to compare Twitter discussion with other feedback (surveys) on the event.
Investigate themes, talks and events emerging as important.
Interest in overlap between strands - the concept of using unique tags for sessions (Kelly 2009) having not been widely adopted by Twitter community.
Opportunity to build on manual analysis of tweets from similar but smaller Repository Fringe events (Osborne 2011). Journalists Workbench for the Guardian JISC Twitter Workbench Changes and "tuning" of the JISC Twitter Workbench for the #OR2012 data Demo Different clustering types appropriate to different quantities and speeds of tweets.
Different approaches required for analysis depending on whether during (streaming data) versus after (static data) an event.
Total data set size raises specific challenges - e.g. cumulative approach needed here.
Clusters confirm some hunches and experiences; also raises new questions and areas for reflection. Inevitable bias of findings - limited to comments on Twitter AND those to one official hashtag (#OR2012).
Clustering focuses on identifying the most central tweets - not necessarily most important or most influential. Minority views also lost in this vs. manual curation and analysis. RTs are excluded.
Does provide relevant analysis to compare with data sources such as feedback survey responses, viewing statistics of videos, etc.
Any surprises? Not really... which is surprising in itself....!
Twitter feedback does seem to reflect other forms of feedback. Perhaps Workbench analysis could act as indicative real-time sample, representing and reinforcing views coming in from other feedback routes.
Several diverse use cases here: small events vs, those at larger scale with super speedy tweets. You need to draw meaning from both situations but very different "tuning" is required for each. Is this analysis and visualisation a potentially useful resource for partially or fully remote attendees or others engaging at a distance?
Is this form of analysis more or less accessible, or useful, than "traditional" twitter archives (e.g. Storify) or manual overview?
To what extent can twitter be used as a proxy for guaging opinion?
Does "at the time" public commentary vary from reported memory of the event? (e.g. life of tweet versus feedback surveys).
Is capturing vicarious feedback during an event more effective in gaining feedback from some audiences? Does it provide different overall perceptions of success/failures because the audience varies?
What are the ethical issues of computational analysis and sharing of analysis and/or visualisations of tweets like this?
Are individuals still respected appropriately (as per Markham & Buchanan 2012)?
What issues arise from converting public sharing to public analysis or publication, possibly out of context (boyd 2007)?
What of post-event deletion of tweets not reflected by/removable from analysed corpus? (Almuhimedia et al 2013). Questions? New software, designed for larger datasets - we've been able to influence development.... but that wont always be possible...
Multiple related clusters - based on language used - makes for tricky trade-off around granularity. Some duplication of clusters may be unavoidable - but could be addressed by follow-on manual analysis. The workbench does allow easy exploration from cluster to tweet level.
Analytical approach varies significantly depending on the size of tweet data - this could prove challenging in terms of processing power if expected use of Twitter varies greatly from previous/predicted usage.
Analysis is restricted to what you access or collect - this might mean a single hashtag (as in our example) or a combination of tags or search terms. You analyse from a stream of tweets but the stream must still be selected. This has different implications in terms of accuracy and processing demands, depending on different contexts and audiences (e.g. a sample of Olympics tweets may be adequate, for a conference you may wish to access most or all tweets). Gathering tweets and using them in this way provides evidence of activity, opinion, interactions around an event (or topic).
Collected Tweets and their analysis are actionable data to measure against particular goals, objectives, questions.
Tweets represent a particular cohort of individuals in a specific context of place/time/interest: can be mined for other areas of interest, research, different types of data analysis, etc.
Potentially enables richer understanding of groups within a cohort and/or identification of potential members/participants.
Potentially offers high level view and a real-time starting point to understanding views, opinions, trends.
Real-time analysis offers opportunities to be flexible and responsive to feedback (e.g. redesign of survey forms). http://homepages.inf.ed.ac.uk/s1053147/clusters/betterOR/stream_plot.html References This work came out of our shared interests in social media and analysis of tweets in particular. Almuhimedia, H. Wilson, S., Liu, B. Sadeh, N. and Acquisti, A. 2013. Tweets Are Forever: A Large-Scale Quantitative Analysis of Deleted Tweets. Forthcoming in Proceedings of the 2013 ACM Conference on Computer Supported Cooperative Work, San Antonio, TX, February 23-27, 2013. Preprint available from: http://www.cs.cmu.edu/~shomir/cscw2013_tweets_are_forever.pdf. Accessed 4th April 2013.
Ball, J. and Lewis, P. 2011. Twitter and the riots: how the news spread. In The Guardian: Reading the Riots, 7th December 2011. Available from: http://www.guardian.co.uk/uk/2011/dec/07/twitter-riots-how-news-spread. Accessed 4th April 2013.
boyd, d 2007. “Social Network Sites: Public, Private, or What?” In Knowledge Tree, 13 (May.) Available from: http://kt.flexiblelearning.net.au/tkt2007/?page_id=28. Accessed 4th April 2013.
Dempsey, Lorcan. 2007. The amplified conference. In Lorcan Dempsey’s Weblog, 25th July 2007. Accessed 7th December 2012: http://orweblog.oclc.org/archives/001404.html.
Guardian Interactive team, Proctor, R. Vis, F. and Voss, A. 2011. How riot rumours spread on Twitter. In The Guardian: Reading the Riots, 7th December 2011. Available from: http://www.guardian.co.uk/uk/interactive/2011/dec/07/london-riots-twitter. Accessed 4th April 2013.
JISC. 2012. JISC Twitter Analysis Workbench. In jisc.ac.uk [website]. http://www.jisc.ac.uk/whatwedo/programmes/di_research/researchtools/workbenchdevo.aspx. Accessed 4th April 2013.
Kelly, Brian. 2009. Hashtags for the ALT-C 2009 Conference. In UK Web Focus [blog], 28th August 2009. Accessed 7th December 2012: http://ukwebfocus.wordpress.com/2009/08/28/hashtags-for-the-alt-c-2009-conference/
Lewis, P. and Newburn, T. 2011. The Reading the Riots project: our methodology explained. In The Guadian: Reading the Riots, 5th December 2011. Available from: http://www.guardian.co.uk/uk/2011/dec/05/reading-the-riots-methodology-explained. Accessed 4th April 2013.
Luo, G., Tang, C., and Yu, P. S. 2007. Resource-adaptive real-time new event detection. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD '07, pp. 497-508. New York, USA: ACM.
Markham, A. and Buchanan, E. 2012. Ethical Decision-Making and Internet Research Recommendations from the AoIR Ethics Working Committee (Version 2.0). http://aoir.org/reports/ethics2.pdf. Accessed 1st April 2013.
Osborne, N. 2011. Amplification and analysis of academic events through social media: A case study of the 2009 beyond the repository fringe event. In L. A. Wankel & C. Wankel (Eds.), Higher education administration with social media: Including applications in student affairs, enrollment management, alumni relations, and career centers, pp. 167-190. United Kingdom: Emerald Group Publishing Limited.
Petrovic, S., Osborne, M., and Lavrenko, V. 2010. Streaming 1st story detection with application to twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pp. 181-189. Stroudsburg, PA, USA: Association for Computational Linguistics.
Twitter. 2013. GET statuses/firehose. In Twitter Developers [website]: REST API v1.1 Documentation, 27th August 2012. Available from: https://dev.twitter.com/docs/api/1.1/get/statuses/firehose. Accessed 4th April 2013. Motivation for analysis Relevance to research London Riots coverage (see: Lewis & Newburn (2011) for background; Ball & Lewis (2011) for analysis; Guardian Interactive team et al (2011) for visualisations; also discussed in Lee Salter's opening plenary).
Workbench designed for large scale real-time contexts such as Olympics tweets. Applied clustering analysis designed for the Twitter "Firehose" (Twitter 2013), to a discreet data set (n = 3026 after preprocessing).
Created specific Use Cases for academic Twitter contexts (including events and small scale research) for the Twitter Workbench team to illustrate challenges of smaller data sets.
Worked with Miranda Taylor to find best ways to adapt Twitter Workbench to smaller, discreet data sets, and then "tuned" the Workbench to our data. What have we found out: about the software (JISC 2012) Number of participants: from ~30 to ~500. Some will be identifiable from attendee lists, some will be unknown to organisers/attendees.
Time period: 1 - 14 days likely to be max for most. 2 months at very max for major conferences.
Tweet volume: likely to be relatively light: less than 2/minute at most times.
Twitterer connectedness: likely to be dense existing relationships between many participants. This may be useful to organisers and helpful for marketing events in the future. Ability to cluster individuals would be useful for analysing relationship between attendees and those outside the room.
Noise: variable depending on hashtag. Likely to be multiple opinions or rooms represented. Limited spam, limited noise especially at most dense tweeting periods. Location: likely to be similar if not the same.
Threading: likely to be multiple parallel strands present, ability to decipher and cluster these would be useful. Manual ability to cluster conversations or de-cluster conversations would be useful. Events Track attendees /those using hashtag for a discreet period of time Number of participants: Could be as low as ~5 to ~1000. Much higher groupings required if statistically significant sample selected but for qualitative and small project approaches that is unlikely.
Time period: Variable. May be long term (fewer participants) or shorter term (more participants). Unlikely to be a long term and large cohort.
Tweet volume: Variable but unlikely to be more than 1/minute, often less.
Twitterer connectedness: likely to be dense existing relationships between many participants and this is likely to be of significant interest to the researcher - particularly follows/followed by relationships and retweeting patterns.
Noise: Likely to be limited if based on specific cohort. Small amounts of noise possible if research is being conducted around a hashtag or conversation topic.
Threading: likely to be multiple parallel strands present, ability to decipher and cluster these important. Connections between participants important. Ability to cluster manually particularly important and tagging or otherwise marking up tweets for textual analysis would be useful. Academic Research - Small Qualitative Projects Number of participants: probably ~100 to ~10,000. Much higher groupings as more likely being considered to track major trend or statistically significant sample of people.
Time period: Variable. May be single day, may be trends over time. Unlikely to be super long term for this scale project (Max 2 weeks?).
Tweet volume: Variable but likely to need to accommodate 5/minute+ down to 1 or 2 per hour (this variety is particularly likely for events/hashtags/topics specific to one geographic region).
Twitterer connectedness: Variable from tight connection to very loose connectedness. For events based hashtags connectedness less likely to be important than common characteristics - some ability to cluster by bio information, tweet and/or bio location, and other aspects that help explain content of tweets and conversations. For longer term or more interest based groupings indications of how wide a variety of topics is tweeted to, shared alternative hashtags/terms might be useful.
Noise: Noise possible and likely at this scale as is spam.
Threading: likely to be multiple parallel strands present, ability to decipher and cluster these important. Connections and/or shared characteristics may enhance usefulness of threading. Ability to cluster manually particularly useful but ability to teach system more likely to be useful for larger scale pieces of work. These are more likely to reflect the types of data use already anticipated - automatic analysis of data sets too large to do any other way and where detailed connections may not be as important. Academic Research - Small to Medium sized Quantitative projects For (opt in) tracking of tweets associated with research around particular hashtags, particular events, particular subjects. Multiple use cases here really but this is to give an idea of what these would look like and what would be useful. Use Cases for Twitter Workbench LDA
Approach An alternative view... Extracting a talk receiving lots of attention... Tweets from the Workshop day... Final Day Tweets: reflection; focus on software and projects; commentary What have we found out: some reflections What this analytical approach achieves Overview of popular talks and events at the conference (with Twitter users) and of key discourses around these.
Multi-stream event - fuller overview of event, discussion, overarching “hot topics”.
Can identify themes or topics of interest - e.g. ePrints or Orcid - but not programme strands per se.
Real-time analysis useful for attendees; potential to enhance networking.
Detects some themes quickly, unclear if these always emerge through other means (e.g. manual monitoring, surveys).
Can observe lifespan of interest in themes and topics.
May be possible to use analysis of Twitter discussions to identify strands to include/exclude for future conferences.
This conference suggests people tweet a lot before the conference, a lot during, not so much afterwards. Useful for analysis - as are understandings of volume of tweets/day. Where this approach can be challenging We cannot deduce what was unpopular: critical comments often divergent in word use and structure; absent comments indicates a lack of tweeting or varying style of tweets, not necessarily unpopularity.
Twitter workbench captures unique tweets only (including first RT and some MTs): does not account for volume of RTs/MTs which may miss key statements/tweets by prominent figures and/or areas where opinion is not divided and thus little original commentary is added.
If capturing a hashtag before and/or after the event you may see some noise which can warp analysis (e.g. Ronaldo tweets around #OR2012).
This analysis ignores connection between Twitter users, networks of contacts etc. Does that impact on interpretation of data or centrality of specific tweets? The Role of customisation and tuning (Some of the) Questions raised Initial version of the
JISC Twitter Workbench Full collection via Martin Hawksey's TAGS Explorer
Additional public archive curated with Storify OR2012 presents an excellent sample set for experimentation with the JISC Twitter Workbench
Relatively large but contained data set which had already been collected
Builds on previous work on analysing Twitter using manual and computational methods.
This work is experimental and raises far more questions than it answers! http://scargill.inf.ed.ac.uk/clare/dashboard/incremental/dashboard/visualisation.html Understanding this data and drawing conclusions from it are particularly challenging!
How we make meaning from and use this analysis is unclear - but we believe it is possible.
We trialled two types of clustering: the incremental approach suited this event but whether analysis is real-time, and what suits other conference sizes/activity levels may vary...
But further investigation is planned, this time running a real-time Twitter Workbench analysis of Repository Fringe 2013 (#rfringe13). Future Directions