See also
Twitter Data Processing
Simple Twitter Analysis
Illocution Inc, a source for sampled Twitter data
Basics
- http://twitter.com
- Twitter Help Center (start here)
- SEE also: [http://onemansblog.com/2011/04/27/how-to-search-twitter-for-old-tweets-and-how-to-archive-them/]
Project
- Build an RSS feed that gathers all tweets on some topic
- See How To Find Your RSS Feed or RSS
- "Feed" can go in two directions. We might want to know how we can send something we update (like our blog) to Twitter. Or we might want to capture something going on out there.
Strategy 1
Go to hashtags.org
Enter a term and take note of the current tweeting on this term.
Strategy 2
Download The Archivist
Simple Example
Strategy 3
See also http://search.twitter.com
Subtopics
Strategy 4
Obtained data from http://simplymeasured.com/blog/2010/06/lakers-vs-celtics-social-media-breakdown-nba/. Uncompressed, cleaned up a bit (details on that another time) and take data subset of about 10k tweets.
Format is
Term | Username | Name | Tweet | Time(PDT) |
Use Excel pivot table on names to create a lookup table and then vlookup to code each case with an tweeter ID.
Use Excel "substitute" function to turn all spaces in Tweet into a findable string including the record number.
Data Cleaning
See also DATA CLEANING on my Twitter project.
We start with data found on http://simplymeasured.com/blog/2010/06/lakers-vs-celtics-social-media-breakdown-nba/ (RowFeeder for Celtics and Lakers compressed.zip).
Uncompressed data contains about 45,000 tweets for each game and looks like this:
Service | Term | Username | Name | Update | Location | URL | Friends | Followers | Time(PDT) | City | State/Region | Country | Metro | Latitude | Longitude |
For our first trial run we will throw away some of the columns and just work with a subset of 10,000 tweets. Our data will include:
Term | Username | Name | Tweet | Time PDT |
Initial examination of data shows lots of junk characters that may make for analysis headaches. Trying to clean some of this up pre-emptively:
Remove blanks from names
We have both username and name. Not sure which we will eventually want to use. For now, assume it is USERNAME. We take this column and use pivot table to create a list of unique names. From 10,000 tweets we have 8760 unique usernames.
Put serial number next to each unique username. Then back in the data we create a new column and use the formula =VLOOKUP(B2,'name-id list'!A$1:C$8760,3) which we then autofill down and then copy and paste special values. Now we can delete the username and name columns
Next we note that the tweets themselves are full of lots of "irregular" text — some is "tweetese" so we don't want to sweep it all aside, but we do want to establish some cleaning. One to note is that some tweeters use the hashtag and some do not — lots of tweets with #lakers and lost without. Also we not upper and lower case. Also, various punctuation marks and URLs.
SUGGESTION: For first cut, let's zap most of this stuff. We'll turn periods, commas, parentheses and brackets, exclamation and question marks, semicolons, colons, etc. into blank spaces. But note that some of these are parts of emoticons.
References
The Twitter Fan Wiki
Twitter Social Network Analysis
http://blog.magicbeanlab.com/sna/some-twitter-social-network-analysis/
Geographic Data Analysis and Visualization at U of Oregon
http://www.crimsonhexagon.com/
See also
- 0262
- 0267
- 0480 Data-to-Table
- Annual Reviews
- Build an App Prototype with PowerPoint
- How To Title A Chart
- cognitive-networks
- how-to:convert-tabular-data-to-pivot-tables
- How to Selectively Highlight Rows in Excel
- ESRI How a GIS represents and models geographic information
- Help How To Set Up A Twitter Account
- Help How To Set Up A Wikipedia Account
- Help How To Tweet A Reading
- Proseminar in Sociology (190)
- How To Edit Pages - Quickstart
- How to Give a Lightning Talk
- how-to-give-a-presentation
- How to make a Gantt chart in Excel
- how-to-make-lines-from-2-points
- How To Make Pie Chart With Column Chart Detail