Methods How To Collect Twitter Data Archivist Example
Grabbing Some Data
- Enter a search term. In this case we are trying "Mills College" (with the quotation marks to see if this will eliminate accidental juxtapositions of the words "mills" and "college" — but note that it also means we may miss references just to "Mills"). This yields the following.
- Then we export the data to Excel (as a txt — tab delimited text — file) and start Excel to import that data.
- And it looks like this (after a little bit of column resizing).
- We scan down the data to see it all looks in order and notice one entry at the bottom that doesn't look right.
- On closer inspection we see this is the result of a clever twitter user who added line breaks to her/his tweet. Hopefully, this doesn't happen too often. Later we might want to go into the raw TXT file and see whether these are ordinary <CR><LF> or another character.
- After fixing that one row, the data is in pretty good shape. Suppose we want to do a simple word count. Here's how we can proceed.
- Select column containing tweets and copy.
- Open MS Word and paste.
- Select all (ctrl-A) and replace all spaces with carriage returns (in the replace box we just type "caret p" or ^p. This puts every word on separate line.
- Next we want to do a little clean up. Replace all commas, colons, periods, exclamation marks, and hyphens with nothing (not blank, nothing).
- Select all, copy.
Simple Frequency Analysis
- Go back to Excel and paste starting in cell A2 in a new worksheet. Label the column (in A1) "words."
- Select the column, INSERT>PIVOT TABLE. Then drag "words" to both the row labels and the data values boxes as shown by the pink arrows.
- Then we select the pivot table, copy, paste-special values.
- Then sort the two columns of data on the count column from largest to smallest.
- Then draw histogram of top 25 or so (here we have omitted "Mills College")
page revision: 9, last edited: 31 Dec 2011 19:57