What is the Twitter Search API?
The Twitter search API, one of three such APIs (search, streaming, “firehose“), allows access to a subset of popular or recent tweets (in the last 4-6 days). That is, it allows querying past tweets (though a significantly small fraction of all tweets). To me, this is a great way to get one’s hands wet on collecting and cleaning tweet datasets, however, it doesn’t really provide any utility for research as the fraction of tweets received may not really be representative of the entire tweet stream.
Who can access the Twitter Search API?
Anyone! That’s right…if you have an account, you can create an authorization token and get started with Big Brotheresque collection of people’s thoughts and locations (that’s right….locations).
How do I get started with Twitter Search API and R?
To be able to query the twitter search api and import the data into R, we’ll need to accomplish the following tasks:
Sign up for Twitter & create an application.
Install R and required R packages
Understand the Twitter Search API query structure
Run our first query, and save to database
Sign up for Twitter & Create an Application
If you don’t yet have a Twitter account, head on over to twitter.com and grab yourself an account. Process is pretty self explanatory. If you already have an account with Twitter, we’ll need to set up an application (this will allow us to connect R to the twitter stream).
First step is to head on over to dev.twitter.com. After logging in (yes, you might be prompted for another login), click on your twitter thumbnail (upper right hand corner of the screen) and click on “My Applications.” In the following screen, click on “Create New App.” You’ll need a name, description, and website. I’ve used my blog address as a website, though I’d imagine that anything works.
Once created, click on “modify app permissions” and allow the application to read, write and access direct messages (this might come in handy later on). Lastly, click on the API Keys tab and scroll to the bottom of the page. Under token actions, click on “Create my access token.” We’ll need these access tokens when we fire up R. That’s it! We’re done with the Twitter part of this setup.
Install R and Required R Packages
If you have not yet installed R, head on over to r-project.org and install the version appropriate for your platform. I also highly suggest installing R-Studio, an integrated development environment and gui for R. I am running R-studio throughout the tutorial.
To use the Twitter Search API, we need the following packages installed:
Some of these packages have dependencies on other packages, so make sure you install all required packages before moving on. To install all of these packages in one run, just copy the following code and run it in R (or R-studio):
# Install and Activate Packages install.packages("twitteR", "RCurl", "RJSONIO", "stringr") library(twitteR) library(RCurl) library(RJSONIO) library(stringr)
Once these packages are installed, it’s time to set up our connection to the Twitter Search API. To do so, we’ll need to copy & paste our API credentials into either a text file, or, preferably, an R script. Just copy the following code into your script, noting the requirements in quotation marks:
# Declare Twitter API Credentials api_key <- "API KEY" # From dev.twitter.com api_secret <- "API Secret" # From dev.twitter.com token <- "Access token" # From dev.twitter.com token_secret <- "Access token secret" # From dev.twitter.com # Create Twitter Connection setup_twitter_oauth(api_key, api_secret, token, token_secret)
Keep the api keys in quotation marks, but remember to replace the text with your actual api and token keys. The setup_twitter_oauth function will create a connection to Twitter’s Search API. If you are successful, the following message should show up in your console:
 "Using direct authentication"
Understanding the Twitter Search API Structure
Our R instance is now ready to receive tweets from Twitter. However, before we can receive any information, we’ll need to understand the format of a Twitter Search query. Per Twitter, the best way to build a query and test if it’s valid and will return matched tweets is to first try it at twitter.com/search. This in essence uses the same API that we are calling. Once your results are accurate, we can load that search string into R.
The query has multiple operators and will behave in the following way:
Few other important parameters that Search Twitter accepts
# Run Twitter Search. Format is searchTwitter("Search Terms", n=100, lang="en", geocode="lat,lng", also accepts since and until). tweets