Authorize a Twitter Data request in R

In order to get data from Twitter you have to let them know that you are in fact authorized to do so. The first step in getting data from Twitter is to create an application with Twitter. The interested reader should explore the section “Where do I create an application?” under the FAQ for instructions on how to create a Twitter application. To authenticate a data request from Twitter we simply need to send the appropriate credentials to Twitter along with our request for data. The site dev.twitter.com provides useful step-by-step guides on how to authenticate data requests (see specifically the documents creating a signature, percent encoding parameters and authorizing a request), guides which we’ll explore here in the context of the R dialect.

Install and load the R packages RCurl, bitops, digest, ROAuth and RJSONIO.

## Install R packages
install.packages('RCurl')
install.packages('bitops')
install.packages('digest')
install.packages('ROAuth')
install.packages('RJSONIO')

## Load R packages
library('bitops')
library('digest')
library('RCurl')
library('ROAuth')
library('RJSONIO')

Access your Twitter application’s oauth settings (available under the OAuth tools tab on your application page) and save them to a data frame object.

oauth <- data.frame(consumerKey='YoUrCoNsUmErKeY',consumerSecret='YoUrCoNsUmErSeCrEt',accessToken='YoUrAcCeSsToKeN',accessTokenSecret='YoUrAcCeSsToKeNsEcReT')

Each time you request data from Twitter you will need to provide them with a unique, randomly generated string of alphanumeric characters.

string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate a random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters

Generate and save the current GMT system time in seconds. Each Twitter data request will want you to include the time in seconds at which the requests was (approximately) made.

timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))

Order the key/value pairs by the first letter of each key. In the current example you’ll notice that the labels to the left of the equal signs are situated in order of ascension. Once ordered, we percent encode the string to create a parameter string.

# Percent encode parameters 1
par1 <- '&resources=statuses'
par2 <- gsub(',','%2C',par1,perl=TRUE) # Percent encode par

# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste('oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)

The parameter string is then extended to include the HTTP method and the Twitter base URL so as to create a signature base string.

# Percent encode parameters 3
url1 <- 'https://api.twitter.com/1.1/application/rate_limit_status.json'
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 

# Create signature base string
signBaseString <- paste('GET','&',url3,'&',ps4,sep='') 

We then create a signing key.

signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')

The signature base string and the signing key are used to create an oauth signature.

osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)

kv <-data.frame(nonce=nonce[1],timestamp=timestamp,osign=osign644[1])

These results can than be passed to the function getURL() so as to download the desired Twitter data, such as status rate limits.

fromJSON(getURL(paste('https://api.twitter.com/1.1/application/rate_limit_status.json?resources=statuses&oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',sep='')))

Results
Screen Shot 2013-05-26 at 10.53.05 PM

To reduce repetition, I’ve wrapped the above code into an R function called keyValues.

keyValues <- function(httpmethod,baseurl,par1){	
# Generate a random string of letters and numbers
string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters

# Get the current GMT system time in seconds 
timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))

# Percent encode parameters 1
par2 <- gsub(',','%2C',par1,perl=TRUE) # Percent encode par

# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste('oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)

# Percent encode parameters 3
#url1 <- 'https://api.twitter.com/1.1/application/rate_limit_status.json'
url1 <- baseurl
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 

# Create signature base string
signBaseString <- paste(httpmethod,'&',url3,'&',ps4,sep='') 

# Create signing key
signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')

# oauth_signature
osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)

return(data.frame(hm=httpmethod,bu=baseurl,p=par1,nonce=nonce[1],timestamp=timestamp,osign=osign644[1]))
}

Twitter data requests now appear, on the user end of things, to need less code through the use of this function. Make sure to run the keyValues() function and the fromJSON() function within a few seconds of each others, or else Twitter won’t respect your data request.

## Check rate limits
# Parameter options: help, users, search, statuses
kv <- keyValues(httpmethod='GET',baseurl='https://api.twitter.com/1.1/application/rate_limit_status.json',par1='&resources=statuses')

fromJSON(getURL(paste(kv$bu,'?','oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',kv$p,sep='')))

Download Twitter Data using JSON in R

Here we consider the task of downloading Twitter data using the R software package RJSONIO.

Screen Shot 2013-05-25 at 6.02.01 PM

Before we can download Twitter data, we’ll need to prove to Twitter that we are in fact authorized to do so. I refer the interested reader to the post Twitter OAuth FAQ for instructions on how to setup an application with dev.twitter.com. Once we’ve setup an application with Twitter we can write some R code to communicate with Twitter about our application and get the data we want. Code from the post Authorize a Twitter Data request in R, specifically the keyValues() function, will be used in this post to handle our authentication needs when requesting data from Twitter.

## Install R packages
install.packages('bitops')
install.packages('digest')
install.packages('RCurl')
install.packages('ROAuth')
install.packages('RJSONIO')


## Load R packages
library('bitops')
library('digest')
library('RCurl')
library('ROAuth')
library('RJSONIO')
library('plyr')


## Set decimal precision
options(digits=22)


## OAuth application values
oauth <- data.frame(consumerKey='YoUrCoNsUmErKeY',consumerSecret='YoUrCoNsUmErSeCrEt',accessToken='YoUrAcCeSsToKeN',accessTokenSecret='YoUrAcCeSsToKeNsEcReT')

keyValues <- function(httpmethod,baseurl,par1a,par1b){  	
# Generate a random string of letters and numbers
string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters
 
# Get the current GMT system time in seconds 
timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))
 
# Percent encode parameters 1
#par1 <- '&resources=statuses'
par2a <- gsub(',','%2C',par1a,perl=TRUE) # Percent encode par
par2b <- gsub(',','%2C',par1b,perl=TRUE) # Percent encode par
 
# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste(par2a,'oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2b,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)
 
# Percent encode parameters 3
url1 <- baseurl
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 
 
# Create signature base string
signBaseString <- paste(httpmethod,'&',url3,'&',ps4,sep='') 
 
# Create signing key
signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')
 
# oauth_signature
osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)
 
return(data.frame(hm=httpmethod,bu=baseurl,p=paste(par1a,par1b,sep=''),nonce=nonce[1],timestamp=timestamp,osign=osign644[1]))
}

Next, we need to figure out what kind of Twitter data we want to download. The Twitter REST API v1.1 Resources site provides a useful outline of what kind of data we can get from Twitter. Just read what is written under the Description sections. As an example, let’s download some user tweets. To do this, we find and consult the specific Resource on the REST API v1.1 page that corresponds with the action we want, here GET statuses/user_timeline. The resource page lists and describes the download options available to the task of getting tweets from a specific user, the thing we want to do, so it’s worth it to the reader to check it out.

Here we download the 100 most recent tweets (and re-tweets) made by the user ‘Reuters’.

## Download user tweets
# Limited to latest 200 tweets
# Specify user name
user <- 'Reuters'
 
kv <- keyValues(httpmethod='GET',baseurl='https://api.twitter.com/1.1/statuses/user_timeline.json',par1a='count=100&include_rts=1&',par1b=paste('&screen_name=',user,sep=''))
 
theData1 <- fromJSON(getURL(paste(kv$bu,'?','oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0','&',kv$p,sep='')))

At this point in the post you should have the 100 most recent tweets made by the user ‘Reuters’ as well as values on several variables recorded by Twitter on each tweet. These are stored in a list data structure. You are now free to do list things to these data to explore what it is you have.

For instance, let’s see the tweets.

theData2 <- unlist(theData1)
names(theData2)
tweets <- theData2[names(theData2)=='text']

Accessing All the Curl Options under R

The native curl package in R, RCurl, provides an integrated set of tools for interacting with remote servers, to say the least. While it provides a number of useful functions, it still lacks a few sorely missed options (e.g., retry). Of course, it’s still possible to write some of these missing functions in R, which can then be used to expand the functionality of the RCurl package, but, on the other hand, it might just be easier to use the better maintained and fully functional curl program that comes with your computer. Under Mac OS X, the native curl program can be accessed in R using the command system().

For instance, we can serially download and save webpages (and retry the process if it fails) by using the following R syntax.

for(i in 1:n){
    system(paste("curl http://www.google.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=TRUE)
}

Similarly, we can use some simple R syntax to asynchronously download a number of webpages. For instance,

for(i in 1:n){
    system(paste("curl http://www.yahoo.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=FALSE)
}

If you’re just downloading webpages, it’s easy enough to use the native curl program that comes with your computer–just use the R command system(). In this way, you can download some pages with curl and then parse the information from them at a later time.