Download Twitter Data using JSON in R

Here we consider the task of downloading Twitter data using the R software package RJSONIO.

Screen Shot 2013-05-25 at 6.02.01 PM

Before we can download Twitter data, we’ll need to prove to Twitter that we are in fact authorized to do so. I refer the interested reader to the post Twitter OAuth FAQ for instructions on how to setup an application with Once we’ve setup an application with Twitter we can write some R code to communicate with Twitter about our application and get the data we want. Code from the post Authorize a Twitter Data request in R, specifically the keyValues() function, will be used in this post to handle our authentication needs when requesting data from Twitter.

## Install R packages

## Load R packages

## Set decimal precision

## OAuth application values
oauth <- data.frame(consumerKey='YoUrCoNsUmErKeY',consumerSecret='YoUrCoNsUmErSeCrEt',accessToken='YoUrAcCeSsToKeN',accessTokenSecret='YoUrAcCeSsToKeNsEcReT')

keyValues <- function(httpmethod,baseurl,par1a,par1b){  	
# Generate a random string of letters and numbers
string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters
# Get the current GMT system time in seconds 
timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))
# Percent encode parameters 1
#par1 <- '&resources=statuses'
par2a <- gsub(',','%2C',par1a,perl=TRUE) # Percent encode par
par2b <- gsub(',','%2C',par1b,perl=TRUE) # Percent encode par
# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste(par2a,'oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2b,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)
# Percent encode parameters 3
url1 <- baseurl
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 
# Create signature base string
signBaseString <- paste(httpmethod,'&',url3,'&',ps4,sep='') 
# Create signing key
signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')
# oauth_signature
osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)

Next, we need to figure out what kind of Twitter data we want to download. The Twitter REST API v1.1 Resources site provides a useful outline of what kind of data we can get from Twitter. Just read what is written under the Description sections. As an example, let’s download some user tweets. To do this, we find and consult the specific Resource on the REST API v1.1 page that corresponds with the action we want, here GET statuses/user_timeline. The resource page lists and describes the download options available to the task of getting tweets from a specific user, the thing we want to do, so it’s worth it to the reader to check it out.

Here we download the 100 most recent tweets (and re-tweets) made by the user ‘Reuters’.

## Download user tweets
# Limited to latest 200 tweets
# Specify user name
user <- 'Reuters'
kv <- keyValues(httpmethod='GET',baseurl='',par1a='count=100&include_rts=1&',par1b=paste('&screen_name=',user,sep=''))
theData1 <- fromJSON(getURL(paste(kv$bu,'?','oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0','&',kv$p,sep='')))

At this point in the post you should have the 100 most recent tweets made by the user ‘Reuters’ as well as values on several variables recorded by Twitter on each tweet. These are stored in a list data structure. You are now free to do list things to these data to explore what it is you have.

For instance, let’s see the tweets.

theData2 <- unlist(theData1)
tweets <- theData2[names(theData2)=='text']

Periodically Run an R Script as a Background Process using launchd under OSX


Computers are great at doing repetitive things a lot, so why deprive them of doing what they do best by manually re-running the same code every night? Here we create a simple bash script to execute an R script and define a *.plist so that launchd, under OSX, can run it periodically. The *.plist code given here is configured to run a shell script every day at 8:00 PM.

R script

xBar <- mean(c(1,2,1,21,2,3,2))

Save code as rCode.R to the ~/ directory.

Bash shell script

/usr/bin/Rscript ~/rCode.R

Save code as file to the ~/ directory.

Execute chmod +x in the terminal to make this file runnable.

*.plist file

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "">
<plist version="1.0">

Save code as com.rTask.plist file under path ~/Library/LaunchAgents.

Run the command launchctl load ~/Library/LaunchAgents/com.rTask.plist.

In this way, we should now be able to periodically run an R script automatically. Fun, huh? For a more lengthy description of how to do this and how it all works see the Creating Launch Daemons and Agents section of the Daemons and Services Programming Guide under the Mac Developer Library at the website.

How to plot a network subgraph on a network graph using R

Here is an example of how to highlight the members of a subgraph on a plot of a network graph.

## Load R libraries

# Set adjacency matrix
g <- matrix(c(0,1,1,1, 1,0,1,0, 1,1,0,1, 0,0,1,0),nrow=4,ncol=4,byrow=TRUE)

# Set adjacency matrix to graph object
g <- graph.adjacency(g,mode="directed")

# Add node attribute label and name values
V(g)$name <- c("n1","n2","n3","n4")

# Set subgraph members
c <- c("n1","n2","n3")

# Add edge attribute id values
E(g)$id <- seq(ecount(g))

# Extract supgraph
ccsg <- induced.subgraph(graph=g,vids=c)

# Extract edge attribute id values of subgraph
ccsgId <- E(ccsg)$id

# Set graph and subgraph edge and node colors and sizes
E(g)[ccsgId]$color <- "#DC143C" # Crimson
E(g)[ccsgId]$width <- 2
V(g)$size <- 4
V(g)$color="#00FFFF" # Cyan
V(g)$label.color="#00FFFF" # Cyan
V(g)$label.cex <-1.5
V(g)[c]$label.color <- "#DC143C" # Crimson
V(g)[c]$color <- "#DC143C" # Crimson

# Set seed value

# Set layout options
l <- layout.fruchterman.reingold(g)

# Plot graph and subgraph

Simple, no?

Return all Column Names that End with a Specified Character using regular expressions in R

With the R functions grep() and names(), you can identify the columns of a matrix that meet some specified criteria.

Say we have the following matrix,


Screen Shot 2012-12-23 at 5.19.37 PM

To return only those columns that end with a character (e.g., the number 1) submit the R command grep(pattern=".[1]",x=names(x),value=TRUE) into the console. 

Screen Shot 2012-12-23 at 5.19.28 PM