Plotting Twitter Networks in R


My friends and my friends’ friends on Twitter. Can you guess which cluster is associated with the National Science Foundation (@NSF)?


How to plot a network subgraph on a network graph using R

Here is an example of how to highlight the members of a subgraph on a plot of a network graph.

## Load R libraries

# Set adjacency matrix
g <- matrix(c(0,1,1,1, 1,0,1,0, 1,1,0,1, 0,0,1,0),nrow=4,ncol=4,byrow=TRUE)

# Set adjacency matrix to graph object
g <- graph.adjacency(g,mode="directed")

# Add node attribute label and name values
V(g)$name <- c("n1","n2","n3","n4")

# Set subgraph members
c <- c("n1","n2","n3")

# Add edge attribute id values
E(g)$id <- seq(ecount(g))

# Extract supgraph
ccsg <- induced.subgraph(graph=g,vids=c)

# Extract edge attribute id values of subgraph
ccsgId <- E(ccsg)$id

# Set graph and subgraph edge and node colors and sizes
E(g)[ccsgId]$color <- "#DC143C" # Crimson
E(g)[ccsgId]$width <- 2
V(g)$size <- 4
V(g)$color="#00FFFF" # Cyan
V(g)$label.color="#00FFFF" # Cyan
V(g)$label.cex <-1.5
V(g)[c]$label.color <- "#DC143C" # Crimson
V(g)[c]$color <- "#DC143C" # Crimson

# Set seed value

# Set layout options
l <- layout.fruchterman.reingold(g)

# Plot graph and subgraph

Simple, no?

Return all Column Names that End with a Specified Character using regular expressions in R

With the R functions grep() and names(), you can identify the columns of a matrix that meet some specified criteria.

Say we have the following matrix,


Screen Shot 2012-12-23 at 5.19.37 PM

To return only those columns that end with a character (e.g., the number 1) submit the R command grep(pattern=".[1]",x=names(x),value=TRUE) into the console. 

Screen Shot 2012-12-23 at 5.19.28 PM

Accessing All the Curl Options under R

The native curl package in R, RCurl, provides an integrated set of tools for interacting with remote servers, to say the least. While it provides a number of useful functions, it still lacks a few sorely missed options (e.g., retry). Of course, it’s still possible to write some of these missing functions in R, which can then be used to expand the functionality of the RCurl package, but, on the other hand, it might just be easier to use the better maintained and fully functional curl program that comes with your computer. Under Mac OS X, the native curl program can be accessed in R using the command system().

For instance, we can serially download and save webpages (and retry the process if it fails) by using the following R syntax.

for(i in 1:n){
    system(paste("curl --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=TRUE)

Similarly, we can use some simple R syntax to asynchronously download a number of webpages. For instance,

for(i in 1:n){
    system(paste("curl --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=FALSE)

If you’re just downloading webpages, it’s easy enough to use the native curl program that comes with your computer–just use the R command system(). In this way, you can download some pages with curl and then parse the information from them at a later time.

Workflow with ESS, Knitr and R

With literate programming we can now embed R code into our working LaTeX documents. Literate programming or reproducible research is often attributed to Donald Knuth, more information of which can be found on the literate programming website. This means that minor updates in code no longer require hours of copying and pasting output into our working document.

Of course Yi Hui’s knitr site gives you basically everything you need to get started. To setup knitr in R, install and load it as you would any other R package install.packages("knitr") and library("knitr").

To setup Emacs so that knitr can be used with ESS, follow the fine instructions provided by Simon Potter and the blog constantMindMapping.

The only real tip that I have to add is how to compile the *.Rnw file in Emacs. This can be done as shown on Simon Potter’s site by using the command M-n r followed by the command, as shown on Yi Hui’s github discussion, “knitr support in other editors (Emacs/ESS/Vim/Eclipse…),” M-n P. Note that this is a capital ‘P’.

The following documents were updated and compiled within seconds of each other! Given a more complicated set of statistical analyses this kind of work could have taken hours and resulted severe eyestrain or headache.