Workflow with ESS, Knitr and R

With literate programming we can now embed R code into our working LaTeX documents. Literate programming or reproducible research is often attributed to Donald Knuth, more information of which can be found on the literate programming website. This means that minor updates in code no longer require hours of copying and pasting output into our working document.

Of course Yi Hui’s knitr site gives you basically everything you need to get started. To setup knitr in R, install and load it as you would any other R package install.packages("knitr") and library("knitr").

To setup Emacs so that knitr can be used with ESS, follow the fine instructions provided by Simon Potter and the blog constantMindMapping.

The only real tip that I have to add is how to compile the *.Rnw file in Emacs. This can be done as shown on Simon Potter’s site by using the command M-n r followed by the command, as shown on Yi Hui’s github discussion, “knitr support in other editors (Emacs/ESS/Vim/Eclipse…),” M-n P. Note that this is a capital ‘P’.

The following documents were updated and compiled within seconds of each other! Given a more complicated set of statistical analyses this kind of work could have taken hours and resulted severe eyestrain or headache.



Map Attribute Data in R using GGPLOT2

Click IrishRepublicCounties for PDF version of the above above map

In this post I map some attribute data using the R software package GGPLOT2. The code primarily comes from James Cheshire excellent GIS site, Spatial Analysis. This site offers many resources, including a number of step-by-step tutorials on introductory and advanced GIS mapping in R. A couple tutorials I found useful describe how to map attribute data in R. His code is concise and well written, but doesn’t explicitly describe how to map observations made on a continuous attribute variable at discrete levels. This is something standard with ArcGIS software and probably something people want to do, or, at least, are use to doing. Think of this code as a mix between Cheshire’s posts Creating a Map with R and Creating a map with R using ggplot2.

In the following code I essentially take the observed values on continuous attribute variable, break them into intervals and plot them on a map of Irish Republic counties. The continuous variable used here was broken into five categories, though the code is modifiable enough to accommodate different numbers of break points. You’ll need to use the R packages maptools, ggplot2, RColorBrewer and classInt the for this one.

# Identify and save 5 lower and 1 upper break values
# i.e. [),[),[),[),[]
brks <- classIntervals(contVariable,n=5,style="quantile")    
# Save and extract break values       												
# Round values to two decimals
brks <- round(brks$brks,digits=2) 						

# Save categorical break value for each observed values of the continuous variable
catVariable <- findInterval(contVariable,brks,all.inside=TRUE)

# Add attribute data to SpatialPolygonsDataFrame shapefile object
counties <- spCbind(counties,catVariable)

# Make SpatialPolygonsDataFrame shapefile object compatible with GGPLOT2
# The function poly_coord_function.r is available through Cheshire's "Creating a map with R using ggplot2" post
names(counties)[1] <- "ID"
counties_geom <- poly_coords(counties)

# Plot and save map
map <- qplot(PolyCoordsY,PolyCoordsX,data=counties_geom,group=Poly_Name,fill=catVarible, geom="polygon")

# Create labels from break values
intLabels <- matrix(1:(length(brks)-1))
for(i in 1:length(intLabels )){intLabels [i] <- paste(as.character(brks[i]),"-",as.character(brks[i+1]))}

# Re-Map data
# Include a categorical legend
# Add the continuous break point label to legend				
map + scale_fill_gradientn(colours=brewer.pal(5, "Set2"),guide="legend",label=intLabels,name="contVariable name",min(contVariable),max(contVariable))

Add Attribute Data to object of class SpatialPolygonsDataFrame in R

Here we discuss the R software package maptools, which lets us add attribute data to an object of class SpatialPolygonsDataFrame.

We can import GIS data, if stored as a shapefile, using the command gisData <- readShapePoly("NameOfShapeFile.shp").

We know it’s an object type SpatialPolygonsDataFrame from the output given by entering the command class(gisData).

We can see the attributes of this data object by following the name of the object by an @ symbol and the word data, i.e. gisData@data. The output to this command should look like a typical data set from any introductory discussion of statistics, with variables along columns and subjects across rows. In fact, we can save information from this data object for later manipulation using a command such as gDVN <- gisData@data$VariableName.

At this point modifying the attribute data of a GIS object is easy. For instance, if you wanted to change values of some variable from counts to percentages you could divide each count value of each subject on the variable of interest by the total number of counts. More concretely, we could divide the number of people in each area by the total population of the entire study region. This can be done using a command such as gisData@data$VariableName/n, where n is some number, possible the total population size.

But what if we wanted to add a new attribute to the SpatialPolygonsDataFrame object? This can be done using the function spCbind from the R software package maptools. Something like gisData <- spCbind(gisData,1:nrow(gisData)), where nrow(gisData) is the number of rows in the attribute data matrix of your GIS object, should add a column of values, ranging from 1 to nrow(gisData) to your GIS data object attribute matrix.

As always, for more information consult the help files, in this case, by entering the command ?spCbind into an active R session.

Import *.csv Adjacency Matrix with Row and Column Names into R

Here is some quick and dirty code for entering an adjacency matrix into R. These data were originally manipulated in Excel 2010 and saved as a comma delimitated *.csv file. The original file had sorted actor names down the first column and the same names along the first row. No value was present in the upper lefthand cell of the original data matrix.

# Load network data
# Expected format: adjacency matrix with corresponding row and column names
# Expected file type: *.csv
rNames<-year1[-1,1]                                   # Get row names
cNames<-as.vector(as.character(year1[1,-1]))          # Get column names
year1<-apply(as.matrix(year1[-1,-1]),2,as.numeric)    # Get network matrix
year1[]<-0                                # Set missing ties to 0
row.names(year1)<-rNames                              # Give row names
colnames(year1)<-cNames                               # Give column names

CONvergence of iterated CORrelations (CONCOR) R function

KNOKE BUREAUCRACIES network data used here were obtained through UCINET software.

For an excellent method tutorial on blockmodeling with UCINET see the file Blockmodels.doc written by David Knoke and hosted on his SOC 8412 homepage.

# Stack matricies and matrix transposes
KNOKI <- read.table("KNOKI.txt",header=FALSE,sep="")
KNOKM <- read.table("KNOKM.txt",header=FALSE,sep="")

# CONCOR function
# CONvergence of iterated CORrelations
# Creates a square matrix of correlations of the column pairs of a matrix
CONCOR <- function(mat){
    colN <- ncol(mat)
    X <- matrix(rep(0,times=colN*colN),nrow=colN,ncol=colN)
    for(i in 1:colN){
        for(j in i:colN){
            X[i,j] <- cor(mat[,i],mat[,j],method=c("pearson"))
    X <- X+(t(X)-diag(diag((X))))


Stats Plushies

My friend Nichole has been doing her part to make statistics a family friendly activity.  She sells stuffed distributions, embroidered bibs, and printed shirts guaranteed to make your child more statsy (looking).  Right now she’s running a contest on her etsy site for a chance to win a smiling statistical distribution.

A Chi-Squared distribution to warm your hearts and minds.