Parse all Numbers from a String using regular expressions in R

The following command replaces all non-digit characters in a string with empty values. The remaining characters, a string of digits, is returned.

gsub(pattern="[^0-9]",replacement="",x=c("Manhattan, KS 66502"))

Screen Shot 2012-12-23 at 5.26.44 PM

gsub(pattern="[^0-9]",replacement="",x=c("M1a2n3h4a5t6t7a8n9, KS 66502"))

Screen Shot 2012-12-23 at 5.28.06 PM

Accessing All the Curl Options under R

The native curl package in R, RCurl, provides an integrated set of tools for interacting with remote servers, to say the least. While it provides a number of useful functions, it still lacks a few sorely missed options (e.g., retry). Of course, it’s still possible to write some of these missing functions in R, which can then be used to expand the functionality of the RCurl package, but, on the other hand, it might just be easier to use the better maintained and fully functional curl program that comes with your computer. Under Mac OS X, the native curl program can be accessed in R using the command system().

For instance, we can serially download and save webpages (and retry the process if it fails) by using the following R syntax.

for(i in 1:n){
    system(paste("curl http://www.google.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=TRUE)
}

Similarly, we can use some simple R syntax to asynchronously download a number of webpages. For instance,

for(i in 1:n){
    system(paste("curl http://www.yahoo.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=FALSE)
}

If you’re just downloading webpages, it’s easy enough to use the native curl program that comes with your computer–just use the R command system(). In this way, you can download some pages with curl and then parse the information from them at a later time.

Workflow with ESS, Knitr and R

With literate programming we can now embed R code into our working LaTeX documents. Literate programming or reproducible research is often attributed to Donald Knuth, more information of which can be found on the literate programming website. This means that minor updates in code no longer require hours of copying and pasting output into our working document.

Of course Yi Hui’s knitr site gives you basically everything you need to get started. To setup knitr in R, install and load it as you would any other R package install.packages("knitr") and library("knitr").

To setup Emacs so that knitr can be used with ESS, follow the fine instructions provided by Simon Potter and the blog constantMindMapping.

The only real tip that I have to add is how to compile the *.Rnw file in Emacs. This can be done as shown on Simon Potter’s site by using the command M-n r followed by the command, as shown on Yi Hui’s github discussion, “knitr support in other editors (Emacs/ESS/Vim/Eclipse…),” M-n P. Note that this is a capital ‘P’.

The following documents were updated and compiled within seconds of each other! Given a more complicated set of statistical analyses this kind of work could have taken hours and resulted severe eyestrain or headache.

knitrExample

Map Attribute Data in R using GGPLOT2

Click IrishRepublicCounties for PDF version of the above above map

In this post I map some attribute data using the R software package GGPLOT2. The code primarily comes from James Cheshire excellent GIS site, Spatial Analysis. This site offers many resources, including a number of step-by-step tutorials on introductory and advanced GIS mapping in R. A couple tutorials I found useful describe how to map attribute data in R. His code is concise and well written, but doesn’t explicitly describe how to map observations made on a continuous attribute variable at discrete levels. This is something standard with ArcGIS software and probably something people want to do, or, at least, are use to doing. Think of this code as a mix between Cheshire’s posts Creating a Map with R and Creating a map with R using ggplot2.

In the following code I essentially take the observed values on continuous attribute variable, break them into intervals and plot them on a map of Irish Republic counties. The continuous variable used here was broken into five categories, though the code is modifiable enough to accommodate different numbers of break points. You’ll need to use the R packages maptools, ggplot2, RColorBrewer and classInt the for this one.

# Identify and save 5 lower and 1 upper break values
# i.e. [),[),[),[),[]
brks <- classIntervals(contVariable,n=5,style="quantile")    
   
# Save and extract break values       												
# Round values to two decimals
brks <- round(brks$brks,digits=2) 						

# Save categorical break value for each observed values of the continuous variable
catVariable <- findInterval(contVariable,brks,all.inside=TRUE)

# Add attribute data to SpatialPolygonsDataFrame shapefile object
counties <- spCbind(counties,catVariable)

# Make SpatialPolygonsDataFrame shapefile object compatible with GGPLOT2
# The function poly_coord_function.r is available through Cheshire's "Creating a map with R using ggplot2" post
names(counties)[1] <- "ID"
source("~/poly_coords_function.r")
counties_geom <- poly_coords(counties)

# Plot and save map
map <- qplot(PolyCoordsY,PolyCoordsX,data=counties_geom,group=Poly_Name,fill=catVarible, geom="polygon")

# Create labels from break values
intLabels <- matrix(1:(length(brks)-1))
for(i in 1:length(intLabels )){intLabels [i] <- paste(as.character(brks[i]),"-",as.character(brks[i+1]))}

# Re-Map data
# Include a categorical legend
# Add the continuous break point label to legend				
map + scale_fill_gradientn(colours=brewer.pal(5, "Set2"),guide="legend",label=intLabels,name="contVariable name",min(contVariable),max(contVariable))

Add Attribute Data to object of class SpatialPolygonsDataFrame in R

Here we discuss the R software package maptools, which lets us add attribute data to an object of class SpatialPolygonsDataFrame.

We can import GIS data, if stored as a shapefile, using the command gisData <- readShapePoly("NameOfShapeFile.shp").

We know it’s an object type SpatialPolygonsDataFrame from the output given by entering the command class(gisData).

We can see the attributes of this data object by following the name of the object by an @ symbol and the word data, i.e. gisData@data. The output to this command should look like a typical data set from any introductory discussion of statistics, with variables along columns and subjects across rows. In fact, we can save information from this data object for later manipulation using a command such as gDVN <- gisData@data$VariableName.

At this point modifying the attribute data of a GIS object is easy. For instance, if you wanted to change values of some variable from counts to percentages you could divide each count value of each subject on the variable of interest by the total number of counts. More concretely, we could divide the number of people in each area by the total population of the entire study region. This can be done using a command such as gisData@data$VariableName/n, where n is some number, possible the total population size.

But what if we wanted to add a new attribute to the SpatialPolygonsDataFrame object? This can be done using the function spCbind from the R software package maptools. Something like gisData <- spCbind(gisData,1:nrow(gisData)), where nrow(gisData) is the number of rows in the attribute data matrix of your GIS object, should add a column of values, ranging from 1 to nrow(gisData) to your GIS data object attribute matrix.

As always, for more information consult the help files, in this case, by entering the command ?spCbind into an active R session.

Import *.csv Adjacency Matrix with Row and Column Names into R

Here is some quick and dirty code for entering an adjacency matrix into R. These data were originally manipulated in Excel 2010 and saved as a comma delimitated *.csv file. The original file had sorted actor names down the first column and the same names along the first row. No value was present in the upper lefthand cell of the original data matrix.

# Load network data
# Expected format: adjacency matrix with corresponding row and column names
# Expected file type: *.csv
year1<-read.csv("networkData.csv",header=FALSE,stringsAsFactors=FALSE,sep=",")
rNames<-year1[-1,1]                                   # Get row names
cNames<-as.vector(as.character(year1[1,-1]))          # Get column names
year1<-apply(as.matrix(year1[-1,-1]),2,as.numeric)    # Get network matrix
year1[is.na(year1)]<-0                                # Set missing ties to 0
row.names(year1)<-rNames                              # Give row names
colnames(year1)<-cNames                               # Give column names

CONvergence of iterated CORrelations (CONCOR) R function

KNOKE BUREAUCRACIES network data used here were obtained through UCINET software.

For an excellent method tutorial on blockmodeling with UCINET see the file Blockmodels.doc written by David Knoke and hosted on his SOC 8412 homepage.

# Stack matricies and matrix transposes
KNOKI <- read.table("KNOKI.txt",header=FALSE,sep="")
KNOKM <- read.table("KNOKM.txt",header=FALSE,sep="")
KNOKIT <- t(KNOKI)
KNOKMT <- t(KNOKM)
KNOK <- rbind(KNOKI,KNOKIT,KNOKM,KNOKMT)

# CONCOR function
# CONvergence of iterated CORrelations
# Creates a square matrix of correlations of the column pairs of a matrix
CONCOR <- function(mat){
    colN <- ncol(mat)
    X <- matrix(rep(0,times=colN*colN),nrow=colN,ncol=colN)
    for(i in 1:colN){
        for(j in i:colN){
            X[i,j] <- cor(mat[,i],mat[,j],method=c("pearson"))
        }
    }
    X <- X+(t(X)-diag(diag((X))))
return(X)
}

KNOKSIM <- CONCOR(KNOK)

Making that Simple Red Sauce

The following tomato sauce recipe is an adaptation from the Goerl Family Menu, a delightful and witty ongoing food adventure, an exploration in the meaning and method of taste. Eat this dish knowing that those who made it before you cared about what they were doing and, at every step of the journey, sought to achieve a truly delightful experience.

Ingredients

  • 3 to 4 cloves of garlic, pressed
  • 1 x 6 oz can tomato paste
  • 2 x 28 oz can whole peeled tomatoes, peeled, destemmed and pureed
  • 1 x large white onion, diced
  • 3 x tablespoon olive oil
  • s and p with command

Short directions

  • Do not burn the onions
  • Do not burn the garlic
  • Do not burn the olive oil
  • Simmer ingredients no less than 8 hours

Long directions

  • Add olive oil into a large pot and set heat to medium. Do not burn the olive oil.
  • Add onions to the olive oil and set the temperature to simmer.
  • Stir until the onions reaches a golden orange-brown color. I suggest repeatedly stirring the onions until they reach an appetizing color, that is, if you’re worried about burning the onions. Do not burn the onions.
  • Press garlic and, half way though cooking the onions, say when the onions reach an early level of caramelization, add them to the onions.
  • Stir the mixture until the onions reach the right color and the garlic appears relaxed. Do no burn the garlic.
  • Create a well among the garlic and onions and add the tomato paste. Stir the tomato paste around the center of the pan, bringing it up to temperature, and then slowly mix in the onions and garlic. (Tomato paste instructions come in part from Gordon Ramsay’s Bolognaise recipe on Cookalong Live).
  • Add pureed tomatoes and an equal amount of water. Turn heat up to high.
  • As soon as the mixture begins to boil, set heat to simmer.
  • Stir occasionally and add water as necessary over the next eight hours.
  • Add salt and pepper with command after the first reduction.
  • Serve over pasta and enjoy!

Pictures

Manhattan Farmers Market: Sunshine Baking Collective

Sunshine Baking Collective (SBC) brings wholesome, freshly baked goods to the Manhattan Kansas Downtown Farmers Market. On most Saturdays, from 8:30 am to 1:00 pm, you can find an assortment of locally produced oven treats ranging from French-style breads to vegan cinnamon rolls, guaranteed to excite and satisfy the palette, or, as they say, they’ll try better next time.  SBC provides a unique opportunity for the Manhattan community to sample new flavors, enjoy classic favorites, and eat a little better.  If some Saturday you find yourself on the corner of 5th and Humboldt, why not stop by Sunshine Baking Collective and treat yourself to something truly worth eating.

Dates attended and items sold by SBC at the Downtown farmers market:

05/14/2011: French-style Batards, Zesty Chocolate Cupcakes, Cinnamon Rolls, Chocolate Chip Cookies, and Peanut Butter Cookies.

05/21/2011: Raisin Challahs, French-style Boules and Batards, Peanut Butter Chocolate Cupcakes, Biscotti, English Muffins and Cowboy Cookies.

06/04/2011: French-style Batards and Chocolate Chunk Peanut Butter Cookies.

06/18/2011: Demi-baguettes, Pecan Cashew Granola, Carrot Cake Cookies, Blueberry Gingerbread Chai Bars, English Muffins, Bagels, Bran Muffins, and Black Coffee.

07/02/2011: Chocolate Orange Apricot Croissants, Plain Croissants, Pecan Cashew Granola, Zucchini Carrot Quick Bread, and Oatmeal Cinnamon Raisin Cookies.

SBC in action!

This slideshow requires JavaScript.

Want to know more about the Manhattan, Kansas Downtown Farmers Market? Checkout the following informative links.

Read the official Downtown Farmers Market website [click here]

Listen to The Community Bridge broadcast (29:30) about this year’s farmers market [click here]