How to extract a network subgraph using R

In a previous post I wrote about highlighting a subgraph of a larger network graph. In response to this post, I was asked how extract a subgraph from a larger graph while retaining all essential characteristics among the extracted nodes.

Vinay wrote:

Dear Will,
The code is well written and only highlights the members of a subgraph. I need to fetch them out from the main graph as a separate subgraph (including nodes and edges). Any suggestions please.

Thanks.

Extract subgraph
For a given list of subgraph members, we can extract their essential characteristics (i.e., tie structure and attributes) from a larger graph using the iGraph function induced.subgraph(). For instance,

library(igraph)                   # Load R packages

set.seed(654654)                  # Set seed value, for reproducibility
g <- graph.ring(10)               # Generate random graph object
E(g)$label <- runif(10,0,1)       # Add an edge attribute

# Plot graph
png('graph.png')
par(mar=c(0,0,0,0))
plot.igraph(g)
dev.off()

g2 <- induced.subgraph(g, 1:7)    # Extract subgraph

# Plot subgraph
png('subgraph.png')
par(mar=c(0,0,0,0))
plot.igraph(g2)
dev.off()

Graph
graph

Subgraph
subgraph

Managing Content with Nodes and Links: Why I won’t use NVivo 10 to Prepare for My Preliminary Exam

The preliminary exam in almost any graduate program requires the organization of a tremendous amount of reading material. One of the preliminary exams I’m taking, the one in social science research methods, for instance, requires a familiarity with over 40 unique sources, spanning 18 distinct topics. That’s a reading list of over 4 pages in length! Furthermore, the exam requires that I write three ten page papers over the course of three consecutive days regarding three unknown methodology questions drawing from potentially any and all of the materials on the reading list. It’s a lot to read and a lot to recall. To organize the information in these reading materials, and speed up my ability to recall topics, quotes and my own notes on them, I tried using a qualitative data analysis (QDA) software system, a preparation strategy which I really wish had worked better.

Generally speaking, QDA software systems allow researchers to organize qualitative data. These suites allow researchers to select content across a number of documents and classify all their selections under different themes. In a slightly more technically way, content across many documents can be selected by a researcher using QDA software, content which they can then in turn associate with different nodes (i.e., themes), both to which they can apply numerous annotations. Content and themes created in this manner can then be relabeled and nested (or unnested) based on the sense-making of the person doing the research. Content selected in this way is recalled by simply double-clicking on the node associated with it. The result of all this work is a spidery network of content, which, as an organization method, offers some attractive qualities.

Organizing content, themes, and annotations by nodes and links is potentially a convenient, timesaving data organization strategy.

  • No longer must researchers copy and paste important quotes from their documents into separate files. Instead they can work directly on their documents and tag, with a flick of the mouse, whatever they think is important.
  • No longer must researchers work with long note outlines. Content is important, of course, but, when trying to make sense of a large collection of identified themes, content items can at times get in the way. Nodes and content are generally shown in QDA packages using separate windows, which simplifies the outline and allows for easier theme management. In this way a researcher can spend more time thinking about how their themes relate to one another and only look at quotes and annotations when they actually need them.
  • No longer must researchers keep track of page numbers. Each content item is tied to the original page within the document it was found. Page numbers, in this way, need only be written out by a researcher when they themselves are actually ready to write about the content referenced. Front loading page numbers is a lot of work and needless work when the content items identified do not make it into the working document.

QDA software packages are a promising way in which researchers can spend more time reading and thinking about their content than explicitly managing it.

To investigate QDA, I looked into QSRI’s software suite NVivo 10. Plenty of great tutorials exist on how to use NVivo 10, put out both by QSRI and members of the NVivo community. For this reason, I’m going to spend more time talking about what I didn’t like about NVivo 10 than how to specifically do certain things with it. Suggestions are also provided as to how NVivo 10 could potentially be made more effective.

Node Matrix Column Width Adjustments Change Other Column Widths

Before
nvivio10UINodeMatrixColWidth1

After
nvivio10UINodeMatrixColWidth2

  • Resizing the column ‘Created On’ resized all other columns as well, most notably the ‘References’ column. These others columns themselves now need to be corrected, which requires the user to do extra work. This needs to be fixed.

Node Column Names Misalign after Adjusting Column Widths in a Narrow UI Frame

Before
nvivio10UINodeMatrixColWidthNarrow1

After
nvivio10UINodeMatrixColWidthNarrow2

  • Resizing the column ‘Name’ misaligned the column names of the node matrix. Because of this, tracking columns names now falls to the user, which is work, extra work they might prefer not to do. This needs to be fixed.

Can’t Easily Retrieve Content From a PDF File

Select and Associate Content with a Node
Screen Shot 2013-08-06 at 7.38.19 PM

Open Node Frame
Screen Shot 2013-08-06 at 7.40.28 PM

  • Content selected by ‘Region’ isn’t shown when opening an ‘Open Node…’ frame. Instead, the region coordinates of the selected content are shown. The point of retrieving content is to actually get the content and not a list of instructions as to where it is the content is located. Content retrieval executed in this way actually passes the burden of content retrieval to the user.

View Selected Content

Screen Shot 2013-08-06 at 7.56.18 PM

  • Selected content can be viewed in the ‘Open Node…’ frame under the ‘PDF’ tab. Even though the unselected part of the document is masked, the content is only found by scrolling around the document itself. This has the benefit of connecting the content with the page it’s on, but getting the content still requires the user to do the work. Is it possible, upon opening an ‘Open Node…’ frame, to generate and display an image of the region selected instead of the coordinates at which it is located? Or, alternatively, is it possible to make available the tools usually available when working with image files when working with PDF files?

Can’t ‘Insert Row’ Content with a PDF File

  • PDF files are not treated the same way as image files, though working with PDF files might be easier if they were.

Working with an Image File

Screen Shot 2013-08-06 at 9.56.35 PM

  • When working with an image file, selected regions can be inserted into a table using the ‘Insert Row’ option. Doing this allows a user to more or less overlap a comment with a selected image region, a comment which can then be connected to a node and recalled as text when needed. In this way, a scanned document can be coded, which, in a round about way, can include a PDF file, when the PDF file itself has first been exported to collection of image files. This conversion process, however, is a lot of work when working with multiple PDF files and when the PDF files themselves each contain multiple pages. Is it possible to include a bulk file conversion function with the software suite?

Annotations Not Displayed In-Line With Retrieved Content

Screen Shot 2013-08-06 at 10.05.18 PM

  • Content annotations are retrieved and shown when opening an ‘Open Node…” frame. However, separating annotations from the content items to which they refer requires the user to do added work when transferring their content to a working document. Instead of copying and pasting at once all retrieved content from an ‘Open Node…’ frame to a working document, the user is required to intersperse, in some manual piece-by-piece way, their annotations among the content items they transferred. It falls to the user to copy an annotation, search a working document for the annotation’s associated content item, and paste it into place. Why not give users the option, when viewing a node in an ‘Open Node…’ frame, to have their annotations inserted in-line with the content items themselves?

Can’t Quickly Unnode Content

  • Removing a selected content item from a node requires the user to first find the original content item they selected. This isn’t so bad, since the software mostly keeps track of this through nodes, but it still requires the user to find and select the item they originally selected, which can require extra work on behalf of the user. Why not let users deselect content from nodes through the ‘Open Node…’ frame?

Have to Readjust the UI for Every Document Opened

Adjusted Workspace
nvivo10UI1

New Workspace
nvivo10UI2

  • Every time a user opens a document they must adjust the UI so as to work with it. With a lot of files, this means doing the following sequence of steps repeatedly: ‘Click to edit’, move and readjust the frame, readjust, when working with images, the region-content table, zoom in on the file viewed, reselect ‘Nodes’ from the navigation pane. This is too much work. Why not make newly opened documents default to the last configuration specified by the user? Or, alternatively, why not implement a tab system of sorts where files can be opened into the user adjusted workspace?

To be fair, NVivo 10 does everything I want a QDA application todo. It lets me select content, associate content with different nodes, nest and unnest nodes, modify node labels, annotate content and nodes, and, most importantly, recall everything done with simply the click of a mouse. However, NVivo 10 falls short regarding content selection options and UI design, which in turn create extra work for the user. One scanned PDF document, for instance, can, when converted to a collection of images, require a user to manage about thirty separate files, files which require the user to do a lot of unnecessary software fidgeting. This fidgeting with NVivo adds up fast and quickly outweighs the productivity gains had in using the software. At this point in its development, I don’t recommended NVivo 10 to most other graduate students looking for an effective means by which to better manage their preliminary exam materials.

Authorize a Twitter Data request in R

In order to get data from Twitter you have to let them know that you are in fact authorized to do so. The first step in getting data from Twitter is to create an application with Twitter. The interested reader should explore the section “Where do I create an application?” under the FAQ for instructions on how to create a Twitter application. To authenticate a data request from Twitter we simply need to send the appropriate credentials to Twitter along with our request for data. The site dev.twitter.com provides useful step-by-step guides on how to authenticate data requests (see specifically the documents creating a signature, percent encoding parameters and authorizing a request), guides which we’ll explore here in the context of the R dialect.

Install and load the R packages RCurl, bitops, digest, ROAuth and RJSONIO.

## Install R packages
install.packages('RCurl')
install.packages('bitops')
install.packages('digest')
install.packages('ROAuth')
install.packages('RJSONIO')

## Load R packages
library('bitops')
library('digest')
library('RCurl')
library('ROAuth')
library('RJSONIO')

Access your Twitter application’s oauth settings (available under the OAuth tools tab on your application page) and save them to a data frame object.

oauth <- data.frame(consumerKey='YoUrCoNsUmErKeY',consumerSecret='YoUrCoNsUmErSeCrEt',accessToken='YoUrAcCeSsToKeN',accessTokenSecret='YoUrAcCeSsToKeNsEcReT')

Each time you request data from Twitter you will need to provide them with a unique, randomly generated string of alphanumeric characters.

string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate a random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters

Generate and save the current GMT system time in seconds. Each Twitter data request will want you to include the time in seconds at which the requests was (approximately) made.

timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))

Order the key/value pairs by the first letter of each key. In the current example you’ll notice that the labels to the left of the equal signs are situated in order of ascension. Once ordered, we percent encode the string to create a parameter string.

# Percent encode parameters 1
par1 <- '&resources=statuses'
par2 <- gsub(',','%2C',par1,perl=TRUE) # Percent encode par

# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste('oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)

The parameter string is then extended to include the HTTP method and the Twitter base URL so as to create a signature base string.

# Percent encode parameters 3
url1 <- 'https://api.twitter.com/1.1/application/rate_limit_status.json'
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 

# Create signature base string
signBaseString <- paste('GET','&',url3,'&',ps4,sep='') 

We then create a signing key.

signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')

The signature base string and the signing key are used to create an oauth signature.

osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)

kv <-data.frame(nonce=nonce[1],timestamp=timestamp,osign=osign644[1])

These results can than be passed to the function getURL() so as to download the desired Twitter data, such as status rate limits.

fromJSON(getURL(paste('https://api.twitter.com/1.1/application/rate_limit_status.json?resources=statuses&oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',sep='')))

Results
Screen Shot 2013-05-26 at 10.53.05 PM

To reduce repetition, I’ve wrapped the above code into an R function called keyValues.

keyValues <- function(httpmethod,baseurl,par1){	
# Generate a random string of letters and numbers
string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters

# Get the current GMT system time in seconds 
timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))

# Percent encode parameters 1
par2 <- gsub(',','%2C',par1,perl=TRUE) # Percent encode par

# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste('oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)

# Percent encode parameters 3
#url1 <- 'https://api.twitter.com/1.1/application/rate_limit_status.json'
url1 <- baseurl
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 

# Create signature base string
signBaseString <- paste(httpmethod,'&',url3,'&',ps4,sep='') 

# Create signing key
signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')

# oauth_signature
osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)

return(data.frame(hm=httpmethod,bu=baseurl,p=par1,nonce=nonce[1],timestamp=timestamp,osign=osign644[1]))
}

Twitter data requests now appear, on the user end of things, to need less code through the use of this function. Make sure to run the keyValues() function and the fromJSON() function within a few seconds of each others, or else Twitter won’t respect your data request.

## Check rate limits
# Parameter options: help, users, search, statuses
kv <- keyValues(httpmethod='GET',baseurl='https://api.twitter.com/1.1/application/rate_limit_status.json',par1='&resources=statuses')

fromJSON(getURL(paste(kv$bu,'?','oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',kv$p,sep='')))

Download Twitter Data using JSON in R

Here we consider the task of downloading Twitter data using the R software package RJSONIO.

Screen Shot 2013-05-25 at 6.02.01 PM

Before we can download Twitter data, we’ll need to prove to Twitter that we are in fact authorized to do so. I refer the interested reader to the post Twitter OAuth FAQ for instructions on how to setup an application with dev.twitter.com. Once we’ve setup an application with Twitter we can write some R code to communicate with Twitter about our application and get the data we want. Code from the post Authorize a Twitter Data request in R, specifically the keyValues() function, will be used in this post to handle our authentication needs when requesting data from Twitter.

## Install R packages
install.packages('bitops')
install.packages('digest')
install.packages('RCurl')
install.packages('ROAuth')
install.packages('RJSONIO')


## Load R packages
library('bitops')
library('digest')
library('RCurl')
library('ROAuth')
library('RJSONIO')
library('plyr')


## Set decimal precision
options(digits=22)


## OAuth application values
oauth <- data.frame(consumerKey='YoUrCoNsUmErKeY',consumerSecret='YoUrCoNsUmErSeCrEt',accessToken='YoUrAcCeSsToKeN',accessTokenSecret='YoUrAcCeSsToKeNsEcReT')

keyValues <- function(httpmethod,baseurl,par1a,par1b){  	
# Generate a random string of letters and numbers
string <- paste(sample(c(letters[1:26],0:9),size=32,replace=T),collapse='') # Generate random string of alphanumeric characters
string2 <- base64(string,encode=TRUE,mode='character') # Convert string to base64
nonce <- gsub('[^a-zA-Z0-9]','',string2,perl=TRUE) # Remove non-alphanumeric characters
 
# Get the current GMT system time in seconds 
timestamp <- as.character(floor(as.numeric(as.POSIXct(Sys.time(),tz='GMT'))))
 
# Percent encode parameters 1
#par1 <- '&resources=statuses'
par2a <- gsub(',','%2C',par1a,perl=TRUE) # Percent encode par
par2b <- gsub(',','%2C',par1b,perl=TRUE) # Percent encode par
 
# Percent ecode parameters 2
# Order the key/value pairs by the first letter of each key
ps <- paste(par2a,'oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',nonce[1],'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0',par2b,sep='')
ps2 <- gsub('%','%25',ps,perl=TRUE) 
ps3 <- gsub('&','%26',ps2,perl=TRUE)
ps4 <- gsub('=','%3D',ps3,perl=TRUE)
 
# Percent encode parameters 3
url1 <- baseurl
url2 <- gsub(':','%3A',url1,perl=TRUE) 
url3 <- gsub('/','%2F',url2,perl=TRUE) 
 
# Create signature base string
signBaseString <- paste(httpmethod,'&',url3,'&',ps4,sep='') 
 
# Create signing key
signKey <- paste(oauth$consumerSecret,'&',oauth$accessTokenSecret,sep='')
 
# oauth_signature
osign <- hmac(key=signKey,object=signBaseString,algo='sha1',serialize=FALSE,raw=TRUE)
osign641 <- base64(osign,encode=TRUE,mode='character')
osign642 <- gsub('/','%2F',osign641,perl=TRUE)
osign643 <- gsub('=','%3D',osign642,perl=TRUE)
osign644 <- gsub('[+]','%2B',osign643,perl=TRUE)
 
return(data.frame(hm=httpmethod,bu=baseurl,p=paste(par1a,par1b,sep=''),nonce=nonce[1],timestamp=timestamp,osign=osign644[1]))
}

Next, we need to figure out what kind of Twitter data we want to download. The Twitter REST API v1.1 Resources site provides a useful outline of what kind of data we can get from Twitter. Just read what is written under the Description sections. As an example, let’s download some user tweets. To do this, we find and consult the specific Resource on the REST API v1.1 page that corresponds with the action we want, here GET statuses/user_timeline. The resource page lists and describes the download options available to the task of getting tweets from a specific user, the thing we want to do, so it’s worth it to the reader to check it out.

Here we download the 100 most recent tweets (and re-tweets) made by the user ‘Reuters’.

## Download user tweets
# Limited to latest 200 tweets
# Specify user name
user <- 'Reuters'
 
kv <- keyValues(httpmethod='GET',baseurl='https://api.twitter.com/1.1/statuses/user_timeline.json',par1a='count=100&include_rts=1&',par1b=paste('&screen_name=',user,sep=''))
 
theData1 <- fromJSON(getURL(paste(kv$bu,'?','oauth_consumer_key=',oauth$consumerKey,'&oauth_nonce=',kv$nonce,'&oauth_signature=',kv$osign,'&oauth_signature_method=HMAC-SHA1&oauth_timestamp=',kv$timestamp,'&oauth_token=',oauth$accessToken,'&oauth_version=1.0','&',kv$p,sep='')))

At this point in the post you should have the 100 most recent tweets made by the user ‘Reuters’ as well as values on several variables recorded by Twitter on each tweet. These are stored in a list data structure. You are now free to do list things to these data to explore what it is you have.

For instance, let’s see the tweets.

theData2 <- unlist(theData1)
names(theData2)
tweets <- theData2[names(theData2)=='text']

Periodically Run an R Script as a Background Process using launchd under OSX

launchdBeforelaunchdAfter

Computers are great at doing repetitive things a lot, so why deprive them of doing what they do best by manually re-running the same code every night? Here we create a simple bash script to execute an R script and define a *.plist so that launchd, under OSX, can run it periodically. The *.plist code given here is configured to run a shell script every day at 8:00 PM.

R script

setwd('~/')
xBar <- mean(c(1,2,1,21,2,3,2))
write.csv(xBar,'rOutput.csv')

Save code as rCode.R to the ~/ directory.

Bash shell script

/usr/bin/Rscript ~/rCode.R

Save code as rShellScript.sh file to the ~/ directory.

Execute chmod +x rShellScript.sh in the terminal to make this file runnable.

*.plist file

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Label</key>
	<string>com.rTask</string>
	<key>ProgramArguments</key>
	<array>
		<string>/Path/to/shell/script/rShellScript.sh</string>
	</array>
	<key>StartCalendarInterval</key>
	<dict>
		<key>Hour</key>
		<integer>20</integer>
		<key>Minute</key>
		<integer>00</integer>
	</dict>
</dict>
</plist>

Save code as com.rTask.plist file under path ~/Library/LaunchAgents.

Run the command launchctl load ~/Library/LaunchAgents/com.rTask.plist.

In this way, we should now be able to periodically run an R script automatically. Fun, huh? For a more lengthy description of how to do this and how it all works see the Creating Launch Daemons and Agents section of the Daemons and Services Programming Guide under the Mac Developer Library at the developer.apple.com website.

How to plot a network subgraph on a network graph using R

plotSubgraphOnGraph
Here is an example of how to highlight the members of a subgraph on a plot of a network graph.

## Load R libraries
library(igraph)

# Set adjacency matrix
g <- matrix(c(0,1,1,1, 1,0,1,0, 1,1,0,1, 0,0,1,0),nrow=4,ncol=4,byrow=TRUE)

# Set adjacency matrix to graph object
g <- graph.adjacency(g,mode="directed")

# Add node attribute label and name values
V(g)$name <- c("n1","n2","n3","n4")

# Set subgraph members
c <- c("n1","n2","n3")

# Add edge attribute id values
E(g)$id <- seq(ecount(g))

# Extract supgraph
ccsg <- induced.subgraph(graph=g,vids=c)

# Extract edge attribute id values of subgraph
ccsgId <- E(ccsg)$id

# Set graph and subgraph edge and node colors and sizes
E(g)$color="grey"
E(g)$width=2
E(g)$arrow.size=1
E(g)$arrow.width=1
E(g)[ccsgId]$color <- "#DC143C" # Crimson
E(g)[ccsgId]$width <- 2
V(g)$size <- 4
V(g)$color="#00FFFF" # Cyan
V(g)$label.color="#00FFFF" # Cyan
V(g)$label.cex <-1.5
V(g)[c]$label.color <- "#DC143C" # Crimson
V(g)[c]$color <- "#DC143C" # Crimson

# Set seed value
set.seed(40041)

# Set layout options
l <- layout.fruchterman.reingold(g)

# Plot graph and subgraph
plot.igraph(x=g,layout=l)

Simple, no?

Return all Column Names that End with a Specified Character using regular expressions in R

With the R functions grep() and names(), you can identify the columns of a matrix that meet some specified criteria.

Say we have the following matrix,

x<-data.frame(v1=c(1,2,3,4),v2=c(11,22,33,44),w1=c(1,2,3,4),w2=c(11,22,33,44))

Screen Shot 2012-12-23 at 5.19.37 PM

To return only those columns that end with a character (e.g., the number 1) submit the R command grep(pattern=".[1]",x=names(x),value=TRUE) into the console. 

Screen Shot 2012-12-23 at 5.19.28 PM