Accessing All the Curl Options under R

The native curl package in R, RCurl, provides an integrated set of tools for interacting with remote servers, to say the least. While it provides a number of useful functions, it still lacks a few sorely missed options (e.g., retry). Of course, it’s still possible to write some of these missing functions in R, which can then be used to expand the functionality of the RCurl package, but, on the other hand, it might just be easier to use the better maintained and fully functional curl program that comes with your computer. Under Mac OS X, the native curl program can be accessed in R using the command system().

For instance, we can serially download and save webpages (and retry the process if it fails) by using the following R syntax.

for(i in 1:n){
    system(paste("curl http://www.google.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=TRUE)
}

Similarly, we can use some simple R syntax to asynchronously download a number of webpages. For instance,

for(i in 1:n){
    system(paste("curl http://www.yahoo.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=FALSE)
}

If you’re just downloading webpages, it’s easy enough to use the native curl program that comes with your computer–just use the R command system(). In this way, you can download some pages with curl and then parse the information from them at a later time.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s