The native curl package in R,
RCurl
, provides an integrated set of tools for interacting with remote servers, to say the least. While it provides a number of useful functions, it still lacks a few sorely missed options (e.g., retry
). Of course, it’s still possible to write some of these missing functions in R, which can then be used to expand the functionality of the RCurl
package, but, on the other hand, it might just be easier to use the better maintained and fully functional curl
program that comes with your computer. Under Mac OS X, the native curl program can be accessed in R using the command system()
.
For instance, we can serially download and save webpages (and retry the process if it fails) by using the following R syntax.
for(i in 1:n){ system(paste("curl http://www.google.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=TRUE) }
Similarly, we can use some simple R syntax to asynchronously download a number of webpages. For instance,
for(i in 1:n){ system(paste("curl http://www.yahoo.com --retry 999 --output sitePage",as.character(i),".html",sep=""),wait=FALSE) }
If you’re just downloading webpages, it’s easy enough to use the native curl
program that comes with your computer–just use the R command system()
. In this way, you can download some pages with curl and then parse the information from them at a later time.