Reading Local Files
Loading flat files - read.table()
- this is the main function for reading data into R
- reads the data into RAM - large data sets can cause problems
- important parameters: file, header, sep, row.names, nrows
- Related methods: read.csv(), read.csv2()
Some more important parameters
quote - you can tell R whether there are any quoted values; quote="" means no quotes
na.strings - set the character that represents a missing value
nrows - how many rows to read from the file
skip - number of lines to skip before starting to read
Reading Excel Files
read.xlsx, read.xlsx2 are in library(xlsx)
write.xslx() will write out an Excel file with similar arguments
The XLConnect package (link to XLConnect Vignette) has more options for writing and manipulating Excel files.
In general, it is advised that you store your data in either a database, .txt, or .csv file, since they are easier to distribute.
Reading XML
library(XML) - the XML library
doc <- xmlTreeParse(fileUrl, useInternal = TRUE) rootNode <- xmlRoot(doc)
xmlName(rootNode) - gets the name of the root node
Get the names of the nested elements under the root node: names(rootNode)
You can access parts of the XML document similar to how you access a list.
rootNode[[1]] - access the first element and its children
rootNode[[1]][[1]] - access the first child of the first element
xmlSApply(rootNode, xmlValue) - programmatically extract parts of the file; gets all the text from the document
You can access information directly using XPath.
extract specific nodes using XPath: xmlSApply(rootNode, "//nodeName", xmlValue)
Reading JSON
library(jsonlite) jsonData <- fromJSON(url) names(jsonData)
Accessing nested objects in JSON
> names(jsonData$objectName) > jsonData$objectName$subObject
Writing Data Frames to JSON
> myJson <- toJSON(dataset, pretty=TRUE) > cat(myjson) # output to console
Convert back to Data Frame
dataset2 <- fromJSON(myJson) head(dataset2)