Objects and Attributes
R has five basic or "atomic" classes of objects:
- character
- numeric (real numbers)
- integer
- complex
- logical (True/False)
The most basic object is a vector.
A vector can only contain objects of the same class. The one exception to this rule is a list, which is represented as a vector but can contain objects of different classes.
vector() creates an empty vector
Two arguments: class of the object in the vector, length of the vector
Numbers
- generally treated as double precision real numbers
- to explicitly define an integer, specify the "L" suffix
- Inf represents infinity
- NaN represents not a number
R objects can have attributes.
- names, dimnames
- dimensions (e.g., matrices, arrays)
- class
- length
- other user-defined attributes/metadata
Attributes of an object can be accessed using the attributes() function.
Vectors and Lists
Creating Vectors
c() can be used to create vectors of objects
> x <- c(0.5, 0.6) # numeric > x <- c(T, F) # logical > x <- 9:29 # integer > x <- c(1+0i, 2+4i) # complex
> x <- vector("numeric", length = 10) > x [1] 0 0 0 0 0 0 0 0 0 0
Mixing Objects
> y <- c(1.7, "a") # character > y <- c(TRUE, 2) # numeric > y <- c("a", TRUE) # character
coercion - every element in a mixed vector becomes of the same class
Explicit Coercion
Objects can be explicitly coerced from one class to another using the as.* functions, if available.
> x <- 0:6 > class(x) [1] "integer" > as.numeric(x) [1] 0 1 2 3 4 5 6 > as.logical(x) [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE > as.character(x) [1] "0" "1" "2" "3" "4" "5" "6"
Nonsensical coercion results in NAs.
> x <- c("a", "b", "c") > as.numeric(x) [1] NA NA NA Warning message: As introduced by coercion > as.logical(x) [1] NA NA NA
Lists
> x <- list(1, "a", TRUE, 1+4i) > x [[1]] [1] 1 [[2]] [1] "a" [[3]] [1] TRUE [[4]] [1] 1+4i
Elements of a list have double-brackets around them. Elements of a vector have single-brackets around them.
Matrices
matrix - a vector with a dimension attribute
dimension - an attribute with an integer vector of length 2 (nrow, ncol)
> m <- matrix(nrow = 2, ncol = 3) > dim(m) [1] 2 3 > attributes(m) $dim [1] 2 3
Matrices are constructed column-wise, so entries start in the top-left of the first column and run down the columns.
> m <- matrix(1:6, nrow = 2, ncol = 3) > m \t[,1]\t[,2]\t[,3] [1,]\t1\t3\t5 [2,]\t2\t4\t6
Matrices can also be created directly from vectors by adding a dimension attribute.
> m <- 1:10 > m [1] 1 2 3 4 5 6 7 8 9 10 > dim(m) <- c(2, 5) > m \t[,1]\t[,2]\t[,3]\t[,4]\t[,5] [1,]\t1\t3\t5\t7\t9 [2,]\t2\t4\t6\t8\t10
cbind() - column-binding method of creating a matrix
rbind() - row-binding method of creating a matrix
> x <- 1:3 > y <- 10:12 > cbind(x, y) \tx\ty [1,]\t1\t10 [2,]\t2\t11 [3,]\t3\t12 > rbind(x, y) \t[,1]\t[,2]\t[,3] x\t1\t2\t3 y\t10\t11\t12
Factors
factor - used to represent categorical data in an ordered or unordered fashion. an integer vector where each integer has a label. Input into the factor function is a character vector.
> x <- factor(c("yes", "no", "yes", "yes", "no")) > x [1] yes no yes yes no Levels: no yes > table(x) x no\tyes 2\t3 > unclass(x) [1] 2 1 2 2 1 attr(,"levels") [1] "no" "yes"
The order of the levels can be set using the levels argument to factor(). This can be important in linear modeling because the first level is used as the baseline level.
> x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no")) > x [1] yes yes no yes no Levels: yes no
The default ordering of levels is by alphabetical order.
Missing Values
Missing values are NA or NaN for undefined mathematical operations.
is.na() is used to test objects if they are NA
is.nan() is used to test for NaN
NA values have a class too, so there are integer NAs, character NAs, etc. While an NaN value is also NA, an NA value is not NaN.
Data Frames
Data Frames
- used to store different classes of objects in tabular data
- represented as a special type of list, where every element of the list has the same length
- each element of the list can be thought of as a column, and the length of each element is the number of rows
- have a special attribute called row.names
- usually created by calling read.table() or read.csv()
- can be converted to a matrix using data.matrix()
> x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) > x \tfoo\tbar 1\t1\tTRUE 2\t2\tTRUE 3\t3\tFALSE 4\t4\tFALSE > nrow(x) [1] 4 > ncol(x) [1] 2
Names Attribute
R objects can also have names, which is very useful for writing readable code and self-describing objects.
> x <- 1:3 > names(x) NULL > names(x) <- c("foo", "bar", "baz") > x foo\tbar\tbaz 1\t2\t3 > names(x) [1] "foo" "bar" "baz"
Lists can also have names.
> x <- list(a = 1, b = 2, c = 3) > x $a [1] 1 $b [1] 2 $c [1] 3
Matrices can also have names.
> m <- matrix(1:4, nrow = 2, ncol = 2) > dimnames(m) <- list(c("a", "b"), c("c", "d")) > m \tc\td a\t1\t3 b\t2\t4
Summary of Data Types in R
- atomic classes: numeric, logical, character, integer, complex
- vectors, lists
- factors
- missing values
- data frames
- names