__Objects and Attributes__

R has five basic or "atomic" classes of objects:

- character
- numeric (real numbers)
- integer
- complex
- logical (True/False)

The most basic object is a __vector__.

A vector can only contain objects of the same class. The one exception to this rule is a list, which is represented as a vector but can contain objects of different classes.

vector() creates an empty vector

Two arguments: class of the object in the vector, length of the vector

Numbers

- generally treated as double precision real numbers
- to explicitly define an integer, specify the "L" suffix
- Inf represents infinity
- NaN represents not a number

R objects can have attributes.

- names, dimnames
- dimensions (e.g., matrices, arrays)
- class
- length
- other user-defined attributes/metadata

Attributes of an object can be accessed using the attributes() function.

__Vectors and Lists__

Creating Vectors

c() can be used to create vectors of objects

> x <- c(0.5, 0.6) # numeric > x <- c(T, F) # logical > x <- 9:29 # integer > x <- c(1+0i, 2+4i) # complex

> x <- vector("numeric", length = 10) > x [1] 0 0 0 0 0 0 0 0 0 0

__Mixing Objects__

> y <- c(1.7, "a") # character > y <- c(TRUE, 2) # numeric > y <- c("a", TRUE) # character

__coercion__ - every element in a mixed vector becomes of the same class

__Explicit Coercion__

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

> x <- 0:6 > class(x) [1] "integer" > as.numeric(x) [1] 0 1 2 3 4 5 6 > as.logical(x) [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE > as.character(x) [1] "0" "1" "2" "3" "4" "5" "6"

Nonsensical coercion results in NAs.

> x <- c("a", "b", "c") > as.numeric(x) [1] NA NA NA Warning message: As introduced by coercion > as.logical(x) [1] NA NA NA

__Lists__

> x <- list(1, "a", TRUE, 1+4i) > x [[1]] [1] 1 [[2]] [1] "a" [[3]] [1] TRUE [[4]] [1] 1+4i

Elements of a list have double-brackets around them. Elements of a vector have single-brackets around them.

__Matrices__

__matrix__ - a vector with a __dimension__ attribute

__dimension__ - an attribute with an integer vector of length 2 (nrow, ncol)

> m <- matrix(nrow = 2, ncol = 3) > dim(m) [1] 2 3 > attributes(m) $dim [1] 2 3

Matrices are constructed column-wise, so entries start in the top-left of the first column and run down the columns.

> m <- matrix(1:6, nrow = 2, ncol = 3) > m \t[,1]\t[,2]\t[,3] [1,]\t1\t3\t5 [2,]\t2\t4\t6

Matrices can also be created directly from vectors by adding a dimension attribute.

> m <- 1:10 > m [1] 1 2 3 4 5 6 7 8 9 10 > dim(m) <- c(2, 5) > m \t[,1]\t[,2]\t[,3]\t[,4]\t[,5] [1,]\t1\t3\t5\t7\t9 [2,]\t2\t4\t6\t8\t10

__cbind()__ - column-binding method of creating a matrix

__rbind()__ - row-binding method of creating a matrix

> x <- 1:3 > y <- 10:12 > cbind(x, y) \tx\ty [1,]\t1\t10 [2,]\t2\t11 [3,]\t3\t12 > rbind(x, y) \t[,1]\t[,2]\t[,3] x\t1\t2\t3 y\t10\t11\t12

__Factors__

__factor__ - used to represent categorical data in an __ordered__ or __unordered__ fashion. an integer vector where each integer has a __label__. Input into the factor function is a character vector.

> x <- factor(c("yes", "no", "yes", "yes", "no")) > x [1] yes no yes yes no Levels: no yes > table(x) x no\tyes 2\t3 > unclass(x) [1] 2 1 2 2 1 attr(,"levels") [1] "no" "yes"

The order of the levels can be set using the levels argument to factor(). This can be important in linear modeling because the first level is used as the baseline level.

> x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no")) > x [1] yes yes no yes no Levels: yes no

The default ordering of levels is by alphabetical order.

__Missing Values__

Missing values are NA or NaN for undefined mathematical operations.

is.na() is used to test objects if they are NA

is.nan() is used to test for NaN

NA values have a class too, so there are integer NAs, character NAs, etc. While an NaN value is also NA, an NA value is not NaN.

__Data Frames__

Data Frames

- used to store different classes of objects in tabular data
- represented as a special type of list, where every element of the list has the same length
- each element of the list can be thought of as a column, and the length of each element is the number of rows
- have a special attribute called row.names
- usually created by calling read.table() or read.csv()
- can be converted to a matrix using data.matrix()

> x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) > x \tfoo\tbar 1\t1\tTRUE 2\t2\tTRUE 3\t3\tFALSE 4\t4\tFALSE > nrow(x) [1] 4 > ncol(x) [1] 2

__Names Attribute__

R objects can also have names, which is very useful for writing readable code and self-describing objects.

> x <- 1:3 > names(x) NULL > names(x) <- c("foo", "bar", "baz") > x foo\tbar\tbaz 1\t2\t3 > names(x) [1] "foo" "bar" "baz"

Lists can also have names.

> x <- list(a = 1, b = 2, c = 3) > x $a [1] 1 $b [1] 2 $c [1] 3

Matrices can also have names.

> m <- matrix(1:4, nrow = 2, ncol = 2) > dimnames(m) <- list(c("a", "b"), c("c", "d")) > m \tc\td a\t1\t3 b\t2\t4

__Summary of Data Types in R__

- atomic classes: numeric, logical, character, integer, complex
- vectors, lists
- factors
- missing values
- data frames
- names