Coursera - R Programming - Week 1 - Subsetting R Objects

Basics

There are a number of operators that can be used to extract subsets of R objects.

[ - always returns an object of the same class as the original, can be used to select more than one element

[ [ - used to extract elements of a list or data frame. can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame

$ - used to extract elements of a list or data frame by name; semantics are similar to that of [ [

    > x <- c("a", "b", "c", "c", "d", "a")
    > x[1]
    [1] "a"
    > x[2]
    [1] "b"
    > x[1:4]
    [1] "a" "b" "c" "c"
    > x[x > "a"]
    [1] "b" "c" "c" "d"
    > u <- x > "a"
    > u
    [1] FALSE TRUE TRUE TRUE TRUE FALSE
    > x[u]
    [1] "b" "c" "c" "d"
    

Lists

    > x <- list(foo = 1:4, bar = 0.6)
    

First element is foo. Second element is bar.

    > x[1] # returns list with sequence
    $foo
    [1] 1 2 3 4
    
    > x[[1]] # returns sequence from list
    [1] 1 2 3 4
    

If you can't remember the position of "bar" in the list, you can access it using its name rather than its index.

    > x$bar # returns element associated with "bar"
    [1] 0.6
    
    > x[["bar"]] # equivalent to above
    [1] 0.6
    
    > x["bar"] # returns list with element
    $bar
    [1] 0.6
    

To extract multiple elements from a list, use the [] operator.

    > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
    > x[c(1, 3)]
    $foo
    [1] 1 2 3 4
    
    $baz
    [1] "hello"
    

You can't use the [[]] or $ operators to extract multiple elements from a list.

The [[]] operator can be used with indices; $ can only be used with literal names.

    > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
    > name <- "foo"
    > x[[name]]
    [1] 1 2 3 4
    > x$name
    NULL
    > x$foo
    [1] 1 2 3 4
    

[[]] can take an integer sequence.

    > x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))
    > x[[c(1, 3)]]
    [1] 14
    > x[[1]][[3]]
    [1] 14
    > x[[c(2, 1)]]
    [1] 3.14
    

Matrices

    > x <- matrix(1:6, 2, 3)
    > x
    \t[ ,1]\t[ ,2]\t[ ,3]
    [1, ]\t1\t3\t5
    [2, ]\t2\t4\t6
    
    > x[1, 2]
    [1] 3
    > x[2, 1]
    [1] 2
    

Indices can also be missing.

    > x[1, ]
    [1] 1 3 5
    > x[, 2]
    [1] 3 4
    

By default, when a single element from a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 x 1 matrix. This behavior can be turned off by setting drop = FALSE.

    > x[1, 2, drop = FALSE]
    \t[ ,1]
    [1, ]\t3
    
    > x[1, , drop = FALSE]
    \t[ ,1]\t[ ,2]\t[ ,3]
    [1, ]\t1\t3\t5
    

Partial Matching

Partial matching of names is allowed with [[]] and $.

$ looks for a name in the list that matches the letter "a"

    > x <- list(aardvark = 1:5)
    > x$a
    [1] 1 2 3 4 5
    

[[]] looks for a name that's an exact match.

    > x[["a"]]
    NULL
    

The exact = FALSE argument drops the exactness requirement.

    > x[["a", exact = FALSE]]
    [1] 1 2 3 4 5
    

Removing Missing (NA) Values

    > x <- c(1, 2, NA, 4, NA, 5)
    > bad <- is.na(x)
    > x[!bad]
    [1] 1 2 4 5
    > y <- c("a", "b", NA, "d", NA, "f")
    > good <- complete.cases(x, y)
    > good
    [1] TRUE TRUE FALSE TRUE FALSE TRUE
    > x[good]
    [1] 1 2 4 5
    > y[good]
    "a" "b" "d" "f"
    

You can also use complete.cases to remove missing values from data frames. To get the rows of a data frame where all the values are not missing:

    > good <- complete.cases(dataframename)
    > dataframename[good, ]
    

Published January 18, 2015