# Vectors, factors and lists

This is all material that I expect you to know, but I'll want to emphasize some features that other writers may not. This will be very sketchy because it's a review.

# Atomic vectors

Atomic vectors are the most basic object in R. The possible modes are

1. numeric
2. character
3. logical

R makes few distinctions between a number and a sequence of numbers.

### Numeric vectors

A numeric vector would arise when storing readings about some set of samples.

names(v) v
S1 1.6
S2 2.1
S3 1.5
S4 1.9

The names attribute must be a character vector.

``````v <- c(1.6, 2.1, 1.5, 1.9)
names(v) <- c("S1", "S2", "S3", "S4")
v
``````
``````##  S1  S2  S3  S4
## 1.6 2.1 1.5 1.9
``````
``````length(v)
``````
``````## [1] 4
``````

The names attribute is optional.

``````names(v) <- NULL
v
``````
``````## [1] 1.6 2.1 1.5 1.9
``````

An integer vector is a special type of numeric vector. It's easy to form ranges of integers like

``````j <- 2:6
j
``````
``````## [1] 2 3 4 5 6
``````

You can fetch the mode of a vector with `mode` or `class`.

``````mode(v)
``````
``````## [1] "numeric"
``````
``````class(v)
``````
``````## [1] "numeric"
``````

## Character vectors

Example:

``````s <- c("a", "BBB", "TCGA", "1", "2.1")
length(s)
``````
``````## [1] 5
``````
``````length("BBB")
``````
``````## [1] 1
``````
``````nchar("BBB")
``````
``````## [1] 3
``````
``````mode(c(3, 2))
``````
``````## [1] "numeric"
``````
``````mode(c("3", "2"))
``````
``````## [1] "character"
``````
``````x2 <- as.numeric(c("3", "2"))
x2
``````
``````## [1] 3 2
``````
``````mode(x2)
``````
``````## [1] "numeric"
``````

## Logical vectors

A logical vector is just a sequence of TRUE and FALSE instances.

``````lv <- c(TRUE, FALSE, TRUE)
``````

Trick: `sum` can count the number of TRUE values in a logical vector.

``````sum(lv)
``````
``````## [1] 2
``````

They can carry name attributes, respond to the `length` command, etc.

### Comparisons

Logical vectors arise from comparisons, normally equality, <, >, %in%, etc. These comparisons replicate across vectors.

``````1 < 2
``````
``````## [1] TRUE
``````
``````c(0, 1, 3) < 2
``````
``````## [1]  TRUE  TRUE FALSE
``````
``````c(0, 1, 3) == 1
``````
``````## [1] FALSE  TRUE FALSE
``````
``````c(0, 1, 3) != 1
``````
``````## [1]  TRUE FALSE  TRUE
``````
``````0 %in% c(0, 1, 3)
``````
``````## [1] TRUE
``````
``````c(0, 2) %in% c(0, 1, 3)
``````
``````## [1]  TRUE FALSE
``````

Comparison expressions can be strung into compound formulas using & (and) and | (or).

## Restricting and subsetting vectors

Positive integer vectors select elements from vector by index. Negative integers remove the corresponding elements.

``````v <- sample(1:100, size = 25)
v[15:19]
``````
``````## [1] 23 67 18 30 56
``````
``````v[25]
``````
``````## [1] 98
``````
``````v[-(3:25)]
``````
``````## [1] 60 68
``````

You can also select using the names attributes.

``````y <- rnorm(10)
names(y) <- paste("S", 1:10, sep = "")
y
``````
``````##       S1       S2       S3       S4       S5       S6       S7       S8
##  1.22968  0.48125  0.44532  0.43198  1.24532  1.28221 -0.21674  0.32227
##       S9      S10
## -1.31833 -0.04364
``````
``````y["S1"]
``````
``````##   S1
## 1.23
``````
``````y[c("S2", "S4")]
``````
``````##     S2     S4
## 0.4813 0.4320
``````

### Subsetting with a logical vector

Putting a logical vector of the same length inside the [ ] forms a subvector consisting of those entries where it's true.

``````a <- c("w", "XX", "z", "fractal")
ll <- c(TRUE, FALSE, FALSE, TRUE)
a[ll]
``````
``````## [1] "w"       "fractal"
``````
``````length(a[ll])
``````
``````## [1] 2
``````

This allows us to use comparison relations to select sequences.

``````# First write an expression to select 'XX' from a
a == "XX"
``````
``````## [1] FALSE  TRUE FALSE FALSE
``````
``````# Use this to subset a
a[a == "XX"]
``````
``````## [1] "XX"
``````

It's more interesting with numerical vectors.

#### Example 1

Go to the sample data web page and run the first `source` command. This will load a vector samp_vec1 into your workspace.

1. What are the length and mode of the vector?
2. Define a subvector b consisting of the elements of a greater than the median.

#### Example 2

The above also loaded a vector samp_vec2.

1. Compute the `sum` of samp_vec2

What happened?

## Missing data

Having missing values in a vector, which are recorded as NA, can complicate the application of functions and comparisons with the vector.

``````z1 <- c(1, 1, 2, 4, 5, NA, NA)
z1 == NA
``````
``````## [1] NA NA NA NA NA NA NA
``````
``````# R has a special function for identifying missing values
is.na(z1)
``````
``````## [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
``````
``````!is.na(z1)
``````
``````## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
``````
``````sum(is.na(z1))
``````
``````## [1] 2
``````

Missing values complicate comparisons and subsetting too.

``````z1 == 1
``````
``````## [1]  TRUE  TRUE FALSE FALSE FALSE    NA    NA
``````
``````z1[z1 == 1]
``````
``````## [1]  1  1 NA NA
``````
``````z1[z1 == 1 & !is.na(z1)]
``````
``````## [1] 1 1
``````

So, I got 2 more elements than I wanted, because the mising values could be 1.

How do I get only the 1's?

### na.rm option

Let's return to samp_vec2. How do we find the sum of the values that aren't missing? We could subset and then sum the resulting vector, but R has a quicker solution. Look up the help for `sum` and solve the problem.

## Assignment to subvectors

It may be necessary to to assign new values to entries in a vector. This may be true for individual entries and for subvectors. Such assignments are similar to subsetting but the square bracket is on the left side of a <- .

To assign to individual entries:

``````y
``````
``````##       S1       S2       S3       S4       S5       S6       S7       S8
##  1.22968  0.48125  0.44532  0.43198  1.24532  1.28221 -0.21674  0.32227
##       S9      S10
## -1.31833 -0.04364
``````
``````y[2] <- 5
y["S10"] <- 0
y
``````
``````##      S1      S2      S3      S4      S5      S6      S7      S8      S9
##  1.2297  5.0000  0.4453  0.4320  1.2453  1.2822 -0.2167  0.3223 -1.3183
##     S10
##  0.0000
``````

You can assign a single value or sequence to a range of entries.

``````y[1:3] <- 1
y
``````
``````##      S1      S2      S3      S4      S5      S6      S7      S8      S9
##  1.0000  1.0000  1.0000  0.4320  1.2453  1.2822 -0.2167  0.3223 -1.3183
##     S10
##  0.0000
``````
``````y[1:3] <- c(0, -1, 2)
y
``````
``````##      S1      S2      S3      S4      S5      S6      S7      S8      S9
##  0.0000 -1.0000  2.0000  0.4320  1.2453  1.2822 -0.2167  0.3223 -1.3183
##     S10
##  0.0000
``````

A logical vector can also be used to select the subsequent that takes the assignment. Predictably, the TRUE values of the vector within the [ ] define the subsequence.

``````x3 <- c("a", "b", "Xi", "mu")
sel <- c(FALSE, FALSE, TRUE, TRUE)
x3[sel] <- c("x", "m")
x3
``````
``````## [1] "a" "b" "x" "m"
``````

As with subsetting, the logical vector usually comes from an equation or inequality.

``````w <- rt(6, df = 4)  # get 6 random values from a t-distribution with 4 degrees of freedom
w
``````
``````## [1] -0.02800  0.04585 -1.66150 -0.90038 -1.73588 -1.60305
``````
``````w[w > median(w)] <- 20
w
``````
``````## [1] 20.000 20.000 -1.661 20.000 -1.736 -1.603
``````

# Factors

A factor is a type of character vector for which the possible values are set by the context. For a factor describing gender the two possibilities are "M" and "F". The extra structure offered by the factor class helps in some statistical modeling and data management.

# Lists

Much of your time in large scale projects will be spent managing lists and data.frames (which are also lists). A list is an indexed collection of R objects. What an index points to is called a component of the list. It's like a vector but you can put anything in a component. Like a vector a list can contain names for the components.

``````aa <- list(c(1, 2), c("z", "ALN"), c(1, 2, 3.4, 5.6))
class(aa)
``````
``````## [1] "list"
``````
``````length(aa)
``````
``````## [1] 3
``````
``````names(aa)
``````
``````## NULL
``````
``````names(aa) <- c("V1", "V2", "V3")
# A quicker way to initialize with names
bb <- list(V1 = c("a", "1"), V2 = c(2, 2, 2))
bb
``````
``````## \$V1
## [1] "a" "1"
##
## \$V2
## [1] 2 2 2
``````

## Indexing a list

We have lots of nested indexing here? Keeping it clear is done with bracket level.

``````aa[[1]]
``````
``````## [1] 1 2
``````
``````bb\$V1
``````
``````## [1] "a" "1"
``````

This gives the first component, as an object of it's own class. The list structure is gone.

``````aa[1]
``````
``````## \$V1
## [1] 1 2
``````
``````class(aa[1])
``````
``````## [1] "list"
``````

This is a list of length 1.

To get sublists in general, do what you'd do for a vector.

``````aa[2:3]
``````
``````## \$V2
## [1] "z"   "ALN"
##
## \$V3
## [1] 1.0 2.0 3.4 5.6
``````

A double bracket range gives an error.

How would we get the first indexed slot of every component in a list? If the list components are all numeric vectors, how do we get the means of all of them? This is what `lapply` and its relatives are for.