This is all material that I expect you to know, but I'll want to emphasize some features that other writers may not. This will be very sketchy because it's a review.

Atomic vectors are the most basic object in **R**. The possible *modes* are

- numeric
- character
- logical

**R** makes few distinctions between a number and a sequence of numbers.

A numeric vector would arise when storing readings about some set of samples.

names(v) | v |
---|---|

S1 | 1.6 |

S2 | 2.1 |

S3 | 1.5 |

S4 | 1.9 |

The names attribute must be a character vector.

```
v <- c(1.6, 2.1, 1.5, 1.9)
names(v) <- c("S1", "S2", "S3", "S4")
v
```

```
## S1 S2 S3 S4
## 1.6 2.1 1.5 1.9
```

```
length(v)
```

```
## [1] 4
```

The names attribute is optional.

```
names(v) <- NULL
v
```

```
## [1] 1.6 2.1 1.5 1.9
```

An *integer* vector is a special type of numeric vector. It's easy to form ranges of integers like

```
j <- 2:6
j
```

```
## [1] 2 3 4 5 6
```

You can fetch the mode of a vector with `mode`

or `class`

.

```
mode(v)
```

```
## [1] "numeric"
```

```
class(v)
```

```
## [1] "numeric"
```

Example:

```
s <- c("a", "BBB", "TCGA", "1", "2.1")
length(s)
```

```
## [1] 5
```

```
length("BBB")
```

```
## [1] 1
```

```
nchar("BBB")
```

```
## [1] 3
```

```
mode(c(3, 2))
```

```
## [1] "numeric"
```

```
mode(c("3", "2"))
```

```
## [1] "character"
```

```
x2 <- as.numeric(c("3", "2"))
x2
```

```
## [1] 3 2
```

```
mode(x2)
```

```
## [1] "numeric"
```

A logical vector is just a sequence of TRUE and FALSE instances.

```
lv <- c(TRUE, FALSE, TRUE)
```

Trick: `sum`

can count the number of TRUE values in a logical vector.

```
sum(lv)
```

```
## [1] 2
```

They can carry name attributes, respond to the `length`

command, etc.

Logical vectors arise from comparisons, normally equality, <, >, %in%, etc. These comparisons replicate across vectors.

```
1 < 2
```

```
## [1] TRUE
```

```
c(0, 1, 3) < 2
```

```
## [1] TRUE TRUE FALSE
```

```
c(0, 1, 3) == 1
```

```
## [1] FALSE TRUE FALSE
```

```
c(0, 1, 3) != 1
```

```
## [1] TRUE FALSE TRUE
```

```
0 %in% c(0, 1, 3)
```

```
## [1] TRUE
```

```
c(0, 2) %in% c(0, 1, 3)
```

```
## [1] TRUE FALSE
```

Comparison expressions can be strung into compound formulas using & (and) and | (or).

Positive integer vectors select elements from vector by index. Negative integers remove the corresponding elements.

```
v <- sample(1:100, size = 25)
v[15:19]
```

```
## [1] 23 67 18 30 56
```

```
v[25]
```

```
## [1] 98
```

```
v[-(3:25)]
```

```
## [1] 60 68
```

You can also select using the names attributes.

```
y <- rnorm(10)
names(y) <- paste("S", 1:10, sep = "")
y
```

```
## S1 S2 S3 S4 S5 S6 S7 S8
## 1.22968 0.48125 0.44532 0.43198 1.24532 1.28221 -0.21674 0.32227
## S9 S10
## -1.31833 -0.04364
```

```
y["S1"]
```

```
## S1
## 1.23
```

```
y[c("S2", "S4")]
```

```
## S2 S4
## 0.4813 0.4320
```

Putting a logical vector of the same length inside the [ ] forms a subvector consisting of those entries where it's true.

```
a <- c("w", "XX", "z", "fractal")
ll <- c(TRUE, FALSE, FALSE, TRUE)
a[ll]
```

```
## [1] "w" "fractal"
```

```
length(a[ll])
```

```
## [1] 2
```

This allows us to use comparison relations to select sequences.

```
# First write an expression to select 'XX' from a
a == "XX"
```

```
## [1] FALSE TRUE FALSE FALSE
```

```
# Use this to subset a
a[a == "XX"]
```

```
## [1] "XX"
```

It's more interesting with numerical vectors.

Go to the sample data web page and run the first `source`

command. This will load a vector samp_vec1 into your workspace.

- What are the length and mode of the vector?
- Define a subvector b consisting of the elements of a greater than the median.

The above also loaded a vector samp_vec2.

- Compute the
`sum`

of samp_vec2

What happened?

Having missing values in a vector, which are recorded as NA, can complicate the application of functions and comparisons with the vector.

```
z1 <- c(1, 1, 2, 4, 5, NA, NA)
z1 == NA
```

```
## [1] NA NA NA NA NA NA NA
```

```
# R has a special function for identifying missing values
is.na(z1)
```

```
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
```

```
!is.na(z1)
```

```
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
```

```
sum(is.na(z1))
```

```
## [1] 2
```

Missing values complicate comparisons and subsetting too.

```
z1 == 1
```

```
## [1] TRUE TRUE FALSE FALSE FALSE NA NA
```

```
z1[z1 == 1]
```

```
## [1] 1 1 NA NA
```

```
z1[z1 == 1 & !is.na(z1)]
```

```
## [1] 1 1
```

So, I got 2 more elements than I wanted, because the mising values *could* be 1.

How do I get **only** the 1's?

Let's return to samp_vec2. How do we find the sum of the values that aren't missing? We could subset and then sum the resulting vector, but **R** has a quicker solution. Look up the help for `sum`

and solve the problem.

It may be necessary to to assign new values to entries in a vector. This may be true for individual entries and for subvectors. Such assignments are similar to subsetting but the square bracket is on the left side of a <- .

To assign to individual entries:

```
y
```

```
## S1 S2 S3 S4 S5 S6 S7 S8
## 1.22968 0.48125 0.44532 0.43198 1.24532 1.28221 -0.21674 0.32227
## S9 S10
## -1.31833 -0.04364
```

```
y[2] <- 5
y["S10"] <- 0
y
```

```
## S1 S2 S3 S4 S5 S6 S7 S8 S9
## 1.2297 5.0000 0.4453 0.4320 1.2453 1.2822 -0.2167 0.3223 -1.3183
## S10
## 0.0000
```

You can assign a single value or sequence to a range of entries.

```
y[1:3] <- 1
y
```

```
## S1 S2 S3 S4 S5 S6 S7 S8 S9
## 1.0000 1.0000 1.0000 0.4320 1.2453 1.2822 -0.2167 0.3223 -1.3183
## S10
## 0.0000
```

```
y[1:3] <- c(0, -1, 2)
y
```

```
## S1 S2 S3 S4 S5 S6 S7 S8 S9
## 0.0000 -1.0000 2.0000 0.4320 1.2453 1.2822 -0.2167 0.3223 -1.3183
## S10
## 0.0000
```

A logical vector can also be used to select the subsequent that takes the assignment. Predictably, the TRUE values of the vector within the [ ] define the subsequence.

```
x3 <- c("a", "b", "Xi", "mu")
sel <- c(FALSE, FALSE, TRUE, TRUE)
x3[sel] <- c("x", "m")
x3
```

```
## [1] "a" "b" "x" "m"
```

As with subsetting, the logical vector usually comes from an equation or inequality.

```
w <- rt(6, df = 4) # get 6 random values from a t-distribution with 4 degrees of freedom
w
```

```
## [1] -0.02800 0.04585 -1.66150 -0.90038 -1.73588 -1.60305
```

```
w[w > median(w)] <- 20
w
```

```
## [1] 20.000 20.000 -1.661 20.000 -1.736 -1.603
```

A *factor* is a type of character vector for which the possible values are set by the context. For a factor describing gender the two possibilities are "M" and "F". The extra structure offered by the factor class helps in some statistical modeling and data management.

Much of your time in large scale projects will be spent managing lists and data.frames (which are also lists). A *list* is an indexed collection of **R** objects. What an index points to is called a *component* of the list. It's like a vector but you can put anything in a component. Like a vector a list can contain *names* for the components.

```
aa <- list(c(1, 2), c("z", "ALN"), c(1, 2, 3.4, 5.6))
class(aa)
```

```
## [1] "list"
```

```
length(aa)
```

```
## [1] 3
```

```
names(aa)
```

```
## NULL
```

```
names(aa) <- c("V1", "V2", "V3")
# A quicker way to initialize with names
bb <- list(V1 = c("a", "1"), V2 = c(2, 2, 2))
bb
```

```
## $V1
## [1] "a" "1"
##
## $V2
## [1] 2 2 2
```

We have lots of nested indexing here? Keeping it clear is done with bracket level.

```
aa[[1]]
```

```
## [1] 1 2
```

```
bb$V1
```

```
## [1] "a" "1"
```

This gives the first component, as an object of it's own class. The list structure is gone.

```
aa[1]
```

```
## $V1
## [1] 1 2
```

```
class(aa[1])
```

```
## [1] "list"
```

This is a list of length 1.

To get sublists in general, do what you'd do for a vector.

```
aa[2:3]
```

```
## $V2
## [1] "z" "ALN"
##
## $V3
## [1] 1.0 2.0 3.4 5.6
```

A double bracket range gives an error.

How would we get the first indexed slot of every component in a list? If the list components are all numeric vectors, how do we get the means of all of them? This is what `lapply`

and its relatives are for.