Frequently when working with a matrix of numbers you'll want to apply a function to each row or column of a matrix to get a sense of the distributions, for example. Following is an example.
Let's read in a matrix of gene expression values.
setwd("~/Documents/Computing with Data/7_Functions_matrices/")
load("./Data/expr_mat")
dim(expr_mat)
## [1] 22283 1269
expr_mat[1:3, 1:4]
## GSM79114 GSM79118 GSM79119 GSM79120
## 1007_s_at 9.990 9.536 10.189 9.682
## 1053_at 4.241 5.957 5.579 5.334
## 117_at 3.406 3.615 3.573 3.741
These are expression levels of "genes" as measured by an Affymetrix microarray probe for 1269 breast cancer patients.
The first method uses a for
loop. It computes the values one row at a time. Before entering the for
loop, define a variable to hold the results.
iqr_v2 <- numeric(nrow(expr_mat))
for (i in 1:nrow(expr_mat)) {
iqr_v2[i] <- IQR(expr_mat[i, ])
}
apply
gives a more direct way. The format is simpler and may be a little faster.
iqr_vals2 <- apply(expr_mat, 1, IQR)
iqr_vals2[1:3]
## 1007_s_at 1053_at 117_at
## 0.8371 0.6955 0.4377
This also set the names of the returned variable.
result <- apply(matrix, margin ( 1= row, 2 = column), function)
The way it works is that the rows of the matrix are used as input to the function. Remember this.
apply
usageWe can compute the mean of the columns as
mean_col <- apply(expr_mat, 2, mean)
Very often what you want to compute about a row isn't axailable as a pre-defined function with a single input variable.
How many rows in the expression matrix have constant expression values? These are ones with max = min.
sum(apply(expr_mat, 1, function(v) {
max(v) == min(v)
}))
## [1] 0
# None of them are constant.
When the function returns a logical you can use it to subset the matrix.
Restrict the matrix to the rows with IQR > 0.5
sub_mat <- expr_mat[apply(expr_mat, 1, function(x) {
IQR(x) > 0.5
}), ]