Applying a function to rows or columns of a matrix

Frequently when working with a matrix of numbers you'll want to apply a function to each row or column of a matrix to get a sense of the distributions, for example. Following is an example.

Let's read in a matrix of gene expression values.

setwd("~/Documents/Computing with Data/7_Functions_matrices/")
load("./Data/expr_mat")
dim(expr_mat)
## [1] 22283  1269
expr_mat[1:3, 1:4]
##           GSM79114 GSM79118 GSM79119 GSM79120
## 1007_s_at    9.990    9.536   10.189    9.682
## 1053_at      4.241    5.957    5.579    5.334
## 117_at       3.406    3.615    3.573    3.741

These are expression levels of "genes" as measured by an Affymetrix microarray probe for 1269 breast cancer patients.

Compute the inter-quartile range of each gene (row)

The first method uses a for loop. It computes the values one row at a time. Before entering the for loop, define a variable to hold the results.

iqr_v2 <- numeric(nrow(expr_mat))
for (i in 1:nrow(expr_mat)) {
    iqr_v2[i] <- IQR(expr_mat[i, ])
}

apply gives a more direct way. The format is simpler and may be a little faster.

iqr_vals2 <- apply(expr_mat, 1, IQR)
iqr_vals2[1:3]
## 1007_s_at   1053_at    117_at 
##    0.8371    0.6955    0.4377

This also set the names of the returned variable.

Format of an apply

result <- apply(matrix, margin ( 1= row, 2 = column), function)

The way it works is that the rows of the matrix are used as input to the function. Remember this.

Examples of apply usage

We can compute the mean of the columns as

mean_col <- apply(expr_mat, 2, mean)

Very often what you want to compute about a row isn't axailable as a pre-defined function with a single input variable.

How many rows in the expression matrix have constant expression values? These are ones with max = min.

sum(apply(expr_mat, 1, function(v) {
    max(v) == min(v)
}))
## [1] 0
# None of them are constant.

When the function returns a logical you can use it to subset the matrix.

Restrict the matrix to the rows with IQR > 0.5

sub_mat <- expr_mat[apply(expr_mat, 1, function(x) {
    IQR(x) > 0.5
}), ]