Why define functions

R has a rich collection of functions for performing calculations necessary for doing statistics. Some functions, like data.frame produce structured objects that store input and output and others, like lm fit data to a statistical model and output parameters that describe the fit. Much of learning R comes down to acquiring a large enough vocabulary of functions to solve your problems. The actual structure of the programs may be quite simple, but you still need to learn the names of the functions that do the required work. Beyond the core R distribution there are thousands of packages in which developers have written new functions for specialized tasks. However, even for everyday users need to write their own functions.

Writing re-usable blocks of code

Quite often when carrying out a project, some set of tasks will need to be repeated with different input. To ensure that the analyses are carried out the same way each time, it is important to wrap the code into a named function that is called each time it's needed. Typically, the function is written in a library file with a .R extension and the file is loaded into the R Markdown document doing an analysis with the source command. Putting this code into its own function saves retyping the commands and ensures reproducibility by forcing exactly the same code to be run when it is used.

Sectioning of analyses into logical groups

In many analyses, the step performed can be grouped into subsections that form fairly discrete chunks. They take specific input and out a small collection of R objects for further use. When this is the case, it is good practice to collect these commands into a function. This helps the reader (often you, a month later) understand the overall structure of the analysis. It also encapsulates the inputs and outputs to this task, regardless of the number of intermediary objects and calculations done in the process.

Performing tasks to arrays and lists of possible inputs

The most common use of defined functions may be to apply a function to every row of matrix, or every component of a list. This is hard to understand without the examples to come later, however it is the principle reason for learning about functions now.

Structure of a function definition

The inputs to a function to a function, often called the arguments are separated by commas within parentheses of the function call. For example, a function named myaddition that takes as input 3 numbers and returns as output another number would be executed as

z <- myaddition(a, b, c)

The definition of myaddition would be found in a block of code that looks like

myaddition <- function(x1, x2, x3) {
  y1 <- code about x1, x2, x3
    y2 <- code about x1, x2, x3, y1
    etc
    output
}

The output line contains the R object you want the function to return without an <- . In a sense the assignment is to the left-hand side of the function call.

The execution of a function has its own namespace. R objects, like y1 and y2 above, created while computing the function's output, but not explicitly output, are invisible to the outside world.

Default arguments

Often, a function will have an argument you can set, but in most uses one specific value is used. When that is the case, to use the default, you just don't mention it in the function call.

For example, the function rnorm generates generates random samples of your desired size in a normal distribution. The default is to use a distribution with mean 0 and standard deviation 1. You can get a vector of 1000 samples by

v <- rnorm(1000)
summary(v)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.7200 -0.6290  0.0330  0.0318  0.6860  3.0300

If you want 500 points from a distribution with mean 2 and standard deviation 4, use

w <- rnorm(500, mean = 2, sd = 4)
summary(w)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -11.300  -0.662   2.070   2.160   4.920  13.400

The help file for rnorm gives a good indication of how to set a default value for one argument of the function. In the functions you define, setting default values may not arise often, but be aware that most of the pre-defined functions you use in R have such values. To fully understand what a function is doing you need to know such details. Read the help file.

Examples

Problem 1

Write a function that for two variables \( x \), \( y \) returns the greatest integer less than \( x^{|y|} \).

Solution 1
Look up the help for round to find a family of such functions. You might also look up the arithmetic topic to learn the exponentiation is ^. abs gives you the absolute value.

my_prob1 <- function(x, y) {
    z <- x^abs(y)
    w <- floor(z)
    w
}
# Test this with various input
a1 <- my_prob1(2, 4)
a1

## [1] 16

# Notice there are no other new variables in the workspace
a2 <- my_prob1(3, -48.32)
a2

## [1] 1.134e+23

my_prob1(0, 0)

## [1] 1