What is a Function?
Posted on November 22, 2015Let’s investigate what, underneath the
covers, an R
function really is, and how
the components are used when R
evaluates
a function. In doing so, we’ll take a
number of detours, as understanding how R
function calls really work requires an
understanding of some subtle details. By
the end of this, we’ll be able to simulate
what R
does internally when it evaluates
a function.
The Three Musketeers
As you might already know from
adv-r,
(or, if you’re brave, the
Technical Details section of ?"function"
),
functions are R
objects made up of three
components:
-
A function body,
-
An environment, and
-
A set of formals.
Let’s look at these components individually, then see how they’re used together when evaluating a function.
The Function Body
The body of a function can be accessed
with the body()
function. Let’s take
a peek at the body of a simple function:
This is just the unevaluated call verbatim
as we wrote above. It’s made up R
language objects, called symbol
s and
call
s. It produces a result when
evaluated within an environment.
In other words, pedantically, it’s a call
to the {
primitive function, which
internally is calling the print
function
with the x
symbol as an argument.
Note that these symbols do not have any meaning until they are evaluated, and their meanings depend on which environment they are evaluated in.
Pedantically, the above function involves a
call to the {
symbol, which will
eventually be resolved as base::"{"
on
evaluation, although one could potentially
override this with their own definition of
{
. Of course, overriding this function is
definitely not recommended!
You might be surprised to see that almost all
of the syntactic elements in R
actually
become function calls after parsing!
But I digress. We understand now that the function body is just an unevaluated call. Let’s move on to the function environment.
The Function Environment
The environment associated with a function
is typically the environment where the
function was created. This is where symbols
within the body()
of a function that
aren’t in the function formals will be
resolved.
Typically, this environment is called the enclosing environment – this is where ‘global’, or ‘non-local’, symbols will be resolved. What does it enclose? And how exactly do function formals get resolved?
Every time a function is evaluated, a new
environment is generated, and any passed
formal arguments are bound within that
environment. We’ll call this the local
evaluation environment, and its parent
environment is the aforementioned enclosing
environment. The evaluation environment can
be accessed by calling environment()
within the function body. Let’s demonstrate
this:
Although it is typically bad form to have functions depend on global variables, it can be useful when functions are defined or returned within functions. Let’s illustrate with an example.
The following function make_fn()
returns
a function, whose enclosing environment is
the local environment created when invoking
make_fn()
itself. Note that this means
that each function returned by make_fn()
gets its own copy of symbols generated when
evaluating the function body:
We now have a few key pieces of
understanding in how R
function
evaluation works. When evaluating a
function,
-
A new, local, environment is generated, with its parent environment being the enclosing environment of the called function,
-
The formals are bound within that local environment,
-
The function body is evaluated in that local environment.
We have the main pieces, but 2. is a bit fuzzy. How exactly are the formals bound within that environment? How do default arguments work?
The Function Formals
The formals of a function can be accessed with
formals()
. It is an R
object that maps
argument names to default values (if they exist).
We can see that stats::rnorm()
takes three
arguments: n
, mean
and sd
; and the
argument mean
gets a default of 0
, while
sd
gets a default of 1
.
Note that, since n
does not receive a
default argument, it is assigned the so-called
‘missing’ symbol. Attempting to evaluate this
symbol directly throws an error:
In case you’re curious, you can create the
so-called missing symbol with the call
quote(expr = )
. (Looks, weird, I know.)
Just to confirm:
Knowing what we already do about how function evaluation works, we just need to figure out how the formals are ‘inserted’ into the local function environment.
So, when foo()
above is evaluated, the
formals a
and b
have already been bound
within the generated local environment. But
there’s still the over-arching question:
when, and how, are they bound exactly? If
we assign an argument name the result of an
expression, e.g. foo(x = 1 + 2)
, when will
1 + 2
be evaluated?
Let’s illustrate with an example. Try to guess what the output of the following function call will be.
You might be surprised that > x
is not
printed until x
is actually evaluated in
the function body, and that it is only
printed once. Why is that?
This demonstrates R
s lazy evaluation of
function arguments. When R
binds
formals to the local generated function
environments, it does so with a promise.
Roughly speaking, a promise is a
transient object that does not produce the
result of its evaluation until explicitly
requested. It is only evaluated once; after
it has been evaluated, the returned value
is stored and returned on any later
subsequent evaluations of that object.
This means that, in the first call to x
above, the expression cat("> x\n")
is
evaluated, and its result is returned and
assigned to x
(in this case, calling the
cat()
function just returns NULL
). The
second time, we’ve already ‘forced’ the
promise, and so we don’t evaluate x
again.
We now understand all the main pieces in
how R
evaluates a function! I am glossing
over the details related to argument
matching. Given what you know now, try to
convince yourself that this stage simply
maps function argument names to promises to
be executed later – either with the
default argument, or an argument provided
by you, the user.
Pop quiz: can you predict the result of this function call?
Because x
is a promise, with expression
y <- TRUE
, when that promise is evaluated
we bind the symbol y
with value TRUE
in
the executing environment. Therefore,
attempting to access y
after forcing the
promise bound to x
will succeed.
That said, please don’t actually write code like this.
Putting it Together
Given the above, let’s try simulating
what happens during function evaluation
ourselves. We’ll make use of one tool,
delayedAssign()
, which will allow us
to generate a promise.
And voila – we’ve successfully simulated
function evaluation in R
. Now let’s put
this all together in a function. We’ll
call it evalf()
, for ‘evaluate function’.
Voila!
Note that this scheme really is quite
simplified – it does not capture what
happens with S3 or S4 dispatch, and it
also does not reflect what happens when
so-called primitive
functions are
called:
It does otherwise accurately portray
how evaluation of ‘vanilla’ R
functions
works. Behind the scenes, R
will handle dispatch and such using C code,
and perform non-standard evaluation to
handle things like the use of UseMethod()
within a function body.
Hopefully this exercise has helped you
piece together what happens behind the
scenes in R
function evaluation, and
given you some understanding of how
R
’s lazy evaluation of function
arguments works.
EDIT 1: Improved implementation of evalf()
;
the original version did not capture promises
appropriately.