15-FP.Rmd

# An Introduction to Functional Programming

**Functional Programming (FP)**\index{functions!functional programming} is another way of thinking about how to organize programs. We talked about OOP---another way to organize programs---in the last chapter (chapter \@ref(an-introduction-to-object-oriented-programming)). So how do OOP and FP differ? To put it simply, FP focuses on functions instead of objects. Because we are talking a lot about functions in this chapter, we will assume you have read and understood section \@ref(functions).

Neither R nor Python is a purely functional language. For us, FP is a style that we can choose to let guide us, or that we can disregard. You can choose to employ a more functional style, or you can choose to use a more object-oriented style, or neither. Some people tend to prefer one style to other styles, and others prefer to decide which to use depending on the task at hand. 

More specifically, a functional programming style takes advantage of **first-class functions**\index{functions!first-class functions} and favors functions that are **pure**.\index{functions!pure functions} 

1. **First-class functions** are [@struc_and_interp] functions that 
    - can be passed as arguments to other functions, 
    - can be returned from other functions, and 
    - can be assigned to variables or stored in data structures. 

2. **Pure functions** 
    - return the same output if they are given the same input, and 
    - do not produce **side-effects**. 
    
Side-effects are changes made to non-temporary variables, to the "state" of the program. 

We discussed (1) in the beginning of chapter \@ref(functions). If you have not used any other programming languages before, you might even take (1) for granted. However, using first-class functions can be difficult in other languages not mentioned in this text. 

There is more to say about definition (2). This means you should keep your functions as *modular* as possible, unless you want your overall program to be much more difficult to understand. FP stipulates that

- **ideally functions will not refer to non-local variables;**

- **ideally functions will not (refer to and) modify non-local variables; and**

- **ideally functions will not modify their arguments.**
    
Unfortunately, violating the first of these three criteria is very easy to do in both of our languages. Recall our conversation about *dynamic lookup* in subsection \@ref(accessing-and-modifying-captured-variables). Both R and Python use dynamic lookup,\index{dynamic lookup} which means you can't reliably control *when* functions look for variables. Typos in variable names easily go undiscovered, and modified global variables can potentially wreak havoc on your overall program. 

Fortunately it is difficult to modify global variables inside functions in both R and Python. This was also discussed in subsection  \@ref(accessing-and-modifying-captured-variables). In Python, you need to make use of the `global` keyword (mentioned in section \@ref(passing-by-assignment-in-python)), and in R, you need to use the rare super assignment operator (it looks like `<<-`, and it was mentioned in \@ref(passing-by-value-in-r)). Because these two symbols are so rare, they can serve as signals to viewers of your code about when and where (in which functions) global variables are being modified.

Last, violating the third criterion is easy in Python and difficult in R. This was discussed earlier in \@ref(modifying-a-functions-arguments). Python can mutate/change arguments that have a mutable type because it has *pass-by-assignment* semantics\index{pass-by-assignment} (mentioned in section \@ref(passing-by-assignment-in-python)), and R generally can't modify its arguments at all because it has *pass-by-value*\index{pass-by-value} semantics \@ref(passing-by-value-in-r).\index{mutable in Python}

This chapter avoids the philosophical discussion of FP. Instead, it takes the applied approach, and provides instructions on how to use FP in your own programs. I try to give examples of *how* you can use FP, and *when* these tools are especially suitable. 

One of the biggest tip-offs that you should be using functional programming is if you need to evaluate a single function many times, or in many different ways. This happens quite frequently in statistical computing. Instead of copy/pasting similar-looking lines of code, you might consider *higher-order* functions that take your function as an input, and intelligently call it in all the many ways you want it to. A third option you might also consider is to use a loop (c.f. \@ref(loops)). However, that approach is not very functional, and so it will not be heavily-discussed in this section.

Another tip-off that you need FP is if you need many different functions that are all "related" to one another. Should you define each function separately, using excessive copy/paste-ing? Or should you write a function that can elegantly generate any function you need?

Not repeating yourself and re-using code is a primary motivation, but it is not the only one. Another motivation for **functional programming** is clearly explained in [Advanced R](https://adv-r.hadley.nz/fp.html)^[Even though this book only discusses *one* of our languages of interest, this quote applies to both langauges.]:

> A functional style tends to create functions that can easily be analysed in isolation (i.e. using only local information), and hence is often much easier to automatically optimise or parallelise.

All of these sound like a good things to have in our code, so let's get started with some examples! 


## Functions as Function Inputs in R


Many of the most commonly-used functionals in R have names that end in "apply". The ones I discuss are `sapply()`, `vapply()`, `lapply()`, `apply()`, `tapply()` and `mapply()`. Each of these takes a function as one of its arguments. Recall that this is made possible by the fact that R has first-class functions.

### `sapply()` and `vapply()`

```{r, echo=F}
myDFrows <- 10
myDFcols <- 100
myDF <- data.frame(matrix(rnorm(myDFrows*myDFcols), ncol = myDFcols))
```

Suppose we have a `data.frame` that has `r nrow(myDF)` rows and `r ncol(myDF)` columns. What if we want to take the mean of each column?

An amateurish way to do this would be something like the following.

```{r, eval=F}
myFirstMean <- mean(myDF[,1])
mySecondMean <- mean(myDF[,2])
# ... so on and so forth ..
myHundredthMean <- mean(myDF[,100])
```

You will need one line of code for each column in the data frame! For data frames with a lot of columns, this becomes quite tedious. You should also ask yourself what happens to you and your collaborators when the data frame changes even slightly, or if you want to apply a different function to its columns. Third, the results are not stored in a single container. You are making it difficult on yourself if you want to use these variables in subsequent pieces of code.

::: {.rmd-details data-latex=""}
"Don't repeat yourself" (DRY) is an idea that's been around for a while and is widely accepted [@hunt2000pragmatic]. DRY is the opposite of [WET](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself#WET).\index{DRY code|see{WET code}}\index{WET code|see{DRY code}}
:::

Instead, prefer the use of `sapply()` in this situation. The "s" in `sapply()` stands for "simplified." In this bit of code `mean()` is called on each column of the data frame. `sapply()` applies the function over columns, instead of rows, because data frames are internally a `list` of columns. 

```{r, collapse=TRUE}
myMeans <- sapply(myDF, mean)
myMeans[1:5]
```

Each call to `mean()` returns a `double` `vector` of length $1$. This is necessary if you want to collect all the results into a `vector`--remember, all elements of a `vector` have to have the same type. To get the same behavior, you might also consider using `vapply(myDF, mean, numeric(1))`.

In the above case, "simplify" referred to how one-hundred length-$1$ vectors were simplified into one length-$100$ `vector`. However, "simplified" does not necessarily imply that all elements will be stored in a `vector`. Consider the `summary` function, which returns a `double` `vector` of length $6$. In this case, one-hundred length-$6$ vectors were simplified into one $6 \times 100$ matrix.

```{r, collapse = TRUE}
mySummaries <- sapply(myDF, summary)
is.matrix(mySummaries)
dim(mySummaries)
```


::: {.rmd-details data-latex=""}
Another function that is worth mentioning is `replicate()`--it is a wrapper for `sapply()`. Consider a situation where you want to call a function many times with the same inputs. You might try something like `sapply(1:100, function(elem) {  return(myFunc(someInput)) } )`. Another, more readable, way to do this is `replicate(100, myFunc(someInput))`. 
:::


### `lapply()`

For functions that do not return amenable types that fit into a `vector`, `matrix` or `array`, they might need to be stored in `list`. In this situation, you would need `lapply()`. The "l" in `lapply()` stands for "list". `lapply()` always returns a `list` of the same length as the `input`.

```{r, collapse = TRUE}
regress <- function(y){ lm(y ~ 1) }
myRegs <- lapply(myDF, regress)
length(myRegs)
class(myRegs[[1]])
summary(myRegs[[12]])
```

### `apply()`

I use `sapply()` and `lapply()` the most, personally. The next most common function I use is `apply()`. You can use it to apply functions to *rows* of rectangular arrays instead of columns. However, it can also apply functions over columns, just as the other functions we discussed can.^[`apply()` is everyone's favorite whipping boy whenever it comes to comparing `apply()` against the other `*apply()` functions. This is because it is generally a little slower--it is written in R and doesn't call out to compiled C code. However, in my humble opinion, it doesn't matter all that much because the fractions of a second saved don't always add up in practice. ] 

```{r, collapse = TRUE}
dim(myDF)
results <- apply(myDF, 1, mean)
results[1:4]
```

Another example where it can be useful to apply a function to rows is  **predicate functions.** A predicate function is just a fancy name for a function that returns a Boolean. I use them to filter out rows of a `data.frame`. Without a predicate function, filtering rows might look something like this on our real estate data [@albemarle_county_gis_web] [@clay_ford].

```{r, collapse = TRUE,  R.options = list(width = 65)}
albRealEstate <- read.csv("data/albemarle_real_estate.csv")
firstB <- albRealEstate$YearBuilt == 2006 
secondB <- albRealEstate$Condition == "Average"
thirdB <- albRealEstate$City == "CROZET"
subDF <- albRealEstate[(firstB & secondB) | thirdB,]
str(subDF, strict.width = "cut")
```

Complicated filtering criteria can become quite wide, so I prefer to break the above code into three steps.

- Step 1: write a predicate function that returns TRUE or FALSE;
- Step 2: construct a `logical` `vector` by `apply()`ing the predicate over rows;
- Step 3: plug the `logical` `vector` into the `[` operator to remove the rows.
   
```{r, collapse = TRUE,  R.options = list(width = 65)}
pred <- function(row){
  yrBuiltCorrect <- row['YearBuilt'] == 2006
  aveCond <- row['Condition'] == "Average"
  inCrozet <- row['City'] == "CROZET"
  ( yrBuiltCorrect && aveCond) || inCrozet 
}
whichRows <- apply(albRealEstate, 1, pred)
subDF <- albRealEstate[whichRows,]
str(subDF, strict.width = "cut")
``` 


### `tapply()`

`tapply()` can be very handy when you need it. First, we've alluded to the definition before in subsection \@ref(data-frames-in-r), but a **ragged array** is a collection of arrays that all have potentially different lengths. I don't typically construct such an object and then pass it to `tapply()`. Rather, I let `tapply()` construct the ragged array for me. The first argument it expects is, to quote the documentation, "typically vector-like," while the second tells us how to break that `vector` into chunks. The third argument is a function that gets applied to each `vector` chunk.

If I wanted the average home price for each city, I could use something like this.

```{r, collapse = TRUE,  R.options = list(width = 65)}
str(albRealEstate, strict.width = "cut")
length(unique(albRealEstate$City))
tapply(albRealEstate$TotalValue, list(albRealEstate$City), mean)[1:4]
```

You might be wondering why we put `albRealEstate$City` into a `list`. That seems kind of unnecessary. This is because `tapply()` can be used with multiple `factor`s--this will break down the `vector` input into a finer partition. The second argument must be one object, though, so all of these `factor`s must be collected into a `list`. The following code produces a "pivot table."

```{r, collapse = TRUE}
pivTable <- tapply(albRealEstate$TotalValue, 
                   list(albRealEstate$City, albRealEstate$Condition), 
                   mean)
pivTable[,1:5]
```


::: {.rmd-details data-latex=""}
For functions that return higher-dimensional output, you will have to use something like `by()` or `aggregate()` in place of `tapply()`.
:::


### `mapply()`


The [documentation of `mapply()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/mapply.html) states `mapply()` is a multivariate version of `sapply()`. `sapply()` worked with univariate functions; the function was called multiple times, but each time with a single argument. If you have a function that takes *multiple arguments*, and you want those arguments to change each time the function is called, then you might be able to use `mapply()`. 

Here is a short example. Regarding the `n=` argument of `rnorm()`, the documentation explains, "[i]f `length(n) > 1`, the length is taken to be the number required." This would be a problem if we want to sample three times from a mean $0$ normal first, then twice from a mean $100$ normal, and then third, once from a mean $42$ normal distribution. We only get three samples when we want six!

```{r}
rnorm(n = c(3,2,1), mean = c(0,100,42), sd = c(.01, .01, .01))
mapply(rnorm, n = c(3,2,1), mean = c(0,100,42), sd = c(.01, .01, .01))
```

### `Reduce()` and `do.call()`

Unlike the other examples of functions that take other functions as inputs, `Reduce()` and `do.call()` don't have many outputs. Instead of collecting many outputs into a container, they just output one thing.

Let's start with an example: "combining" data sets. In section \@ref(reshaping-and-combining-data-sets) we talked about several different ways of combining data sets. We discussed stacking data sets on top of one another with `rbind()` (c.f. subsection \@ref(stacking-data-sets-and-placing-them-shoulder-to-shoulder)), stacking them side-by-side with `cbind()` (also in \@ref(stacking-data-sets-and-placing-them-shoulder-to-shoulder)), and intelligently joining them together with `merge()` (c.f. \@ref(merging-or-joining-data-sets)). 

Now consider the task of combining *many* data sets. How can we combine three or more data sets into one? Also, how do we write DRY code and abide by the DRY principle? As the name of the subsection suggests, we can use either `Reduce()` or `do.call()` as a higher-order function. Just like the aforementioned `*apply()` functions, they take in either `cbind()`, `rbind()`, or `merge()` as a function input. Which one do we pick, though? The answer to that question deals with *how many arguments our lower-order function takes.* 

Take a look at the documentation to `rbind()`. Its first argument is `...`, which is the [dot-dot-dot](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Dot_002ddot_002ddot) symbol. This means `rbind()` can take a varying number of `data.frame`s to stack on top of each other. In other words, `rbind()` is **variadic**.

On the other hand, take a look at the documentation of `merge()`. It only takes two `data.frame`s at a time^[Although, it is still variadic. The difference is that the `dot-dot-dot` symbol does not refer to a varying number of `data.frame`s, just a varying number of other things we don't care about in this example.]. If we want to combine many data sets, `merge()` needs a helper function. 

This is the difference between `Reduce()` and `do.call()`. `do.call()` calls a function once on many arguments, so its function must be able to handle many arguments. On the other hand, `Reduce()` calls a binary function many times on pairs of arguments. `Reduce()`'s function argument gets called on the first two elements, then on the first output and the third element, then on the second output and fourth element, and so on. 

Here is an initial example that makes use of four data sets `d1.csv`, `d2.csv`, `d3.csv`, and `d4.csv`. To start, ask yourself how we would read all of these in. There is a temptation to copy and paste `read.csv` calls, but that would violate the DRY principle. Instead, let's use `lapply()` an anonymous function that constructs a file path string, and then uses it to read in the data set the string refers to.

```{r, collapse = TRUE}
numDataSets <- 4
dataSets <- paste0("d",1:numDataSets)
dfs <- lapply(dataSets, 
              function(name) read.csv(paste0("data/", name, ".csv")))
head(dfs[[3]])
```

Notice how the above code would only need to be changed by one character if we wanted to increase the number of data sets being read in!^[To make it even more flexible, we could write code that doesn't assume the functions are all named the same way, or in the same directory together.]

Next, `cbind()`ing them all together can be done as follows. `do.call()` will call the function only once. `cbind()` takes many arguments at once, so this works. This code is even better than the above code in that if `dfs` becomes longer, or changes at all, *nothing* will need to be changed. 

```{r, collapse = TRUE}
do.call(cbind, dfs) # DRY! :)
# cbind(df1,df2,df3,df4) # WET! :(
```

What if we wanted to `merge()` all these data sets together? After all, the `id` column appears to be repeating itself, and some data from `d2` isn't lining up.

```{r, collapse=TRUE}
Reduce(merge, dfs)
```
Again, this is very DRY code. Nothing would need to be changed if `dfs` grew. Furthermore, trying to `do.call()` the `merge()` function wouldn't work because it can only take two data sets at a time. 


## Functions as Function Inputs in Python


### Functions as Function Inputs in Base Python

I discuss two functions from base Python that take functions as input. Neither return a `list` or a `np.array`, but they do return different kinds of **iterables**, which are "objects capable of returning their members one at a time," [according to the Python documentation.](https://docs.python.org/3/glossary.html) `map()`, the function, will return objects of type `map`. `filter()`, the function, will return objects of type `filter`. Often times we will just convert these to the container we are more familiar with.

#### `map()`

[`map()`](https://docs.python.org/3/library/functions.html#map) can call a function repeatedly using elements of a container as inputs. Here is an example of calculating outputs of a *spline* function, which can be useful for coming up with predictors in regression models. This particular spline function is $f(x) = (x-k)1(x \ge k)$, where $k$ is some chosen "knot point."

```{python, collapse=TRUE}
import numpy as np
my_inputs = np.linspace(start = 0, stop = 2*np.pi)
def spline(x):
    knot = 3.0
    if x >= knot:
        return x-knot
    else:
        return 0.0
output = list(map(spline, my_inputs))
```

We can visualize the mathematical function by plotting its outputs against its inputs. More information on visualization was given in subsection \@ref(visualization). 

```{r spline-plot, out.width='80%', fig.align='center', fig.cap = "Our Spline Function", echo = F}
knitr::include_graphics("pics/spline.png")
```

`map()` can also be used like `mapply()`. In other words, you can apply it to two containers,

```{python, collapse=TRUE}
import numpy as np
x = np.linspace(start = -1., stop = 1.0)
y = np.linspace(start = -1., stop = 1.0)
def f(x,y):
    return np.log(x**2 + y**2)
list(map(f, x, y))[:3]
```


#### `filter()`


[`filter()`](https://docs.python.org/3/library/functions.html#filter) helps remove unwanted elements from a container. It returns an iterable of type `filter`, which we can iterate over or convert to a more familiar type of container. In this example, I iterate over it without converting it.

This code also provides our first example of a [**lambda function**](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) [@Lutz13]. Lambda functions are simply another way to define functions. Notice that in this example, we didn't have to name our function. In other words, it was **anonymous**. We can also save a few lines of code. 


```{python, collapse = TRUE}
raw_data = np.arange(0,1.45,.01)
for elem in filter(lambda x : x**2 > 2, raw_data):
    print(elem)
```


### Functions as Function Inputs in Numpy

Numpy provides a [number of functions](https://numpy.org/doc/stable/reference/routines.functional.html) that facilitate working with `np.ndarray`s in a functional style. For example, [`np.apply_along_axis()`](https://numpy.org/doc/stable/reference/generated/numpy.apply_along_axis.html) is similar to R's `apply()`. `apply()` had a `MARGIN=` input (`1` sums rows, `2` sums columns), whereas this function has a `axis=` input (`0` sums columns, `1` sums rows). 

```{python, collapse = TRUE}
import numpy as np
my_array = np.arange(6).reshape((2,3))
my_array
np.apply_along_axis(sum, 0, my_array) # summing columns
np.apply_along_axis(sum, 1, my_array) # summing rows
```


### Functional Methods in Pandas

Pandas\index{Pandas}' `DataFrame`s have an [`.apply()` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) that is very similar to `apply()` in R,^[You should know that a lot of special-case functions that you typically apply to rows or columns come built-in as `DataFrame` methods. For instance, `.mean()` would allow you to do something like `my_df.mean()`.] but again, just like the above function, you have to think about an `axis=` argument instead of a `MARGIN=` argument. 

```{python, collapse = TRUE}
import pandas as pd
alb_real_est = pd.read_csv("data/albemarle_real_estate.csv")
alb_real_est.shape
alb_real_est.apply(len, axis=0) # length of columns
type(alb_real_est.apply(len, axis=1)) # length of rows
```

Another thing to keep in mind is that `DataFrame`s, unlike `ndarray`s, don't have to have the same type for all elements. If you have mixed column types, then summing rows, for instance, might not make sense. This just requires subsetting columns before `.apply()`ing a function to rows. Here is an example of computing each property's "score". 

```{python, collapse = TRUE}
import pandas as pd
# alb_real_est.apply(sum, axis=1) # can't add letters to numbers!
def get_prop_score(row):
    return 2*row[0] + 3*row[1]
two_col_df = alb_real_est[['FinSqFt','LotSize']]
alb_real_est['Score'] = two_col_df.apply(get_prop_score, 1)
alb_real_est[['FinSqFt','LotSize','Score']].head(2)
```

`.apply()` also works with more than one function at a time.

```{python, collapse = TRUE}
alb_real_est[['FinSqFt','LotSize']].apply([sum, len])
```

If you do not want to waste two lines defining a function with `def`, you can use an anonymous lambda function. Be careful, though--if your function is complex enough, then your lines will get quite wide. For instance, this example is pushing it.

```{python, collapse = TRUE}
two_col_df.apply(lambda row : sum(row*[2,3]), 1)[:3]
```


The previous example `.apply()`s a *binary* function to each row. The function is binary because it takes two elements at a time. If you want to apply a *unary* function (i.e. it takes one argument at a time) function to each row for, and for each column, then you can use [`.applymap()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap).

```{python, collapse = TRUE}
alb_real_est[['FinSqFt','LotSize']].applymap(lambda e : e + 1).head(3)
```


Last, we have a [`.groupby()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) method, which can be used to mirror the behavior of R's `tapply()`, `aggregate()` or `by()`. It can take the `DataFrame` it belongs to, and group its rows into multiple sub-`DataFrame`s. The collection of sub-`DataFrames` has a lot of the same methods that an individual `DataFrame` has (e.g. the subsetting operators,  and the `.apply()` method), which can all be used in a second step of calculating things on each sub-`DataFrame`.

```{python, collapse = TRUE}
type(alb_real_est.groupby(['City']))
type(alb_real_est.groupby(['City'])['TotalValue'])
```

Here is an example that models some pretty typical functionality. It shows two ways to get the average home price by city. The first line groups the rows by which `City` they are in, extracts the `TotalValue` column in each sub-`DataFrame`, and then `.apply()`s the `np.average()` function on the sole column found in each sub-`DataFrame`. The second `.apply()`s a lambda function to each sub-`DataFrame` directly. More details on this "split-apply-combine" strategy can be found in the [Pandas documentation.](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)

```{python, collapse = TRUE}
grouped = alb_real_est.groupby(['City'])
grouped['TotalValue'].apply(np.average)
grouped.apply(lambda df : np.average(df['TotalValue']))
```


## Functions as Function Outputs in R

Functions that create and return other functions are sometimes called **function factories.** Functions are first-class objects in R, so it's easy to return them. What's more interesting is that supposedly temporary objects inside the outer function can be accessed during the call of the inner function after it's returned. 

Here is a first quick example. 

```{r, collapse = TRUE}
funcFactory <- function(greetingMessage){
  function(name){
    paste(greetingMessage, name)
  }
}
greetWithHello <- funcFactory("Hello")
greetWithHello("Taylor")
greetWithHello("Charlie")
```

Notice that **the `greetingMessage=` argument that is passed in, `"Hello"`, isn't temporary anymore.** It lives on so it can be used by all the functions created by `funcFactory()`. This is the most surprising aspect of writing function factories. 

Let's now consider a more complicated and realistic example. Let's implement a variance reduction technique called **common random numbers**. 

Suppose $X \sim \text{Normal}(\mu, \sigma^2)$, and we are interested in approximating an expectation of a function of this random variable. Suppose that we don't know that
\begin{equation} 
\mathbb{E}[\sin(X)] = \sin(\mu) \exp\left(-\frac{\sigma^2}{2}\right)
\end{equation}
for any particular choice of $\mu$ and $\sigma^2$, and instead, we choose to use the Monte Carlo method:
\begin{equation} 
\hat{\mathbb{E}}[\sin(X)] = \frac{1}{n}\sum_{i=1}^n\sin(X^i)
\end{equation}
where $X^1, \ldots, X^n \overset{\text{iid}}{\sim} \text{Normal}(\mu, \sigma^2)$ is a large collection of draws from the appropriate normal distribution, probably coming from a call to `rnorm()`. In more realistic situations, the theoretical expectation might not be tractable, either because the random variable has a complicated distribution, or maybe because the functional is very complicated. In these cases, a tool like Monte Carlo might be the only available approach.

Here are two functions that calculate the above quantities for $n=1000$. `actualExpectSin()` is a function that computes the theoretical expectation for any particular parameter pair. `monteCarloSin()` is a function that implements the Monte Carlo approximate expectation. 

```{r, collapse = TRUE}
n <- 1000 # don't hardcode variables that aren't passed as arguments!
actualExpectSin <- function(params){
  stopifnot(params[2] > 0) # second parameter is sigma
  sin(params[1])*exp(-.5*(params[2]^2))
}
monteCarloSin <- function(params){
  stopifnot(params[2] > 0) 
  mean(sin(rnorm(n = n, mean = params[1], sd = params[2])))
}
# monteCarloSin(c(10,1))
```

One-off approximations aren't as interesting as visualizing many expectations for many parameter inputs. Below we plot the expectations for many different parameter vectors/configurations/settings. 

```{r, collapse = TRUE, out.width='80%', fig.align='center', fig.cap = "Monte Carlo Approximations versus Exact Evaluations"}
muGrid <- seq(-10,10, length.out = 100)
sigmaGrid <- seq(.001, 5, length.out = 100)
muSigmaGrid <- expand.grid(muGrid, sigmaGrid)
actuals <-  matrix(apply(muSigmaGrid, 1, actualExpectSin), 
                   ncol = length(muGrid))
mcApprox <- matrix(apply(muSigmaGrid, 1, monteCarloSin), 
                   ncol = length(muGrid))

par(mfrow=c(1,2))
contour(muGrid, sigmaGrid, actuals, 
        xlab = "mu", ylab = "sigma", main = "actual expects")
contour(muGrid, sigmaGrid, mcApprox, 
        xlab = "mu", ylab = "sigma", main = "mc without crn")

```

```{r, echo=FALSE}
par(mfrow=c(1,1))
```

There are three problems with this implementation:

  - `monteCarloSin()` is not pure because it captures the `n` variable, 
  - the only way to increase the accuracy of the plot in the right panel is to increase `n`, and
  - every time we re-run this code the plot on the right looks different.

If we wanted to use common random numbers, we could generate $Z^1, \ldots, Z^n \overset{\text{iid}}{\sim} \text{Normal}(0, 1)$, and use the fact that

\begin{equation} 
X^i = \mu + \sigma Z^i
\end{equation}

This leads to the Monte Carlo estimate

\begin{equation} 
\tilde{\mathbb{E}}[\sin(X)] = \frac{1}{n}\sum_{i=1}^n\sin(\mu + \sigma Z^i)
\end{equation}

Here is one function that naively implements Monte Carlo with common random numbers. We generate the collection of standard normal random variables once, globally. Each time you call `monteCarloSinCRNv1(c(10,1))`, you get the same answer. 
```{r, collapse = TRUE}
commonZs <- rnorm(n=n)
monteCarloSinCRNv1 <- function(params){
  stopifnot(params[2] > 0) 
  mean(sin(params[1] + params[2]*commonZs))
} 
# monteCarloSinCRNv1(c(10,1))
```

Let's compare using common random numbers to going without. As you can see, common random numbers make the plot look "smoother." In other words, we increase our sampling accuracy without spending more computational time.

```{r, collapse=TRUE, out.width='80%', fig.align='center', fig.cap = "Monte Carlo: With and Without Common Random Numbers"}
mcApproxCRNv1 <- matrix(apply(muSigmaGrid, 1, monteCarloSinCRNv1), 
                        ncol = length(muGrid))
par(mfrow=c(1,2))
contour(muGrid, sigmaGrid, mcApprox, 
        xlab = "mu", ylab = "sigma", main = "mc without crn")
contour(muGrid, sigmaGrid, mcApproxCRNv1, 
        xlab = "mu", ylab = "sigma", main = "mc with crn")
par(mfrow=c(1,1))
```

There are some new downsides to this implementation to consider:

  - we have another global variable--a bunch of samples called `commonZs` floating around, and
  - the dependence on the global variable for sample size is even further obscured.
  
We can fix these two problems very nicely by using a function factory.

```{r, collapse = TRUE}
makeMCFunc <- function(n = 1000){
  commonZs <- rnorm(n)
  function(params){
    stopifnot(params[2] > 0) 
    mean(sin(params[1] + params[2]*commonZs))    
  }  
}
monteCarloSinCRNv2 <- makeMCFunc()
# now call monteCarloSinCRNv2 to approx. expectations
# e.g. monteCarloSinCRNv2(c(10,1))
```

This is much better because 

  - the desired sample size must be passed in as a function argument instead of being captured, 
  - the re-used standard normal variates are not in the global environment anymore, and
  - a sensible default number of samples is provided in the event that the programmer forgets to specify one. 

The inner function did in fact capture `commonZs`, but it captured from the enclosing scope, not the global scope. Capturing isn't always a terrible idea. It would be difficult to modify these variables, so we don't need to worry about function behavior changing in unpredictable ways. Actually capturing a variable instead of passing it in is an intelligent design choice--now the end-users of functions created by this factory don't need to worry about plugging in extra parameters. 

Let's use $1000$ samples again and make sure this function works by comparing its output to the known true function. Run the following code on your own machine. Note the new Greek letters in the axis labels.

```{r, collapse=TRUE, eval=FALSE}
mcApproxCRNv2 <- matrix(apply(muSigmaGrid, 1, monteCarloSinCRNv2), 
                        ncol = length(muGrid))
par(mfrow=c(1,2))
contour(muGrid, sigmaGrid, mcApprox, 
        xlab = expression(mu), ylab = expression(sigma))
contour(muGrid, sigmaGrid, mcApproxCRNv2, 
        xlab = expression(mu), ylab = expression(sigma))
par(mfrow=c(1,1))
```

## Functions as Function Outputs in Python

We can write function factories in Python, too. Here is another implementation of the first example from the previous section. Again, just as it did in R, `str` passed in as `greeting_message` persists well after `func_factory()` is finished working. 

```{python, collapse = TRUE}
def func_factory(greeting_message):
    def func(name):
        print(greeting_message + ' ' + name)
    return func
greet_with_hello = func_factory("Hello")
greet_with_hello("Taylor")
greet_with_hello("Charlie")
```

Let's consider another less trivial example. Recall the spline function from earlier in the chapter:

```{python, collapse = TRUE}
import numpy as np
def spline(x):
    knot = 3.0
    if x >= knot:
        return x-knot
    else:
        return 0.0
```

This function is limited in that it takes in only one element at a time. Unfortunately, we would not be able to provide an entire Numpy array as an argument (e.g. `spline(np.arange(3))`). Many functions do possess this behavior, and it is generally advantageous to take advantage of it. If you recall our discussion about universal functions in section \@ref(vectorization-in-python), you might have grown accustomed to taking advantage of writing vectorized code. 

Fortunately there's a way to automatically vectorize functions like the one above: [`np.vectorize()`](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html#numpy.vectorize). `np.vectorize()` takes in a unary function, and outputs a vectorized version of it that is able to take entire arrays as an input. Here's an example. Compare this to us using `map()` before. 

```{python, collapse = TRUE}
my_inputs = np.linspace(start = 0, stop = 2*np.pi)
# make a vectorized function
vec_spline = np.vectorize(spline) 
vec_spline(my_inputs)[:4]
# alternatively output = list(map(spline, my_inputs)) from last time
```

The above code doesn't just demonstrate how to return functions from a function. It is also an example of using functions as function inputs. When a function takes in and spits out functions, there is an alternative way to use it that is unique to Python. You can use **[function decorators](https://www.python.org/dev/peps/pep-0318/).**\index{functions!function decorators} You can *decorate* a function by using the `@` operator [@Lutz13]. 

If you decorate a function, it is equivalent to passing that function in to a function factory (aka outer function). That function will take the function you defined, alter it, and then give it back to you with the same name that you chose in the first place. 


```{python, collapse = TRUE}
# instead of spline = np.vectorize(spline) 
@np.vectorize
def spline(x):
    knot = 3.0
    if x >= knot:
        return x-knot
    else:
        return 0.0
spline(my_inputs)[:4]
```


### Writing Our Own Decorators

We can write our own function factory that can be used as decoration. The main restriction is that this function factory must take a function as an argument, too. This can sometimes be restrictive. You might have noticed that the definition of `func_factory()` from earlier in this section did not do that. If you don't believe me, as an exercise, after you read this section, you might consider trying to rewrite the example from \@ref(functions-as-function-outputs-in-r) that implements Monte Carlo sampling using common random numbers. 

Before we get too ahead of ourselves, let's describe the basics. Here is our first decorator function `add_greeting()`.

```{python, collapse = TRUE}
def add_greeting(func):
    def wrapper(name):
        print('Salutations, ')
        func(name)
    return wrapper
```

The decorator `add_greeting()` returns a function that is an embellished version of the function it is given. When we decorate a function with it, it looks like this. 

```{python, collapse = TRUE}
@add_greeting
def print_name(first_name):
    print(first_name)
```

You could get the same behavior by typing the following. They are equivalent!

```{python, collapse = TRUE}
def print_name(first_name):
    print(first_name)
print_name = add_greeting(print_name)
```

Things can get a little more complicated when your decorators take additional arguments. 

```{python, collapse = TRUE, echo = F}
def add_greeting(greet):
    def decorator(func):
        def wrapper(name):
            print(greet)
            func(name)
        return wrapper
    return decorator
```


```{python, collapse = TRUE}
@add_greeting("How you doin'")
def print_name(first_name):
    print(first_name)
print_name('Taylor')
```


So how do we write decorators that accomplish this? The important thing to remember is that `@add_greeting("How you doin'")` in the previous code block is equivalent to writing this after the function definition: `print_name = add_greeting("How you doin'")(print_name)`. This is a function returning a function returning a function! The definition of `add_greeting()` could look something like this.

```{python, collapse = TRUE}
def add_greeting(greet):
    def decorator(func):
        def wrapper(name):
            print(greet)
            func(name)
        return wrapper
    return decorator
```


Now that you know how decorators work, you can feel comfortable using third-party ones. You might come across, for example, the `@jit` decorator from [Numba](https://numba.pydata.org/), which will translate your Python function into faster machine code, the `@lru_cache` decorator from the [`functools` module](https://docs.python.org/3/library/functools.html)--this can make your code faster by saving some of its outputs--or decorators that perform application specific tasks like [`@tf.function`](https://www.tensorflow.org/api_docs/python/tf/function) from Tensorflow. 


## Exercises


### Python Questions

1.

Write a function decorator called `@log_dens_eval(left_bound, right_bound)`. When it decorates a function, say `func(x)`, it will not change that function's input or output, but it will verify that the input to the function (in this case `x`) is between `left_bound` and `right_bound`. If it is not, it will return negative infinity. 

2.

The Split-Apply-Combine strategy might be useful in writing code for a tree-based model [@trees]. We won't discuss how these models are estimated, but we will write a function that generates another function that is able to generate predictions by stratifying the input/predictor/feature space.

  a) Import the data `"winequality-red.csv"`, call it `wine`, and remove all columns except for `fixed acidity`, `volatile acidity`, and `quality`.
  b) Write a function called `generate_pred_func(fixed_cutoff, vol_cutoff, dataset)`. 
      + The `dataset` argument should be a Pandas `DataFrame` that has three columns called `fixed acidity`, `volatile acidity`, and `quality`.
      + The `fixed_cutoff` argument should be a floating point number that separates `fixed acidity` into two regions.
      + The `vol_cutoff` argument should be a floating point number that separates `volatile acidity` into two regions. 
      + The function should return a function, say `func(fixed_acidity, volatile_acidity)`. The two arguments are floating points. This function should return the most frequent `quality` observation out of all points whose inputs lie in in the corresponding region in the feature space. 

After you finish the problem, you should have a definition of a `generate_pred_func()` that could be used as follows:

```{python, collapse = TRUE, eval=FALSE}
predict = generate_pred_func(fixed_cutoff=8, vol_cutoff=.5, dataset=wine)
predict(10,10)
```

3.

Let's predict what type of activity some is doing based on measurements taken from their cell phone. We will begin implementing a **K-Nearest Neighbors (KNN) classifier** [@knn1] [@knn2].

Consider the data files `"X_train.txt"` and `"y_train.txt"` from [@Anguita2013APD], which is available from the UCI Machine Learning Repository [@uci_data]. The first data set consists of recorded movements from a cell phone, and the second data set consists of activity labels of people. Labels $1$ through $6$ correspond to walking, walking upstairs, walking downstairs, sitting, standing and laying, respectively. 

  a) Read in `"X_train.txt"` as a `DataFrame` called `x_train`.
  b) Read in `"y_train.txt"` as a `DataFrame` called `y_train`
  c) Define a function called `standardize(arr)` that takes in an array-like and returns a standardized version of the array-like. Do this by subtracting from each element the overall mean and dividing each element by the standard deviation (use the length as a denominator, not length minus $1$).
  d) Apply `standardize()` to each column and transform `x_train` by replacing all of its column by their standardized versions. Make sure to overwrite `x_train`. Do this in one line of code. 
  e) Write a function called `euclid_dist(arr1, arr2)` that calculates Euclidean distance between two points/array-likes. 
  f) What is the most common label among the 5 rows that are closest to the first row? Assign your answer to `my_predict`. Assume that the two data sets you imported have the same order. Don't include the first row in these $5$ *nearest neighbors*. Take care not to modify `x_train` or `y_train`. 


### R Questions


1. 

The density of a particular bivariate Gaussian distribution is

$$
f(x,y) = \frac{1}{2 \pi}  \exp\left[ -\frac{x ^2 + y^2}{2}     \right] \tag{1}.
$$
The random elements $X$ and $Y$, in this particular case, are independent, each have unit variance, and zero mean. In this case, the marginal for $X$ is a mean $0$, unit variance normal distribution:

$$
g(x) = \frac{1}{\sqrt{2\pi}} \exp\left[ -\frac{x ^2 }{2}     \right] \tag{2}.
$$

  a) Write a function called `fTwoArgs(x,y)` that takes two arguments, and returns the value of the above density in equation (1) at those two points. 
  b) Write a function called `fOneArg(vec)` that takes one argument: a length two `vector`. It should return a density in equation (1) evaluated at that point. 
  c) Write a function called `gOneArg(x)` that evaluates the density in (2). 
  d) Generate two sequences called `xPoints` and `yPoints`. Have them contain the twenty equally-spaced numbers going from $-3$ to $3$, inclusive.
  e) Use `expand.grid()` to create a `data.frame` called `myGrid`. It should have two columns, and it should contain in its rows every possible pair of two points from the above sequences. The "x" coordinates should be in the first column. 
  f) Use `mapply()` to evaluate the bivariate density on every grid point. Store your results in a `vector` `mEvals`.
  g) Use `apply()` to evaluate the bivariate density on every grid point. Store your results in a `vector` `aEvals`.
  h) Use `sapply()` to evaluate the univariate density on every element of `xPoints`. Store your results in a `vector` `sEvals`.
  i) Use `vapply` to evaluate the univariate density on every element of `xPoints`. Store your results in `vector` `vEvals`.
  j) Use `lapply` to evaluate the univariate density on every element of `xPoints`. Store your results in a `list` `lEvals`.
  k) Generate two plots of the bivariate density. For one, use `persp()`. For the other, use `contour()`. Feel free to revive the code you used in Chapter 13's Exercise 1.
  l) Generate a third plot of the univariate density. Feel free to revive the code you used in Chapter 13's Exercises.
  
2.

Write a function that reads in all of the data sets contained in any given folder. 

  + The function should be called `readAllData(path, extensions)`. 
  + The first argument, `path`, should be a string representing which folder you would like to search for files. 
  + The second argument, `extensions`, should be a `character` `vector` of all the file extensions you are interested in reading in (e.g. `c('.txt','.csv')`). Be careful about using regular expressions! 
  + The function should return a `list` of `data.frame`s.

3.

Consider the Militarized Interstate Disputes (v5.0) [@mid5] data sets again: `"MIDA 5.0.csv"`, `"MIDB 5.0.csv"`, `"MIDI 5.0.csv"`, and `"MIDIP 5.0.csv"`.

  a) Read these data sets in as a `list` of `data.frame`s. Call it `dfList`.
  b) Use `lapply()` and `do.call()` to calculate the biggest column average for all of these data sets. Store this average as `biggestAve`. Store the name of the column that had this biggest average as `whichBiggestAve`. Don't worry about storing which data set this column mean was found in. 
  c) Use `Reduce()` and `lapply()` to, once again, calculate the biggest column average. Store this number as `biggestAve2`


<!-- ### Exercises -->

<!-- 1.  Let $X_t$ be the price of a stock at time $t$. A *call (or put) option* is a type of derivative contract you can buy or sell. It's called a derivative because its value derives from the value of a stock. A single call contract (or put) gives the owner the right to buy (or sell) stock ($100$ shares) at a certain price. This certain price written into an options contract ($s$) is called the *strike price*. The day the option expires    is called the *expiration date*. If a holder of the option uses his/her right to buy or sell the underlying stock, he/she is said to have *exercised* the option. -->

<!-- The price and value of options floats around randomly through time just like a stock does. Part of the value of an option is its *intrinsic value*. It is not the same as the overall value of an option, or its "fair price"--rather, the intrinsic value represents how much money its possible to guarantee at the current moment, if you exercised the option, and then liquidated the resulting position in the underlying. -->

<!-- For example, call owners hope the stock rises above the strike price so that they can buy the stock for a low price, and immediately sell it for a high price. If $X_t > s$, they could buy at $s$, and sell at $X_t$. In other words, the intrinsic value of a call option is  -->

<!-- $$ -->
<!-- C^s_t = 100 \times \max(X_t - s, 0) -->
<!-- $$ -->
<!-- as long as $t$ is before the expiration date. If $X_t < s$, then the owner doesn't have to buy the stock, so the call at that time is intrinsically worthless.  -->

<!-- Put owners hope the stock price falls below the strike price, because a put's intrinsic value is -->

<!-- $$ -->
<!-- P^s_t = 100 \times \max(s - X_t, 0) -->
<!-- $$ -->

<!-- Sometimes people buy and sell multiple contracts (with the same expiration date) at the same time. If they do, the intrinsic value of the entire position is additive. For example, if you bought two of the same put option as the one above, then you just double the above expression to get the intrinsic value. -->

<!-- As another example, if you buy (you can also sell) a *call spread*, you buy one call, and then sell another call with the same expiration and a higher strike price. Assuming $s_1 < s_2$ the intrinsic value is of the call spread with strikes $s_1$ and $s_2$ is -->
<!-- $$ -->
<!-- 100 \times \max(X_t - s_1, 0) - 100\times\max(X_t - s_2, 0) -->
<!-- $$ -->

<!-- a. In R, write a function called `getIntrinsicValueFunc` that takes three vectors as arguments, (`dirs`, `strikes`, `contractType`, `numContracts`), and returns a function. These arguments should all be the same length. The length of these inputs is the number of contracts in an options position. The function that `getIntrinsicValueFunc` returns should only take one argument: `prices`, a vector of prices of the underlying stock. The function that `getIntrinsicValueFunc` should return a vector of intrinsic values over time.  -->

<!-- Because the arguments may only take on certain values, and because the function to be returned has a certain signature, a template has been provided below.  -->

<!-- ```{r, eval=F} -->
<!-- getIntrinsicValueFunc <-function(dirs, strikes, contractType, numContracts){ -->
<!--   stopifnot(all(dirs %in% c("buy", "sell"))) -->
<!--   stopifnot(all(contractType %in% c("call", "put"))) -->
<!--   stopifnot(all(is.integer(strikes))) -->
<!--   stopifnot(all(is.integer(numContracts))) -->
<!--   stopifnot( (length(dirs) == length(strikes)) && (length(dirs) == length(contractType))) -->

<!--   usefulFunc <- function(prices){ -->
<!--     # TODO -->
<!--   } -->
<!--   return(usefulFunc) -->
<!-- }  -->
<!-- ``` -->

<!-- b. Create an intrinsic value plot. The x-axis should be a grid of values: let's say from $100$ to $200$. On the y-axis plot the intrinsic value of buying a call spread with strikes $110$ and $120$.  -->

<!-- c. Do the same thing as above but show what happens when you sell the same call spread.  -->

<!-- d. Plot the intrinsic value of a bought put spread with strikes $110$ and $120$.  -->

<!-- e. Plot the intrinsic value of a sold put spread with strikes $110$ and $120$. What is the difference between all four plots? -->


<!-- 2. Do the same thing, but in Python! -->


<!-- creating row masks in a data frame -->

<!-- .apply with weird predicate function -->

<!-- Optimization example  -->

<!-- Importance sampling example -->

<!-- bootstrap -->

<!-- numerical integration -->


<!-- 3. verbal, open-ended wuestion about what happens when you make a typo in a variable name? R versus Python? -->

<!-- 4. try to think of the most complicated program that uses two functions that both use (and modify) globals -->