Skip to content

Commit

Permalink
intermediate edits
Browse files Browse the repository at this point in the history
- some errata additions
- link allows setting index variables to zero in order to remove those
parameters from calculations
- updates to documentation: <- becomes = in Stan example code
- added clog log and inv_clogog, and added to list that map2stan
recognizes
- replaced softmax with _logit in categorical map2stan template
- warnings added to lm link and sim methods (they remain untested)
  • Loading branch information
Richard McElreath committed Aug 19, 2016
1 parent a309712 commit 4ac9dbf
Show file tree
Hide file tree
Showing 9 changed files with 92 additions and 17 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Package: rethinking
Type: Package
Title: Statistical Rethinking book package
Version: 1.59
Date: 2016-6-24
Version: 1.60
Date: 2016-6-27
Author: Richard McElreath
Maintainer: Richard McElreath <[email protected]>
Maintainer: Richard McElreath <[email protected]>
Imports: coda, MASS, mvtnorm, loo
Depends: rstan (>= 2.10.0), parallel, methods, stats, graphics
Description: Utilities for fitting and comparing models
Expand Down
10 changes: 10 additions & 0 deletions ERRATA.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
Statistical Rethinking book Errata
==========

page 13: "What does mean to take a limit..." is missing the word "it".

page 42: Just below R code box 2.6, the text says that map requires a list of start values. It does not, as long as priors are provided for each parameter. And indeed the example in box 2.6 does not contain a list of start values.

page 66, end of first paragraph: 'the right-hand plot' should be 'the bottom plot'.

page 76, Overthinking box, first paragraph: "You're computer already knows it" should read "Your computer..."

page 87: The marginal description of the model reads "mu ~ dnorm(156, 10)" but the model is Normal(178, 20). Same error on p 95 and in code 4.38. It is corrected in code 4.39.

page 95-96: dnorm(156,100) should be dnorm(178,100) in both model presentation and then R code on top of page 96.
Expand All @@ -17,6 +23,8 @@ page 196-200: The data.frame d has 17 cases. However in the discussion of the fo

page 212, the next-to-last sentence on the page refers to "the Rethinking box at the end of this section." That box is not in the text.

page 215, first paragraph: "despite it's plausible superiority" should be "despite its plausible superiority".

page 237 Exercise H1: "...index variable, as explained in Chapter 6.",
should be chapter 5 (at least that's their first appearance)

Expand All @@ -27,6 +35,8 @@ page 314: "Islands that are better network acquire or sustain more tool types.":

page 331, line 1: "a women" -> "a woman"

page 386, problem 12H1, first paragraph: 'By the year 200' should read 'By the year 2000'.

page 435: "FIGURE 14.4 display ... imputed neocortex values in blue ...
shown by the blue line segments". The imputed values are actually the
open black dots (and corresponding black line segments) as the caption
Expand Down
13 changes: 13 additions & 0 deletions R/distributions.r
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,19 @@ logit <- function( x ) {
log( x ) - log( 1-x )
}

# complementary log log
# x is probability scale
cloglog <- function( x ) {
log(-log(1-x))
}

# inverse cloglog
inv_cloglog <- function( x ) {
p <- 1 - exp(-exp(x))
p <- ifelse( x==Inf , 1 , p )
p
}

pordlogit <- function( x , phi , a , log=FALSE ) {
a <- c( as.numeric(a) , Inf )
if ( length(phi) == 1 ) {
Expand Down
6 changes: 3 additions & 3 deletions R/map2stan-templates.r
Original file line number Diff line number Diff line change
Expand Up @@ -966,13 +966,13 @@ dev <- dev + (-2)*(bernoulli_lpmf(0|PAR1) + binomial_lpmf(OUTCOME|PAR2,PAR3));",
"{
vector[PAR1] theta;
PAR2
OUTCOME ~ categorical( softmax(theta) );
OUTCOME ~ categorical_logit( theta );
}",
stan_dev =
"{
vector[PAR1] theta;
PAR2
dev <- dev + (-2)*categorical_lpmf( OUTCOME | softmax(theta) );
dev <- dev + (-2)*categorical_logit_lpmf( OUTCOME | theta );
}",
num_pars = 1,
par_names = c("prob"),
Expand All @@ -983,7 +983,7 @@ dev <- dev + (-2)*(bernoulli_lpmf(0|PAR1) + binomial_lpmf(OUTCOME|PAR2,PAR3));",
# k should be softmax call
# convert to list of names with indices
new_k <- as.character(k[[1]])
# add length to front, overwriting 'softmax' function name
# add length to front, overwriting 'softmax' or 'c' function name
num_scores <- length(new_k)-1
new_k[1] <- num_scores
# now need to build the code that populates the theta vector of inputs to softmax
Expand Down
3 changes: 2 additions & 1 deletion R/map2stan.r
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ map2stan <- function( flist , data , start , pars , constraints=list() , types=l
inverse_links <- list(
log = 'exp',
logit = 'inv_logit',
logodds = 'inv_logit'
logodds = 'inv_logit',
cloglog = 'inv_cloglog'
)

suffix_merge <- "_merge"
Expand Down
59 changes: 52 additions & 7 deletions R/z_link-map2stan.r
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,18 @@ setMethod("link", "map2stan",
function( fit , data , n=1000 , post , refresh=0.1 , replace=list() , flatten=TRUE , ... ) {

if ( class(fit)!="map2stan" ) stop("Requires map2stan fit")

# get samples from Stan fit
if ( missing(post) ) post <- extract.samples(fit,n=n)

# handle data
if ( missing(data) ) {
data <- fit@data
} else {
# make sure all variables same length
# weird vectorization errors otherwise
#data <- as.data.frame(data)
# new data housekeeping
# none yet, but index variable conversion further down
}

# get samples from Stan fit
if ( missing(post) )
post <- extract.samples(fit,n=n)

# check n and truncate accordingly
# if ( n==0 )

Expand Down Expand Up @@ -95,6 +95,51 @@ function( fit , data , n=1000 , post , refresh=0.1 , replace=list() , flatten=TR
lm_out[[ lm_names[i] ]] <- array( 0 , dim=c( n , n_cases , K ) )
}
}

if ( TRUE ) {
# check for index variables set to 0 (zero)
# in that case, set corresponding parameters to zero
# this allows users to generate predictions that ignore varying effects

# make list of index variables in model
idx_vars <- list()
if ( !is.null(fit@formula_parsed$vprior) ) {
for ( i in 1:length(fit@formula_parsed$vprior) ) {
idx_vars[[i]] <- fit@formula_parsed$vprior[[i]]$group
}#i
}
if ( length(idx_vars)>0 ) {
# check if any index variables are either missing from data or set to value zero
# in that case, replace corresponding pars_out (from vprior list) with zeros in post object
# this effectively removes the random effects, but also assumes zero-centered (non-centered) parameterization
for ( i in 1:length(idx_vars) ) {
do_fill <- FALSE
the_idx <- idx_vars[[i]]
the_effects <- fit@formula_parsed$vprior[[i]]$pars_out
if ( is.null(data[[the_idx]]) ) {
message( concat("Index '",the_idx,"' not found in data. Assuming all corresponding varying effects equal to zero: ",the_effects) )
# missing from data
# insert index 1 as dummy value
# will pull out zero in any event
data[[the_idx]] <- rep( 1 , n_cases_list[[1]] ) # could be in other likelihood, but in that case force user to be explicit and include index in data frame
do_fill <- TRUE
}#is.null
if ( any(data[[the_idx]]==0) ) {
# index has zero value, so replace with 1s and do fill
the_dims <- length( data[[the_idx]] )
data[[the_idx]] <- rep( 1 , the_dims )
do_fill <- TRUE
}#==0
if ( do_fill==TRUE ) {
# now replace the effects with zeros
for ( j in 1:length(the_effects) ) {
the_dims <- dim(post[[ the_effects[[j]] ]])
post[[ the_effects[[j]] ]] <- array( 0 , dim=the_dims )
}#j
}#do_fill
}#i
}
}

#init <- fit@start
init <- list()
Expand Down
4 changes: 4 additions & 0 deletions R/z_link_sim_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
setMethod("link", "lm",
function( fit , data , n=1000 , post , refresh=0.1 , replace=list() , flatten=TRUE , ... ) {

warning("The link method for lm fits is not complete and not tested. Proceed with abundant caution.")

# extract samples --- will need for inline parameters e.g. sigma in likelihood
if ( missing(post) )
post <- extract.samples(fit,n=n)
Expand Down Expand Up @@ -48,6 +50,8 @@ function( fit , data , n=1000 , post , refresh=0.1 , replace=list() , flatten=TR
setMethod("sim", "lm",
function( fit , data , n=1000 , post , ll=FALSE , refresh=0.1 , ... ) {

warning("The link and sim methods for lm fits is not complete and not tested. Proceed with abundant caution.")

# extract samples --- will need for inline parameters e.g. sigma in likelihood
if ( missing(post) )
post <- extract.samples(fit,n=n)
Expand Down
6 changes: 3 additions & 3 deletions man/WAIC.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -71,16 +71,16 @@ model{
bP ~ normal( 0 , 1 );
a ~ normal( 0 , 10 );
for ( i in 1:N ) {
p[i] <- a + bP * prosoc[i];
p[i] = a + bP * prosoc[i];
}
y ~ binomial_logit( 1 , p );
}
generated quantities{
vector[N] p;
vector[N] log_lik;
for ( i in 1:N ) {
p[i] <- a + bP * prosoc[i];
log_lik[i] <- binomial_logit_log( y[i] , 1 , p[i] );
p[i] = a + bP * prosoc[i];
log_lik[i] = binomial_logit_lpmf( y[i] , 1 , p[i] );
}
}
'
Expand Down
2 changes: 2 additions & 0 deletions man/link.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ link( fit , data , n=1000 , ... )
This function is used internally by \code{\link{WAIC}}, \code{\link{sim}}, \code{\link{postcheck}}, and \code{\link{ensemble}}.

It is possible to replace components of the posterior distribution with simulated values. The \code{replace} argument should be a named \code{list} with replacement values. This is useful for marginalizing over varying effects. See the examples below for an example in which varying intercepts are marginalized this way.

It is easy to substitute zero values for any varying effect parameters in a model. In the \code{data} passed to \code{link}, either omit the relevant index variable or set the index variable to value zero. This will cause \code{link} to replace with zero all samples for any parameters corresponding the index variable in a prior. For example, the prior declaration \code{aid[id] ~ dmvnorm2(0,sigma,Rho)} defines a vector of parameters \code{aid} that are indexed by the variable \code{id}. If \code{id} is absent from \code{data} when calling \code{link}, or set to value zero, then \code{link} will replace all samples for \code{aid} with value zero. This effectively removes the varying effect from posterior predictions. If the prior were instead \code{c(aid,bid)[id] ~ dmvnorm(0,sigma,Rho)}, then both \code{aid} and \code{bid} would be set to zero in all samples.
}
\value{
}
Expand Down

0 comments on commit 4ac9dbf

Please sign in to comment.