This vignette describes how to write formulas using
ipmr
’s notation for models derived from parameter sets.
Parameter sets may be derived from mixed models, or fixed effects that
are discrete. The notation is meant to mirror the standard mathematical
notation used to represent the models. The examples here will primarily
use lme4 to illustrate how
to go from vital rate models to functional forms that you can use in
ipmr
. You are of course free to use whatever model fitting
software you like, but the number of different possibilities are far too
great to cover in this vignette.
The ipmr
notation is meant to make implementing models
more concise, as it allows you to avoid re-typing/copy+pasting the same
kernel formats and just substituting, for example, plot name or
transition year. It works by appending an index suffix to kernels, vital
rates, and parameters that can take multiple values (e.g. P
becomes P_yr
). We then add a named list where the names
correspond to the index and the entries correspond to the values it can
take. We pass pass the list to par_set_indices
(i.e. par_set_indices = list(yr = 1:5)
), and set
uses_par_sets = TRUE
.
NB: because of the way these indices are handled internally, they should always appear at the end of the name they modify. For example, use
s_slope_site
instead ofs_site_slope
. Multiple indices are fine as long as they chained together at the end (e.g.s_slope_site_year
). Models may still work if formatted with indices in the middle of a name, but they also may not (often with obscure error messages).
Some quick translations between mathematical notation, R model
syntax, and ipmr
style notation. In the ipmr
column, index suffixes that are appended are highlighted in
bold.
Suffix names are not restricted to the examples below, and can be anything you want them to be.
The mathematical notation in this vignette will take the following form:
Coefficients/parameters: α - intercept, β - slope, μ - mean, σ standard deviation.
Vital rates: subscripts on the coefficients/parameters. For example, αG for the intercept of a growth model.
Kernels: bivariate kernels are denoted with capital letters (e.g. a growth kernel G(z′, z)) and univariate functions are denoted with lower case letters (e.g. a survival function s(z)).
Indices: superscripts on coefficients. For example, βgsite for a site specific growth slope.
Probability density functions: these are denoted with fp, where p is replaced by the vital rate. For example, fG(z′, μG, σG) would denote a probability density function for growth.
Math Formula | R Formula | ipmr | Model Number |
---|---|---|---|
μGyr = αG + αGyr * βG * z |
size_2 ~ size_1 + (1 | year), family = gaussian()
|
mu_g_yr = g_int + g_int_yr + g_slope * z | 1 |
G(z′, z) = fG(z′, μgyr, σG) | g = dnorm(z_2, mu_g_yr, sd_g) | 1 | |
Logit(μsplot, yr) = αs + αsplot, yr + β * z |
surv ~ size_1 + (1 | year) + (1 | plot), family = binomial()
|
s_pl_yr = s_int + s_int_pl_yr + s_slope * z | 2 |
surv ~ size_1 + (1 | year) + (1 | plot), family = binomial()
|
s_pl_yr = s_int + s_int_pl + s_int_yr + s_slope * z | 2 | |
s = inv_logit(s_pl_yr) | 2 | ||
log(μryr) = αr + αryr + (βr + βryr) * z |
fec ~ size_1 + (size_1 | year), family = poisson()
|
r = exp(r_int + r_int_yr + (r_slope + r_slope_yr) * z) | 3 |
Internally, ipmr
generates the various levels using
expand.grid
. This means that every combination of the
values in par_set_indices
must exist in the
data_list
, or the model will fail with an error along the
lines of (Error in eval_tidy: object 'x_yz' not found
). In
order to exclude levels that do not exist in your data, you can add a
vector to the list in par_set_indices
called
drop_levels
. This should contain the values you wish to
exclude as a character vector. For example, if data are collected for
sites = c("a", "b", "c")
, and
years = c(2005:2008)
, but there is no data from site
"a"
in 2007, we can use
par_set_indices = list(site = c("a", "b", "c"), year = c(2005:2008), drop_levels = c("a_2007"))
.
When picking values to appear in par_set_indices
, it is
best to avoid longer words or phrases where words are separated by
underscores
(e.g. treatment = c("h_removal", "c_addition")
), as this
may cause problems when values are substituted into expressions.
CamelCase is generally safer in these cases (e.g. use
treatment = c("hRemoval", "cAddition")
instead of
treatment = c("h_removal", "c_addition")
).
These models are pretty common in the IPM literature. Examples include IPMs where a single site has been sampled over many years, or sampled many sites across one transition.
This example will use a hypothetical study that spans 5 transitions
and has 6 consecutive censuses. The temporal variation will be denoted
with a index/superscript appended to each term that it modifies (xyr/x_yr
).
To limit complexity, let’s pretend our exploratory analysis and model
selection indicated the best overall models included random year-specifc
intercepts for growth (gyr)
and survival (syr),
but not probability of reproduction (rp), recruit
production (rr), or the
recruit size distribution (rd). We will
work with a simple model to start (i.e. 1 continuous state variable, no
discrete states). This yields the following IPM (z, z′ represent
size/height/weight at time t
and t + 1, respectively):
n(z′, t + 1) = ∫LU[Pyr(z′, z) + F(z′, z)]n(z, t)dz
Pyr(z′, z) = syr(z) * gyr(z′, z)
F(z′, z) = rp(z) + rr(z) + rd(z′)
Note that only the P kernel
get the yr subscript
appended to them - this is the only one that our model selection process
decided had substantial year to year variation. Thus, we can actually
write the F kernel the exact
same way as we would in a simple IPM when we implement the model in
ipmr
.
The vital rate models could be written on paper as follows:
Growth
μGyr(z) = (αG + αGyr) + βG * z
Gyr(z′, z) = fG(z′, μGyr(z), σG)
Survival
Probability of reproducing
Recruit production
Recruit size distribution
and modeled in R as follows:
library(lme4)
library(ipmr)
grow_mod <- lmer(size_2 ~ size_1 + (1 | year), data = grow_data)
surv_mod <- glmer(surv ~ size_1 + (1 | year), data = surv_data, family = binomial)
repr_mod <- glm(repr ~ size_1, data = repr_data, family = binomial)
recr_mod <- glm(recr ~ size_1, data = recr_data, family = poisson)
rcsz_mu <- mean(rcsz_data$size_2)
rcsz_sd <- sd(rcsz_data$size_2)
We could then extract the fixed coefficients with the
fixef
method and the conditional modes of the random
effects via the ranef
method:
beta_gs <- fixef(grow_mod)
alpha_g_yrs <- ranef(grow_mod)
beta_ss <- fixef(surv_mod)
alpha_s_yrs <- ranef(surv_mod)
The non-mixed models can use the coef
method to extract
a vector of coefficients:
The next step is to transform these values into something
ipmr
can use. This must always be a named
list where the names of the list components are the names used
in the vital expressions or kernel formula. Transforming the βs is easier - the output from
fixef
is always a named vector, so we just use
as.list()
and change the names to whatever we want to use.
We then bundle it into one big list with all the fixed coefficients.
g_list <- as.list(beta_gs)
names(g_list) <- c("alpha_g", "beta_g")
s_list <- as.list(beta_ss)
names(s_list) <- c("alpha_s", "beta_s")
r_p_list <- as.list(recr_coef)
names(f_p_list ) <- c("alpha_r_p", "beta_r_p")
r_r_list <- as.list(repr_coef)
names(r_r_list) <- c("alpha_r_r", "beta_r_r")
r_d_list <- list(mu_r_d = rcsz_mu, sigma_r_d = rcsz_sd)
all_fixed_params <- c(g_list, s_list, r_p_list, r_r_list, r_d_list)
For the random effects, we probably need to do some a bit more
processing, though this really depends on which modeling framework you
use and the format of the outputs. For lme4
, it looks like
this:
g_alpha_list <- as.list(unlist(alpha_g_yrs))
names(g_alpha_list) <- paste("alpha_g_", 2001:2006, sep = "")
s_alpha_list <- as.list(unlist(alpha_s_yrs))
names(s_alpha_list) <- paste("alpha_s_", 2001:2006, sep = "")
First, we strip away all of lme4
-related attributes with
unlist()
, then convert it to the flat list format that we
need. Next, we set the names. Now, we can combine it with our
all_fixed_params
list and we have all of the parameters we
need for the kernel functions:
We are now ready to begin implementing the kernels. This model will be a simple, density independent, stochastic kernel-resampled model. The parts where the index syntax is used are highlighted by comments in the code block:
ex_ipm <-init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
# The _yr index is appended as a suffix with an underscore. Notice how the formula
# from Eq 2 is translated into the formula argument and the kernel name
name = "P_yr",
family = "CC",
formula = s_yr * g_yr,
# We also modify each parameter name is indexed as well.
# Here, I've split out the linear predictor and the inverse logit
# transformation into separate steps to avoid over cluttering
s_lin_p = alpha_s + alpha_s_yr + beta_s * z_1,
s_yr = 1 / (1 + exp(- s_lin_p)),
# We do the same with growth as we did with survival and the P_yr formula
g_yr = dnorm(z_2, mu_g_yr, sigma_g),
mu_g_yr = alpha_g + alpha_g_yr + beta_g * z_1,
data_list = all_params,
states = list(c("z")),
# We signal that the kernel has parameter sets
uses_par_sets = TRUE,
# And provide the values that each index can take as a named list.
# The name(s) in this list MUST match the index used in the expressions.
par_set_indices = list(yr = 2001:2006),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_yr")
) %>%
# This kernel doesn't get anything special because there are no
# time-varying parameter values.
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_r * r_d,
r_p_lin_p = alpha_r_p + beta_r_p * z_1,
r_p = 1 / (1 + exp( -r_p_lin_p)),
r_r = exp(alpha_r_r + beta_r_r * z_1),
r_d = dnorm(z_2, mu_r_d, sigma_r_d),
data_list = all_params,
states = list(c("z")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P_yr", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("z", 2),
state_end = rep("z", 2)
)
) %>%
define_domains(
z = c(0.1, 10, 50)
) %>%
define_pop_state(
n_z = runif(50)
) %>%
make_ipm(
iterate = TRUE,
iterations = 100,
kernel_seq = sample(2001:2006, size = 100, replace = TRUE),
normalize_pop_size = TRUE
)
Sometimes, models will have discretely and continuously varying components as functions of different variables. For example, a study could consider multiple sites (discrete component) and spatial and/or temporal fluctuations in the environment at each site (continuous component). We’ll use a hypothetical study that models demography as a function of environment across multiple sites to illustrate this.
We’ll implement a general IPM of a perennial plant species. Our data
will come from 6 sites, "A"
- "F"
, and the
vital rate intercepts for the plants will vary across each one. Its
seeds can either enter a seed bank or recruit immediately, producing
recruits at the next time step. We’ll pretend that data from the recruit
size distribution was pooled across sites, and so does not vary with
space. The seed bank will be age structured and its parameters will be
space- and time-invariant. Seeds in the first year can either survive
(ssb1/s_sb1
)
and recruit (rsb1),
or survive and become two year old seeds. Two year old seeds can either
survive (ssb2/s_sb2
)
and recruit (rsb2/r_sb2
)
or die.
The vital rate information fed into our hypothetical regression
models are (ssite/s_site
),
growth (gsite/g_site
),
probability of reproduction (cpsite/c_p_site
),
and seed production (crsite/c_r_site
).
They are all functions of size (z
) and environmental
covariates (θt and θp, denoted
together in the kernel formulas as just θ).
The full IPM looks like this:
$$ n(z, t + 1) = \int_L^U [P^{site}(z', z,\theta) + r_i * F^{site}(z', z, \theta)]n(z, t)dz +\\ s_{{sb}_1} * r_{{sb}_1} * sb_1(t) * c_d(z') + \\s_{{sb}_2} * r_{{sb}_2} * sb_2(t) * c_d(z') $$
sb1(t + 1) = (1 − ri) * ∫LU[cpsite(z, θ) * crsite(z, θ)]n(z, t)dz
sb2(t + 1) = ssb1 * (1 − rsb1) * sb1(t)
The sub kernels are:
Psite(z′, z, θ) = ssite(z, θ) * Gsite(z′, z, θ)
Fsite(z′, z, θ) = cpsite(z, θ) * crsite(z, θ) * cd(z′)
θt ∼ Norm(μt, σt)
θp ∼ Gamma(αp, βp)
And the size/climate dependent vital rate functions and example R code are:
Logit(ssite(z, θ)) = (αs + αssite) + βsz * z + βst * θt + βsp * θp
glmer(survival ~ size + temp + precip + (1 | site), data = my_surv_data, family = binomial())
Gsite(z′, z, θ) = fG(z′, μGsite(z, θ), σG)
μGsite(z, θ) = (αG + αGsite) + βGz * z + βGt * θt + βGp * θp
lmer(size_next ~ size + temp + precip + (1 | site), data = my_grow_data)
Logit(cpsite(z, θ)) = (αcp + αcpsite) + βcpz * z + βcpt * θt + βcpp * θp
glmer(repro ~ size + temp + precip + (1 | site), data = my_repro_data, family = binomial())
log(crsite(z, θ)) = (αcr + αcrsite) + βcrz * z + βcrt * θt + βcrp * θp
glmer(flower_n ~ size + temp + precip + (1 | site), data = my_flower_data, family = poisson())
cd(z′) = fcd(z′, μcd, σcd)
mean(my_recr_data$size_next); sd(my_recr_data$size_next)
Since this example doesn’t work with actual data, we need to simulate some parameters for each site. Additionally, we’ll specify the distribution parameters for each θ and write a function that samples them once.
library(ipmr)
set.seed(51)
# Set up the sites, and the parameters
sites <- LETTERS[1:6]
all_pars <- list(
r_i = 0.6,
r_sb1 = 0.2,
r_sb2 = 0.3,
s_sb1 = 0.4,
s_sb2 = 0.1,
mu_c_d = 4,
sigma_c_d = 0.9,
sigma_g = 3,
beta_s_z = 2.5,
beta_s_prec = 0.3,
beta_s_temp = -0.5,
beta_g_z = 0.99,
beta_g_prec = 0.01,
beta_g_temp = 0.1,
beta_c_p_z = 0.4,
beta_c_p_prec = 0.6,
beta_c_p_temp = -0.3,
beta_c_r_z = 0.005,
beta_c_r_prec = -0.3,
beta_c_r_temp = 0.9
)
g_alphs <- rnorm(6, mean = 4.5, sd = 1) %>%
as.list() %>%
setNames(
paste('alpha_g_', sites, sep = "")
)
s_alphs <- rnorm(6, mean = -7, sd = 1.5) %>%
as.list() %>%
setNames(
paste('alpha_s_', sites, sep = "")
)
c_p_alphs <- rnorm(6, mean = -10, sd = 2) %>%
as.list() %>%
setNames(
paste('alpha_c_p_', sites, sep = "")
)
c_r_alphs <- runif(6, 0.01, 0.1) %>%
as.list() %>%
setNames(
paste('alpha_c_r_', sites, sep = "")
)
all_pars <- c(all_pars,
g_alphs,
s_alphs,
c_p_alphs,
c_r_alphs)
# This list contains parameters that define the distributions for environmental
# covariates
env_params <- list(
temp_mu = 12,
temp_sd = 2,
prec_shape = 1000,
prec_rate = 2
)
# This function samples the environmental covariate distributions and returns
# the values in a named list. We can reference the names in the list in our
# vital rate expressions.
sample_fun <- function(env_params) {
temp <- rnorm(1, env_params$temp_mu, env_params$temp_sd)
prec <- rgamma(1, env_params$prec_shape, env_params$prec_rate)
out <- list(temp = temp,
prec = prec)
return(out)
}
We’ve written out our model - now we can implement it. We’ll use the general, density independent, stochastic parameter resampled machinery for this.
ex_ipm <- init_ipm(sim_gen = "general",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = "P_site",
family = "CC",
# site is appended so that we don't have to write out s_A, s_B, etc.
formula = s_site * g_site * d_z,
s_site_lin = alpha_s_site +
beta_s_z * z_1 +
beta_s_prec * prec +
beta_s_temp * temp,
s_site = 1 / (1 + exp(-s_site_lin)),
mu_g_site = alpha_g_site +
beta_g_z * z_1 +
beta_g_prec * prec +
beta_g_temp * temp,
g_site = dnorm(z_2, mu_g_site, sigma_g),
data_list = all_pars,
states = list(c("z")),
uses_par_sets = TRUE,
par_set_indices = list(site = LETTERS[1:6]),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_site")
) %>%
define_kernel(
name = "F_site",
family = "CC",
formula = c_p_site * c_r_site * c_d * d_z,
c_p_site_lin = alpha_c_p_site +
beta_c_p_z * z_1 +
beta_c_p_prec * prec +
beta_c_p_temp * temp,
c_p_site = 1 / (1 + exp(-c_p_site_lin)),
c_r_site = exp(alpha_c_r_site +
beta_c_r_z * z_1 +
beta_c_r_prec * prec +
beta_c_r_temp * temp),
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb1", "sb2")),
uses_par_sets = TRUE,
par_set_indices = list(site = LETTERS[1:6]),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
) %>%
define_kernel(
name = "go_sb_1_site",
family = "CD",
formula = (1 - r_i) * c_p_site * c_r_site * d_z,
c_p_site_lin = alpha_c_p_site +
beta_c_p_z * z_1 +
beta_c_p_prec * prec +
beta_c_p_temp * temp,
c_p_site = 1 / (1 + exp(-c_p_site_lin)),
c_r_site = exp(alpha_c_r_site +
beta_c_r_z * z_1 +
beta_c_r_prec * prec +
beta_c_r_temp * temp),
data_list = all_pars,
states = list(c("z", "sb1")),
uses_par_sets = TRUE,
par_set_indices = list(site = LETTERS[1:6]),
evict_cor = FALSE
) %>%
define_kernel(
name = "go_sb_2",
family = "DD",
formula = s_sb1 * (1 - r_sb1),
data_list = all_pars,
states = list(c("sb1", "sb2")),
uses_par_sets = FALSE,
evict_cor = FALSE
) %>%
define_kernel(
name = "leave_sb_1",
family = "DC",
formula = s_sb1 * r_sb1 * c_d,
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb1")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
) %>%
define_kernel(
name = "leave_sb_2",
family = "DC",
formula = s_sb2 * r_sb2 * c_d,
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb2")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P_site",
"F_site",
"go_sb_1_site",
"go_sb_2",
"leave_sb_1",
"leave_sb_2"),
int_rule = rep("midpoint", 6),
state_start = c("z", "z", "z", "sb1", "sb1", "sb2"),
state_end = c("z", "z", "sb1", "sb2", "z", "z")
)
) %>%
define_domains(
z = c(0.1, 100, 200)
) %>%
define_pop_state(
n_z = runif(200),
n_sb1 = 20,
n_sb2 = 20
) %>%
define_env_state(
env_state = sample_fun(env_params),
data_list = list(env_params = env_params,
sample_fun = sample_fun)
) %>%
make_ipm(
iterations = 20,
kernel_seq = sample(LETTERS[1:6], 20, replace = TRUE),
normalize_pop_size = TRUE
)
lambda(ex_ipm)