Hope you enjoy the post!

]]>

Here, I’m going to tell you a story about open science. What is it? Why should we do it? Why aren’t people doing it? And what can we do to move towards open science? After that, I will tell a success story in open science, and tell you about what I am doing to promote open science in my own work.

- The Transparency in methods, data, and analyses.
- This means:
- Having your data public available for others to reuse
- Having your research results publicly available and transparent in the form of a scientific communication.
- Such as publishing in open access journals like plos, royal society open access, or other open science journals.

- Using web-based tools to facilitate scientific collaboration, even before anything has been published:
- For example, posting projects on research gate, open science framework, or Github

- Open science is scientific knowledge and communication that is autonomous/independent (which promotes openness, integrity, and reproducibility).
- This framework accumulates knowledge by sharing information and independently reproducing results

- Open science allows for reproducibility and verifying results
- As scientists, we shouldn't trust the interpretation of results based on someone’s authority
- We should be able to recreate their evidence and findings
- Need to trust evidence, not the argument

- As scientists, we shouldn't trust the interpretation of results based on someone’s authority
- Unfortunately, there aren’t alot of big incentives from institutions for using Open Science
- But there are incentives on to increase in citations, media attention, potential collaborators, job opportunities and funding opportunities [check out this cool article on this topic]

- Our current science system supports closed science, which means that there are more incentives to not share data or code. For example:
*Scientist’s reputations-*typically measured by citations, h-indices, download counts*Productivity-*measured by the number of papers in traditional journals with high impact factors, and the importance of a scientists work is measured by citation count.- Which then determines funding and promotions
- So in an indirect way future funding opportunities may be affected if we don’t continue using the current system.

- This system endorses
*minimal accountability*, which facilitates the manufacturing of results- For example, because there are typically multiple statistical tools to perform the same analysis, it is easy to pick the one that produces the most desirable results for publishing
- Publishers also prioritize clean and clear results over reality- which is messy and confusing
- And then, once results are published, there is little incentive to conduct replication to verify results and improve precision of estimation.
- In summary, we prioritize nice, positive results over reality at the expense of knowledge.
- BUT science doesn’t work like the way we have set up the current system
- There is a clear decoupling between what is rewarded and what is reality- which creates a threat to the validity of results and future progress.

- BUT science doesn’t work like the way we have set up the current system

- In summary, the problems with the closed system, include, but are not limited to:
- Disguising bad research
- Generating false hope
- And inefficiently promoting progress

- Dr. Christie Bahlai recently taught a course on open science within the past year at Michigan State University
- What students did was download open data, analyze it, and provide code on Github, and write this up for publication
- One of the groups in the class, submitted their paper to Royal Society Open Science, and one of the reviewers went on Github, downloaded their data and code, and was able to recreate the results from their paper.
- That reviewer started a discussion with them on alternative ways to think about interpreting the results and future directions they could take the work- making the paper stronger in the end

- Which is pretty cool, and really helpful.
- Check out the published paper here!

- These are exactly the types of things that Open science is looking to do
- Promote openness
- Reproductibility
- And facilitate collaborations and discussions.

- Things we can do:
- Put up data and meta-data on websites for other people to use
- Document and release code
- And, if you are up for it, write a blog post announcing papers we publish, and using this platform to make the information easy to digest and access.

- Things I am doing to promote open science in my own work:
- I provide the code and materials I generate from my own research projects on my website
- I blog about interesting models I am working on- giving others code to simulate data and analyze data
- I create short videos about the labs I work in and on some of the projects I’ve done with undergraduates, and I post them on youtube.
- Recently, I’ve started compiling the materials for published papers, and giving open access to the article, code, and data all on one website- which is the open science framework. I highly encourage people to check this out.
- And In the future, I would like to create open access software that allows people to use the statistical models I develop more easily.

- We, as a community, need to raise the expectations on what it means to complete a scientific project- is it just publishing the paper, or publishing the code, data, and the paper- and we need to do this as a whole if we want to change the culture of science.
- To Increase Openness, Enhance integrity, and Promote reproducibility

Before we begin, I’d like to lay out some information:

Figure 1. Graphical representation of our model describing the processes contributing to changes in abundance. Parameter names are explained in Table 1.

*The type of data we ‘collect’*: capture-mark-recapture- where each individual is captured, uniquely marked (e.g., elastomer tags, pit tag, toe clips, etc.), and released. Then, each season, new individuals are marked, and old individuals are recaptured. When an individual is captured, we record its state.

*Our modeling approach:* Multi-state Jolly-Seber Model

*Statistical framework*: Bayesian

For more information on these models, I highly recommend the book:*Bayesian population analysis using WinBUGs* by Marc Kéry and Michael Schaub.

Table 1 Parameter definitions and symbols.

For more information on these models, I highly recommend the book:

Table 1 Parameter definitions and symbols.

#--------------- R code starts here

#--- Survey conditions

n.occasions <- 4 # Number of seasons

#---- Population parameters

# Super population size of uninfecteds

N_U <- 250

# Super population size of infecteds

N_I <- 200

# Super population size of all

N <- N_U + N_I

# Number of states

n.states <- 4

# Number of observable states

n.obs <- 3

#--- Define parameter values

# Uninfected survival probability

phi_U <- 0.9

# Infected survival probability

phi_I <- 0.7

#------ Entry probability

# Entry probability of uninfected hosts

gamma_U <- (1/(n.occasions-1))/2

# Entry probability of infected hosts

gamma_I <- (1/(n.occasions-1))/2

#------ Transition probability

#(Uninfected to Infected)

beta_UI <- 0.2

#(Infected to Uninfected)

beta_IU <- 0.3

#------ Detection probability

# Uninfected detection probability

p_U <- 0.9

# Infected detection probability

p_I <- 0.7

#---- 1. State process matrix

#---- 4 state system

# 1. Not entered

# 2. Uninfected

# 3. Infected

# 4. Dead

PSI.state <- array(NA, dim = c(n.states, n.states, N, n.occasions-1))

for(i in 1:N){

for(j in 1:n.occasions-1){

PSI.state[,,i,j] <- matrix(c(1 - gamma_U - gamma_I, gamma_U, gamma_I, 0,

0, phi_U * (1- beta_UI), phi_U * beta_UI, 1-phi_U,

0, phi_I * beta_IU, phi_I * (1- beta_IU), 1-phi_I,

0, 0, 0, 1),

nrow = 4, ncol = 4, byrow = T)

}

}

# 2. Observational process matrix

#----- 3 observed states

# 1. Seen uninfected

# 2. Seen infected

# 3. Not seen

PSI.obs <- array(NA, dim = c(n.states, n.obs, N, n.occasions-1))

for(i in 1:N){

for(j in 1:n.occasions-1){

PSI.obs[,,i,j] <- matrix(c( 0, 0, 1,

p_U, 0, 1- p_U,

0, p_I, 1- p_I,

0, 0, 1),

nrow = 4, ncol = 3, byrow = T)

}

}

#---- Function to simulate capture-recapture data

simul.js <- function(PSI.state, PSI.obs, gamma_U, gamma_I, N, N_U, N_I){

n.occasions <- dim(PSI.state)[4] + 1

B_U <- rmultinom(1, N_U, rep(gamma_U, times = n.occasions))

B_I <- rmultinom(1, N_I, rep(gamma_I, times = n.occasions))

# Generate no. of entering hosts per occasion

# N is superpopulation size

B <- B_U + B_I

CH.sur <- CH.p <- matrix(0, ncol = n.occasions, nrow = N)

# Define a vector with the occasion of entering the population

ent.occ_U <- ent.occ_I <- numeric()

for (t in 1:n.occasions){

ent.occ_U <- c(ent.occ_U, rep(t, B_U[t]))

ent.occ_I <- c(ent.occ_I, rep(t, B_I[t]))

}

ent.occ <- c(ent.occ_U, ent.occ_I)

# Simulating survival

for (i in 1:length(ent.occ_U)){ # For each individual in the superpopulation

CH.sur[i, ent.occ_U[i]] <- 2

}

for(i in (length(ent.occ_U)+1):length(ent.occ)){

CH.sur[i, ent.occ[i]] <- 3

}

for (i in 1:N){

# Probability of capturing the individual during the first occasion

if(CH.sur[i, ent.occ[i]] == 2){

CH.p[i, ent.occ[i]] <- 1 * rbinom(1, 1, p_U)

}else{

CH.p[i, ent.occ[i]] <- 2 * rbinom(1, 1, p_I)

}

if (ent.occ[i] == n.occasions) next # If the entry occasion = the last occasion, go next

for(t in (ent.occ[i]+1):n.occasions){ # for each occasion from entering 2 last occasion

# Determine individual state history

sur <- which(rmultinom(1, 1, PSI.state[CH.sur[i, t-1], , i, t-1]) == 1)

CH.sur[i, t] <- sur

# Determine if the individual is captured

event <- which(rmultinom(1, 1, PSI.obs[CH.sur[i, t], , i, t-1]) == 1)

CH.p[i, t] <- event

} #t

} #i

#---- Reorder

CH.p <-

CH.p[

c(which(ent.occ == 1),

which(ent.occ == 2),

which(ent.occ == 3),

which(ent.occ == 4)),]

CH.sur <-

CH.sur[

c(which(ent.occ == 1),

which(ent.occ == 2),

which(ent.occ == 3),

which(ent.occ == 4)),]

# Remove individuals never captured

CH <- CH.p * CH.sur

cap.sum <- rowSums(CH)

never <- which(cap.sum == 0)

if(length(never) > 0) {

CH.p <- CH.p[-never,]

}

Nt <- numeric(n.occasions)

# Calculate super population size by the simulated data

for(i in 1:n.occasions){

Nt[i] <- length(which(CH.sur[,i] > 0))

}

#-----

CH <- CH.p

CH[is.na(CH) == TRUE] <- 0

CH[CH == dim(PSI.state)[1]] <- 0

id <- numeric(0)

for(i in 1:dim(CH)[1]){ # For each individual row

z <- min(which(CH[i, ] != 0))

# which column is the lowest that does not = 0?

ifelse(z == dim(CH)[2], id <- c(id, i), id <- c(id))

# keep a list of individuals only captured on last occasion

}

CH.sur[CH.sur == 0] <- 1

return(list(CH.p = CH.p[-id, ],

CH.sur = CH.sur[-id,],

B = B,

Nt = Nt))

}

# Execute simulation function

sim <- simul.js(PSI.state = PSI.state, PSI.obs = PSI.obs,

gamma_U = gamma_U, gamma_I = gamma_I, N = N,

N_U = N_U, N_I = N_I)

#----------------------------#

Hope this was helpful and feel free to shoot me an email with any questions at grace.direnzo@gmail.com

]]>#--- Survey conditions

n.occasions <- 4 # Number of seasons

#---- Population parameters

# Super population size of uninfecteds

N_U <- 250

# Super population size of infecteds

N_I <- 200

# Super population size of all

N <- N_U + N_I

# Number of states

n.states <- 4

# Number of observable states

n.obs <- 3

#--- Define parameter values

# Uninfected survival probability

phi_U <- 0.9

# Infected survival probability

phi_I <- 0.7

#------ Entry probability

# Entry probability of uninfected hosts

gamma_U <- (1/(n.occasions-1))/2

# Entry probability of infected hosts

gamma_I <- (1/(n.occasions-1))/2

#------ Transition probability

#(Uninfected to Infected)

beta_UI <- 0.2

#(Infected to Uninfected)

beta_IU <- 0.3

#------ Detection probability

# Uninfected detection probability

p_U <- 0.9

# Infected detection probability

p_I <- 0.7

#---- 1. State process matrix

#---- 4 state system

# 1. Not entered

# 2. Uninfected

# 3. Infected

# 4. Dead

PSI.state <- array(NA, dim = c(n.states, n.states, N, n.occasions-1))

for(i in 1:N){

for(j in 1:n.occasions-1){

PSI.state[,,i,j] <- matrix(c(1 - gamma_U - gamma_I, gamma_U, gamma_I, 0,

0, phi_U * (1- beta_UI), phi_U * beta_UI, 1-phi_U,

0, phi_I * beta_IU, phi_I * (1- beta_IU), 1-phi_I,

0, 0, 0, 1),

nrow = 4, ncol = 4, byrow = T)

}

}

# 2. Observational process matrix

#----- 3 observed states

# 1. Seen uninfected

# 2. Seen infected

# 3. Not seen

PSI.obs <- array(NA, dim = c(n.states, n.obs, N, n.occasions-1))

for(i in 1:N){

for(j in 1:n.occasions-1){

PSI.obs[,,i,j] <- matrix(c( 0, 0, 1,

p_U, 0, 1- p_U,

0, p_I, 1- p_I,

0, 0, 1),

nrow = 4, ncol = 3, byrow = T)

}

}

#---- Function to simulate capture-recapture data

simul.js <- function(PSI.state, PSI.obs, gamma_U, gamma_I, N, N_U, N_I){

n.occasions <- dim(PSI.state)[4] + 1

B_U <- rmultinom(1, N_U, rep(gamma_U, times = n.occasions))

B_I <- rmultinom(1, N_I, rep(gamma_I, times = n.occasions))

# Generate no. of entering hosts per occasion

# N is superpopulation size

B <- B_U + B_I

CH.sur <- CH.p <- matrix(0, ncol = n.occasions, nrow = N)

# Define a vector with the occasion of entering the population

ent.occ_U <- ent.occ_I <- numeric()

for (t in 1:n.occasions){

ent.occ_U <- c(ent.occ_U, rep(t, B_U[t]))

ent.occ_I <- c(ent.occ_I, rep(t, B_I[t]))

}

ent.occ <- c(ent.occ_U, ent.occ_I)

# Simulating survival

for (i in 1:length(ent.occ_U)){ # For each individual in the superpopulation

CH.sur[i, ent.occ_U[i]] <- 2

}

for(i in (length(ent.occ_U)+1):length(ent.occ)){

CH.sur[i, ent.occ[i]] <- 3

}

for (i in 1:N){

# Probability of capturing the individual during the first occasion

if(CH.sur[i, ent.occ[i]] == 2){

CH.p[i, ent.occ[i]] <- 1 * rbinom(1, 1, p_U)

}else{

CH.p[i, ent.occ[i]] <- 2 * rbinom(1, 1, p_I)

}

if (ent.occ[i] == n.occasions) next # If the entry occasion = the last occasion, go next

for(t in (ent.occ[i]+1):n.occasions){ # for each occasion from entering 2 last occasion

# Determine individual state history

sur <- which(rmultinom(1, 1, PSI.state[CH.sur[i, t-1], , i, t-1]) == 1)

CH.sur[i, t] <- sur

# Determine if the individual is captured

event <- which(rmultinom(1, 1, PSI.obs[CH.sur[i, t], , i, t-1]) == 1)

CH.p[i, t] <- event

} #t

} #i

#---- Reorder

CH.p <-

CH.p[

c(which(ent.occ == 1),

which(ent.occ == 2),

which(ent.occ == 3),

which(ent.occ == 4)),]

CH.sur <-

CH.sur[

c(which(ent.occ == 1),

which(ent.occ == 2),

which(ent.occ == 3),

which(ent.occ == 4)),]

# Remove individuals never captured

CH <- CH.p * CH.sur

cap.sum <- rowSums(CH)

never <- which(cap.sum == 0)

if(length(never) > 0) {

CH.p <- CH.p[-never,]

}

Nt <- numeric(n.occasions)

# Calculate super population size by the simulated data

for(i in 1:n.occasions){

Nt[i] <- length(which(CH.sur[,i] > 0))

}

#-----

CH <- CH.p

CH[is.na(CH) == TRUE] <- 0

CH[CH == dim(PSI.state)[1]] <- 0

id <- numeric(0)

for(i in 1:dim(CH)[1]){ # For each individual row

z <- min(which(CH[i, ] != 0))

# which column is the lowest that does not = 0?

ifelse(z == dim(CH)[2], id <- c(id, i), id <- c(id))

# keep a list of individuals only captured on last occasion

}

CH.sur[CH.sur == 0] <- 1

return(list(CH.p = CH.p[-id, ],

CH.sur = CH.sur[-id,],

B = B,

Nt = Nt))

}

# Execute simulation function

sim <- simul.js(PSI.state = PSI.state, PSI.obs = PSI.obs,

gamma_U = gamma_U, gamma_I = gamma_I, N = N,

N_U = N_U, N_I = N_I)

#----------------------------#

Hope this was helpful and feel free to shoot me an email with any questions at grace.direnzo@gmail.com

Within the next couple of days, I'll be sharing a project I’ve been part of for several years but just recently started writing the code for.

Watch out for my new post on Multi-State Cormack-Jolly-Seber models!!