Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binomial self report #11

Merged
merged 30 commits into from
Sep 14, 2023
Merged

Binomial self report #11

merged 30 commits into from
Sep 14, 2023

Conversation

jeremy-haynes
Copy link
Collaborator

Hey Nate,

I started on the binomial model for the self-report data and am a little stuck, so I figured that a (hopefully) good way to see if you could look at it would be to make a new branch off of the fix-group-means-correlations branch (let me know if there was a better way of doing this). Where I'm stuck is that the model produces a lot (~90%) divergent transitions (not good 🥲). I think that it may have to do with the priors. I based them off of Kruschke, but I've also run into a similar issue before with trying to fit somewhat similar models, so I'm thinking I'm misunderstanding something.

I tried to setup the RMD file so you could fit the binomial stan model for just the prdep_tot data (the code is designed to run through each measure but there's a line that sets the scale of interest to just prdep_tot). There are some things I need to check with Tom about in terms of data, but I don't think those would affect the model-fit. Essentially, I've subtracted each scales' minimum possible score from the participant's score and the maximum possible score and I need him to confirm some of those mins/maxes (letting you know in case that does have some effect I'm unaware of).

Let me know what you think and if it'd be easier to meet about it I'd be happy to set up a time.

Thanks!
Jeremy

Copy link
Collaborator

@Nathaniel-Haines Nathaniel-Haines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremy-haynes this looks good! I added some notes on what needs to change to fix divergences—let me know if you have any questions on that.

Also, I merged-to-main the branch you have this one set to merge into.. 😅 That said, you should just be able to set this branch to merge to main as well!

1_IGT_PP/Code/Stan/binomial_selfreport.stan Outdated Show resolved Hide resolved
1_IGT_PP/Code/R/0_Preprocess/SelfReportPreprocess.R Outdated Show resolved Hide resolved
@Nathaniel-Haines Nathaniel-Haines changed the base branch from fix-group-means-correlations to main June 7, 2023 03:05
@Nathaniel-Haines
Copy link
Collaborator

Also, I merged-to-main the branch you have this one set to merge into.. 😅 That said, you should just be able to set this branch to merge to main as well!

@jeremy-haynes I went ahead and changed it FYI

@jeremy-haynes jeremy-haynes marked this pull request as ready for review June 7, 2023 14:07
jeremy-haynes

This comment was marked as off-topic.

@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines ,

I think I've got the ind-mu models for the IGT and self-report data working. That is, the models where we use all of the data from both sessions in estimating each parameter individually (i.e., not a separate parameter for each session). So far, these models are fit to the IGT and self-report data separately (i.e., not as a joint model). I believe those are working correctly - at least there doesn't seem to be 90% divergent transitions (or other problems) after you helped on the last part haha.

I was feeling a bit confident, so I took a crack at the joint model where we wanted to fit the ORL to the IGT data and the binomial model to the self-report data simultaneously, so that we could obtain correlations between the ORL and binomial model parameters (RMD here and stan here. I get an error that I cannot seem to diagnose (below). Because it's quite possible I am totally off the mark with this joint model, I figured now would be a good time to touch base with you so I don't waste time diagnosing this model if it's totally not what we were thinking). I'm happy to work through this via messaging or meeting - I could meet this week or next, whenever is convenient for you.

I also changed several file names in my last commit. I did this because I was getting confused on what models the files were referencing (e.g., the linear model was called a joint ind-mu model etc.); however, that might mess with how some files are referenced for your gitignored files (e.g., the .rds models). Feel free to undo the last commit if that messes things up on your end too much.

Hope all is well!
Jeremy

Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can't start sampling from this initial value.
...
Chain 1: Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:Chain 1: Exception: modela06c64065160__namespace::log_prob: theta[sym1__] is nan, but must be greater than or equal to 0 (in 'string', line 78, column 2 to column 36)
Chain 1: If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1: but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Copy link
Collaborator

@Nathaniel-Haines Nathaniel-Haines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremy-haynes this all looks good! Nice job putting together the joint self-report + ORL model, it checks out on my end 👍

I just made a comment re the path name to the stan data needing changed (based on what the name is when preprocessed+saved out). Actually, I wonder if that had something to do with the trouble running the model on your end? e.g., perhaps you have data with the current name, but it is not the data we mean to pass to the model? Because after running the preprocessing code and then changing the name as I suggest in the comment, the model does sample without issue on my end 🤔

The only other high-level comment I have is that, when it comes to these types of joint models, it is really important to check that the ordering of participants in the stan_data is the same across behavioral and self-report measures. Perhaps we could save the subject IDs in the stan list when doing the preprocessing for both the ORL and self-reports, and then check that the lists are identical when bringing the data together for the modeling step?

1_IGT_PP/Code/R/1_Analyses/fitting_ORL_lineartime.Rmd Outdated Show resolved Hide resolved

## Load Data
```{r}
igt_data = readRDS(here("1_IGT_PP", "Data", "1_Preprocessed", "stan_ready_ORL_IGT.rds"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re the name needing to be stan_ready_IGT_ORL.rds

@jeremy-haynes
Copy link
Collaborator Author

@Nathaniel-Haines that sounds good! The incorrect data-referencing would be an issue haha! This all sounds good; I'll get to work on updating some of this. And I definitely agree, adding the subject IDs is wise to make sure it's all referencing the right people. I'll get to work on that as well. Thanks for checking out!

@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines ,

I wanted to touch base with you on two things with fitting the Joint ORL-Binomial models for the IGT & self-report data (before I forget them😅).

As an update, so far, the models seem to be fitting fine. I suppose the problem I had before with the "rejecting initial value..." did have something to do with referencing the wrong datafile 🤷‍♂️. I've been fitting the models one at a time and they take a while but so far so good.

The first thing I wanted to touch base with you on is how I added checks for the IDs between the IGT and self-report data. I added code to the preprocessing files to save the participant IDs for the IGT data and the self-report data, and then in the RMD for fitting the model, I put in a check (see here) to make sure that those are the same. (btw, the permalinks are super cool on here!). I feel good that those are working correctly and that the IDs are the same across the IGT and self-report data, but if you notice anything, let me know.

The second thing I wanted to check with you about is making sure I'm labeling the posteriors correctly for the correlation matrix. Based on how the correlation matrix is set up and how we use the SD derived from that correlation matrix in the joint ORL-binomial stan code, I believe that we are essentially setting up the following 6x6 correlation matrix:
image

If we extract the posterior distributions from the stan-fit, we get a 16000x6x6 array of correlations coefficients (i.e., orl_posteriors <- extract(orl_fit)... orl_posteriors$R). Rows should be iterations and I'm assuming the columns and matrix slices correspond to the rows and columns, respectively, of the correlation matrix above. Would that be the correct assumption? For example, orl_posteriors$R[,1,1] should be the posterior distribution of Rs between Arew and Arew (thus, all equal to 1 which it is), orl_posteriors$R[,2,1] should be the posterior distribution of Rs between Arew and Apun... I have some, admittedly complicated, code that runs through the posterior distribution of Rs in this way (here). Let me know what you think.

Thanks,
Jeremy

@Nathaniel-Haines
Copy link
Collaborator

The first thing I wanted to touch base with you on is how I added checks for the IDs between the IGT and self-report data. I added code to the preprocessing files to save the participant IDs for the IGT data and the self-report data, and then in the RMD for fitting the model, I put in a check (see here) to make sure that those are the same. (btw, the permalinks are super cool on here!). I feel good that those are working correctly and that the IDs are the same across the IGT and self-report data, but if you notice anything, let me know.

@jeremy-haynes yes this check is great! Helps us know that everything is aligned properly. I made a minor suggestion about a way to check more explicitly, but definitely not necessary and more just a readability/style thing.

If we extract the posterior distributions from the stan-fit, we get a 16000x6x6 array of correlations coefficients (i.e., orl_posteriors <- extract(orl_fit)... orl_posteriors$R). Rows should be iterations and I'm assuming the columns and matrix slices correspond to the rows and columns, respectively, of the correlation matrix above. Would that be the correct assumption? For example, orl_posteriors$R[,1,1] should be the posterior distribution of Rs between Arew and Arew (thus, all equal to 1 which it is), orl_posteriors$R[,2,1] should be the posterior distribution of Rs between Arew and Apun... I have some, admittedly complicated, code that runs through the posterior distribution of Rs in this way (here). Let me know what you think.

Yes you are 100% correct! Basically, each iteration, R[i,1,2] is a correlation matrix for the given indices (1 and 2 in this case). So, the posterior on the correlation is the distribution of values for the given indices across all iterations i. If you want to easily just get the mean correlation matrix, you can do something like apply(R, c(2,3), mean), and it will return the mean for each posterior correlation in the format of the matrix itself. More generally, the apply function is great for summarizing posteriors that are in the form of a matrix. You just specify which dimensions you want to get the result back in (i.e. passing in c(2,3) means I want to get back the second and third indices), and then pass in any function of interest (e.g., mean, sd), and you will get a result back that is a bit easier to interpret given it retains the matrix format.

Makes ID checking syntax more clear

Co-authored-by: Nathaniel Haines <[email protected]>
@jeremy-haynes
Copy link
Collaborator Author

@Nathaniel-Haines thanks for checking these out! I definitely support more readability, so committed that. And that sounds good about the correlation matrix. That's awesome I could use apply. I definitely made that way harder than it needed to be. Thanks!

@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines , I've run into an issue with two of the joint ORL-binomial models that I'm a little stuck on. Specifically, the models fit to basfunsk and BIS self-report data return the warning/error message below (I only put basfunsk plots for illustration).

image

It seems to have issues with mus and choice_pred values, but I don't see the issue in the output. For example, when I make traceplots of the mus, I get the following

image

I've looked at the self-report data and I'm not seeing an issue. In addition, I would think that that would not affect the ORL parameters (although I assume it all interacts to some extent). Do you have any thoughts?

@jeremy-haynes
Copy link
Collaborator Author

Darn, my last comment's hyperlinks are not seeming to work. Essentially, I added the files in the commit labeled "added carryover parameter to joint model (cow)" and the important lines of stan code are 180-195.

@Nathaniel-Haines
Copy link
Collaborator

I added the carryover parameter to the joint model. I did a quick fit (2000 iterations w/1000 warmup) and everything seems relatively similar to as before. It might be worth double-checking that I implemented it correctly. The RMD for running it is here. I added several lines of code to the joint model in a new stan file here, but the area that may need a careful look at it here. Let me know what you think!

@jeremy-haynes I looked over and you implemented it well! The only issue I could note was that, you would want to add the weighting to the generated quantities block as well in order to generate the posterior predictions and log likelihood values correctly. But that would not influence the parameters estimates themselves. I ran a small model too and it seems that things are like you said—not a huge difference here (although the weighting parameter is quite high! suggesting there are some carry-over effects, just not anything super substantial).

That said, the estimates themselves are just rather uncertain, they aren't necessarily low. e.g., here is an example of the correlation I get for Arew:

Screen Shot 2023-07-16 at 4 26 12 PM

The mean is low-ish, and the CI is indeed over 0, but it is primarily due to the fact that uncertainty is just so high. I assume that results would be a little more clear with more data, but for now I would not say that this necessarily bodes poorly for reliability. It could be that the play-pass version is just less informative than the original version wrt the model parameters 🤔

I think that it would be reasonable to more forward with just the original test-retest model results—nobody has published results for this particular model-task combination, and it could be informative for others interested in the task. Curious what you think?

@Nathaniel-Haines
Copy link
Collaborator

Nathaniel-Haines commented Jul 17, 2023

I have been playing around with different fictive updating rules, and one that I tried was the following:

// Likelihood - predict choice as a function of utility
        choice[i,t,s] ~ categorical_logit(to_vector(utility[card[i,t,s], :]));
        
        // After choice, calculate prediction error
        PEval      = outcome[i,t,s] - ev[card[i,t,s], choice[i,t,s]];     // Value prediction error
        PEfreq     = sign[i,t,s] - ef[card[i,t,s], choice[i,t,s]];        // Win-frequency prediction error
        PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]]; // Lose-frequency prediction error?
        ef_chosen = ef[card[i,t,s], choice[i,t,s]];
        
        if (outcome[i,t,s] >= 0) {  // If participant DID NOT lose
          // Update expected win-frequency of what participant DID NOT chose to do
          ef[:, choice[i,t,s]] = ef[:, choice[i,t,s]] + Apun[i,s] * PEfreq_fic;
          // Update what participant chose
          ef[card[i,t,s], choice[i,t,s]] = ef_chosen + Arew[i,s] * PEfreq;
          ev[card[i,t,s], choice[i,t,s]] = ev[card[i,t,s], choice[i,t,s]] + Arew[i,s] * PEval;
        } else { // If participant DID lose
          // Update expected win-frequency of what participant DID NOT choose to do
          ef[:, choice[i,t,s]] = ef[:, choice[i,t,s]] + Arew[i,s] * PEfreq_fic;
          // Update what participant chose
          ef[card[i,t,s], choice[i,t,s]] = ef_chosen + Apun[i,s] * PEfreq;
          ev[card[i,t,s], choice[i,t,s]] = ev[card[i,t,s], choice[i,t,s]] + Apun[i,s] * PEval;
        }

in the ORL for the original IGT, the fictive updating is across decks given that there is only one choice option per deck (i.e. just play, no pass). When generalizing the model, I original decided to make the fictive updating work within decks (i.e. between play and pass) as opposed to across decks. The code above does fictive updating across decks for the same option (i.e. if play was chosen, play is fictively updated across decks), which is more similar to the original ORL model.

When fitting this model, things look a bit different. Basically, reliability looks much better for Arew, slightly worse for Apun, and similar (if I remember correctly) for other parameters:

Screen Shot 2023-07-16 at 8 28 39 PM

Also, the posterior predictions look similar to before, indicating that fit is not compromised by this change:

Screen Shot 2023-07-16 at 8 24 19 PM

Anyway, could be worth exploring this model more! Let me know what you think, and if I can answer any questions this all brings to mind. (NOTE: A couple chains were not converging well for the model I fit, in part likely due to the fact that I fit so few samples. We might need to modify priors if this continues with more samples, but would first want to just run the model with all samples to see if it is still an issue. The plots I have above are after removing those bad chains)

@Nathaniel-Haines
Copy link
Collaborator

For completeness, here are posterior predictions for the session two data from the same model (above are session 1):

Screen Shot 2023-07-16 at 8 40 34 PM

@hollysully
Copy link
Owner

Wow! Changing the fictive updating really changed reliability for Arew—that is exciting!

@jeremy-haynes
Copy link
Collaborator Author

Interesting. I’ve wondered about that part of the model - mostly, I've had a hard time wrapping my head around how it affected value for each option. So, previously (when we were fitting it initially to this data as you describe above), fictive updating basically only affected the card that you were presented with and only when you passed on it (because the expected win frequency of passing is always 0, so there is no fictive updating for that); right? The change makes sense to me. Essentially, when you play on a deck, that changes the expected win frequency of the other decks. Are the participant's told that some decks are good and others are bad? I think that would further justify it because they might be making a comparison across decks. Also, just to make sure I understand, the way it's coded would not cause fictive updating of the other decks (or at least their plays) when the participant passes on a deck right?

Also, one thought I had this morning is that we do have another dataset with this task. We could train this model on these data and then fit it to the parents of the longitudinal dataset (e.g., just for Time 1 and Time 2). Just a thought.

@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines ,

I wanted to touch base on how the model-fitting is going based on our conversation last week. I was able to fit the updated fictive models to the IGT and SIGT data. One model uses the fictive updating as your coded wherein fictive updating occurs across decks for plays and passes with the updating being independent with respect to plays and passes. That is, we replaced PEfreq_fic = -sign[i,t,s]/1 - ef[card[i,t,s], (3-choice[i,t,s])]; in the original joint model with PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]]; in what I call the updatedfic model. The other model only updates fictive learning across decks for plays, not passes. Essentially, I did the following in the stan model I call playfic model:
if(choice[i,t,s]==1) { PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]]; } else { PEfreq_fic = ef[:, choice[i,t,s]]; }
When I fit these models to the data, two odd things happen.

First, when I fit the joint versions of both models to the 2-session IGT data, the chains look terrible for session 1 parameters; session 2 parameter chains look relatively good I think. Below are the group-level parameter traceplots from the playfic model (person-level parameter chains show a similar pattern):
image
I am currently refitting one of the models with tighter priors on the mus... normal(0,.5), but to be honest, that's a bit of a guess.

The second oddity is that I believe that these two models are identical. The estimates and traceplots look the same for the joint models AND for the models I fit to the SIGT data (the monetary task). For the SIGT, those fit better; the traceplots look fine although there are 32 divergent transitions (for both types of fictive updating models). I'm working on PPC plots for the SIGT data now.

If you have any insights or thoughts, let me know. Also, in my last push, I caused a conflict (I think because I updated the regular ORL model from the SIGT). I'm not 100% sure the best way to resolve it: should I just delete the lines I think are problematic and mark it as resolved? Anyway, thanks and I hope all is well!

@Nathaniel-Haines
Copy link
Collaborator

@jeremy-haynes Thanks for the update! Ah yeah it looks like convergence is still an issue with more iterations.. I was worried about this, as it showed up a bit with the small number of samples I used originally. I think the issue is not actually too bad though—if you do traceplot(orl_fit, pars = c("mu_p", "lp__")), you will see that the log posterior is much lower for the bad chain than for the good chains. To me, this indicates that the bad chain is just sampling away in a non-optimal region of the posterior, and that we should be able to fix this with more informative priors, a different parameterization of the model, or even by setting better initial values for the chains. This all contrasts a case where, e.g., the log posterior for the "bad" chain is similar to the "good" chains, which would indicate more of a parameter identifiability issue re the structure of the model itself.

In either case, I am working on the three different potential solutions I listed above and will report back once I made headway. re this point:

Essentially, I did the following in the stan model I call playfic model:
if(choice[i,t,s]==1) { PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]]; } else { PEfreq_fic = ef[:, choice[i,t,s]]; }

I think this is likely not the best implementation for only having fictive updating for play. The above is still doing fictive updating for pass, but it is setting the fictive prediction error to the current EF value. If we simply want to not update the EF values for unchosen decks when passing, the following rule would achieve that:

if(choice[i,t,s]==1) { 
    PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]]; 
} else { 
    PEfreq_fic = rep_vector(0, 4); 
}

Then, in the updating equation, the fictive update will have no effect on the EF since the fictive PE is 0 for all decks. Let me know if that makes sense!

@Nathaniel-Haines
Copy link
Collaborator

Nathaniel-Haines commented Jul 31, 2023

Oh and for the conflict—this is normal! It happened here because the 1_SIGT_PP/Code/Stan/igt_orl.stan file on the main branch must have been changed, and you have also changed it here in this branch. So there are now two different version of the file and github does not know which should be preserved. Before merging this branch, we will just have to resolve the conflict. Happy to meet and walk through that process at some point!

Copy link
Collaborator

@Nathaniel-Haines Nathaniel-Haines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so after troubleshooting, I found that the convergence issues were a combination of the uninformative priors and the initial values used to kick-off MCMC sampling.

Of these, it was mostly the initial values causing issues. By default, Stan will randomly sample between -2 and 2 to set initial values for each parameter for each chain. But sometimes a value in [-2,2] is not realistic given the model. Instead, you can pass an init argument to Stan that generates initial values for parameters of interest in a custom way. In the suggestion below, I am setting initial values for the group mean parameters (mu_p) in such a way that the initial values are randomly sampled from a region a bit closer to where we expect chains will end up. This basically fixes things! As seen below:

image

However, I realized that more informative priors could also make estimation a bit more robust, (see the recommend the change below). Before, the mu_p ~ normal(0, 1) prior implied uniform priors on the group-level mean learning rates and K parameter. Learning rates are generally around .1 to .5 for models/tasks like the IGT, and so I set some priors to better capture this. When combined with the initial value function, this leads to not only better convergence, but K now shows some improvement in reliability from before:

image

Curious what you think! I have not applied this to the playfic model yet, but I imagine the same change to priors+initial values function should fix the issues there too 🤓 . Feel free to take or leave the prior recommendation—the initial value fix is the most important here.

1_IGT_PP/Code/Stan/igt_orl_joint_updatedfic.stan Outdated Show resolved Hide resolved
@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines ,

Sorry for the delayed response. I've been working on a couple of things that I had wanted to finish before replying (probably getting a little too mad-scientist-ish on some of this haha). Anyway, I think this all makes sense. I do have a couple of questions, just so I understand.

  1. When looking at the log-posterior for the bad chain, the fact that it is below the other chains indicates that the chain is sampling a parameter space that has a lower 'probability' relative to the other chains' parameter space. Would this be similar to comparing the log likelihoods where we would want the parameter that maximizes the log likelihood?
  2. I saw that when you tightened the priors for the mus, you 'loosened' the priors for the sigmas. Could you tell me why you did that? I ask because I'm still trying to grasp the interplay between those and how they affect estimation.

I reran the model with the updated priors and starting values and the chains definitely improved. There are a couple of areas that could probably be better (Session 2 sigma Arew looks to have an odd chain; so do a couple of the BLUPs). If I wanted to fix those, would I probably tighten up the priors you think?
image

I've also been doing a bit of tweaking with the model too and found some potential avenues to consider. I started working with a model in which updating only occurs whenever a play is made (here). Essentially I put an if(choice[i,t,s]==1){//update} which improved reliability, notably for Arew. At this point, however, I wasn't confident that K (memory decay) was being used properly (that is, assuming that memory decay happens on every trial regardless of whether a play or pass is made). So, I put in an else statement to update the perseverance and utility of plays when a pass is made (this also occurs for plays but everything else updates as well; changes are here and posteriors for reliability are below). From my understanding, this implementation would essentially keep utility for passes = 0 always (let me know if my intuition is incorrect here). In this model, I've been using your implementation of fictive updating across decks which I found to be important when I accidentally fit the model without it (PPCs looked terrible without the change to fictive updating). Before posting this today, I realized I had made a coding error, so I reran the model with just 2000 iterations (1000 of which are warmup). Despite the fewer iterations, the model seems to converge fine and doing a quick LOOIC comparison between 1) this model, 2) the original play-or-pass joint model that we were fitting, and 3) the model with just your change to fictive updating, this newer model fits better. That said, when I used the same model fit to the play-or-pass monetary S-IGT data (that is, from the separate sample that did the PP-IGT and PP-SIGT once), this is not the case. Suffice to say, maybe I'm going down a rabbit hole, but I'd be interested in your thoughts in any case.

image

Two-Timepoint Monetary IGT Dataset
elpd_diff se_diff
Joint ORL Playfic Playupdate Modified K 0.0 0.0
Joint ORL -41.8 32.4
Joint ORL Updated Fic -55.4 21.3

Single-Timepoint Monetary IGT Dataset from S-IGT Project
elpd_diff se_diff
ORL 0.0 0.0
ORL Updated Fic -19.8 26.3
ORL Playfic Playupdate Modified K -6166.6 500.0

Also, I'd definitely be interested in meeting to figure out the best way to resolve the conflict. I don't think that's it's too pressing or messing up anything, so whenever you have time, let me know and we can meet about it.

Hope all is well - Jeremy

@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines I just realized that I had implemented the model for the singe-timepoint ORL Playfic Playupdate Modified K. I'm re-running it now and will post the results of the model-comparison once they come in.

@jeremy-haynes
Copy link
Collaborator Author

Hey @Nathaniel-Haines I just realized that I had implemented the model for the singe-timepoint ORL Playfic Playupdate Modified K. I'm re-running it now and will post the results of the model-comparison once they come in.

Ok, I just finished re-running the model and model-comparisons which are below.
elpd_diff se_diff
ORL 0.0 0.0
ORL Updatedfic -19.8 26.3
ORL Playfic Playupdate Modified K -45.7 50.9

The original ORL is still the best-fitting although the standard errors of the differences are pretty high and would make the elpd_diff overlap with 0. From my understanding, this makes which model is the best-fitting a little ambiguous, but I'm less familiar with this type of model-comparison, so I'd be interested in your thoughts. For these, I only ran the model for 2000 iterations (with 1000 warmup) to get some results out before out meeting tomorrow, so I'm running it now as the larger model now which may finish before our meeting but I'm not 100% sure. Anyway, we can discuss stuff tomorrow. Hope all is well! - Jeremy

@jeremy-haynes
Copy link
Collaborator Author

jeremy-haynes commented Aug 18, 2023

Hey @Nathaniel-Haines , I hope things are good on your end. I tried fitting the ORL without the perseverance term (i.e., excluding k and betaP) and it's having trouble with convergence. I even tried some of the changes to the priors and starting values and that still didn't quite help. Tom's thoughts on that were that that probably means that model is probably just not a good fit. I have the code for that if you'd like to see and everything, but I haven't pushed it so as not to throw too much on here that we'll probably not use. I also calculated the log-Arew/Apun ratios and correlated these with the selfreport measures. These show largely similar relations to what we observed with the individual Arew and Apun parameters, so Tom figures that for simplicity, we can just talk about the individual parameter-selfreport correlations. I did push these in case you wanted to take a look (overall correlations & session-specific correlations). As far as I can tell, I think that we're heading towards using the model in which fictive updating occurs across decks (instead of within-decks) and updating only occurs on plays; however, when passes occur, perseverance is updated. Here's the RMD and stan file for that model. I've also put some plots below to give you a summary of the model. For now, I'm going to work on writing the data analysis and results as if we're going with this model, but let me know what you think of all this and if you have any feedback, notes, etc. on anything. Thanks - Jeremy

Traceplots
image

PPC Plots
image

Estimates
image

Reliability Estimates
image

@jeremy-haynes
Copy link
Collaborator Author

jeremy-haynes commented Aug 18, 2023

As I was writing, I realized that one part of the playfic_playupdate_modifiedK model might need to be tweaked. Specifically, when a participant passes, perseverance is updated for plays across all decks. I wrote that as
pers[:,1] = pers[:,1] / (1 + K_tr);
(see also here) but I think I misunderstood perseverance here. Shouldn't it still be

pers[card[i,t,s], 1] = 1;
pers = pers / (1 + K_tr);

Let me know what you think

@Nathaniel-Haines
Copy link
Collaborator

Ok, I just finished re-running the model and model-comparisons which are below. elpd_diff se_diff ORL 0.0 0.0 ORL Updatedfic -19.8 26.3 ORL Playfic Playupdate Modified K -45.7 50.9

The original ORL is still the best-fitting although the standard errors of the differences are pretty high and would make the elpd_diff overlap with 0. From my understanding, this makes which model is the best-fitting a little ambiguous, but I'm less familiar with this type of model-comparison, so I'd be interested in your thoughts. For these, I only ran the model for 2000 iterations (with 1000 warmup) to get some results out before out meeting tomorrow, so I'm running it now as the larger model now which may finish before our meeting but I'm not 100% sure. Anyway, we can discuss stuff tomorrow. Hope all is well! - Jeremy

@jeremy-haynes yes you are exactly right! When the difference is this uncertain, it is safe to assume that we cannot make a clear judgement call that one model is better than the other. There are other things we could do to test the models with more scrutiny (e.g., fitting to just 1 timepoint, predicting for the same participants on the second timepoint and scoring models accordingly), but that will likely produce a similar conclusion. In cases like this, we can always report multiple models, but selecting a "final" model to use for inference comes down to more than just the fit. We can also report the estimates of interest from each model—it seems there is not a whole lot of variability between different models, so I don't expect the substantive results to vary my much either.

@Nathaniel-Haines
Copy link
Collaborator

As I was writing, I realized that one part of the playfic_playupdate_modifiedK model might need to be tweaked. Specifically, when a participant passes, perseverance is updated for plays across all decks. I wrote that as pers[:,1] = pers[:,1] / (1 + K_tr); (see also here) but I think I misunderstood perseverance here. Shouldn't it still be

pers[card[i,t,s], 1] = 1;
pers = pers / (1 + K_tr);

Let me know what you think

@jeremy-haynes apologies for the delayed responses! Things have been extra busy at Ledger, and I only now had some time to come back to this.

Overall, I think this is a solid set of results! Your reasoning throughout the last few posts above makes sense to me, and I agree on converging toward the model that only updates on plays, and then does fictive updating across decks as opposed to within decks. I think this model is also true to the original ORL.

re the perseverance rule, I think it should be the same result in either case. I think you can even simplify this code a bit by updating this part:

if(choice[i,t,s]==1){
            // After choice, calculate prediction error
            PEval      = outcome[i,t,s] - ev[card[i,t,s], choice[i,t,s]]; 
            PEfreq     = sign[i,t,s] - ef[card[i,t,s], choice[i,t,s]];       
            PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]];
            ef_chosen  = ef[card[i,t,s], choice[i,t,s]];
            
            if (outcome[i,t,s] >= 0) {
              ef[:, choice[i,t,s]] = ef[:, choice[i,t,s]] + Apun[i,s] * PEfreq_fic;
              ef[card[i,t,s], choice[i,t,s]] = ef_chosen + Arew[i,s] * PEfreq;
              ev[card[i,t,s], choice[i,t,s]] = ev[card[i,t,s], choice[i,t,s]] + Arew[i,s] * PEval;
            } else { 
              ef[:, choice[i,t,s]] = ef[:, choice[i,t,s]] + Arew[i,s] * PEfreq_fic;
              ef[card[i,t,s], choice[i,t,s]] = ef_chosen + Apun[i,s] * PEfreq;
              ev[card[i,t,s], choice[i,t,s]] = ev[card[i,t,s], choice[i,t,s]] + Apun[i,s] * PEval;
            }
         
            pers[card[i,t,s], choice[i,t,s]] = 1;
            pers = pers / (1 + K_tr);
            
            utility = ev + ef * betaF[i,s] + pers * betaP[i,s];
          } else { // else pass
          pers[:,1] = pers[:,1] / (1 + K_tr);
          utility[:,1] = ev[:,1] + ef[:,1] * betaF[i,s] + pers[:,1] * betaP[i,s];
          }
        }

to:

if(choice[i,t,s]==1){
            // After choice, calculate prediction error
            PEval      = outcome[i,t,s] - ev[card[i,t,s], choice[i,t,s]]; 
            PEfreq     = sign[i,t,s] - ef[card[i,t,s], choice[i,t,s]];       
            PEfreq_fic = -sign[i,t,s]/3 - ef[:, choice[i,t,s]];
            ef_chosen  = ef[card[i,t,s], choice[i,t,s]];
            
            if (outcome[i,t,s] >= 0) {
              ef[:, choice[i,t,s]] = ef[:, choice[i,t,s]] + Apun[i,s] * PEfreq_fic;
              ef[card[i,t,s], choice[i,t,s]] = ef_chosen + Arew[i,s] * PEfreq;
              ev[card[i,t,s], choice[i,t,s]] = ev[card[i,t,s], choice[i,t,s]] + Arew[i,s] * PEval;
            } else { 
              ef[:, choice[i,t,s]] = ef[:, choice[i,t,s]] + Arew[i,s] * PEfreq_fic;
              ef[card[i,t,s], choice[i,t,s]] = ef_chosen + Apun[i,s] * PEfreq;
              ev[card[i,t,s], choice[i,t,s]] = ev[card[i,t,s], choice[i,t,s]] + Apun[i,s] * PEval;
            }
         
            pers[card[i,t,s], choice[i,t,s]] = 1;
          }
        pers = pers / (1 + K_tr);
        utility = ev + ef * betaF[i,s] + pers * betaP[i,s];
        }

Reason being—if it is a play, updating occurs as normal per the if statement, and the pers term is set to 1 for the played deck. But then you can just get rid of the else statement entirely because you always want to decay pers regardless of play or pass (and the pers term for pass will always be 0 anyway). The utility is then always calculated at the end, no need to specify just updating the play index for each deck because the pass indices will be unchanged.

Actually, now that I write this out, it is clear to me that ev, ef, and pers will now only take on non-zero values for the play options, and the pass option will always be 0. Therefore, utility will also only be non-0 for the play index. So there is now no need to have a play vs pass index for each deck. I will work on modifying the script to show what this change translates to in stan 🤓

@Nathaniel-Haines
Copy link
Collaborator

Nathaniel-Haines commented Aug 20, 2023

@jeremy-haynes OK I have added the simplified version of the model here. Basically, I just got rid of some parts of the code that became unnecessary as we changed how it was functioning. I think this version is nice because it is easier to reason about despite it being identical to the igt_orl_joint_playfic_playupdate_modifiedK.stan version. I ran it with a few hundred iterations and am including some plots below to show that it is the same as what you had before:

Screen Shot 2023-08-20 at 2 45 51 PM Screen Shot 2023-08-20 at 2 46 03 PM Screen Shot 2023-08-20 at 2 47 39 PM Screen Shot 2023-08-20 at 2 49 32 PM

Anyway, I am signed off on moving forward with this model! Great job BTW in getting it to this point, I know from experience that it is quite time consuming (and often unclear IF you will get to a point where you are happy with it.. ha), but I would say all your work has paid off!

NOTE: Oh and before we merge, happy to meet about resolving the git conflict!

Copy link
Collaborator Author

@jeremy-haynes jeremy-haynes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, this is much simpler, cool!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nathaniel-Haines This is a bit of an aside, so no worries to put this on the back-burner. I was looking back at the linear time model and I don't think stan is calculating the mu for session 2 correctly. I changed it in this commit; however, I don't think that I'm calculating it correctly either🤔. I initially assumed the mu_pr[2, ] represents the average slope-effect across participants and that adding that back into mu_pr[1, ] would give us the mu for the parameters at session 2, but that doesn't seem right because it assumes day = 1, right? So, would I calculate an average day and multiply that by the slope mu - e.g., (mean(day[,2]) * mu_pr[2,])? In addition, I wasn't sure how to incorporate the standard deviation of the slopes - e.g., sigma_Arew[2]. Let me know what your think.

Thanks,
Jeremy

@jeremy-haynes jeremy-haynes merged commit de352f4 into main Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants