-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binomial self report #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremy-haynes this looks good! I added some notes on what needs to change to fix divergences—let me know if you have any questions on that.
Also, I merged-to-main the branch you have this one set to merge into.. 😅 That said, you should just be able to set this branch to merge to main as well!
@jeremy-haynes I went ahead and changed it FYI |
Co-authored-by: Nathaniel Haines <[email protected]>
…ters correlated with choice proportions
Hey @Nathaniel-Haines , I think I've got the ind-mu models for the IGT and self-report data working. That is, the models where we use all of the data from both sessions in estimating each parameter individually (i.e., not a separate parameter for each session). So far, these models are fit to the IGT and self-report data separately (i.e., not as a joint model). I believe those are working correctly - at least there doesn't seem to be 90% divergent transitions (or other problems) after you helped on the last part haha. I was feeling a bit confident, so I took a crack at the joint model where we wanted to fit the ORL to the IGT data and the binomial model to the self-report data simultaneously, so that we could obtain correlations between the ORL and binomial model parameters (RMD here and stan here. I get an error that I cannot seem to diagnose (below). Because it's quite possible I am totally off the mark with this joint model, I figured now would be a good time to touch base with you so I don't waste time diagnosing this model if it's totally not what we were thinking). I'm happy to work through this via messaging or meeting - I could meet this week or next, whenever is convenient for you. I also changed several file names in my last commit. I did this because I was getting confused on what models the files were referencing (e.g., the linear model was called a joint ind-mu model etc.); however, that might mess with how some files are referenced for your gitignored files (e.g., the .rds models). Feel free to undo the last commit if that messes things up on your end too much. Hope all is well! Chain 1: Rejecting initial value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremy-haynes this all looks good! Nice job putting together the joint self-report + ORL model, it checks out on my end 👍
I just made a comment re the path name to the stan data needing changed (based on what the name is when preprocessed+saved out). Actually, I wonder if that had something to do with the trouble running the model on your end? e.g., perhaps you have data with the current name, but it is not the data we mean to pass to the model? Because after running the preprocessing code and then changing the name as I suggest in the comment, the model does sample without issue on my end 🤔
The only other high-level comment I have is that, when it comes to these types of joint models, it is really important to check that the ordering of participants in the stan_data
is the same across behavioral and self-report measures. Perhaps we could save the subject IDs in the stan list when doing the preprocessing for both the ORL and self-reports, and then check that the lists are identical when bringing the data together for the modeling step?
|
||
## Load Data | ||
```{r} | ||
igt_data = readRDS(here("1_IGT_PP", "Data", "1_Preprocessed", "stan_ready_ORL_IGT.rds")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re the name needing to be stan_ready_IGT_ORL.rds
1_IGT_PP/Code/Stan/igt_selfreport_orl_binomial_joint_indMu.stan
Outdated
Show resolved
Hide resolved
Co-authored-by: Nathaniel Haines <[email protected]>
Co-authored-by: Nathaniel Haines <[email protected]>
@Nathaniel-Haines that sounds good! The incorrect data-referencing would be an issue haha! This all sounds good; I'll get to work on updating some of this. And I definitely agree, adding the subject IDs is wise to make sure it's all referencing the right people. I'll get to work on that as well. Thanks for checking out! |
…ained behavioral analyses, and added log_lik
…IGT_PP_Shared into Binomial-Self-Report
Hey @Nathaniel-Haines , I wanted to touch base with you on two things with fitting the Joint ORL-Binomial models for the IGT & self-report data (before I forget them😅). As an update, so far, the models seem to be fitting fine. I suppose the problem I had before with the "rejecting initial value..." did have something to do with referencing the wrong datafile 🤷♂️. I've been fitting the models one at a time and they take a while but so far so good. The first thing I wanted to touch base with you on is how I added checks for the IDs between the IGT and self-report data. I added code to the preprocessing files to save the participant IDs for the IGT data and the self-report data, and then in the RMD for fitting the model, I put in a check (see here) to make sure that those are the same. (btw, the permalinks are super cool on here!). I feel good that those are working correctly and that the IDs are the same across the IGT and self-report data, but if you notice anything, let me know. The second thing I wanted to check with you about is making sure I'm labeling the posteriors correctly for the correlation matrix. Based on how the correlation matrix is set up and how we use the SD derived from that correlation matrix in the joint ORL-binomial stan code, I believe that we are essentially setting up the following 6x6 correlation matrix: If we extract the posterior distributions from the stan-fit, we get a 16000x6x6 array of correlations coefficients (i.e., Thanks, |
@jeremy-haynes yes this check is great! Helps us know that everything is aligned properly. I made a minor suggestion about a way to check more explicitly, but definitely not necessary and more just a readability/style thing.
Yes you are 100% correct! Basically, each iteration, |
Makes ID checking syntax more clear Co-authored-by: Nathaniel Haines <[email protected]>
@Nathaniel-Haines thanks for checking these out! I definitely support more readability, so committed that. And that sounds good about the correlation matrix. That's awesome I could use apply. I definitely made that way harder than it needed to be. Thanks! |
Hey @Nathaniel-Haines , I've run into an issue with two of the joint ORL-binomial models that I'm a little stuck on. Specifically, the models fit to basfunsk and BIS self-report data return the warning/error message below (I only put basfunsk plots for illustration). It seems to have issues with mus and choice_pred values, but I don't see the issue in the output. For example, when I make traceplots of the mus, I get the following ![]() I've looked at the self-report data and I'm not seeing an issue. In addition, I would think that that would not affect the ORL parameters (although I assume it all interacts to some extent). Do you have any thoughts? |
Darn, my last comment's hyperlinks are not seeming to work. Essentially, I added the files in the commit labeled "added carryover parameter to joint model (cow)" and the important lines of stan code are 180-195. |
@jeremy-haynes I looked over and you implemented it well! The only issue I could note was that, you would want to add the weighting to the That said, the estimates themselves are just rather uncertain, they aren't necessarily low. e.g., here is an example of the correlation I get for ![]() The mean is low-ish, and the CI is indeed over 0, but it is primarily due to the fact that uncertainty is just so high. I assume that results would be a little more clear with more data, but for now I would not say that this necessarily bodes poorly for reliability. It could be that the play-pass version is just less informative than the original version wrt the model parameters 🤔 I think that it would be reasonable to more forward with just the original test-retest model results—nobody has published results for this particular model-task combination, and it could be informative for others interested in the task. Curious what you think? |
Wow! Changing the fictive updating really changed reliability for Arew—that is exciting! |
Interesting. I’ve wondered about that part of the model - mostly, I've had a hard time wrapping my head around how it affected value for each option. So, previously (when we were fitting it initially to this data as you describe above), fictive updating basically only affected the card that you were presented with and only when you passed on it (because the expected win frequency of passing is always 0, so there is no fictive updating for that); right? The change makes sense to me. Essentially, when you play on a deck, that changes the expected win frequency of the other decks. Are the participant's told that some decks are good and others are bad? I think that would further justify it because they might be making a comparison across decks. Also, just to make sure I understand, the way it's coded would not cause fictive updating of the other decks (or at least their plays) when the participant passes on a deck right? Also, one thought I had this morning is that we do have another dataset with this task. We could train this model on these data and then fit it to the parents of the longitudinal dataset (e.g., just for Time 1 and Time 2). Just a thought. |
Hey @Nathaniel-Haines , I wanted to touch base on how the model-fitting is going based on our conversation last week. I was able to fit the updated fictive models to the IGT and SIGT data. One model uses the fictive updating as your coded wherein fictive updating occurs across decks for plays and passes with the updating being independent with respect to plays and passes. That is, we replaced First, when I fit the joint versions of both models to the 2-session IGT data, the chains look terrible for session 1 parameters; session 2 parameter chains look relatively good I think. Below are the group-level parameter traceplots from the playfic model (person-level parameter chains show a similar pattern): The second oddity is that I believe that these two models are identical. The estimates and traceplots look the same for the joint models AND for the models I fit to the SIGT data (the monetary task). For the SIGT, those fit better; the traceplots look fine although there are 32 divergent transitions (for both types of fictive updating models). I'm working on PPC plots for the SIGT data now. If you have any insights or thoughts, let me know. Also, in my last push, I caused a conflict (I think because I updated the regular ORL model from the SIGT). I'm not 100% sure the best way to resolve it: should I just delete the lines I think are problematic and mark it as resolved? Anyway, thanks and I hope all is well! |
@jeremy-haynes Thanks for the update! Ah yeah it looks like convergence is still an issue with more iterations.. I was worried about this, as it showed up a bit with the small number of samples I used originally. I think the issue is not actually too bad though—if you do In either case, I am working on the three different potential solutions I listed above and will report back once I made headway. re this point:
I think this is likely not the best implementation for only having fictive updating for play. The above is still doing fictive updating for pass, but it is setting the fictive prediction error to the current EF value. If we simply want to not update the EF values for unchosen decks when passing, the following rule would achieve that:
Then, in the updating equation, the fictive update will have no effect on the EF since the fictive PE is 0 for all decks. Let me know if that makes sense! |
Oh and for the conflict—this is normal! It happened here because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so after troubleshooting, I found that the convergence issues were a combination of the uninformative priors and the initial values used to kick-off MCMC sampling.
Of these, it was mostly the initial values causing issues. By default, Stan will randomly sample between -2 and 2 to set initial values for each parameter for each chain. But sometimes a value in [-2,2] is not realistic given the model. Instead, you can pass an init
argument to Stan that generates initial values for parameters of interest in a custom way. In the suggestion below, I am setting initial values for the group mean parameters (mu_p
) in such a way that the initial values are randomly sampled from a region a bit closer to where we expect chains will end up. This basically fixes things! As seen below:
However, I realized that more informative priors could also make estimation a bit more robust, (see the recommend the change below). Before, the mu_p ~ normal(0, 1)
prior implied uniform priors on the group-level mean learning rates and K parameter. Learning rates are generally around .1 to .5 for models/tasks like the IGT, and so I set some priors to better capture this. When combined with the initial value function, this leads to not only better convergence, but K now shows some improvement in reliability from before:
Curious what you think! I have not applied this to the playfic
model yet, but I imagine the same change to priors+initial values function should fix the issues there too 🤓 . Feel free to take or leave the prior recommendation—the initial value fix is the most important here.
Co-authored-by: Nathaniel Haines <[email protected]>
Co-authored-by: Nathaniel Haines <[email protected]>
Hey @Nathaniel-Haines , Sorry for the delayed response. I've been working on a couple of things that I had wanted to finish before replying (probably getting a little too mad-scientist-ish on some of this haha). Anyway, I think this all makes sense. I do have a couple of questions, just so I understand.
I reran the model with the updated priors and starting values and the chains definitely improved. There are a couple of areas that could probably be better (Session 2 sigma Arew looks to have an odd chain; so do a couple of the BLUPs). If I wanted to fix those, would I probably tighten up the priors you think? I've also been doing a bit of tweaking with the model too and found some potential avenues to consider. I started working with a model in which updating only occurs whenever a play is made (here). Essentially I put an ![]() Two-Timepoint Monetary IGT Dataset Single-Timepoint Monetary IGT Dataset from S-IGT Project Also, I'd definitely be interested in meeting to figure out the best way to resolve the conflict. I don't think that's it's too pressing or messing up anything, so whenever you have time, let me know and we can meet about it. Hope all is well - Jeremy |
Hey @Nathaniel-Haines I just realized that I had implemented the model for the singe-timepoint ORL Playfic Playupdate Modified K. I'm re-running it now and will post the results of the model-comparison once they come in. |
Ok, I just finished re-running the model and model-comparisons which are below. The original ORL is still the best-fitting although the standard errors of the differences are pretty high and would make the elpd_diff overlap with 0. From my understanding, this makes which model is the best-fitting a little ambiguous, but I'm less familiar with this type of model-comparison, so I'd be interested in your thoughts. For these, I only ran the model for 2000 iterations (with 1000 warmup) to get some results out before out meeting tomorrow, so I'm running it now as the larger model now which may finish before our meeting but I'm not 100% sure. Anyway, we can discuss stuff tomorrow. Hope all is well! - Jeremy |
Hey @Nathaniel-Haines , I hope things are good on your end. I tried fitting the ORL without the perseverance term (i.e., excluding k and betaP) and it's having trouble with convergence. I even tried some of the changes to the priors and starting values and that still didn't quite help. Tom's thoughts on that were that that probably means that model is probably just not a good fit. I have the code for that if you'd like to see and everything, but I haven't pushed it so as not to throw too much on here that we'll probably not use. I also calculated the log-Arew/Apun ratios and correlated these with the selfreport measures. These show largely similar relations to what we observed with the individual Arew and Apun parameters, so Tom figures that for simplicity, we can just talk about the individual parameter-selfreport correlations. I did push these in case you wanted to take a look (overall correlations & session-specific correlations). As far as I can tell, I think that we're heading towards using the model in which fictive updating occurs across decks (instead of within-decks) and updating only occurs on plays; however, when passes occur, perseverance is updated. Here's the RMD and stan file for that model. I've also put some plots below to give you a summary of the model. For now, I'm going to work on writing the data analysis and results as if we're going with this model, but let me know what you think of all this and if you have any feedback, notes, etc. on anything. Thanks - Jeremy |
As I was writing, I realized that one part of the playfic_playupdate_modifiedK model might need to be tweaked. Specifically, when a participant passes, perseverance is updated for plays across all decks. I wrote that as
Let me know what you think |
@jeremy-haynes yes you are exactly right! When the difference is this uncertain, it is safe to assume that we cannot make a clear judgement call that one model is better than the other. There are other things we could do to test the models with more scrutiny (e.g., fitting to just 1 timepoint, predicting for the same participants on the second timepoint and scoring models accordingly), but that will likely produce a similar conclusion. In cases like this, we can always report multiple models, but selecting a "final" model to use for inference comes down to more than just the fit. We can also report the estimates of interest from each model—it seems there is not a whole lot of variability between different models, so I don't expect the substantive results to vary my much either. |
@jeremy-haynes apologies for the delayed responses! Things have been extra busy at Ledger, and I only now had some time to come back to this. Overall, I think this is a solid set of results! Your reasoning throughout the last few posts above makes sense to me, and I agree on converging toward the model that only updates on plays, and then does fictive updating across decks as opposed to within decks. I think this model is also true to the original ORL. re the perseverance rule, I think it should be the same result in either case. I think you can even simplify this code a bit by updating this part:
to:
Reason being—if it is a play, updating occurs as normal per the Actually, now that I write this out, it is clear to me that |
@jeremy-haynes OK I have added the simplified version of the model here. Basically, I just got rid of some parts of the code that became unnecessary as we changed how it was functioning. I think this version is nice because it is easier to reason about despite it being identical to the ![]() ![]() ![]() ![]() Anyway, I am signed off on moving forward with this model! Great job BTW in getting it to this point, I know from experience that it is quite time consuming (and often unclear IF you will get to a point where you are happy with it.. ha), but I would say all your work has paid off! NOTE: Oh and before we merge, happy to meet about resolving the git conflict! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes, this is much simpler, cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nathaniel-Haines This is a bit of an aside, so no worries to put this on the back-burner. I was looking back at the linear time model and I don't think stan is calculating the mu for session 2 correctly. I changed it in this commit; however, I don't think that I'm calculating it correctly either🤔. I initially assumed the mu_pr[2, ]
represents the average slope-effect across participants and that adding that back into mu_pr[1, ]
would give us the mu for the parameters at session 2, but that doesn't seem right because it assumes day = 1, right? So, would I calculate an average day and multiply that by the slope mu - e.g., (mean(day[,2]) * mu_pr[2,])
? In addition, I wasn't sure how to incorporate the standard deviation of the slopes - e.g., sigma_Arew[2]
. Let me know what your think.
Thanks,
Jeremy
Hey Nate,
I started on the binomial model for the self-report data and am a little stuck, so I figured that a (hopefully) good way to see if you could look at it would be to make a new branch off of the fix-group-means-correlations branch (let me know if there was a better way of doing this). Where I'm stuck is that the model produces a lot (~90%) divergent transitions (not good 🥲). I think that it may have to do with the priors. I based them off of Kruschke, but I've also run into a similar issue before with trying to fit somewhat similar models, so I'm thinking I'm misunderstanding something.
I tried to setup the RMD file so you could fit the binomial stan model for just the prdep_tot data (the code is designed to run through each measure but there's a line that sets the scale of interest to just prdep_tot). There are some things I need to check with Tom about in terms of data, but I don't think those would affect the model-fit. Essentially, I've subtracted each scales' minimum possible score from the participant's score and the maximum possible score and I need him to confirm some of those mins/maxes (letting you know in case that does have some effect I'm unaware of).
Let me know what you think and if it'd be easier to meet about it I'd be happy to set up a time.
Thanks!
Jeremy