Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some tools for rendering Rmardown document in a capsule ? #1

Open
cderv opened this issue Jan 6, 2020 · 5 comments
Open

Add some tools for rendering Rmardown document in a capsule ? #1

cderv opened this issue Jan 6, 2020 · 5 comments

Comments

@cderv
Copy link

cderv commented Jan 6, 2020

Hello @MilesMcBain !

Happy new year ! Hope you're fine !

I open this issue to open a discussion on an interesting use-case.
Today, there was a question on RStudio community about using a renv lockfile per Rmd document to be able to render a document in its own locked environment with specific package.
https://community.rstudio.com/t/renv-lockfile-per-rmarkdown-doc/

I found this idea very interesting and it made me think about your {capsule} package. I willing to try protyping some functions to help do that, but I wanted first to have your thoughts regarding the work you've done on this package, what you encounter on a daily basis, and if capsule already brings a solution to this or not. It seems to me you could have only one lockfile per project for all document in a project, where libraries are all in the same packages.R files. I may be missing something though...

If you think it is interesting, do you capsule is a good place for such functions ? Or should it live elsewhere ?

As you could see in the small try I made, I think I am very close to the challenge you try to deal with.

I have some other ideas, and I think this would be very powerful to be able to have some companion lockfile (or even a way to embed it inside a Rmd - let's be audacious) and use it to render a Rmd Analysis in in its own temporary capsule without to have one project with a renv lib per Rmd document.

When you have some spare time, I am very interested to read you 💭 on this!

Thank you !

@MilesMcBain
Copy link
Owner

I am now finally starting to get interested in this idea too, now that I can snapshot quickly.

I've made the lockfiles generated by capshot() able to be minified, which might help. Although I am thinking it would be ideal to have a solution that would work in both HTML and pdf documents. The first thing that comes to mind is encoding the lockfile as an image.

@cderv
Copy link
Author

cderv commented Jul 16, 2021

Hi Miles,

I did not follow closely the why of capshot() and how it is implemented, so I am not sure to see which is the minified version of it. capshot_str() ? For now, I just read the post on it (https://milesmcbain.micro.blog/2021/07/15/unlocking-fast-rstats.html). I would have thoughts that playing with type argument in renv::snapshot() would be enough to get a faster version of lockfile creation adapted to your workflow. Or maybe playing with snapshot.type settings so that renv:::lockfile (unfortunately not exported) would help creating a lockfile quicker for your usage.

Anyway, regarding the embeding, did you see that renv as a new mechanism that could be adapted to Rmd file or even single script out of a project ? It is renv::use() - see article for usage: https://rstudio.github.io/renv/articles/use.html

There is also an experimental function called renv::embed() to help insert a lockfile into a document. For Rmd, it will insert lockfile as renv::use() call into a chunk.

I am just sharing this in case you don't know. It is always interesting to see what others do. I know you have the using package also that do a similar thing that renv::use() probably, but with a different mechanism and purpose.

Regarding the main topic here, how do you see things working by embeding as as image ? You would call capsule::run_*() or a new capsule::render_*() and it would find the lockfile to use ?

Initially, I had in mind something not necessarily embeded but an easy way of rendering a document in a capsule. Somthing like capsule::render(rmd_file, lockfile, ...=) or simply with capsule::run() where we could pass the lockfile or package.R script. However, something embeded could be more interesting for single Rmd file. There is always the limit of rmarkdown own deps that you need to run a chunk. renv::use() in a chunk works ok for this but I did not look at all the implication when run in interactive session (because temp lib dir is still active after rendering with rmarkdown::render()). When run in a background session or job (like with Knit button), it works quite well. I find it to be a good idea overall to the single file dependencies management.

It is a bit unstructured thoughts above but I am taking the opportunity of this issue for discussion on this.

@MilesMcBain
Copy link
Owner

MilesMcBain commented Jul 16, 2021

Thanks for the info Chrisophe!

So I've had a few performance issues with {renv} snapshotting I describe here: rstudio/renv#774

Part of it seems to be with using certain kinds of repos, e.g. r-universe, but even without those deps it's sluggish. I guess a lot of this is validation, and sometimes it makes network calls to do that.

Yeah using has similar syntax and does some similar things, but it still has the fundamental difference that it doesn't mess with your .libPaths(). I remember seeing an issue for renv::use but I didn't know it was implemented, so thanks for pointing that out.

Regarding rendering you're thinking very much in the same direction to me. Some function to render using a lockfile like capsule::render is a great idea.

I was talking about a different kind of embedding though. Not embedding a lock file to be used internally by {rmarkdown}/{knitr} as the library, but embedding the lockfile as a full description of the packages used in case that becomes useful in the future. For example, you have a result you need to reproduce following someone's report, but you get a slightly different output. You could make extracting and diffing a lockfile for your env vs the author's a one liner.

Thinking about this some more though, this is kind of moot if you have access to the author's source repository, since you could look at it as at the report date and read the lockfile from there if it was committed.

@MilesMcBain
Copy link
Owner

MilesMcBain commented Jul 16, 2021

renv::use() in a chunk works ok for this but I did not look at all the implication when run in interactive session (because temp lib dir is still active after rendering with rmarkdown::render()). When run in a background session or job (like with Knit button), it works quite well. I find it to be a good idea overall to the single file dependencies management.

Yes! So the way capsule::run() used to work was it hot swapped the libpaths for the current session - kind of like how I think the isolate option for renv::use works. This turned out to be a bit of a reproducibility issue since you could get 'leakage' between libraries. For example if you used renv::use() inside an Rmd, and you rendered that in the interactive session, then all of rmarkdown's deps from the original library paths will be the ones on the search path, and not any different versions specified inside the Rmd with use, since they won't be attached because that package is already attached.

I hope that makes sense. I eventually rewrote capsule::run to always run in a separate session to avoid this.

@MilesMcBain
Copy link
Owner

So actually this leakage issue is something you need to be careful of with use. Any code higher up in the script than the use call might attach packages that can't be reattached by use. So while it's fine to say 'always put it at the top of your script' you need to make sure the receiver of the script isn't doing anything in their R profile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants