-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python computing supplement #24
Comments
We would make a For libraries... ¯\(ツ)/¯ I'd like to avoid extra complexity but would defer to the python community for those decisions. |
I would also like to work on python supplement. @bmreiniger may we collaborate on this? |
I'll also toss in my hat for a collaboration on sklearn/python code. Could be a fun project! |
I'd like to work on this too. I have a decent knowledge of python for ML (Kaggle notebooks GM). As already mentionned it is difficult to imagine working without pandas sklearn and matplotlib. If plotnine is mentionned to replace matplotlib, I should mention polars that has a grammar closer to the tidyverse and is significantly better than pandas. |
There is another plotting option, lets-plot |
I too would like to work on the python code, @bmreiniger lets collaborate on this? |
I'll recant my previous statement:
Use whatever libraries you see fit. We use a ton of R packages to make the book (that's the way R is); use anything that you think makes the best results. |
I suggest creating a starter repo using the structure and styling of the |
Also, I can export the data sets to a more suitable format to Python to ingest. What do you suggest? csv? |
I'll probably be more useful on content, but I have a little site deployment experience; when I get some time I'll draft something. If anybody else knows more and/or has more time, jump in. My first thoughts:
As for data format, csv is probably fine. At least until something comes up to suggest otherwise. On plotting, I'd lean toward starting out with matplotlib (and using the plotting functionality of pandas and sklearn), and if anyone can make much nicer plots much easier with another package, then make a PR for us all to look at. Similarly, I'd start with pandas, but if @lcrmorin or others can make something look nicer (or much faster, even for the toy datasets I imagine we'll have here?) using polars then let's see that and decide together. |
A (very) rough demo for option 1: https://bmreiniger.github.io/aml4td-demo-computing-python/chapters/whole-game.html |
I like that! Sphix-gallery from option 3 looks nice as well but this is an area I'm not well versed in so I don't have a strong opinion. On the subject of plotting libraries, another option I'm fond of is using the Seaborn objects API: https://seaborn.pydata.org/tutorial/objects_interface.html This allows one to approximate a ggplot-like grammer of graphics using method chaining. As it says in the docs, it's still early in development but might be worth trying out. |
We've experimented with side-by-side R/python code and I've never seen it work all that well. I think that it should be Python only. Based on other things that I've done, many of the people consuming the main site and these computing pages are not going to be well versed in Python or R. We'll need to strike a balance between helpful content for beginners and more experienced readers (including "how to install" docs). That said, I think that @bmreiniger's options 1 and 2 are good 9but I've never seen Sphix-gallery until now and don't know if that works with Quarto). |
The demo looks good! There are some nice Posit Python packages for tables and interactivity and many others unrelated to Posit (obviously).
I was asked to discuss a PR or maybe a pip about this pre-pandemic. ¯\(ツ)/¯ There will be a lot of inconsistencies where R or Python have different (or more extensive) capabilities. It doesn't have to be a perfect reproduction of what is on the main site. |
Best way to go is usually to stratify by |
I think Sphinx would be instead of Quarto. I'd like to put the same sort of demo together for that, but I suspect it'll end up being similar amount of setup/work, with a very slight benefit of being pure .py scripts, and the detriment of being styled very differently from the rest of the project (barring a lot of work in defining a sphinx style/template). I had some trouble getting renv set up, but now have a working demo of R+python in tabsets. So it seems approach (1) is probably best, and I'll try to clean it up, complete with a python env. (Maybe I'll still demo sphinx for the sake of having done it.) So, another early question: which environment manager? I'd suggest conda or Pipenv; I find conda more intuitive, and Pipenv more rigorous. |
I want to keep the repos on Quarto just so that they are in one format. :-/ You can use Jupyter notebooks or basic Python chunks; you won't need R for anything. |
I've got a start in my org, if folks want to collaborate there. Ideally at some point it'd get moved under the aml4td org (with a name change)? |
I'd be interested in helping with a python computing supplement.
Did you have a format in mind? It seems likely that after the setup section, most sections could be tightly coupled between the R and python versions, which suggests maybe having two independent repositories isn't ideal? I think Quarto supports panelsets (as "tabsets"); that strikes me as a nice way to display the two, but also would mean both codes should be updated when a change is made.
One other thing that would be nice to decide on early: which python plotting library to use? plotnine mimics ggplot, matplotlib is already used by sklearn+pandas, others are slicker...
The text was updated successfully, but these errors were encountered: