Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host the Python docs on ReadTheDocs #5

Open
JulienPalard opened this issue May 11, 2021 · 40 comments
Open

Host the Python docs on ReadTheDocs #5

JulienPalard opened this issue May 11, 2021 · 40 comments

Comments

@JulienPalard
Copy link
Member

From time to time the discussion arise about moving docs.python.org to readthedocs, there's even an experimental python.readthedocs.io.

I think there's many pros and cons of using readthedocs.

I'd personally be in favor of it if we (the PSF) support them (the cpython Doc is a big one with a lot of traffic, it take around 24h of CPU to build all versions × languages with all PDF A4, PDF letter, HTML, plaintext, epub).

In the other hand it's not an easy task, among other things docs.python.org is not only about generating the docs but also hosting history:

@JulienPalard
Copy link
Member Author

ping @pradyunsg because you're speaking a lot about readthedocs :)
ping @humitos because you're from readthedocs :)

@humitos
Copy link
Contributor

humitos commented May 11, 2021

Disclaimer: I work at Read the Docs.

I think there's many pros and cons of using readthedocs.

I'm not going to start selling Read the Docs because that's not my topic 😄 --Currently, I'm more interested in making sure that "the migration is possible from the technical aspect" and mention some notes to consider.

I'd personally be in favor of it if we (the PSF) support them

What kind of support do you refer to here? Take into account that Read the Docs by default adds EthicalAds on the documentation it hosts. I suppose there may be some different opinions about this and I think it worth having a conversation about the different possibilities we can explore.

the cpython Doc is a big one with a lot of traffic, it take around 24h of CPU to build all versions × languages with all PDF A4, PDF letter, HTML, plaintext, epub

We are currently using Read the Docs for the Spanish translation of the CPython documentation. It takes ~500 seconds to build the HTML with our smallest builder (2 CPU, 2Gb RAM, -j auto). It takes ~500 seconds more to build the PDF

Each commit merged into an active branch from English, should trigger a re-build for all the translations to update them as well. So, one single merge will trigger 1 (version) x (8 languages) = 8 builds * ~1000 seconds = ~2 hours (is my math correct? 🤔 )

Edit: I put 8 versions originally, but that's not correct. Only one version is required to be built. The version where the merge was done. If it was done in 3.10, only 3.10 + its translations need to be re-built.

Edit: the unit of the result was incorrect. I changed "2 minutes" by "2 hours" thanks to @hugovk for noting this.

Take into account that translations may require some extra work to be able to clone CPython repository to get the source files. In the Spanish translation we are using a git submode for this and then changing the Sphinx.srcdir to point to it in the conf.py file.

Note that Read the Docs only support one PDF output file. So, all the pages are together in one big PDF file, instead of split into the tutorial and others as the current behavior. There is an issue to discuss supporting multiple PDF output at readthedocs/readthedocs.org#2045

among other things docs.python.org is not only about generating the docs but also hosting history:
https://docs.python.org/release/ (this is all the tagged commits in the cpython repo)

This page looks like a simple index of a directory containing all the versions. There is not a direct match of this to a feature on RTD that allows this listing. I'd suppose that you may need to create an RST file or include an HTML page that lists all the current releases.

Also, a URL like /release/ is not supported either. It needs to have a prefix of /lang/version/ on a multi-version project. However, a redirect could be created: /release/ -> /en/stable/release/.

https://docs.python.org/2.0/ (yes we keep old versions, and I'm not willing to see this disappear)

This shouldn't be a problem if there is a way to re-generate them (e.g. running sphinx-build from branch 2.0 works). Unfortunately, Read the Does does not support uploading pre-built documentation. See readthedocs/readthedocs.org#1083

To summarize, I think there some initial things bumped here:

  • Ads on documentation
  • building time (resource usage)
  • multiple PDF output
  • plaintext format not supported
  • build old releases
  • page listing all versions
  • translations may need extra work

Hopefully, this helps to understand some of the technical requirements and provides some value regarding the work effort required conversation.

@hugovk
Copy link
Member

hugovk commented May 11, 2021

Each commit merged into an active branch from English, should trigger a re-build for all the translations to update them as well. So, one single merge will trigger 1 (version) x (8 languages) = 8 builds * ~1000 seconds = ~2 minutes (is my math correct? 🤔 )

~8,000 seconds = ~133 minutes = ~2h13m

Edit: I put 8 versions originally, but that's not correct. Only one version is required to be built. The version where the merge was done. If it was done in 3.10, only 3.10 + its translations need to be re-built.

~1,000 seconds = ~17 minutes

@ericholscher
Copy link

ericholscher commented May 11, 2021

Also, a URL like /release/ is not supported either. It needs to have a prefix of /lang/version/ on a multi-version project. However, a redirect could be created: /release/ -> /en/stable/release/.

We can mostly support custom URL's like this now. It's a beta feature, but if it's a blocker we can certainly manage it. That said, we don't currently support directory indexing, but we could create a version of the page thats branded properly in the docs theme.

What kind of support do you refer to here? Take into account that Read the Docs by default adds EthicalAds on the documentation it hosts. I suppose there may be some different opinions about this and I think it worth having a conversation about the different possibilities we can explore.

Just to be explicit, we're happy to host the Python docs without ads. I know there has been some discussion about having PSF sponsors included on the pages, and we've had separate conversations with the PSF team about us enabling that with our open source EthicalAds platform. We can definitely discuss having an additional business relationship with the PSF for hosting the docs, but I don't consider it a prerequisite. We get a lot of value out of Python, and are happy to support the projects we use heavily ad-free if needed (we already do this with Sphinx).

Unfortunately, Read the Does does not support uploading pre-built documentation. See readthedocs/readthedocs.org#1083

We could upload pre-built docs if they don't ever change as a one-off. It's not supported on the platform, but we can manually do it if needed.

I'm not going to start selling Read the Docs because that's not my topic 😄 --Currently, I'm more interested in making sure that "the migration is possible from the technical aspect" and mention some notes to consider.

Agreed. In the past there were some technical issues that would have prevented this, but I think we've solved most of them. We can do a bit more work to fully replicate the docs.python.org setup with existing URL's on our side if you'd like to see a tech demo, but we'd hope to do that after there was an agreement on moving to RTD from the team.

My primary non-technical selling point to hosting docs on RTD is that any work that goes into building features will now be shared with the entire Python community via RTD, instead of work done on a custom docs hosting tool. Similarly, any work the community does to improve RTD will automatically flow back to Python itself. Hopefully these additional resources will make doc hosting better for the whole ecosystem.

Please let us know if you have any questions about the product features or feasibility of hosting, as this progresses.

@pradyunsg
Copy link
Member

ping @pradyunsg because you're speaking a lot about readthedocs :)

Only because a lot of what the docs build scripts are doing, is something provided by RTD as well. 😅

And RTD has the benefit of being a platform, and thus being able to provide things like PR previews.

@ericholscher
Copy link

Just wanted to note that this might also be worth thinking about while we're doing #1. It would be awesome if we could get a new theme and not have to implement all the various Python-specific code to support it. We'd be interested in helping with this, if it was of interest.

Not sure what kind of 👍 we'd need to move forward on it, but I'd love to get more effort going into RTD integration instead of 1-off integrations of custom build logic. We've made good progress on a few of the above issues, and would be happy to do custom stuff for Python in the places where we don't have platform support.

@encukou encukou changed the title ReadTheDocs Host the Python docs on ReadTheDocs Mar 14, 2022
@hugovk
Copy link
Member

hugovk commented Jan 19, 2024

I think it's a great idea to host the docs on Read the Docs:

  • transparency of builds for all, no need to bug someone to SSH in to a server
  • PR previews (we're using an RTD site just for this, it's extremely useful)
  • can have builds for each merge to the branch, no need to wait 24h+ for the cron
  • the RTD team are very responsive and helpful
  • RTD is open source and Python

We've been discussing this in docs-community monthly meetings and Discord, and there's seems to be support on moving forward.


We have a large build matrix of languages. versions, and output formats (HTML, PDF, EPUB, etc). I would suggest an incremental approach, starting with just English, dev and and HTML.

Once we're happy with it, we can remove that particular build from the server and add another language/version/output.

Perhaps doing English+HTML first for the main builds (currently /dev, /3 and maybe /3.11) will give the biggest initial benefit. Then other languages+HTML. It might be that we keep some builds on the server for much longer, like PDF/EPUB etc. I think this is fine, the reduced load of moving HTML builds off the server will improve the queue time for other builds.


More concretely, I think the first thing to do is deal with the HTML language/version switcher. Currently this is added on the server builds via https://github.com/python/docsbuild-scripts.

We'll need a similar switcher for RTD builds.

I don't think it necessarily needs to be identical to the legacy one, but should be similar because people will be switching to/from the old and new sites.

We would develop it on something like https://cpython-previews.readthedocs.io (which is used for PR previews) and when happy, switch https://docs.python.org to it, stop the old server HTML build, and move to the next increment.

Thoughts?

@JulienPalard
Copy link
Member Author

Thoughts?

I don't know how RTD survives financially today (yes I know it has been very hard).

Building more often that we build now (for every push, with PDF US letter, PDF A3, epub, txt, downloadable HTML) will take a lot of resources, more than the 4 virtual CPU that we already use 24h/24h.

I also don't know how RTD pays for the bandwidth, but I bet docs.python.org consumes some bandwidth (@ewdurbin any numbers?), it's probably nothing compared to PyPI but yet, let's check that. PSF side the bandwidth is sponsored by Fastly, but RTD side I don't know...

I think that the PSF and RTD should talk about money here: the PSF have to ensure, at the very least, that we don't penalize RTD. But it's not enough, I think the PSF should back up RTD in case RTD looses a vital sponsor (CPU/bandwidth provider or something) to ensure the future. @ericholscher @ewdurbin.

My non-financial view on this: RTD has proven they are the Python documentation hosting platform. Using RTD will make cpython a good citizen of the ecosystem. In other words, There should be one-- and preferably only one --obvious way to do it., and it's clearly not our little snowflake.

@hugovk
Copy link
Member

hugovk commented Jan 19, 2024

Building more often:

If we don't want to build every merge, we can write a custom command to skip builds. For example, we should only do so for Doc/ changes, and could come up with extra logic to limit builds.

Docs: https://docs.readthedocs.io/en/stable/build-customization.html#cancel-build-based-on-a-condition

Some numbers:

During the 30-day Plausible trial, we had 6.5 million total visits and 10.9 million pageviews to the 3.11 and 3=3.12 English sites.

@AA-Turner
Copy link
Member

We could feasibly add a proxy/cache via Fastly to read the docs, we do this for PEPs. That might also solve keeping /2.0/ &c.

A

@ncoghlan
Copy link

Echoing support for the proxy/cache concept, I don't think there was ever a philosophical objection to hosting on RTD, just technical and sustainability questions. Both the PSF and RTD are much better equipped these days to have the sustainability discussion directly between PSF and RTD staff than either were back when RTD was still relatively new, which leaves the technical side of things for the docs community to consider.

If the existing Fastly + Nginx docs endpoint is retained, with just the build and hosting of the version specific docs for actively maintained versions shifted to RTD, then it may be feasible to migrate new docs builds without needing to worry about too many one-off CPython-docs-specific hosting capabilities in RTD (instead leaving those in the existing PSF infrastructure).

I do think it would be important to be explicit about the intended benefits of the hosting change, though. As far as I am aware, the main CPython docs builds are still just periodic (daily?), which would make the commit triggered builds in RTD one of the biggest improvements on offer.

@methane
Copy link
Member

methane commented Jan 23, 2024

Another benefit of using Read the Docs is the server side search.

@ericholscher
Copy link

Just wanted to chime in here from the RTD perspective. RTD has a similar relationship with sponsored CDN hosting, where we aren't paying for bandwidth. So we don't have much worry about traffic or CDN support there. If it's a benefit to y'all to be able to put your own CDN in front, that's something we can support, but we already have our CDN configured to purge automatically on build, redirects, and other settings change, which won't flow through. That's likely not a huge deal, but might lead to delays in content changes, so something to consider.

I don't want to block this discussion on sustainability -- RTD is able to host the Python docs without any specific payment -- as noted above we're in a much better place these days, with our server costs and CDN sponsored and a full-time team working on the platform. We would still love some kind of support from the PSF, but it's not required. We'd just ask that the fact that RTD is hosting it is shown somewhere on the pages, perhaps in the footer.

Overall, we're excited to be able to host the official Python docs, and want to work with y'all to make it successful. I know @humitos has already chimed in here, but we're both available to answer any questions that y'all might have, as needed.

@ewdurbin
Copy link
Member

Thanks for jumping in @ericholscher.

From PSF infra side, I'm a huge +1 on moving away from bespoke docs build infra as much as is possible. Given that RTD's CDN is sponsored as well I see no reason to involve our CDN in front of RTD's CDN (except perhaps as a mechanism for transition).

From PSF general side, we always like to acknowledge our hosting providers so assuming it is doable from the docs side of things adding a hosted by is 100% fine. Additional considerations would be another conversation with our Director of Resource Development. I'm happy to start that conversation once we have established a technical plan and committed to RTD.

@ewdurbin
Copy link
Member

ewdurbin commented Jan 23, 2024

The only "hmmmmmm" I can think of off top of head is that we currently have good access-logging of docs.python.org requests that have been used in the past for some analysis of "top pages" and "top content" particularly by translations teams. Does RTD provide any insight into traffic?

Edit: Yes. With a 30 day lookback. I believe this is generally sufficient.

@encukou
Copy link
Member

encukou commented Jan 24, 2024

we currently have good access-logging of docs.python.org requests that have been used in the past for some analysis of "top pages" and "top content" particularly by translations teams

Hmm, do we? Hugo & co. recently ran a Plausible trial to get some metrics, after failing to get them from server logs.
If b.p.o metrics are better but getting them to the relevant people involves asking busy server admins, I'd say basic monthly stats would be an improvement :)

@ewdurbin
Copy link
Member

ewdurbin commented Jan 24, 2024

we currently have good access-logging of docs.python.org requests that have been used in the past for some analysis of "top pages" and "top content" particularly by translations teams

Hmm, do we? Hugo & co. recently ran a Plausible trial to get some metrics, after failing to get them from server logs. If b.p.o metrics are better but getting them to the relevant people involves asking busy server admins, I'd say basic monthly stats would be an improvement :)

Yes, currently Fastly logs are streamed to a server in real time for "right now" analysis and rotated off after 7 days. All logs are archived in segments to S3 for three years.

Getting some basics from the real-time is pretty straightforward, we have only used the archives a couple times (more often for other projects like python.org).

If b.p.o metrics are better but getting them to the relevant people involves asking busy server admins, I'd say basic monthly stats would be an improvement :)

I agree! I will investigate automating pulling and storing the basic stats long-term if/when we switch over to RTD.

@hugovk
Copy link
Member

hugovk commented Jan 24, 2024

From PSF general side, we always like to acknowledge our hosting providers so assuming it is doable from the docs side of things adding a hosted by is 100% fine.

This should be doable from the docs side. The footer is defined in python-docs-theme, we can add a variable that adds the acknowledgement for RTD builds.

@ewdurbin
Copy link
Member

This should be doable from the docs side. The footer is defined in python-docs-theme, we can add a variable that adds the acknowledgement for RTD builds.

If I recall from a past discussion of adding sponsor logos to the docs, the rub there is that we only would want to include it in the online builds and not the offline builds. Is that possible?

@hugovk
Copy link
Member

hugovk commented Jan 24, 2024

Good point, we can use the READTHEDOCS environment variable or define our own in RTD builds.

@hugovk
Copy link
Member

hugovk commented Jan 26, 2024

Please see PR to add a "Hosted by" link to the theme's footer if a variable is defined: python/python-docs-theme#165.

@ericholscher
Copy link

we currently have good access-logging of docs.python.org requests that have been used in the past for some analysis of "top pages" and "top content" particularly by translations teams

Hmm, do we? Hugo & co. recently ran a Plausible trial to get some metrics, after failing to get them from server logs. If b.p.o metrics are better but getting them to the relevant people involves asking busy server admins, I'd say basic monthly stats would be an improvement :)

Yes, currently Fastly logs are streamed to a server in real time for "right now" analysis and rotated off after 7 days. All logs are archived in segments to S3 for three years.

Getting some basics from the real-time is pretty straightforward, we have only used the archives a couple times (more often for other projects like python.org).

Of note, RTD does have some basic analytics functionality built in. It's not got a ton of features, but it might be useful for the basics like "what are the top pages people are reading?": https://docs.readthedocs.io/en/stable/analytics.html

humitos added a commit to readthedocs/cpython that referenced this issue Mar 18, 2024
Integrate the new Read the Docs Addons JavaScript into the Python Docs Sphinx
theme to render versions and languages selector nicely.

References:

* Discord thread: https://discord.com/channels/935215565872693329/1159601953265942589
* Implementation of Addons JavaScript `CustomEvent`: readthedocs/addons#64
* Conversation about using Read the Docs: python/docs-community#5
@jayaddison
Copy link

There's no sense of urgency here, but to avoid missing the boat: could I request a stay (also known as: a delay, a pause) on any decision here for another year (+1y) or so?

After getting involved in Sphinx development a year or so ago (-1yr), I think I'm nearing effective-contributor status for the client-side HTML search functionality it contains - and which could be considered a competitor to what I perceive as the most clearly-stated and agreed-with benefit to RTD that I find in this thread so far (server-side search).

Given another six months I believe it'll be possible to improve the performance and relevance of client-side search in Sphinx noticeably, without sacrificing user resource consumption (bandwidth, CPU, battery-life, user-experience), privacy, or the ability for enthusiasts to inspect, learn from and/or tweak the code locally or themselves. Another six months beyond that would allow time for evaluation, feedback, adjustments and decision-making.

@hugovk
Copy link
Member

hugovk commented Jul 18, 2024

Hello, great to hear Sphinx search is being improved and thank you for your work on Sphinx! But I'm not too keen on delaying by a year because the docs build is currently very slow: python/docsbuild-scripts#169 from October talks of ~27 hours for a complete rebuild.

I recently set up a repo to monitor time between each version's HTML deploy, and it's now stretching up to 50 hours:

Time between deploys; last one is time since last deploy:
Version 3.12: ['38.8 hours', '39.6 hours', '58.5 hours', '39.7 hours', '39.2
hours', '41.3 hours', '45.0 hours', '42.8 hours', '41.0 hours', '38.1 hours',
'44.0 hours', '44.1 hours', '40.9 hours', '47.0 hours', '46.4 hours', '45.2
hours', '43.0 hours', '46.5 hours', '48.0 hours', '49.1 hours', '42.2 hours',
'4.6 hours ago']
Version 3.13: ['39.0 hours', '40.4 hours', '57.5 hours', '40.4 hours', '40.1
hours', '42.6 hours', '44.2 hours', '42.2 hours', '39.7 hours', '40.3 hours',
'43.7 hours', '44.4 hours', '40.7 hours', '47.8 hours', '46.1 hours', '44.4
hours', '43.6 hours', '47.4 hours', '49.1 hours', '47.5 hours', '31.1 hours
ago']
Version 3.14: ['36.3 hours', '39.2 hours', '37.7 hours', '24.4 hours', '37.2
hours', '39.0 hours', '41.1 hours', '44.6 hours', '42.6 hours', '42.1 hours',
'37.1 hours', '43.1 hours', '43.9 hours', '43.5 hours', '43.1 hours', '47.6
hours', '45.5 hours', '42.9 hours', '45.2 hours', '47.5 hours', '49.7 hours',
'46.0 hours', '16.4 hours ago']

I disagree that server-side search is the most-wanted benefit in this thread (it was mentioned once), for me it's definitely build and deploy time.

@jayaddison
Copy link

Thank you for the context @hugovk - I'll follow along with the PDF build-time concern and will offer any assistance if I can (although I'll also be cautious not to get in the way or distract from existing progress on it; neither PDF nor PDF in Sphinx are areas of expertise for me).

@humitos
Copy link
Contributor

humitos commented Jul 18, 2024

Also note that hosting the documentation on Read the Docs doesn't mean that you have to use the Read the Docs search, you can stick with the Sphinx's default search if it works better for you. Even, you can start with Read the Docs search first and then swap to the Sphinx's default search once it's improved as well.

Regarding with the migration, do we have a set of "next steps" already defined? Let me know if there is anything else I can do to help with this process.

@hugovk
Copy link
Member

hugovk commented Jul 19, 2024

First a summary of where we're up to:

  • We have the 3.11-3.13 and main branches building on RTD at https://cpython-previews.readthedocs.io and they automatically show in the version selector, via an RTD API.

  • We've linked a Spanish translation project on RTD to the main one, and it similarly shows in the language selector.

Next steps:

  • The version selector at https://docs.python.org goes all the way back to 2.6. I doubt it's worth importing all those to RTD (some might not even build), and they're no longer updated (2.6-3.7) or rarely updated (3.8-3.11).

  • And similarly for languages, we'll still want to link to the existing ones before importing them to RTD, so we can do an incremental rollout.

  • Do we need something to define which language/versions are on RTD, and which are not?

@jayaddison
Copy link

I'd like to apologize for my recent comment where I misrepresented the gist of this thread so far, and also to withdraw my request for a pause.

I did not take the time that I should have done to read and understand the existing commentary, and I am sorry for that. I'd like to be a constructive and collaborative contributor, and will unsubscribe and distance myself from this thread until I can learn from that mistake and figure out how to proceed.

@hugovk
Copy link
Member

hugovk commented Jul 20, 2024

@jayaddison No problem whatsoever! You're very welcome to contribute here, and I greatly appreciate your work on Sphinx, and across the Python ecosystem. 👍

@humitos
Copy link
Contributor

humitos commented Jul 22, 2024

And similarly for languages, we'll still want to link to the existing ones before importing them to RTD, so we can do an incremental rollout.

I can help you with this by opening a PR that adds hard-coded URLs in https://github.com/python/cpython/blob/main/Doc/tools/static/rtd_switcher.js#L27. Sounds good?

Do we need something to define which language/versions are on RTD, and which are not?

I'd start by hosting one version/language on Read the Docs first and keep all the other ones in the current hosting. This will minimize any initial risk and will help maintainers to get familiar with the platform (general usage, build times, build output logs, workflow, etc) before performing the full migration.

I'm sure during this period there will be different opinions (good and bad ones 😅 ), limitations found, workflow changes and others that we will need to continue talking, adapting and documenting as well. However, once the initial setup is done, all the following versions and translations will be a lot easier to add.

In case we follow this idea of having one version/language on Read the Docs, should we setup cpython.readthedocs.io (or python.readthedocs.io) and create a proxy on your side from docs.python.org/{language}/{version}/ to point the cpython.readthedocs.io/{language}/{version}/ URL? Does this make sense?

Example: let's say we define Spanish 3.12 to be hosted on Read the Docs. Then https://docs.python.org/es/3.12/ should proxy to https://cpython.readthedocs.io/es/3.12/

Then we can keep adding more versions/languages using the same technique. Finally, when we are all migrated, we can just add docs.python.org as a canonical domain on Read the Docs' project and there won't be need to keep the proxy up anymore.

@hugovk
Copy link
Member

hugovk commented Jul 23, 2024

@humitos Sounds good, thanks! I definitely agree we want a gradual rollout, starting with one version/language.

And similarly for languages, we'll still want to link to the existing ones before importing them to RTD, so we can do an incremental rollout.

I can help you with this by opening a PR that adds hard-coded URLs in python/cpython@main/Doc/tools/static/rtd_switcher.js#L27. Sounds good?

Yes, or would it be better to have the URLs in a file hosted outside the CPython repo? That way we don't need to worry about backporting the list to each version branch, which we try and avoid for security-only branches.

https://github.com/python/devguide/blob/main/include/release-cycle.json is a good candidate for this (it's used to generate the chart and tables at https://devguide.python.org/versions/).

In case we follow this idea of having one version/language on Read the Docs, should we setup cpython.readthedocs.io (or python.readthedocs.io) and create a proxy on your side from docs.python.org/{language}/{version}/ to point the cpython.readthedocs.io/{language}/{version}/ URL? Does this make sense?

Hmm, these names are already taken but abandoned:

Do you think we should rename the existing cpython-previews (used for PR previews) to one of those, or set one of those up as a new account?

But yes to a proxy, when we have the first one ready to go. So let's get a demo working with a single version/language still on RTD first, test that out, then proxy it.

@humitos
Copy link
Contributor

humitos commented Jul 23, 2024

Yes, or would it be better to have the URLs in a file hosted outside the CPython repo?

Cool. I will try to find some time this week to open a PR and we can discuss there the best approach so we don't deviate this issue this technical details.

Hmm, these names are already taken but abandoned:
readthedocs.org/projects/python - by some guy named Eric :)

Heh, yeah, this was created as a mirror and as a way to reserve the name as well. I'm sure we can free this slug and use it for Python itself 😄 . I will ask Eric just in case, tho.

readthedocs.org/projects/cpython - by a third party

Indeed, this project is abandoned and we can follow https://docs.readthedocs.io/en/stable/abandoned-projects.html to recover the slug.

In case both are available, which one do you prefer: cpython or python?

Do you think we should rename the existing cpython-previews (used for PR previews) to one of those, or set one of those up as a new account?

We can, yes. It will require re-building all the versions and languages after changing the slug; but I think that's not an issue.

But yes to a proxy, when we have the first one ready to go. So let's get a demo working with a single version/language still on RTD first, test that out, then proxy it.

Excellent 💯

humitos added a commit to readthedocs/docsbuild-scripts that referenced this issue Jul 23, 2024
_Note this is just a proof of concept to start the conversation._

While working on the idea to generate the version/language selectors and combine
them with the data fetched from Read the Docs API, I realized that if we are
migrating only _one_ version to Read the Docs and using a proxy to serve it at
the official `docs.python.org` domain, there is no need to use a different
version/language selector at all --since all the URLs will be the same and all the
JavaScript logic will be the same. The proxy will do the magic to redirect to
Read the Docs _only_ the versions configured in the proxy [^1].

However, since when building on Read the Docs the variables `VERSIONS` and
`LANGUAGES` are not passed, we need to populate them dynamically with JavaScript
when the page is served.

Do we have a JSON file from where we can populate the `LANGUAGES` variable?

I'm opening a PR here to show what I'm thinking and discuss if this is the
approach we want to follow. BTW, the code is not tested. I just wrote it as an
example to show what I'm thinking is the direction.

Related:
- python/python-docs-theme#193
- python/docs-community#5

[^1]: Once all the versions/languages are migrated to Read the Docs, we won't
require the `release-cycle.json` nor other file to populate the `LANGUAGES`
variable because this data will come from Read the Docs Addons API.
humitos added a commit to readthedocs/docsbuild-scripts that referenced this issue Jul 23, 2024
_Note this is just a proof of concept to start the conversation._

While working on the idea to generate the version/language selectors and combine
them with the data fetched from Read the Docs API, I realized that if we are
migrating only _one_ version to Read the Docs and using a proxy to serve it at
the official `docs.python.org` domain, there is no need to use a different
version/language selector at all --since all the URLs will be the same and all the
JavaScript logic will be the same. The proxy will do the magic to redirect to
Read the Docs _only_ the versions configured in the proxy [^1].

However, since when building on Read the Docs the variables `VERSIONS` and
`LANGUAGES` are not passed, we need to populate them dynamically with JavaScript
when the page is served.

- Populate `all_languages` in the same way.
  Do we have a JSON file from where we can populate the `LANGUAGES` variable?
- Move this `switchers.js` file to
  https://github.com/python/cpython/tree/main/Doc/tools/static

----

I'm opening a PR here to show what I'm thinking and discuss if this is the
approach we want to follow. BTW, the code is not tested. I just wrote it as an
example to show what I'm thinking is the direction.

Related:
- python/python-docs-theme#193
- python/docs-community#5

[^1]: Once all the versions/languages are migrated to Read the Docs, we won't
require the `release-cycle.json` nor other file to populate the `LANGUAGES`
variable because this data will come from Read the Docs Addons API.
humitos added a commit to readthedocs/docsbuild-scripts that referenced this issue Jul 23, 2024
_Note this is just a proof of concept to start the conversation._

While working on the idea to generate the version/language selectors and combine
them with the data fetched from Read the Docs API, I realized that if we are
migrating only _one_ version to Read the Docs and using a proxy to serve it at
the official `docs.python.org` domain, there is no need to use a different
version/language selector at all --since all the URLs will be the same and all the
JavaScript logic will be the same. The proxy will do the magic to redirect to
Read the Docs _only_ the versions configured in the proxy [^1].

However, since when building on Read the Docs the variables `VERSIONS` and
`LANGUAGES` are not passed, we need to populate them dynamically with JavaScript
when the page is served.

**ToDo**:

- Populate `all_languages` in the same way.
  Do we have a JSON file from where we can populate the `LANGUAGES` variable?
- Move this `switchers.js` file to
  https://github.com/python/cpython/tree/main/Doc/tools/static

----

I'm opening a PR here to show what I'm thinking and discuss if this is the
approach we want to follow. BTW, the code is not tested. I just wrote it as an
example to show what I'm thinking is the direction.

Related:
- python/python-docs-theme#193
- python/docs-community#5

[^1]: Once all the versions/languages are migrated to Read the Docs, we won't
require the `release-cycle.json` nor other file to populate the `LANGUAGES`
variable because this data will come from Read the Docs Addons API.
@humitos
Copy link
Contributor

humitos commented Jul 23, 2024

After working on python/docsbuild-scripts#179 and finding myself blocked there, I took a deeper look at the PR that moves the selectors into the theme (python/python-docs-theme#193) and generate them at build time.

Since we talked about start serving one version on Read the Docs first and configure a proxy on docs.python.org to serve that version directly from Read the Docs, there is no need to use Read the Docs Addons API (yet) to generate the selectors as we are currently doing in https://cpython-previews.readthedocs.io/en/main/ because we won't have all the available versions hosted on Read the Docs and the API won't return all of them.

Something like the PR that generate the selectors at build time using external files (.json for the versions and .toml for the languages) would be enough to build on Read the Docs at this time and also building using the current docsbuild-script as well with the correct selectors.

Here is my proposal to continue with the migration of the documentation to Read the Docs in gradual phases:

Migration proposal to host docs on Read the Docs

Phase I: initial version migrated

  1. Setup cpython or python project on Read the Docs.
  2. Merge the PR that generate the switchers at build time (Add language and version switchers python-docs-theme#193).
  3. Build one version on Read the Docs and check that everything is fine (e.g. 3.10).
  4. Configure the proxy on docs.python.org to serve that version from Read the Docs. This is https://docs.python.org/3.10/ should proxy to https://cpython.readthedocs.io/en/3.10/ for example.

With that, we should have everything in place for the initial tests.

Phase II: add another version

Once we are happy with the previous tests, we can add more versions/languages to Read the Docs:

  1. Activate a new version on Read the Docs to build it (e.g. 3.11).
  2. Configure the proxy to point the new version to be served from Read the Docs.

Phase III: add another language

If everything is going well with the versions added, we can follow with translations/languages:

  1. Setup cpython-{language} project on Read the Docs.
  2. Activate and build all the versions we want for this translation.
  3. Configure the proxy to point the new language to be served from Read the Docs.

Phase IV: final

At some point we will have added all the versions and languages to Read the Docs and there won't be more builds happening outside Read the Docs.

  1. Remove the code to generate the selectors at build time and use the Read the Docs Addons API directly (see https://github.com/python/cpython/blob/main/Doc/tools/static/rtd_switcher.js)
  2. Setup docs.python.org as canonical domain on Read the Docs
  3. Delete configuration for the proxy

Let me know if this proposal makes sense or you think we should adapt it. I tried to make one step at a time to be able to revert the changes if something goes in an unexpected way. I'm happy to receive feedback and work together with any modification you may have to the proposal.

@hugovk
Copy link
Member

hugovk commented Jul 24, 2024

Thanks, this sounds like a good plan. 👍

(Little nitpick, although I know the examples versions are just to illustrate :) 3.10 and 3.11 are in security-only phase, so we try not to touch them. 3.12 aka /3/ is the most visited, so best not to start with that, probably dev version 3.14 first, then pre-release 3.13.)

@ewdurbin
Copy link
Member

one consideration is the long tail legacy documentation hosted on docs.python.org. there’s quite a bit.

it seems certainly preferable to move to hosting to RTD for the whole domain due to cache invalidation alone.

is there any way for us to host the older versions on RTD or will we need to keep fronting with Fastly to split traffic between branches built on RTD and the rest?

many of the files on the backends are probably safe to consider static. if we moved them to an object store like AWS S3, is it possible to have RTD handle routing to them?

@humitos
Copy link
Contributor

humitos commented Jul 25, 2024

is there any way for us to host the older versions on RTD or will we need to keep fronting with Fastly to split traffic between branches built on RTD and the rest?

There are a few possibilities here that we can explore if needed.

I started creating a proof of concept with the idea I consider is the best approach. It consists on downloading the HTMLZip for the old "static" version we want to serve from Read the Docs, and manually upload it skipping the normal build process.

I followed these steps:

  1. Download the HTML .tar.bz21 version 3.9 for the Spanish translation2 from https://docs.python.org/es/3.9/download.html
  2. Upload its content to Read the Docs' S3 bucket next to the other versions of the project
  3. Manually create a new 3.9 version for the project in the database

The result can be seen at https://cpython-previews.readthedocs.io/es/3.9/


As it follows the standard Read the Docs pattern but only skipping the build process, everything keeps working as if the version would have been built on Read the Docs:

  • the flyout (and version selector) shows 3.9 as an available version
  • the 3.9 version is listed in the dashboard's list of versions: https://app.readthedocs.org/projects/cpython-previews-es/
  • it can be hidden from the dashboard in case you don't want to expose it to your users
  • it can be eventually de-activated from the dashboard to delete it permanently

The only differences are:

  • there is no build logs for this version
  • this version cannot be re-built3

Let me know if all of this makes sense to you and if you have any doubt.

Footnotes

  1. I'm not sure if this file is exactly the same version that is hosted at https://docs.python.org/es/3.9/ --if it's not, you can provide the correct one to upload

  2. I chose this version as an example, but it could be anyone

  3. In case rebuilding is strictly needed at some point in the future we can either follow this same approach to re-upload it, or reconfigure this version to be built on Read the Docs

@humitos
Copy link
Contributor

humitos commented Jul 25, 2024

probably dev version 3.14 first, then pre-release 3.13

Makes sense to me 👍🏼

@AA-Turner
Copy link
Member

I wanted to quickly write to apologise as I haven't had the time needed to give this a considered response yet -- I hope to do so on Monday.

A

@humitos
Copy link
Contributor

humitos commented Aug 16, 2024

@AA-Turner friendly ping 😊. I'd really like to know your thoughts about this migration plan

@m-aciek
Copy link

m-aciek commented Sep 8, 2024

Setup cpython-{language} project on Read the Docs.

Nit-picky comment to the migration plan: could the name for language repos be python-{language}? I believe this documentation documents more than just one implementation (contains ABI docs as well), so using python instead of cpython for identification shouldn't be an abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests