Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-121970: Extract pydoc_topics into a new extension #129116

Merged
merged 8 commits into from
Jan 21, 2025

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Jan 21, 2025

This also simplifies the pydoc-topics builder. Grouping the topic labels by docname improves the speed of topics generation from ~68s to ~57s (19% faster) from a cold state, and from ~13s to ~3.4s (3.8x faster) when re-using the pickled documents.

The representation of topics.py also changes from the default pprint.pformat output of:

topics = {'assert': 'The "assert" statement\n'
           '**********************\n'
           '\n'
           'Assert statements are a convenient way to insert debugging '
           'assertions\n'
           'into a program:\n'

to a simpler representation using triple single quotes (save for when ''' appears in the body):

topics = {
    'assert': r'''The "assert" statement
**********************

Assert statements are a convenient way to insert debugging assertions
into a program:
'''

This representation is both nicer to read and is 63% of the file size of the current topics.py (518KB vs 830KB). Line count also decreases from 17,486 to 12,782.

Tested by running:

>>> import runpy
>>> topics_old = runpy.run_path("Doc/build/topics_old.py")['topics']
>>> topics_new = runpy.run_path("Doc/build/topics_new.py")['topics']
>>> assert list(topics_old) == list(topics_new) # check order
>>> [k for k in topics_old if topics_old[k] != topics_new[k]]
['debugger', 'formatstrings']

The 'formatstrings' change is trailing whitespace on the >>> for num in range(5,12): line:

>>> fs_old_stripped = '\n'.join(map(str.rstrip, topics_old['formatstrings'].splitlines()))
>>> fs_new_stripped = '\n'.join(map(str.rstrip, topics_new['formatstrings'].splitlines()))
>>> assert fs_old_stripped == fs_new_stripped

The 'debugger' change is "Ctrl-C" to "Ctrl"-"C", but I'm not sure what caused this:

>>> print('\n'.join(difflib.unified_diff(topics_old['debugger'].splitlines(), topics_new['debugger'].splitlines())))
--- 
+++ 
@@ -186,9 +186,9 @@
    originate in a module that matches one of these patterns. [1]
 
    By default, Pdb sets a handler for the SIGINT signal (which is sent
-   when the user presses "Ctrl-C" on the console) when you give a
+   when the user presses "Ctrl"-"C" on the console) when you give a
    "continue" command. This allows you to break into the debugger
-   again by pressing "Ctrl-C".  If you want Pdb not to touch the
+   again by pressing "Ctrl"-"C".  If you want Pdb not to touch the
    SIGINT handler, set *nosigint* to true.

cc @hugovk as 3.14 release manager as this does change the format of Lib/pydoc_data/topics.py. I'm happy when doing backports to preserve the current format (pformat) if release managers would prefer.

A


📚 Documentation preview 📚: https://cpython-previews--129116.org.readthedocs.build/

@AA-Turner AA-Turner requested a review from hugovk as a code owner January 21, 2025 03:52
@AA-Turner AA-Turner added docs Documentation in the Doc dir skip news needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Jan 21, 2025
@hugovk
Copy link
Member

hugovk commented Jan 21, 2025

On macOS, make -C Doc pydoc-topics goes from 14s (cold) and 7s (warm) to 9s and 2s.

The file also reduced from 830 KB to 519 KB.

However, I don't have any ['debugger', 'formatstrings'] differences:

Python 3.13.1 (v3.13.1:06714517797, Dec  3 2024, 14:00:22) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import runpy
>>> topics_old = runpy.run_path("topics_old.py")['topics']
>>> topics_new = runpy.run_path("topics_new.py")['topics']
>>> assert list(topics_old) == list(topics_new) # check order
>>> [k for k in topics_old if topics_old[k] != topics_new[k]]
[]
>>>

topics_old_and_new.zip

@AA-Turner
Copy link
Member Author

Wonderful! I think I may have been testing with Sphinx 8.1 vs 8.2 (unreleased), as I made a change to how the :kbd: role is implemented, which would explain the Ctrl-C change.

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hugovk as 3.14 release manager as this does change the format of Lib/pydoc_data/topics.py. I'm happy when doing backports to preserve the current format (pformat) if release managers would prefer.

I'm okay with it, but then there are no backports to 3.14 🙃

@AA-Turner
Copy link
Member Author

Ok, let's merge this and then ask @Yhg1s what he thinks about backporting to 3.13 and 3.12. I'm assuming the backports will fail due to the change to Lib/pydoc_data/topics.py

A

@AA-Turner AA-Turner merged commit 01bcf13 into python:main Jan 21, 2025
53 checks passed
@miss-islington-app
Copy link

Thanks @AA-Turner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @AA-Turner, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 01bcf13a1c5bfca5124cf2e0679c9d1b25b04708 3.13

@miss-islington-app
Copy link

Sorry, @AA-Turner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 01bcf13a1c5bfca5124cf2e0679c9d1b25b04708 3.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes skip news
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants