Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove u string prefix from docs #1174

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

verhovsky
Copy link

@verhovsky verhovsky commented Jan 16, 2025

I did a search and replace for u" and (?<!')\bu' and checked each case. I also updated some strings to include the character instead of \u and \x literals since that's what the Python 3 REPL does and I also noticed some literals in docstrings were not escaped correctly (you need to escape the backslash unless it's a r""" docstring, otherwise the docstring will contain the actual character instead of the literal)

Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Generally I'm on board with this, but there are some inconsistencies and weirdness here.

u'apples\u3001oranges\u548cpears'
'apples、oranges和pears'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do some of the changes have escaped sequences replaced with characters and others don't? 🤔

I think I'd rather not change any escape sequences to the unescaped versions, just so it's easy to see e.g. the difference between - – — 😄

Copy link
Author

@verhovsky verhovsky Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said,

I also updated some strings to include the character instead of \u and \x literals since that's what the Python 3 REPL does

I don't think I missed any, some sequences are escaped because they're unprintable characters that are escaped by the Python REPL, like \u2009. I think for console examples the example should print exactly what the REPL outputs, we could argue about what to put in the code that causes the REPL to print that string though.

Like we can have

>>> 'apples\u3001oranges\u548cpears'
'apples、oranges和pears'

but we definitely shouldn't have

>>> 'apples\u3001oranges\u548cpears'
'apples\u3001oranges\u548cpears'

because that's not what actually happens in the REPL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also separated the 3 things I did into 3 separate commits for easier review.

@@ -94,7 +94,7 @@ def datetime(
>>> from babel.dates import get_timezone
>>> fmt = Format('en_US', tzinfo=get_timezone('US/Eastern'))
>>> fmt.datetime(datetime(2007, 4, 1, 15, 30))
u'Apr 1, 2007, 11:30:00\u202fAM'
'Apr 1, 2007, 11:30:00\\u202fAM'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does a single slash turn into two here? (And in other places in the diff.)

Copy link
Author

@verhovsky verhovsky Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained in the PR

I also noticed some literals in docstrings were not escaped correctly (you need to escape the backslash unless it's a r""" docstring, otherwise the docstring will contain the actual character instead of the literal)

It's because this is a string within a string (a docstring) that starts here

"""Return a date and time formatted according to the given pattern.

Someone pasted REPL output directly into the docstring without escaping the backslashes, the other way to do this right is to use a r""" docstring which some of the code does.

You can see that this is wrong if you open a Python REPL and run this code

from babel.support import Format
fmt = Format('en_US', tzinfo=get_timezone('US/Eastern'))
help(fmt.datetime)

you'll see this:

Screenshot 2025-01-16 at 05 07 16

but if you actually type the code into the repl the character is escaped

Screenshot 2025-01-16 at 05 08 35

@verhovsky
Copy link
Author

verhovsky commented Jan 16, 2025

It would be good to add doctest to babel to check that the REPL output in the examples actually matches what you get if you were to run the code, but I'm not going to work on that.

@akx
Copy link
Member

akx commented Jan 24, 2025

It would be good to add doctest to babel to check that the REPL output in the examples actually matches what you get if you were to run the code, but I'm not going to work on that.

We do have that...

babel/conftest.py

Lines 22 to 24 in 98b9562

def pytest_collect_file(file_path: Path, parent):
if _is_relative(file_path, babel_path) and file_path.suffix == '.py':
return DoctestModule.from_parent(parent, path=file_path)

@verhovsky
Copy link
Author

verhovsky commented Jan 24, 2025

I thought it would catch the incorrectly escaped strings but it turns out doctest thinks that

['E d.\\u2009–\\u2009', 'E d.M.']

and

['E d.\u2009\u2009', 'E d.M.']

are the same, which is surprising because they obviously are not but they wont fix it python/cpython#129257 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants