Remove u string prefix from docs #1174

verhovsky · 2025-01-16T10:47:44Z

I did a search and replace for u" and (?<!')\bu' and checked each case. I also updated some strings to include the character instead of \u and \x literals since that's what the Python 3 REPL does and I also noticed some literals in docstrings were not escaped correctly (you need to escape the backslash unless it's a r""" docstring, otherwise the docstring will contain the actual character instead of the literal)

akx

Thank you for the PR! Generally I'm on board with this, but there are some inconsistencies and weirdness here.

akx · 2025-01-16T11:18:07Z

babel/lists.py

-    u'apples\u3001oranges\u548cpears'
+    'apples、oranges和pears'


Why do some of the changes have escaped sequences replaced with characters and others don't? 🤔

I think I'd rather not change any escape sequences to the unescaped versions, just so it's easy to see e.g. the difference between - – — 😄

Like I said,

I also updated some strings to include the character instead of \u and \x literals since that's what the Python 3 REPL does

I don't think I missed any, some sequences are escaped because they're unprintable characters that are escaped by the Python REPL, like \u2009. I think for console examples the example should print exactly what the REPL outputs, we could argue about what to put in the code that causes the REPL to print that string though.

Like we can have

>>> 'apples\u3001oranges\u548cpears' 'apples、oranges和pears'

but we definitely shouldn't have

>>> 'apples\u3001oranges\u548cpears' 'apples\u3001oranges\u548cpears'

because that's not what actually happens in the REPL.

I also separated the 3 things I did into 3 separate commits for easier review.

akx · 2025-01-16T11:23:04Z

babel/support.py

@@ -94,7 +94,7 @@ def datetime(
        >>> from babel.dates import get_timezone
        >>> fmt = Format('en_US', tzinfo=get_timezone('US/Eastern'))
        >>> fmt.datetime(datetime(2007, 4, 1, 15, 30))
-        u'Apr 1, 2007, 11:30:00\u202fAM'
+        'Apr 1, 2007, 11:30:00\\u202fAM'


Why does a single slash turn into two here? (And in other places in the diff.)

I explained in the PR

I also noticed some literals in docstrings were not escaped correctly (you need to escape the backslash unless it's a r""" docstring, otherwise the docstring will contain the actual character instead of the literal)

It's because this is a string within a string (a docstring) that starts here

babel/babel/support.py

Line 91 in 3000762

"""Return a date and time formatted according to the given pattern.

Someone pasted REPL output directly into the docstring without escaping the backslashes, the other way to do this right is to use a r""" docstring which some of the code does.

You can see that this is wrong if you open a Python REPL and run this code

from babel.support import Format fmt = Format('en_US', tzinfo=get_timezone('US/Eastern')) help(fmt.datetime)

you'll see this:

but if you actually type the code into the repl the character is escaped

verhovsky · 2025-01-16T12:39:15Z

It would be good to add doctest to babel to check that the REPL output in the examples actually matches what you get if you were to run the code, but I'm not going to work on that.

akx · 2025-01-24T07:40:05Z

It would be good to add doctest to babel to check that the REPL output in the examples actually matches what you get if you were to run the code, but I'm not going to work on that.

We do have that...

babel/conftest.py

Lines 22 to 24 in 98b9562

    
           def pytest_collect_file(file_path: Path, parent): 
        
               if _is_relative(file_path, babel_path) and file_path.suffix == '.py': 
        
                   return DoctestModule.from_parent(parent, path=file_path)

verhovsky · 2025-01-24T10:27:00Z

I thought it would catch the incorrectly escaped strings but it turns out doctest thinks that

['E d.\\u2009–\\u2009', 'E d.M.']

and

['E d.\u2009–\u2009', 'E d.M.']

are the same, which is surprising because they obviously are not but they wont fix it python/cpython#129257 (comment)

verhovsky added 3 commits January 16, 2025 03:31

Remove u string prefix

73c9106

Convert unicode literals to characters

5c8b75a

Escape unicode literals

3000762

akx requested changes Jan 16, 2025

View reviewed changes

Stop removing u prefix in doctest

02decbe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove u string prefix from docs #1174

Remove u string prefix from docs #1174

verhovsky commented Jan 16, 2025 •

edited

Loading

akx left a comment

akx Jan 16, 2025

verhovsky Jan 16, 2025 •

edited

Loading

verhovsky Jan 16, 2025

akx Jan 16, 2025

verhovsky Jan 16, 2025 •

edited

Loading

verhovsky commented Jan 16, 2025 •

edited

Loading

akx commented Jan 24, 2025

verhovsky commented Jan 24, 2025 •

edited

Loading

Remove u string prefix from docs #1174

Are you sure you want to change the base?

Remove u string prefix from docs #1174

Conversation

verhovsky commented Jan 16, 2025 • edited Loading

akx left a comment

Choose a reason for hiding this comment

akx Jan 16, 2025

Choose a reason for hiding this comment

verhovsky Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

verhovsky Jan 16, 2025

Choose a reason for hiding this comment

akx Jan 16, 2025

Choose a reason for hiding this comment

verhovsky Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

verhovsky commented Jan 16, 2025 • edited Loading

akx commented Jan 24, 2025

verhovsky commented Jan 24, 2025 • edited Loading

verhovsky commented Jan 16, 2025 •

edited

Loading

verhovsky Jan 16, 2025 •

edited

Loading

verhovsky Jan 16, 2025 •

edited

Loading

verhovsky commented Jan 16, 2025 •

edited

Loading

verhovsky commented Jan 24, 2025 •

edited

Loading