Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2082' in position 289702: Body ('₂') is not valid Latin-1. #346

Open
wlad opened this issue Nov 17, 2021 · 1 comment

Comments

@wlad
Copy link

wlad commented Nov 17, 2021

I'm trying to POST an XML file which has elements like <items id="text">SpO₂</items>. Request fails with following error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2082' in position 289702: Body ('₂') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Traceback (most recent call last):
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/RequestsLibrary/utils.py", line 138, in decorator
return func(*args, **kwargs)
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/RequestsLibrary/RequestsOnSessionKeywords.py", line 60, in post_on_session
response = self._common_request("post", session, url,
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/RequestsLibrary/RequestsKeywords.py", line 37, in _common_request
resp = method_function(
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/sessions.py", line 590, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/lib/python3.9/http/client.py", line 1257, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.9/http/client.py", line 1302, in _send_request
body = _encode(body, 'body')
File "/usr/lib/python3.9/http/client.py", line 164, in _encode
raise UnicodeEncodeError(

Here is how I send the request (${file} is loaded via Get File keyword)

${resp}=            POST On Session      ${SUT}    /definition/template/adl1.4   expected_status=anything
                    ...                  data=${file}    headers=${headers}

If I remove from the payload the request succeeds

What am I missing? The XML files actually starts with

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
@robinmatz
Copy link
Contributor

robinmatz commented Dec 30, 2021

Hi @wlad ,
I was able to reproduce the error.

The cause for this behavior is in python's http client.

Here, we have the following code:

def _encode(data, name='data'):
    """Call data.encode("latin-1") but show a better error message."""
    try:
        return data.encode("latin-1")
    except UnicodeEncodeError as err:
        raise UnicodeEncodeError(
            err.encoding,
            err.object,
            err.start,
            err.end,
            "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
            "if you want to send it encoded in UTF-8." %
            (name.title(), data[err.start:err.end], name)) from None

the line return data.encode("latin-1") is the where the error occurs.

As you can see, it tries to decode the data as latin-1, disregarding <?xml version="1.0" encoding="utf-8" standalone="yes"?> in the xml file.

This issue has been raised in requests, too: psf/requests#1822 (comment)

There is a workaround. If you modify your test case like this, the requests should succeed:

${file}=    Get File    /path/to/file.xml    encoding=latin-1
${file_utf8}=    Evaluate    """${file}""".encode("utf-8")
${resp}=    POST On Session    ${SUT}    /definition/template/adl1.4    expected_status=anything
...    data=${file_utf8}    headers=${headers}

or have the file content encoded as latin-1

${file}=    Get File    /path/to/file.xml    encoding=latin-1
${resp}=    POST On Session    ${SUT}    /definition/template/adl1.4    expected_status=anything
...    data=${file}    headers=${headers}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants