-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode characters incorrectly escaped in encode #334
Comments
Hi @samhh, thanks for submitting the issue! This indeed looks like an unexpected behaviour 😞 λ: show "ü"
"\"\\252\"" It looks like some smarter handling of Unicode characters is required to preserve TOML semantics. Relevant code to change is here: tomland/src/Toml/Type/Printer.hs Line 141 in 38d74d3
A more interesting question is why such errors weren't caught by our property tests? 🤔 🤔 🤔 |
I implemented the requested feature, now I only have to implement tests and figure it out why it wasn't caught by our test cases. Code needs some cleaning but it is working. :-) |
Our tests was ok, but they were testing something different. Text was generated with unicode characters but they were written like |
* [#334] parse and unparse tests * removed parsing and unparing tests * [#334] showUnicodeText * [#334] escaping unicode character as well as regular characters * [#334] resolved issue with escaping regular unescaped chars * added tests, but they are not in use * examples.hs revert to original content * [#334] changes requested by chshersh
The `tomland` library currently has some issues with Unicode characters, which makes it not usable right now. In the future we could try to migrate again but for now we should probably stick to YAML. See kowainik/tomland#334
At least, that's what I think is happening. In a REPL, the following will fail:
Looking at the encoding, here's what we're given:
And passing that into decode will fail per the above:
Looking at the TOML spec, it looks like Unicode characters should be encoded with a
\u
prefix. Modifying the string to contain an extrau0
allows encoding to succeed, and I think that's what we want given that's roughly the decimal output in an online Unicode converter:But I'm pretty ignorant about character encoding and am honestly not sure if that's the right output. 😄
The text was updated successfully, but these errors were encountered: