-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeEncodeError in mergerfs.balance #106
Comments
Funny this should come up for you as someone else ran into similar with my other tool trapexit/scorch#38 And yeah... I addressed it by forcing utf8 in the textiowrapper. It appears to work in their case as far as I think I reproduced it. Been waiting for confirmation before adding it to all my python tools. |
Yes it seems really likely now that something is messing with the locale settings during the runtime of |
I'm not sure why I went with surrogateescape over backslashreplace. Might have just been what I found while searching. |
Thanks, this seems to work -- but as I said, I'm currently not re-balancing the same files I was re-balancing earlier. I'll keep an eye on it for now. |
Hey, I have seen that this was an issue before and that you tried to solve it by replacing
sys.stdout
andsys.stderr
with aTextIOWrapper
that hopefully justsurrogateescape
s encoding errors... but this didn't work for me for some files, causing me to just wrap theprint
lines intry-except UnicodeEncodeError
blocks.When I took a closer look, I realized there's something strange going on. You shouldn't need the
TextIOWrapper
workaround at all. Every path thatfind_a_file
(os.walk
) returns should be a regular string (Unicode codepoints), i.e. it was already successfully decoded by Python at this point. If you then try to re-encode them through theTextIOWrapper
instance that useslocale.getpreferredencoding(False)
as its encoding, the encoding might actually fail, because that's not necessarily an encoding that the filesystem itself uses (I think). What is strange however is, that my result forlocale.getprefferencoding(False)
definitely returnsUTF-8
, so I'm confused how the encoding error could have arisen in the first place (I definitely don't have filenames/paths that fall outside the UTF-8 table). It feels like something is messing with the set locale while the script is running...rsync
maybe? I saw in its source code that it likes to setLC_TYPE
to an empty string, but I'm not sure if this would result in a differentlocale.getprefferencoding(False)
output.My suggestion would be to just force
UTF-8
encoding in the TextIOWrapper instances, which IMHO should be able to encode every string you throw at it into bytes -- even if the output in the console might look a bit funny then (because of a locale mismatch), it's better than the script crashing. Maybe this isn't a good idea, I'm not sure. I unfortunately can't test this super well, because I'm already "past" the files that caused errors, and didn't take note of their names. Maybe you have some input on this?The text was updated successfully, but these errors were encountered: