Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when running integrity check on file with emoji #38

Open
danny-wu opened this issue Jul 7, 2020 · 11 comments
Open

Comments

@danny-wu
Copy link

danny-wu commented Jul 7, 2020

I get the following error when I try to run the integrity check on a file that includes an emoji in the name. This happens even after I renamed the file, I suspect it's baked in the database now. Here is the error:


Traceback (most recent call last):
  File "/volume1/system/scorch/scorch.py", line 640, in inst_check
    newfi = get_fileinfo(filepath)
  File "/volume1/system/scorch/scorch.py", line 404, in get_fileinfo
    st = os.lstat(filepath)
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f917' in position 56: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/volume1/system/scorch/scorch.py", line 1647, in <module>
    main()
  File "/volume1/system/scorch/scorch.py", line 1626, in main
    rv = rv | func(opts,directory,db,dbremove)
  File "/volume1/system/scorch/scorch.py", line 707, in inst_check
    print_filepath(filepath,actions,total,opts.quote)
  File "/volume1/system/scorch/scorch.py", line 432, in print_filepath
    print(s,end=end)
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f917' in position 66: ordinal not in range(128)
@trapexit
Copy link
Owner

trapexit commented Jul 7, 2020

Are you using the latest release? Those line number don't match master or the 1.0.0 release.

@danny-wu
Copy link
Author

I just updated to the latest release and I am still getting this error. It is trivially reproducible for me. I just need to append to a database with a filename that has emoji. Then, check will always fail on that file.

Traceback (most recent call last):
  File "/volume1/system/scorch/scorch.py", line 644, in inst_check
    newfi = get_fileinfo(filepath)
  File "/volume1/system/scorch/scorch.py", line 408, in get_fileinfo
    st = os.lstat(filepath)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 42-43: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/volume1/system/scorch/scorch.py", line 1651, in <module>
    main()
  File "/volume1/system/scorch/scorch.py", line 1630, in main
    rv = rv | func(opts,directory,db,dbremove)
  File "/volume1/system/scorch/scorch.py", line 711, in inst_check
    print_filepath(filepath,actions,total,opts.quote)
  File "/volume1/system/scorch/scorch.py", line 436, in print_filepath
    print(s,end=end)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 52-53: ordinal not in range(128)

@danny-wu
Copy link
Author

danny-wu commented Jul 11, 2020

I am testing whether applying

export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'

before running scorch resolves the issue. I'll keep you updated.

EDIT: Unfortunately that does not work. Checking via the CLI, I'm already running in utf8 anyway even without manually specifying the exports.

@trapexit
Copy link
Owner

Can you please provide the filename? I can't test things otherwise.

@trapexit
Copy link
Owner

I created a file named 🤗 and it handles it fine. The fact it's saying it's using ascii implies something with the environment. I'm not super familiar with all of the ways Python might be picking it's encoding/decoding but I'll look around.

@danny-wu
Copy link
Author

Hi, I did some more research and I believe the issue affects Synology task scheduled functions specifically:

https://stackoverflow.com/questions/38174485/python3-unicodeencodeerror-when-run-via-synology-task-scheduler

So this doesn't appear to be a bug with scorch but rather Synology. Unfortunately, I'm now getting this error instead:

admin@sweet:~$ python3 /volume1/system/scorch/scorch.py -v -d /volume1/system/scorch/allpersonal.db check /volume1/personal/ -f ".+\/(Thumbs\.db|@eaDir.*|.DS_Store|#recycle.*)" -F
Traceback (most recent call last):
  File "/volume1/system/scorch/scorch.py", line 1651, in <module>
    main()
  File "/volume1/system/scorch/scorch.py", line 1624, in main
    db = read_db(opts.dbpath)
  File "/volume1/system/scorch/scorch.py", line 1481, in read_db
    db = read_db_from_fd(f)
  File "/volume1/system/scorch/scorch.py", line 1447, in read_db_from_fd
    for row in reader:
  File "/volume1/@appstore/py3k/usr/local/lib/python3.5/gzip.py", line 287, in read1
    return self._buffer.read1(size)
  File "/volume1/@appstore/py3k/usr/local/lib/python3.5/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/volume1/@appstore/py3k/usr/local/lib/python3.5/gzip.py", line 469, in read
    uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid block type

I get this error even on a backup of my database. The DB file remains unchanged, so it's not corrupt; it's the reading of the file that seems to be corrupt when the proper utf8 encoding is set.

I guess i might have to start over...

@danny-wu
Copy link
Author

danny-wu commented Jul 11, 2020

Suggestion: as this is a script that may be commonly run on a Synology, and this issue would not be detected by the end user until there is a non-ASCII filename, it might be helpful to try and detect this condition on start by purposefully writing an non-ASCII file, stating it, and then removing it.

@trapexit
Copy link
Owner

How old was the version of scorch were you using? The DB has changed from a plain text file years ago to a gzipped file. I removed the non-gzip loading in the 1.0 release because it wasn't supported in a long while.

I'm sure there is a transparent work around for the encoding stuff but need to know exactly what's going on.

@trapexit
Copy link
Owner

I see what's going on. Thanks for the link. Let me see what I can do.

@trapexit
Copy link
Owner

@trapexit
Copy link
Owner

:bump:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants