File Metadata error when parsing HDoujin Downloader's info.json files inside zip files #40

Dystasia · 2020-09-13T04:25:34Z

File Metadata parser fails for info.json files generated from HDoujin Downloader when inside zip files. Same info.json when extracted parses with no issues whatsoever.

Here is the plugin.log:

Sep-09 00:16:49--INFO pluginctx.file-metadata.main: Attempting with DataType.eze
Sep-09 00:16:49--WARNING pluginctx.file-metadata.extractors.common: An error occured while trying to parse file into a dict
Sep-09 00:16:49--INFO pluginctx.file-metadata.main: Skipping DataType.eze
Sep-09 00:16:49--INFO pluginctx.file-metadata.main: Attempting with DataType.hdoujin
Sep-09 00:16:49--WARNING pluginctx.file-metadata.extractors.common: An error occured while trying to parse file into a dict
Sep-09 00:16:49--INFO pluginctx.file-metadata.main: Skipping DataType.hdoujin

Let me know if you need an exmaple, but really this is happening with all my files.

Dystasia · 2020-09-14T10:13:35Z

Actually, it is not all of them. I am trying to identify the differences but I am guessing it has something to do with the structure of some info files.

Dystasia · 2020-09-14T18:54:42Z

Ok I found the issue. It has something to do with special characters when zipped. This Json works when unzipped but not when zipped:

zatsuna · 2020-09-14T20:57:03Z

@Dystasia
I only have zip and rar files. I did some testing and here's what I found out.
The File Metadata plugin finds and successfully adds tags but only if the folder is unzipped. I don't have any unzipped galleries, so I didn't notice this before. I have many .zip galleries and none works with File Metadata.
It worked fine with .zip galleries in HPX from a year before.

Also, I don't get duplicate galleries with unzipped folders when scanning for new galleries. If galleries are zipped, I always get duplicates of every gallery regardless of "Scan only for new galleries" option being selected. Every scan adds another duplicate.

These two issues are probably related to each other as they both are solved by unzipping.

Dystasia · 2020-09-17T07:20:27Z

Just an update of how I attempted to fix this.

First, the exception actually thrown when trying to parse is:
'charmap' codec can't decode byte 0x9d in position 314: character maps to <undefined>

This probably means, the reading of the file is happening without utf-8 encoding.

The reading and parsing of the file is happening in:

plugins/plugins/File Metadata/extractors/common.py

Lines 85 to 86 in 6472a37

    
           with fs.open("r", **kw) as f: 
        
               d = json.load(f)

even tho the encoding seems to get set at:

plugins/plugins/File Metadata/extractors/common.py

Lines 82 to 83 in 6472a37

    
           if not fs.inside_archive: 
        
               kw['encoding'] = 'utf-8'

this doesn't seem to work for compressed info.json files. Attempting to remove the if condition I get the exception:
open() got an unexpected keyword argument 'encoding'

I can't see the content of hpx.command.CoreFS even tho the documentation states it is a file handler/wrapper, so I'm kinda stuck on not knowing the interface of this class or how to try and force the encoding in another way.

@twiddli have any inputs? is this something that needs to be fixed in hpx core instead of the plugin?

twiddli · 2020-09-17T19:10:42Z

Hello, thank you guys for the troubleshooting. This is such a weird issue as I still can't repro it yet.
Creating a zip file with an info.json with the contents:

{ "manga_info": { "title": "Bad Girl", "original_title": "", "author": [], "artist": [ "INAGO" ], "circle": [], "scanlator": [], "translator": [], "publisher": "FAKKU", "description": "It’s because I’m a good student…that I need some stimulation. ❤", "status": "", "chapters": "N/A", "pages": 20, "tags": { "Misc": [ "Schoolgirl Outfit", "Creampie", "Deepthroat", "Exhibitionism", "Glasses", "Hentai", "Humiliation", "Loli", "Masturbation", "Teacher", "Toys", "Uncensored", "X-Ray" ] }, "type": "", "language": [ "English" ], "released": "", "reading_direction": "", "characters": [], "series": "", "parody": [ "Original Work" ], "url": "https://hentainexus.com/read/6019" } }

works totally fine, I even put the character ❤ in the filename for good measure and got no issues.

Can you check if the file is utf-8 encoded?

Also, for more insight on what's happening on that line of code, it checks if the file is inside the archive and omits specifying the encoding because the archive handler from the std lib doesn't accept an encoding parameter when opening files from inside the archive. I think this is because it is assumed the encoding is utf-8.

Saving the info.json file inside the archive with a different encoding than utf-8, I get this error: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page. suggesting that it expects utf-8 for all text files.

zatsuna · 2020-09-18T17:18:36Z

All my files generated by E-Hentai Downloader have a UTF-8 info.txt.

Sample info file:
info.txt

Dystasia mentioned this issue Sep 14, 2020

Can't add any tags happypandax/happypandax#211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File Metadata error when parsing HDoujin Downloader's info.json files inside zip files #40

File Metadata error when parsing HDoujin Downloader's info.json files inside zip files #40

Dystasia commented Sep 13, 2020

Dystasia commented Sep 14, 2020

Dystasia commented Sep 14, 2020 •

edited

Loading

zatsuna commented Sep 14, 2020 •

edited

Loading

Dystasia commented Sep 17, 2020 •

edited

Loading

twiddli commented Sep 17, 2020 •

edited

Loading

zatsuna commented Sep 18, 2020 •

edited

Loading

File Metadata error when parsing HDoujin Downloader's info.json files inside zip files #40

File Metadata error when parsing HDoujin Downloader's info.json files inside zip files #40

Comments

Dystasia commented Sep 13, 2020

Dystasia commented Sep 14, 2020

Dystasia commented Sep 14, 2020 • edited Loading

zatsuna commented Sep 14, 2020 • edited Loading

Dystasia commented Sep 17, 2020 • edited Loading

twiddli commented Sep 17, 2020 • edited Loading

zatsuna commented Sep 18, 2020 • edited Loading

Dystasia commented Sep 14, 2020 •

edited

Loading

zatsuna commented Sep 14, 2020 •

edited

Loading

Dystasia commented Sep 17, 2020 •

edited

Loading

twiddli commented Sep 17, 2020 •

edited

Loading

zatsuna commented Sep 18, 2020 •

edited

Loading