Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't find words in Kobo dictionary generated by dictgen #16

Open
Ceiyne opened this issue Jan 30, 2021 · 3 comments
Open

Can't find words in Kobo dictionary generated by dictgen #16

Ceiyne opened this issue Jan 30, 2021 · 3 comments

Comments

@Ceiyne
Copy link

Ceiyne commented Jan 30, 2021

This is my first time using dictgen, so I apologize if this is actually user error.

I have been trying to convert JMdict to a Kobo dictionary. I used Pyglossary to generate a df file from JMdict, and then used dictgen to create the Kobo dictionary. After installing it (using the custom-dict folder), the Kobo saw the dictionary, but none of the words I tried to look up were able to be found. These were all words that are present in the JMdict source file.

When I ran dictgen, I used the default options. It ran without errors and said that it successfully wrote 190,800 entries.

I did a little troubleshooting but didn't come up with anything solid. Here are a few notes from what I checked:

  1. I looked at the df file that Pyglossary generated and it appeared to be in the correct format based on what I see on your documentation page. I also verified the entries I was trying to find in the book, and they were present in this file as well.
  2. I looked at the zip file that dictgen generated and on the surface it looked like my other Kobo dictionaries. It contained many files with filenames with two-character names like xy.html and those files contained unreadable data.
  3. I looked at your existing issues but didn't see anything similar. I saw one issue where your notes mentioned a "no words found" bug triggered by spaces in the dictionary filename, but I did not have spaces. I tried a few different names to make sure it wasn't a naming issue, things like: dicthtml-test.zip and dicthtml-test-test.zip

Pyglossary is capable of creating Kobo dictionaries as well, so in case you were wondering why I didn't just do that... I tried that method but had issues there as well. With Pyglossary the generated dictionary worked to some extent -- the Kobo would return the correct dictionary entries for many words. But there were also a lot of words that could not be found despite being present in JMdict. So, I thought I'd try working with dictgen instead.

@pgaskin
Copy link
Owner

pgaskin commented Jan 30, 2021

If this is the same Japanese issue reported on PyGlossary, note that the Kobo dictionary implementation in PyGlossary was derived from dictutil. 🙂

If it's not, can you provide a few examples of words which aren't found?

Also, just to check, what's your firmware version?

@Ceiyne
Copy link
Author

Ceiyne commented Jan 30, 2021

Yeah, it's basically the same one. The one over there was from when I used Pyglossary to do the whole conversion, and the one here was when I used Pyglossary to make the df file and dictgen to create the dicthtml.zip.

I'm on the latest (as far as I know) firmware, 4.25.15875.

@pgaskin
Copy link
Owner

pgaskin commented Jan 30, 2021

Yep, that would have essentially the same result since PyGlossary's logic is based on dictutil.

#14 is quite high up on my to-do list, but I haven't had enough contiguous free time to work on it yet. I'll probably end up doing it towards the middle of this year. Even though that doesn't implement the Japanese algorithms, it should be possible to work around the prefixing differences entirely as a temporary hack using the new prefix exception mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants