Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NLP on OCR as a complementary feature for off-category-classification #7

Open
Tracked by #2
alexgarel opened this issue Mar 21, 2022 · 2 comments
Open
Tracked by #2
Assignees

Comments

@alexgarel
Copy link
Member

alexgarel commented Mar 21, 2022

First idea is to inject whole OCR into features (maybe distinct from other features) and see what we can do.
As this has lot of noise, "some attention" mechanism might be necessary.

Follow the line of having pre treatments in the tensorflow pipeline to ease model deployment.

@alexgarel alexgarel changed the title Use NLP on OCR as a complementary feature for off-category-classification (Mathilde) Use NLP on OCR as a complementary feature for off-category-classification Mar 21, 2022
@teolemon
Copy link
Member

@alexgarel
Copy link
Member Author

In the above file:

The field "source" give source file name, eg: "source": "/50414727/1.json", the field "content" contains OCR data.

barcode is in the folders name, file name correspond to photo number.

For barcode, see https://github.com/openfoodfacts/robotoff/blob/master/robotoff/off.py#L88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To discuss and validate
Development

No branches or pull requests

3 participants