Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word level annotations? #1

Closed
vsooda opened this issue Mar 19, 2018 · 3 comments
Closed

word level annotations? #1

vsooda opened this issue Mar 19, 2018 · 3 comments

Comments

@vsooda
Copy link

vsooda commented Mar 19, 2018

thanks for the great dataset.

I looked into the dataset, it is a character-based dataset. and you use detection with different category for recognizing. But my solution is detecting the word bbox then recognizing.

Maybe I can write code to convert the annotation to word format. But it's time consuming. Could you also offer a word level annotation. It maybe much more easy to use for someone like me.

@yuantailing
Copy link
Owner

See each_char in pythonapi/anno_tools.py, for each block, just compute bounding box of char['polygon'] (or char['adjusted_bbox']) and concatenate char['text'].

@yuantailing
Copy link
Owner

yuantailing commented Mar 19, 2018

It may look like this.

from __future__ import print_function
import json

def each_word(anno):
    for block in anno['annotations']:
        xx, yy = [], []
        s = ''
        for char in block:
            for xy in char['polygon']:
                xx.append(xy[0])
                yy.append(xy[1])
            if char['is_chinese']:
                s += char['text']
        yield (min(xx), min(yy), max(xx) - min(xx), max(yy) - min(yy)), s

if __name__ == '__main__':
    with open('../data/annotations/train.jsonl') as f:
        anno = json.loads(f.readline())
    for bbox, s in each_word(anno):
        print(bbox, s)

@vsooda
Copy link
Author

vsooda commented Mar 19, 2018

awesome! thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants