Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何获取行文本图像和标签? #43

Closed
wksuixin opened this issue Mar 3, 2021 · 2 comments
Closed

如何获取行文本图像和标签? #43

wksuixin opened this issue Mar 3, 2021 · 2 comments

Comments

@wksuixin
Copy link

wksuixin commented Mar 3, 2021

  1. 行文本可通过截取本行文字的最小外接矩形来获得。如何获得行文本标签?主要是不确定train.jsonl文件中,单个字符出现顺序是否和行文本字符出现顺序一致?
@yuantailing
Copy link
Owner

您好,instance 顺序和阅读顺序是一致的。

sentence:
[instance_0, instance_1, instance_2, ...]                 # MUST NOT be empty

instance:
{
    polygon: [[x0, y0], [x1, y1], [x2, y2], [x3, y3]],    # x, y are floating-point numbers
    text: str,                                            # the length of the text MUST be exactly 1
    is_chinese: bool,
    attributes: [attr_0, attr_1, attr_2, ...],            # MAY be an empty list
    adjusted_bbox: [xmin, ymin, w, h],                    # x, y, w, h are floating-point numbers
}

标注人员一般按阅读习惯的顺序标注。在上述结构中,sentence 数组中 instance 的顺序保留了标注顺序。因此,为了获取行文本的标签,将 instance.text 字段顺次连接即可。

@wksuixin
Copy link
Author

wksuixin commented Mar 3, 2021

好的,谢谢您

@wksuixin wksuixin closed this as completed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants