您好，能麻烦提供一下你们做的中文字库吗，你们论文中说数据集包括3850个不同的中文字 #5

Banyueqin · 2018-03-21T07:42:17Z

No description provided.

yuantailing · 2018-03-21T07:55:54Z

请看一下标注格式，自行从标注中提取。

yuantailing · 2018-03-21T08:01:06Z

示例代码：

import json

from pythonapi import anno_tools

if __name__ == '__main__':
    s = set()
    with open('../data/annotations/train.jsonl') as f:
        for line in f:
            anno = json.loads(line)
            for char in anno_tools.each_char(anno):
                s.add(char['text'])
    with open('../data/annotations/val.jsonl') as f:
        for line in f:
            anno = json.loads(line)
            for char in anno_tools.each_char(anno):
                s.add(char['text'])
    print(s)

Banyueqin · 2018-03-21T08:04:28Z

谢谢

pycoco · 2019-10-24T19:24:51Z

why i just get 3768 characters?

yuantailing · 2019-10-25T08:06:52Z

why i just get 3768 characters?

Some character categories appear only in the test set.

pycoco · 2019-10-25T08:34:38Z

thank u, i use above code to generate dict,but i also get key error when training.i don't know why

yuantailing closed this as completed Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

您好，能麻烦提供一下你们做的中文字库吗，你们论文中说数据集包括3850个不同的中文字 #5

您好，能麻烦提供一下你们做的中文字库吗，你们论文中说数据集包括3850个不同的中文字 #5

Banyueqin commented Mar 21, 2018

yuantailing commented Mar 21, 2018

yuantailing commented Mar 21, 2018

Banyueqin commented Mar 21, 2018

pycoco commented Oct 24, 2019

yuantailing commented Oct 25, 2019

pycoco commented Oct 25, 2019

您好，能麻烦提供一下你们做的中文字库吗，你们论文中说数据集包括3850个不同的中文字 #5

您好，能麻烦提供一下你们做的中文字库吗，你们论文中说数据集包括3850个不同的中文字 #5

Comments

Banyueqin commented Mar 21, 2018

yuantailing commented Mar 21, 2018

yuantailing commented Mar 21, 2018

Banyueqin commented Mar 21, 2018

pycoco commented Oct 24, 2019

yuantailing commented Oct 25, 2019

pycoco commented Oct 25, 2019