Skip to content

albertofwb/ofdtotext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ofd2txt

使用截图

Usage

命令行调用

python3 ofd_test.py 1.ofd

代码中引用

from ofdtotext import OFDFile


doc = OFDFile('test.ofd')
print(doc.get_text())

ref

核心代码参考自 ofd2img

程序思路

先通过 ofd2img 项目中的代码解压 ofd(该文件类似于 docx 是一个zip压缩包) 文件 将 xml 通过在线网站转为 json 格式,可清晰看出文本消息所处的层级关系,依次定义如下数据结构,即可提取所有文字

class TextCode:
    def __init__(self, text_code):
        self.text = text_code.text


class TextObject:
    def __init__(self, text_obj):
        self.text_code = [TextCode(i['TextCode']) for i in text_obj.children]


class Layer:
    def __init__(self, layer):
        self.text_obj = layer['TextObject']


class Content:
    def __init__(self, content):
        self.layer = TextObject(content['Layer'])

About

提取 ofd 文档的文字

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages