Cross-language Patent Similarity Learning

1.模型简介

A Siamese adversarial neural network framework was constructed based on the Siamese neural network and incorporating adversarial training method. The framework is designed for training and retrieval of multi-lingual patent text representation models, using Contrastive Loss as the training loss function, aiming to better fine-tune the training-related text representation models. The effectiveness of the Siamese adversarial neural network framework and the fine-tuned models were mainly validated through multiple comparative experiments designed on self-built parallel patent corpora of Thai, Vietnamese and seven other languages including German, French, Japanese, Korean, and Russian. 基于孪生神经网络并融入对抗训练方法构建了一个孪生对抗神经网络框架。该框架面向多语言、跨语种的专利文本表示模型训练任务及检索应用任务所构建，采用对比损失（Contrastive Loss）作为模型训练的损失函数，旨在更好地微调训练相关文本表示模型。孪生对抗神经网络框架及微调训练后的模型有效性主要通过在自建的泰语、越南语小语种专利平行语料以及自建的包含泰语、越南语、德语、法语、日语、韩语、俄语 7 种语言在内的专利平行语料上设计多组对比实验进行验证。

2.跨语种数据训练结果对比

2.1 模型检索性能评估

通过在包含同一专利小语种原文及中文翻译版本的数据库中，以小语种文本表示为检索对象检索最相近的N个文本表示，计算对应中文翻译版本文本表示所在排位得出指标。

2.2 模型表示能力示意

同一专利分别包含中文、英文、小语种原文三个版本的文本表示，图中编号数字相同的点代表同一个专利。

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
LICENSE		LICENSE
README.md		README.md
bert_siamese_similarity.py		bert_siamese_similarity.py
config.py		config.py
data_helper.py		data_helper.py
get_test_metrics.py		get_test_metrics.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-language Patent Similarity Learning

1.模型简介

2.跨语种数据训练结果对比

2.1 模型检索性能评估

2.2 模型表示能力示意

About

Releases

Packages

Languages

License

ChaneMo/Cross-language-Patent-Similarity-Learning

Folders and files

Latest commit

History

Repository files navigation

Cross-language Patent Similarity Learning

1.模型简介

2.跨语种数据训练结果对比

2.1 模型检索性能评估

2.2 模型表示能力示意

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages