Skip to content

Commit

Permalink
new structure
Browse files Browse the repository at this point in the history
  • Loading branch information
faneshion committed Dec 7, 2017
1 parent e6bc0e1 commit d13495c
Show file tree
Hide file tree
Showing 136 changed files with 26,672 additions and 14,294 deletions.
17 changes: 2 additions & 15 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,28 +1,15 @@
*.pyc
*.txt
*.log
*.swp
*.bak
*.weights
*.trec
*.ranklist
*.DS_Store
*.mq2007
matchzoo/*.txt
data/mq2008/*
data/mq2007
data/toutiao
data/example/*
data/toutiao_jieba_new
data/robust/*
build/
dist/
log/*
matchzoo/log/*
qrels.*
trec_eval
matchzoo/lydev/*
#matchzoo/models/*.config
matchzoo/run_submit_gypsum_jobs_wikiqa.py
matchzoo/run_model.py
matchzoo/run_model_wraper.py
log/*
.idea/
12 changes: 8 additions & 4 deletions MatchZoo.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Metadata-Version: 1.1
Name: MatchZoo
Version: 1.0
Summary: MatchingZoom is a toolkit for text matching.It was developed with a focus on enabling fast experimentation.
Version: 0.2.0
Summary: MatchZoo is a toolkit for text matching. It was developed with a focus on facilitating the designing, comparing and sharing of deep text matching models.
Home-page: https://github.com/faneshion/MatchZoo
Author: Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, Xueqi Cheng
Author-email: [email protected]
Expand All @@ -12,6 +12,10 @@ Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: BSD License
Classifier: License :: OSI Approved :: Apache License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
1 change: 1 addition & 0 deletions MatchZoo.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@ matchzoo/metrics/evaluations.py
matchzoo/metrics/rank_evaluations.py
matchzoo/utils/__init__.py
matchzoo/utils/rank_io.py
matchzoo/utils/roc_auc.py
matchzoo/utils/utility.py
2 changes: 2 additions & 0 deletions MatchZoo.egg-info/requires.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ tensorflow >= 1.1.0
nltk >= 3.2.3
numpy >= 1.12.1
six >= 1.10.0
h5py >= 2.7.0
tqdm >= 4.19.4
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<div align='center'>
<img src="./data/matchzoo-logo.png" width = "400" alt="图片名称" align=center />
<img src="./docs/_static/images/matchzoo-logo.png" width = "400" alt="图片名称" align=center />
</div>

---
Expand Down Expand Up @@ -55,14 +55,14 @@ In the main directory, this will install the dependencies automatically.

For usage examples, you can run
```
python main.py --phase train --model_file ./models/arci_ranking.config
python main.py --phase predict --model_file ./models/arci_ranking.config
python matchzoo/main.py --phase train --model_file examples/toy_example/config/arci_ranking.config
python matchzoo/main.py --phase predict --model_file examples/toy_example/config/arci_ranking.config
```

## Overview
The architecture of the MatchZoo toolkit is described in the Figure in what follows,
<div align='center'>
<img src="./data/matchzoo.png" width = "400" height = "200" alt="图片名称" align=center />
<img src="./docs/_static/images/matchzoo.png" width = "400" height = "200" alt="图片名称" align=center />
</div>
There are three major modules in the toolkit, namely data preparation, model construction, training and evaluation, respectively. These three modules are actually organized as a pipeline of data flow.

Expand All @@ -87,11 +87,11 @@ Here, we adopt <a href="https://www.microsoft.com/en-us/download/details.aspx?id

Take the DRMM as an example. In training phase, you can run
```
python main.py --phase train --model_file models/wikiqa_config/drmm_wikiqa.config
python matchzoo/main.py --phase train --model_file examples/wikiqa/config/drmm_wikiqa.config
```
In testing phase, you can run
```
python main.py --phase predict --model_file models/wikiqa_config/drmm_wikiqa.config
python matchzoo/main.py --phase predict --model_file examples/wikiqa/config/drmm_wikiqa.config
```

We have compared 10 models, the results are as follows.
Expand Down Expand Up @@ -166,12 +166,12 @@ We have compared 10 models, the results are as follows.
</table>
The loss of each models are described in the following figure,
<div align='center'>
<img src="./data/matchzoo.wikiqa.loss.png" width = "550" alt="图片名称" align=center />
<img src="./docs/_static/images/matchzoo.wikiqa.loss.png" width = "550" alt="图片名称" align=center />
</div>

The MAP of each models are depicted in the following figure,
<div align='center'>
<img src="./data/matchzoo.wikiqa.map.png" width = "550" alt="图片名称" align=center />
<img src="./docs/_static_images/matchzoo.wikiqa.map.png" width = "550" alt="图片名称" align=center />
</div>
Here, the DRMM_TKS is a variant of DRMM for short text matching. Specifically, the matching histogram is replaced by a top-k maxpooling layer and the remaining part are fixed.

Expand Down Expand Up @@ -297,11 +297,11 @@ Development Teams

Acknowledgements
=====
We would like to express our appreciation to the following people for contributing source code to MatchZoo, including [Yixing Fan](https://scholar.google.com/citations?user=w5kGcUsAAAAJ&hl=en), [Liang Pang](https://scholar.google.com/citations?user=1dgQHBkAAAAJ&hl=zh-CN), [Liu Yang](https://sites.google.com/site/lyangwww/), [Lijuan Chen](), [Jianpeng Hou](https://github.com/HouJP), [Zhou Yang](), [Niuguo cheng](https://github.com/niuox) etc..
We would like to express our appreciation to the following people for contributing source code to MatchZoo, including [Yixing Fan](https://scholar.google.com/citations?user=w5kGcUsAAAAJ&hl=en), [Liang Pang](https://scholar.google.com/citations?user=1dgQHBkAAAAJ&hl=zh-CN), [Liu Yang](https://sites.google.com/site/lyangwww/), [Yukun Zheng](), [Lijuan Chen](), [Jianpeng Hou](https://github.com/HouJP), [Zhou Yang](), [Niuguo cheng](https://github.com/niuox) etc..

Feedback and Join Us
=====
Feel free to post any questions or suggestions on [GitHub Issues](https://github.com/faneshion/MatchZoo/issues) and we will reply to your questions there. You can also suggest adding new deep text maching models into MatchZoo and apply for joining us to develop MatchZoo together.
<div align='center'>
<img src="./data/matchzoo-group.jpeg" width = "200" alt="图片名称" align=center />
<img src="./docs/_static/images/matchzoo-group.jpeg" width = "200" alt="图片名称" align=center />
</div>
3 changes: 3 additions & 0 deletions Team.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ The following people contributed to the development of the MatchZoo project:
- **Liu Yang (Core Developer)**
- PhD. student from Center for Intelligent Information Retrieval, University of Massachusetts Amherst
- [HomePage](https://sites.google.com/site/lyangwww/)
- **Yukun Zheng (Core Developer)**
- master student from Tsinghua University
- [HomePage]()
- **Zhou Yang (Core Developer)**
- Master student from Chongqing University of Technology
- [HomePage]()
Expand Down
44 changes: 39 additions & 5 deletions build/lib/matchzoo/inputs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,39 @@
# note
from pair_generator import PairGenerator
from pair_generator import DRMM_PairGenerator
from list_generator import ListGenerator
from list_generator import DRMM_ListGenerator
# note
from __future__ import absolute_import
import six
from keras.utils.generic_utils import deserialize_keras_object

from .point_generator import PointGenerator
from .point_generator import Triletter_PointGenerator
from .point_generator import DRMM_PointGenerator

from .pair_generator import PairGenerator
from .pair_generator import Triletter_PairGenerator
from .pair_generator import DRMM_PairGenerator
from .pair_generator import PairGenerator_Feats
from .list_generator import ListGenerator
from .list_generator import Triletter_ListGenerator
from .list_generator import DRMM_ListGenerator
from .list_generator import ListGenerator_Feats

def serialize(generator):
return generator.__name__

def deserialize(name, custom_objects=None):
return deserialize_keras_object(name,
module_objects=globals(),
custom_objects=custom_objects,
printable_module_name='loss function')

def get(identifier):
if identifier is None:
return None
if isinstance(identifier, six.string_types):
identifier = str(identifier)
return deserialize(identifier)
elif callable(identifier):
return identifier
else:
raise ValueError('Could not interpret '
'loss function identifier:', identifier)

Loading

0 comments on commit d13495c

Please sign in to comment.