-
Notifications
You must be signed in to change notification settings - Fork 8
Suggestions filtering
This step assumes using suggestions ranking.
Matches in match_dict
can have multiple options of 2 types:
I. grammar / syntactic variants ex. people / person, a / an
ex.
- (A)
Foo a fo fo.
- (B)
Foo an fo fo.
If A is ranked higher, you can hide B (most probably incorrect) by setting filter_suggestions
to True
:
lm_path='/path/to/your/kenlm/model.bin'
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict, lm_path=lm_path, filter_suggestions=True)
ReplaCy does guessing for you. Expect suggestions filtering if: -- variants have the same lemma ex. kid / kids, walk / walked -- variants are DET, ex. a / an -- variants consist of more than one word, ex. think / think of
II. lexical variants ex. tall / big / huge
ex.
- (A)
Foo foo big foo.
- (B)
Foo foo huge foo.
- (C)
Foo foo tall foo.
Set default_max_count
to any integer to display top n suggestions, ex. default_max_count=2
would display just A and B.
lm_path='/path/to/your/kenlm/model.bin'
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict, lm_path=lm_path, filter_suggestions=True, default_max_count=2)
Additionally, any suggestion item can have MAX_COUNT
property, which overwrites the above rules ( filter_suggestions
and default_max_count
can be omitted ), ex.
"suggestions": [
[
{
"TEXT": {"IN": ["big", "huge", "tall", "high"]},
"MAX_COUNT": 3
}
]
],
Add debug=True
to get information about accepted and suppressed suggestions along with their MAX_COUNT
:
lm_path='/path/to/your/kenlm/model.bin'
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict, lm_path=lm_path, filter_suggestions=True, default_max_count=2, debug=True)