-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](analysis) add new chinese tokenizer IK #269
base: clucene
Are you sure you want to change the base?
Conversation
|
||
CL_NS_DEF(analysis) | ||
|
||
enum class AnalyzerMode { | ||
Default, | ||
All, | ||
Search | ||
Search, | ||
IK_Smart, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest separating the IK and Jieba enums for better clarity.
if (store_size_ >= children_array_.size()) { | ||
children_array_.resize(store_size_ + 1); | ||
} | ||
// 插入并保持有序 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest using English comments.
f538355
to
e45e3f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e45e3f3
to
35881b0
Compare
Migrate analysis-ik from Java to C++, implement basic tokenization functionality, and integrate it into CLucene.
35881b0
to
eecf99b
Compare
Support IK tokenizer for inverted index:
Migrate analysis-ik from Java to C++, Implement basic tokenization functionality.
The major differences from the original Java code are as follows:
Major changes to the original code:
/src/test/data/contribs-lib/analysis/chinese/speed-test-text.txt
(红楼梦) for testing.Add IK tokenizer configuration, initialization entry, and dictionary loading logic.
Add the IK tokenization mode entry (temporary mode entry) in
AnalyzerMode
.