A Chinese term classifier based on web search results
Files in this repository includes
- Scripts to retrieve features of terms from search engine, cleansing the raw feature lists and sampling the data
- Dictionary used for Chinese text segmentation
- Sample data sets include:
- Input term sets named in
drugList-
; - Raw term-feature matrix generated from different search engines and term set named in
drugFeature-
; - The exact testing and training sets used for this study named in
-TestTrain
.
- Input term sets named in
You will need 7-Zip (http://www.7-zip.org/) to decompress the files.