A simple topic detector.
TrTopicter is a pre-built package equipped with a machine learning model designed to detect the topics of Turkish textual content. The language of the text is first identified to avoid analyzing non-Turkish text, which could lead to inaccurate results. The deployed model has been trained on nearly 30,000 annotated Turkish sentences/paragraphs, achieving an average F-1 score of 94.37%. The execution time for analyzing text with over 300 characters is less than 1 ms, and resource usage is only 6 MB.
You can easily install TrTopicter via PyPI. It has been tested on Windows 8/10, Ubuntu 18.04/20.04, and macOS Catalina 10.15.7.
pip install trtopicter
- politics
- economy
- health
- sport
- technology
- culture
- religion
- justice
Text preprocessing includes:
- Case-folding to lowercase
- Punctuation, numbers, and white space removal
- Stop words removal (Credits: Zemberek-NLP)
{
"LANGUAGE_IDENTIFICATION": {
"limit": {
"character": 500
},
"probability_threshold": 0.2
},
"DOMAIN_DETECTION": {
"limit": {
"character": 500
},
"probability_threshold": 0.5
}
}
character
: Number of characters threshold for detection (data type: integer).probability_threshold
: Probability threshold for detection (data type: float).
You can use the TrTopicter package to determine the topic of Turkish text easily:
from trtopicter import TrTopicter
topicter = TrTopicter()
result = topicter.get_topic("Your Turkish text goes here.")
print(result)
Our to-do list includes:
- Expanding the number of supported topics
- Adding Cython support
Explore more about natural language processing and related topics: