Releases: huggingface/tokenizers
Releases · huggingface/tokenizers
Node v0.8.3
node-v0.8.3 Adding rust release CI.
Python v0.11.5
[#895] Add wheel support for Python 3.10
Rust v0.11.1
Python v0.11.3
Node v0.8.2
[#884] Fixing bad deserialization following inclusion of a default for Punctuation
Node v0.8.1
Fixing various backward compatibility bugs (Old serialized files couldn't be deserialized anymore.
Python v0.11.4
[#884] Fixing bad deserialization following inclusion of a default for Punctuation
Python v0.11.2
Fixes #868
Python v0.11.1
[#860] Adding TruncationSide
to TruncationParams
.
Python v0.11.0
Fixed
- [#585] Conda version should now work on old CentOS
- [#844] Fixing interaction between
is_pretokenized
andtrim_offsets
. - [#851] Doc links
Added
- [#657]: Add SplitDelimiterBehavior customization to Punctuation constructor
- [#845]: Documentation for
Decoders
.
Changed
- [#850]: Added a feature gate to enable disabling
http
features - [#718]: Fix
WordLevel
tokenizer determinism during training - [#762]: Add a way to specify the unknown token in
SentencePieceUnigramTokenizer
- [#770]: Improved documentation for
UnigramTrainer
- [#780]: Add
Tokenizer.from_pretrained
to load tokenizers from the Hugging Face Hub - [#793]: Saving a pretty JSON file by default when saving a tokenizer