- Backward compatibility with Rust 1.21.0 and earlier.
- Downstream dependency:
num-traits
0.1 → 0.2
This release brings extended hyphenation, a substantial API reform, and a more flexible approach to dictionary loading.
-
API reform.
The
hyphenation
API is now based on concrete dictionary types (“hyphenators”), and includes a richer set of methods and capabilities. Refer to the documentation for more. -
Full-text hyphenation has been removed. To hyphenate words within text runs, see the notes on segmentation.
-
Hyphenation patterns are not normalized by default. Normalization is now behind feature flags.
-
Hyphenation dictionaries are bundled with the artifact, and need only be rebuilt if normalization is required.
-
Hyphenation dictionaries are not embedded by default; to enable embedding, use the
embed_all
feature.
- Extended (“non-standard”) hyphenation, available for Hungarian and Catalan. Extended hyphenation can improve orthography by altering or inserting characters around word breaks.
- Supported languages now include:
- Belarusian (be)
- Ecclesiastical Latin (la-x-liturgic)
- Pāli (pi)
-
Special casing
Words containing characters that change byte size when lowercased (the Turkish “İ”) are now hyphenated correctly.
- Hyphenation dictionaries are ~40% smaller.
- Hyphenation is ~20% faster.
serde
→ 1.0bincode
→ 1.0
- Slightly reduced embedded pattern size.
- Unicode normalization can be configured via build flags.
- Patterns are now serialized by
bincode
at build time. - Hyphenation is faster by:
- 10–20%, generally;
- 40–60%, for lowercase ASCII words (in a mostly ASCII language).
- Embedded pattern size increased by ~25%.
- The
Standard
hyphenating iterator implementssize_hint()
andExactSizeIterator
.
- Reverted from previous change in 0.3.0: the
Hyphenation
trait only performs word hyphenation. Full text hyphenation is now part of theFullTextHyphenation
trait. - Language support: added Church Slavonic.
-
Updated patterns to hyph-utf8.json v0.1.0.0.
Previously, patterns containing multibyte letters with combining marks could be mapped to mismatching scores of incorrect length, due to normalization issues. The following languages were affected:
- Assamese
- Bengali
- Panjabi
- Sanskrit
- The pattern repository is now bundled with the crate, and no longer requires manual initialization.
- Hyphenation of lowercase words is ~10% faster.
Patterns
andExceptions
now return preexisting values oninsert()
, analogously to HashMap.
- Language loading time is halved.
- Word hyphenation is ~15% faster.
- The
Hyphenation
trait accepts full text as well as individual words. Standard
hyphenators expose apunctuate_with()
method.
- Loading a language is ~30% faster.
- Hyphenation patterns are stored as JSON.