-
Notifications
You must be signed in to change notification settings - Fork 613
Accents, DIN 91379, non Latin scripts
To process text containing letters composed of multiple Unicode glyphs e.g. letters with accents, it is necessary to compute the correct positioning of the glyphs and code this positions into the resulting PDF file.
OpenPDF can process such texts starting with release 1.3.24.
Internally OpenPDF uses Java2D builtin routines for glyph layout, reordering and substitution. For Java 9 and newer these routines rely on the HarfBuzz shaping library.
We tested this approach with letters conforming to "DIN 91379: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" (and the predecessor DIN SPEC 91379) which describes a subset of Unicode consisting mainly of Latin letters and diacritic signs. This standard will be mandatory for the data exchange of the German administration with citizens and businesses from Nov. 2024.
The processing of text in other languages and scripts using this approach should be possible, you are invited to try it and share the results.
import com.lowagie.text.pdf.LayoutProcessor;
...
LayoutProcessor.enableKernLiga(); // since 1.3.31
For versions before 1.3.31 kerning and ligatures are not supported.
LayoutProcessor.enable(); // before 1.3.31
Provide an OpenType font containing the necessary characters and positioning information, e.g. a font from the Google Noto fonts. If no OpenType font is provided, LayoutProcessor will do nothing.
float fontSize = 12.0f;
String fontDir = "com/lowagie/examples/fonts/";
FontFactory.register(fontDir+"noto/NotoSans-Regular.ttf", "notoSans");
Font notoSans = FontFactory.getFont("notoSans", BaseFont.IDENTITY_H, true, fontSize);
Java's Bidi
-class is used to deduce the text direction for each chunk of text.
Optionally you can specify the text direction per font.
Font notoSansArabic = getFont(fontDir+"noto/NotoSansArabic-Regular.ttf", "notoSansArabic", fontSize);
LayoutProcessor.setRunDirectionRtl(notoSansArabic);
Process the document or form as usual.
GlyphLayoutDocumentDin91379.java
GlyphLayoutDocumentBidiPerFont.java
GlyphLayoutDocumentKernLigaPerFont.java
- DIN 91379 (English Wikipedia)
- DIN 91379 (German Wikipedia)
- DIN 91379 Characters and Sequences (GitHub)
- String.Latin+ 1.2 (extended and commented version of DIN SPEC 91379 in German, free download)
- DIN SPEC 91379: Characters in Unicode for the electronic processing of names and data exchange in Europe (free download after registration)
- DIN 91379:2022-08: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM (access chargeable)
- Decision of IT Planungsrat 2022/51 (in German)
- Noto Latin, Greek, Cyrillic fonts, see Google, GitHub
- HarfBuzz text shaping library