-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Localization for as many languages as possible #1474
base: master
Are you sure you want to change the base?
Conversation
Yes, fair criticism. There is neither linguistic or typographical justification for that difference.
I'm of two minds about including keys we don't yet use, but I'll stew on that while I work on other stuff going in this release.
Fantastic work, I meant to look into pulling together sources myself...
I think this is correct. If it's not we actually have bigger issues with some of our hyphenation patterns which I think were lifted from TeX.
I think thats right, as fara as I know
I may object to this. The whole point of this string was having something in each language that is not an actual localization used by in SILE output for testing purposes to make sure we can test that the right localization is being used. Having a substitution is important for this. Don't worry about fixing it, I'll look into it.
I think I want to scope FTL terms by package, so I'll look into fixing this and perhaps others that didn't get scoped by package. |
Yup, possibly. That's what I actually tried for chapters etc. (with the parametrized
I am not even hiding where I am heading to for a revised/augmented book class 🤣 ... Actually the month names are of the things I didn't import from Babel.... Though this could be useful for BibTeX (but I only worked on that package after...). Yet, there are other complexities there anyway (e.g. issue numbers such as "no. 5" is not a translation issue only and would need a command hook, e.g. "n° 5" in French (with a superscript o) and similarly in some other languages.) As for the "Hello World", we can use any of the useful strings instead of it, or instantiate it on need for demonstration purpose. It avoids having to guess it for 70+ languages... It's also very idiomatic and inherently difficult - e.g. for French I wouldn't be that at ease to find a translation... ("Bonjour Monde" is agrammatical, an article would be needed, but for languages with genders, that's quickly messy and the pattern cannot be general any longer). |
On the contrary, keying off of the potential use is exactly what we want. In fact all the keys should be tightly scoped to their intended use. This is where is where Fluent stands head and shoulders above other localization systems: it is not just a key/value store that leaves the intended use of strings up to the programmer to sort out, it enables the translator to know enough context about the actual usage to provide a natural and accurate translation without the programmer having to understand the complexities of target languages. Assembling translations out of smaller building blocks is possible and in some cases a good idea, but again the way Fluent handles this is putting the translator in charge. The smaller building block translations should be private to the FTL file and not exposed to the programmer (in our case SILE & documents). Using the ToC as our case in point, at the moment the only context SILE uses this is is the header, but lets say it also spat out a CLI message saying what it was doing. You could use a term like this: -tableofcontents = Table of Contents
tableofcontents-header = { -tableofcontents }
cli-generating-toc = Now generating { -tableofcontents } This is somewhat contrived and clumsy, but the point is to demonstrate how a term would be used. SILE would not be able to access the term, only the public messages. Each language could use or not use terms as they saw fit as an implementation detail.
I kind of figured ;-)
Yes, we're going to have to think about how to handle SILE command hooks in messages. Should we just post process the message as SIL format input? no-number = N\super{o} { $num } I haven't thought through what the syntax & processing impacts of that would be but certainly that already hits a syntax conflict with braces that would not be easy to work around while staying compatible with other Fluent tooling and not being hideously ugly/unwieldy for translators. Assuming XML format on post processing might be easier, e.g.: no-number = N<super>o</super> { $num }
Yes, that's exactly why it's a useful demo term! I don't think we need to have it for every language, but it would make a good demonstration case for how French can implement it one way and a language with grammatical genders for names could implement it a different way. Having this tech demo string to play with in tests & docs be something other that a string actually used in outputs gives us the flexibility to play with it and update docs and demos without it ever being a breaking change for anyone. If we used some existing key for that we would be limited to how that key was actually used in practice. |
@Omikhleia Did you generate these with a script that I could perhaps use to re-generate them with some tweaks? I have a few bulk edits I want to make (like using Fluent terms and package namespaces for key) but it might be easier with access to the original script rather than writing one from scratch. |
Lovingly but manually crafted |
Roger that, and no problem. I just thought it was worth asking. By the way when I do jobs like that something I usually is is commit the automatic stuff first with the command I used to generate it in the commit message so I could redo that step later if needed, then followup commits with the manual bits. That has served me well sometimes when I want to come back later and tweak the automatic stuff and re-apply the manual tweaks on top of the new base. |
This reverts commit 17d3ce6 and bits of previous commits.
We *want* to fail with an error if localized strings are requested for non-languages, so providing generic English stand-ins seems like a counter-productive move. Additionally for anybody starting a new localization it would be better to start with a language similar to their target because implementation details might differ. With simple key/value lookups it might seem like any language could be a template for any other, but Fluent is much more flexible than that.
Just a heads up my current thinking on this is that I really don't want to include keys for things we don't currently use. I also want to stick to Fluent norms of only exposing the full contextual translations and making any partial translation such as the keywords used here private terms. I plan on keeping these commits around to cherry pick from as we add classes/packages/features that use these keys, but will cherry pick just the ones we use so far for the next release. |
Why not just make these keys private, and move on? |
Anyhow, are you going to fix, first of all, the fluent scope-leaking stuff? |
While lovingly hand crafted, they don't reflect the structure of how we're doing to use keys as a namespace and they are getting in the way of automatically reprocessing these files. Also we don't really want to include commented keys for untranslated terms because that is one more thing to maintain and get out of sync without necessarily providing a benefit. We want people to reference similar languages when translating. ```console sed -i -e '/^#/d' *.ftl sed -i -e '/^$/d' *.ftl ```
```console sed -i -e '/Rerun SILE/d' *.ftl git checkout -- i18n/en.ftl ```
In the future on an as-needed basis these could be converted to terms, but so far there is no use case and doing this should be a translators implementation choice not something SILE suggests. Prepend something to all files, then: ```vim silent normal! 0gg/^appendix df=xd$nf{v%p silent normal! 0gg/^part df=xd$nf{v%p silent normal! 0gg/^chapter df=xd$nf{v%p ``` Then touchoup eu, hr, lt
Refactoring the package system with modules attached to classes fixed most of the scope leak issues with Fluent. There may be more, but the ones I know of are fixed. Do note that the global fluent instance just uses whatever locale it was last set to, so it must be poked with the current document language every time it is used. I do expect to wrap this is some better abstractions as we sort out setting scope in general (see #1327 and others). I'm leaving this PR open because these is still lots of good stuff in here to be mined. For the next release I'm only including localization of strings we use internally, but I'm not ruling out preloading more — we just need to work out the key naming a bit. Just for the record I have a partially refactored version of the branch in this PR in my fork here that (with the commands noted in some of the commits here) make it easy to fetch strings from a |
This old PR never came to fruition as-is and now has conflicts. Due to inactivity, and as part of backlog cleaning, I would have been tempted to close it, but the topic was interesting. But we did address the important parts of it. On one hand:
However, it never happened in 2+ years and the conflicts and recent changes won't help... Moreover, on the other hand, anyone is free, as I did back then, to "mine" Babel and other existing resources and propose something to SILE in today's state of the art. So I'm adding the "pending closure" as a label - As a last chance for contributors to eventually say their word. |
I have some local notes on things I still hope to mine out of this, lets keep it around until I finish. The merge conflicts aren't worth fixing en masse but cherry picking out of here is still useful, but my doing so is pending some other language handling changes. |
I was a bit annoyed since my first attempts with SILE 0.10 that in English, with the standard book class, I would get "Chapter 1" whereas in French I would still get only "1". Now that we have that fluent thing with i18n files, I decided to give it a try. After all, seeing what was done for Norwegian and Esperanto, it ought to be a matter of "just" providing a few translation strings... Er... Wait, we just have a few lone languages supported? And how are expecting this to go on, language after language, and at the same time be sort of future proof? (What about "parts", "list of figures", "list of tables", etc. -- which some of us may already have in their classes)...
This PR therefore:
Attempts at providing i18n files for as many supported languages as possible
for languages I kind of know: Except in very rare cases, I eventually ended up following Babel.
(E.g. an exception to this is the ToC header, which resolved to "Contents" in Babel for
en
, rather than our "Table of Contents" - I kept the latter ; a contrary example corresponds mainly to cases where differences were actually a matter of taste, e.g. "table des figures" vs. "table des illustrations")Handle a few complex cases
el
expects to be, for now the localization file is a link toel-monoton
(corresponding to modern Greek)nb
andno
have the same localization, butnn
has its own... Our recent Norwegian friend only provided the translations for Bokmål, but Nynorsk is slightly different in written form... I followed Babel once again, but also checked a few differences on https://ordbokene.no to be sure)Get rid of the awkward
hello { $name }
patterns...en
andtr
as they are used as examples in the Manual, but honestly, I'd want them to go away.Get rid of
book-chapter-title-pre
and changebook:chapter:post
. These were mixing translation issue and (vertical) spacing, this was IMHO pretty bad and the translation part shall all be done in the fluent templates... Currently, the only language using a\medskip
instead of a\par
is Japanese... Honestly again, unless there's a real good reason, I'd expect this to go away too...Change the toc key. Anyway, the Manual ("c08-language") mentioned
toc-heading
which did not exist... It was actuallytoc-title
, I changed that totableofcontents
anyway, just because.One annoying thing for languages without available localization is that a
\tableofcontents
would yield nothing at all in the first output (no header, but also no message at first SILE run...). So even if the proposed pattern file is mostly commented out in these cases, some of the keys preferably have to be present to avoid that ugly case (i.e. using the English wording if nothing else is currently available). That's true too for some bibliography patterns (it's still better than nothing to a least get the name parameter...)Ah ah, now we can play...
I started investigating this, and I am finding some other issues... (E.g. that fluent thing seems to leaks beyond its expected scope, with possible code smell at one point -- I will go on looking before opening a dedicated report, but it's kind of orthogonal to this very PR).