-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Localization Proposal
The main goal is to make it possible to present MathJax's user interface elements in languages other than English. This includes things like the MathJax menu, the About MathJax dialog, the loading messages, and the various error messages produced by the input jax. This document describes a proposal for the underlying code and data structures for implementing this in MathJax.
The code must be able to handle the following:
- expressions with substitution values (e.g., "file xxx not found")
- plural forms (e.g. "loaded xx file" versus "loaded xx files")
- multiple forms for a word (e.g., "Post" as a verb versus "Post" as a noun)
- HTML-snippets as defined in MathJax (since many dialogs are constructed from these)
- fallback to English when translations are not available
- translations for dynamically loaded components
- components that may not all come from the same location
- third-party translations
The mechanism for specifying the selected language has yet to be determined, but the page author should be able to give a default language, and users should be able to override that if they choose.
A new Localization
object will be added to the MathJax
variable to handle localization functions. This will include the data needed for the translations into the selected language, the methods to be called for obtaining those translations, and the methods needed for loading and registering translations.
Currently all messages used in MathJax are in English, and the text of these messages usually are hard-coded as literal strings at the locations the messages are used. (Some messages are constructed on the fly from smaller pieces. These messages may need to be handled differently to allow for easier translation.) This is convenient since it is easy to see what message will be produced at any particular point, but in order to allow MathJax to be localized, these strings will need to be replaced by function calls that obtain the translation appropriate for the selected language.
One approach would be to use these message strings as the keys for looking up the translations, but this would make it harder to modify the English messages if rewording were required, or if spelling errors were found. Instead, each message will have an ID string that will be used to identify the phrase so that the English can be changed without requiring all the translation files to be modified to reflect the change. This also has the advantage the the same word or phrase, when used in different ways, can have different identifiers, so "Post" as a verb and "Post" as a noun can be translated differently, if necessary.
The basic means of obtaining the string to use for a message to display to the user is to call the _()
method of the MathJax.Localization
object, passing the string id and the English phrase. For example,
MathJax.Message.Set("Typesetting complete");
could be replaced by
MathJax.Message.Set(_("TC","Typsetting complete"));
where "TC"
is the identifier for the message "Typesetting complete"
, and provided you have defined
var _ = function () {MathJax.Localization._.apply(MathJax.Localization,arguments)}
earlier. (Since most of MathJax is defined within a function closure, making such function shortcuts is straight-forward.)
The advantage of having both the identifier and the English string together is that
- You still can see the actual English message at the location in the code where it is used.
- The English version is available to use as a fallback if the phrase has not been translated into the selected language.
- The English translation doesn't need to be loaded separately (i.e., you don't need to load two language files, the selected one, plus English for fallback, and English users won't need to download any language files at all).
Using short identifiers can lead to collisions if not handled carefully. To help avoid this, we introduce identifier domains that are used to isolate collections of identifiers for one component of MathJax from those for another component. For example, each input jax could have its own domain, as could each extension. This means you only have to worry about collisions within your own domain, and so can more easily manage the uniqueness id's in use.
To use a domain with your id, pass _()
an array consisting of the domain and the id in place of the id. For example, the TeX input jax could use
TEX.Error(_(["TeX","mb"],"Missing Close Brace"));
to get the message with id "mb"
in the domain "TeX"
. Note that the local definition for _()
within the TeX input jax could be
var _ = function (id) {MathJax.Localization._.apply(MathJax.Localization,[ ["TeX",id] ].concat([].slice.call(arguments,1)));
in which case the message above could become
TEX.Error(_("mb","Missing Close Brace"));
This lets you avoid having to repeat the domain within every call to _()
in the input jax. (It would also be possible for TEX.Error()
to call _()
for you, but see below for information about obtaining the translation data.)
The default domain is "*"
.
Many messages need to include words that are not available until run time (like file names, or a token that is causing an error, etc.). To include such values in a message, pass the values to _()
following the main message string, and use %1
, %2
, etc., within the message to indicate where to put the additional strings. For example
MathJax.Message.Set(_("fnf","File %1 not found"));
or
TEX.Error(_("'%1' seen where '%2' was expected",token,delimiter));
Note that the extra arguments can be used in any order (in particular, a translation may put them in a different order), so
TEX.Error(_("'%2' was expected where '%1' was seen",token,delimiter));
would also be valid.
Although it would be rare to need more than 9 additional parameters, you can use %10
, %11
, etc., to get the 10-th, 11-th, and so on. If you need a parameter to be followed directly by a number, use %{1}0
rather than %10
.
A %
followed by a non-number (and not matching %\{\d+\}
as a regular expression) generates just the character following the percent, so %%
is a literal %
, and %:
would generate just :
.
If a message must be represented differently depending on a particular numeric value (say to distinguish between "1 file loaded" and "2 files loaded"), replace the message by an array consisting of the numeric value followed by the strings to use when that value is 1, 2, 3, etc., where the last string is used if the numeric value is outside the number of entries given. For example,
MathJax.Message.Set(_("fl",[n,"%1 file loaded","%1 files loaded"],n));
would select "%1 file loaded"
when n
is 1, and "%1 files loaded"
for any other value of n
. Then that string is used as the message, with the value of n
inserted for %1
(since n
is passed as the third parameter to _()
).
If you need a different value for 0, for example, you could use something like
MathJax.Message.Set(_("fl",[n+1,"No files loaded","%1 file loaded","%1 files loaded"],n));
to select the string based on n+1
rather than n
.
A number of the dialogs used in MathJax are defined using HTML snippets, which allow you to encode an HTML DOM fragment using JavaScript objects. These can include things like bold and italic indicators, as well as other styling or layout. While it is possible to break these into pieces to pass to _()
separately, it may be better to allow the translator to translate the complete snippet, so that styling and layout can be properly adjusted for the target language. Thus _()
allows a complete HTML snippet in place of the message string (and will return an HTML snippet rather than a string literal). E.g.,
MathJax.HTML.Element("span",{},_("dtn",["Do this",["b",null,["now!"]]]));
would get the translation for the snippet (that is effectively Do this <b>now!</b>
) and put it in a <span>
.
If the snippet depends on a numeric value for its plural form, then you can use an array that consists of a number followed by the various HTML snippets; the snippet corresponding to the given numeric value will be selected (just as it was for strings above). E.g.,
MathJax.HTML.Element("span,null,_("fl",[n,
["%1 ",["b",null,"file"]," loaded"],
["%1 ",["b",null,"files"], loaded"]
],n));
would return a DOM element representing <span>1 <b>file</b> loaded</span>
if n
is 1, but <span>3 <b>files</b> loaded</span>
if n
is 3.
Note that parameter substitution is performed on the strings of the snippet that will become text in the DOM fragment that is generated from the snippet.
Some words or phrases may be used in more than one way, and these may require different translations. For example, "Post" may be used as a verb as a button label, while "Post" as a noun could refer to a blog post. These may need to be translated into different words or phrases in another language. Since a translator will be presented with the same word ("Post") in both cases, you may need to give the translator more help in determining how the word will be used. You do this by providing an extra argument following the message string (or array) that indicates the extra data to be shown to the translator. For example
_("pn","Post",{form:"noun"})
or
_("pv","Post",{form:"verb"})
Note that the id is different for these two, so there will be two values for the translator; the form
tells the translator how the word is used. The value for form
can be anything that will help the translator figure out how best to translate the word, e.g.,
_("pcol","Post",{form:"column name"})
In fact, you can supply as much meta-data between the braces as you would like. [I'm not sure yet how this will be used, other than form
, but it gives flexibility for the future.]
The MathJax.Localization
object holds the data for the various translations, as well as the service routines for adding to the translations, and retrieving translations.
The methods in MathJax.Localization
include:
- _(id,message[,form][,arguments])
- The function described in detail above that returns the translated string for a given id.
- setLocale(locale)
- Sets the selected locale to the given one, e.g.
MathJax.Localization.setLocale("fr");
- addTranslation(local,domain,def)
- Defines (or adds to) the translation data for the given
locale
anddomain
. Thedef
is the definition to be merged with the current translation data (if it exists) or to be used as the complete definition (if not). The data format is described below.
- locale
- The currently selected locale, e.g.,
"fr"
. This is set by thesetLocale()
method, and should not be modified by hand. - directory
- The URL for the localization data files. This can be overridden for individual languages or domains (see below). The default is
[MathJax]/localization
. - strings
- This is the main data structure that holds the translation strings. It consists of an entry for each language that MathJax knows about, e.g., there would be an entry with key `fr` whose value is the data for the Frenchtranslation. Initially, these simply reference the files that define the translation data, which MathJax will load when needed. After the file is loaded, they will contain the translation data as well. This is described in more detail below.
Each language has its own data in the MathJax.Localization.strings
structure. This structure holds data about the translation, plus the translated string for each domain.
A typical example might be
fr: {
version: "1.0",
directory: "[MathJax]/localization/fr", // optional
file: "fr.js", // optional
isLoaded: true, // set when loaded
font: "...", // optional
meta: {
translator: "...", // other metadata could be added
},
domains: {
hub: {
version: "1.0",
file: "http://somecompany.com/MathJax/localization/fr/hub.js", // optional
strings: {
fnf: "File '%1' not found",
fl: ["%1 file loaded","%1 files loaded"],
...
}
},
TeX: {
...
},
"*": {
...
},
...
}