-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Localization Proposal
The main goal is to make it possible to present MathJax's user interface elements in languages other than English. This includes things like the MathJax menu, the About MathJax dialog, the loading messages, and the various error messages produced by the input jax. This document describes a proposal for the underlying code and data structures for implementing this in MathJax.
The code must be able to handle the following:
- expressions with substitution values (e.g., "file xxx not found")
- plural forms (e.g. "loaded xx file" versus "loaded xx files")
- multiple forms for a word (e.g., "Post" as a verb versus "Post" as a noun)
- HTML-snippets as defined in MathJax (since many dialogs are constructed from these)
- fallback to English when translations are not available
- translations for dynamically loaded components
- components that may not all come from the same location
- third-party translations
The mechanism for specifying the selected language has yet to be determined, but the page author should be able to give a default language, and users should be able to override that if they choose.
A new Localization
object will be added to the MathJax
variable to handle localization functions. This will include the data needed for the translations into the selected language, the methods to be called for obtaining those translations, and the methods needed for loading and registering translations.
Currently all messages used in MathJax are in English, and the text of these messages usually are hard-coded as literal strings at the locations the messages are used. (Some messages are constructed on the fly from smaller pieces. These messages may need to be handled differently to allow for easier translation.) This is convenient since it is easy to see what message will be produced at any particular point, but in order to allow MathJax to be localized, these strings will need to be replaced by function calls that obtain the translation appropriate for the selected language.
One approach would be to use these message strings as the keys for looking up the translations, but this would make it harder to modify the English messages if rewording were required, or if spelling errors were found. Instead, each message will have an ID string that will be used to identify the phrase so that the English can be changed without requiring all the translation files to be modified to reflect the change. This also has the advantage the the same word or phrase, when used in different ways, can have different identifiers, so "Post" as a verb and "Post" as a noun can be translated differently, if necessary.
The basic means of obtaining the string to use for a message to display to the user is to call the _()
method of the MathJax.Localization
object, passing the string id and the English phrase. For example,
MathJax.Message.Set("Typesetting complete");
could be replaced by
MathJax.Message.Set(_("TC","Typsetting complete"));
where "TC"
is the identifier for the message "Typesetting complete"
, and provided you have defined
var _ = function () {MathJax.Localization._.apply(MathJax.Localization,arguments)}
earlier. (Since most of MathJax is defined within a function closure, making such function shortcuts is straight-forward.)
The advantage of having both the identifier and the English string together is that
- You still can see the actual English message at the location in the code where it is used.
- The English version is available to use as a fallback if the phrase has not been translated into the selected language.
- The English translation doesn't need to be loaded separately (i.e., you don't need to load two language files, the selected one, plus English for fallback, and English users won't need to download any language files at all).
Using short identifiers can lead to collisions if not handled carefully. To help avoid this, we introduce identifier domains that are used to isolate collections of identifiers for one component of MathJax from those for another component. For example, each input jax could have its own domain, as could each extension. This means you only have to worry about collisions within your own domain, and so can more easily manage the uniqueness id's in use.
To use a domain with your id, pass _()
an array consisting of the domain and the id in place of the id. For example, the TeX input jax could use
TEX.Error(_(["TeX","mb"],"Missing Close Brace"));
to get the message with id "mb"
in the domain "TeX"
. Note that the local definition for _()
within the TeX input jax could be
var _ = function (id) {MathJax.Localization._.apply(MathJax.Localization,[ ["TeX",id] ].concat([].slice.call(arguments,1)));
in which case the message above could become
TEX.Error(_("mb","Missing Close Brace"));
This lets you avoid having to repeat the domain within every call to _()
in the input jax. (It would also be possible for TEX.Error()
to call _()
for you, but see below for information about obtaining the translation data.)
The default domain is "_"
.
Many messages need to include words that are not available until run time (like file names, or a token that is causing an error, etc.). To include such values in a message, pass the values to _()
following the main message string, and use %1
, %2
, etc., within the message to indicate where to put the additional strings. For example
MathJax.Message.Set(_("fnf","File %1 not found"));
or
TEX.Error(_("'%1' seen where '%2' was expected",token,delimiter));
Note that the extra arguments can be used in any order (in particular, a translation may put them in a different order), so
TEX.Error(_("'%2' was expected where '%1' was seen",token,delimiter));
would also be valid.
Although it would be rare to need more than 9 additional parameters, you can use %10
, %11
, etc., to get the 10-th, 11-th, and so on. If you need a parameter to be followed directly by a number, use %{1}0
rather than %10
.
A %
followed by a non-number (and not matching %\{\d+\}
as a regular expression) generates just the character following the percent, so %%
is a literal %
, and %:
would generate just :
.
If a message must be represented differently depending on a particular numeric value (say to distinguish between "1 file loaded" and "2 files loaded"), replace the message by an array consisting of the numeric value followed by the strings to use when that value is 1, 2, 3, etc., where the last string is used if the numeric value is outside the number of entries given. For example,
MathJax.Message.Set(_("fl",[n,"%1 file loaded","%1 files loaded"],n));
would select "%1 file loaded"
when n
is 1, and "%1 files loaded"
for any other value of n
. Then that string is used as the message, with the value of n
inserted for %1
(since n
is passed as the third parameter to _()
).
If you need a different value for 0, for example, you could use something like
MathJax.Message.Set(_("fl",[n+1,"No files loaded","%1 file loaded","%1 files loaded"],n));
to select the string based on n+1
rather than n
.
A number of the dialogs used in MathJax are defined using HTML snippets, which allow you to encode an HTML DOM fragment using JavaScript objects. These can include things like bold and italic indicators, as well as other styling or layout. While it is possible to break these into pieces to pass to _()
separately, it may be better to allow the translator to translate the complete snippet, so that styling and layout can be properly adjusted for the target language. Thus _()
allows a complete HTML snippet in place of the message string (and will return an HTML snippet rather than a string literal). E.g.,
MathJax.HTML.Element("span",{},_("dtn",["Do this",["b",null,["now!"]]]));
would get the translation for the snippet (that is effectively Do this <b>now!</b>
) and put it in a <span>
.
If the snippet depends on a numeric value for its plural form, then you can use an array that consists of a number followed by the various HTML snippets; the snippet corresponding to the given numeric value will be selected (just as it was for strings above). E.g.,
MathJax.HTML.Element("span,null,_("fl",[n,
["%1 ",["b",null,"file"]," loaded"],
["%1 ",["b",null,"files"], loaded"]
],n));
would return a DOM element representing <span>1 <b>file</b> loaded</span>
if n
is 1, but <span>3 <b>files</b> loaded</span>
if n
is 3.
Note that parameter substitution is performed on the strings of the snippet that will become text in the DOM fragment that is generated from the snippet.
Some words or phrases may be used in more than one way, and these may require different translations. For example, "Post" may be used as a verb as a button label, while "Post" as a noun could refer to a blog post. These may need to be translated into different words or phrases in another language. Since a translator will be presented with the same word ("Post") in both cases, you may need to give the translator more help in determining how the word will be used. You do this by providing an extra argument following the message string (or array) that indicates the extra data to be shown to the translator. For example
_("pn","Post",{form:"noun"})
or
_("pv","Post",{form:"verb"})
Note that the id is different for these two, so there will be two values for the translator; the form
tells the translator how the word is used. The value for form
can be anything that will help the translator figure out how best to translate the word, e.g.,
_("pcol","Post",{form:"column name"})
In fact, you can supply as much meta-data between the braces as you would like. [I'm not sure yet how this will be used, other than form
, but it gives flexibility for the future.]
The MathJax.Localization
object holds the data for the various translations, as well as the service routines for adding to the translations, and retrieving translations.
The methods in MathJax.Localization
include:
- _(id,message[,form][,arguments])
- The function described in detail above that returns the translated string for a given id.
- setLocale(locale)
- Sets the selected locale to the given one, e.g.
MathJax.Localization.setLocale("fr");
- addTranslation(local,domain,def)
- Defines (or adds to) the translation data for the given
locale
anddomain
. Thedef
is the definition to be merged with the current translation data (if it exists) or to be used as the complete definition (if not). The data format is described below. - fontFamily()
- Get the font-family needed to display text in the selected language. Returns
null
if no special font is required.
- locale
- The currently selected locale, e.g.,
"fr"
. This is set by thesetLocale()
method, and should not be modified by hand. - directory
- The URL for the localization data files. This can be overridden for individual languages or domains (see below). The default is
[MathJax]/localization
. - strings
- This is the main data structure that holds the translation strings. It consists of an entry for each language that MathJax knows about, e.g., there would be an entry with key `fr` whose value is the data for the Frenchtranslation. Initially, these simply reference the files that define the translation data, which MathJax will load when needed. After the file is loaded, they will contain the translation data as well. This is described in more detail below.
Each language has its own data in the MathJax.Localization.strings
structure. This structure holds data about the translation, plus the translated strings for each domain.
A typical example might be
fr: {
version: "1.0",
directory: "[MathJax]/localization/fr", // optional
file: "fr.js", // optional
isLoaded: true, // set when loaded
font: "...", // optional
meta: {
translator: "...", // other metadata could be added
},
domains: {
hub: {
version: "1.0",
file: "http://somecompany.com/MathJax/localization/fr/hub.js", // optional
isLoaded: true,
strings: {
fnf: "File '%1' not found",
fl: ["%1 file loaded","%1 files loaded"],
...
}
},
TeX: {
...
},
"_": {
...
},
...
}
The fields have the following meanings:
- version
- The version of the translation data.
- directory
- An optional value that can be used to override the directory where the translation files for this language are stored. The default is to add the locale identifier to the end of `MathJax.Localization.directory`, so the value given in the example above is the default value, and could be omitted.
- file
- The name of the file containing the translation data for this language. The default is the locale identifier with
.js
appended, so the value given in the example above is the default value, and could be omitted. - isLoaded
- This is set to true when MathJax has loaded the data for this language. Typically, when a language is registered with MathJax, the data file isn't loaded at that point. It will be loaded when it is first needed, and when that happens, this value is set.
- font
- This is a font-family (or list of font-families) that should be used when text in this language is displayed. If not present, then no special font is needed.
- meta
- This is an object that contains the meta-data about the translation. Such information can include the name of the translator, the date of the translation, etc.
- domains
- This is an object that contains the translation strings for this language, grouped by domain. Each domain has an entry, and its value is an object that contains the translation strings for that domain. The format is described in more detail below.
Each domain for which there are translations has an entry in the locale's domains
object. These store the following information:
- version
- The version of the data for this domain
- file
- If the domain data is stored in a separate file from the rest of the language's data (e.g., a third-party extension that is not stored on the CDN may have translation data that is provied by the thrid-party), this property tells where to obtain the translation data. In the example above, the data is provided by another company via a complete URL. The default value is the locale's directory with the domain name appended and
.js
appended to that. - isLoaded
- This is set to
true
when the data file has been loaded. - strings
- This is an object that contains that actual translated strings. The keys are the message identifiers described in the section on "Getting a Translated String" above, and the values are the translations, or arrays of translations (see the sections on "Plural Forms" above), or translated HTML snippets (see the section on "HTML Snippets" above).
Typically, for languages stored on the CDN, MathJax will register the language with a call like
MathJax.Localization.addTranslation("fr",null,{});
which will create an fr
entry in the localization data that will be tied to the [MathJax]/localization/fr
directory, and the [MathJax]/localization/fr/fr.js
file. That directory could contain individual files for the various domains, or the fr.js
file could contain combined data that includes the most common domains, leaving only the lesser-used domains in separate files.
An example fr.js
file could be
MathJax.Localization.addTranslation("fr",null,{
version: "1.0",
meta: {
translator: "Joe Green"
},
domains: {
"_": {},
TeX: {},
Menu: {}
}
});
This would declare that there are translation files for the _
, TeX
, and Menu
domains, and that these will be loaded individually from their default file names in the default directory of [MathJax]/localization/fr
. Other domains will not be translated unless they register themselves via a command like
MathJax.Localization.addTranslation("fr","Zoom",{});
in which case the domain's data file will be loaded automatically when needed.
One could preload translation strings by including them in the fr.js
file:
MathJax.Localization.addTranslation("fr",null,{
version: "1.0",
meta: {
translator: "Joe Green"
},
domains: {
"_": {
isLoaded: true,
strings: {
'fnf': "Fichier `%1` non trouvé",
...
}
},
TeX: {
isLoaded: true,
strings: {
'mcb': "Accolade de fermeture manquante",
...
}
},
Menu: {}
}
});
Here the _
and TeX
strings are preloaded, while the Menu
strings will be loaded on demand.
A third party extension could include
MathJax.Localization.addTranslation("fr","myExtension",{
file: "http://myserver.com/MathJax/localization/myExtension/fr.js"
});
to add french translations for the myExtension
domain (used by the extension) so that they would be obtained from the third-party server when needed.
A third party could provide a translation for a language not covered by the MathJax CDN by using
MathJax.Localization.addTranslation("kr",null,{
directory: "http://mycompany.com/MathJax/localization/kr"
});
and providing a kr.js
file in their MathJax/localization/kr
directory that defines the details of their translation. If the Korean (kr) locale is selected, MathJax will load http://mycompany.com/MathJax/localization/kr/kr.js
and any other domain files when they are needed.
In order to make working MathJax data convenient for translators, we will need to provide the translation strings in one of the standard formats, like .po
for example. The usual approach is to have a program that scans the code for the _()
calls and builds the data file from that, and that should work with MathJax as well. The .po
format supports the domain approach, as well as plural forms, and the idea of multiple forms (e.g., verb versus noun). HTML snippets should be translated into HTML strings, so that
["Do it ",["b",null,["now!"]]]
would become
"Do it <b>now!</b>"
for translation. The translator would produce an HTML version of the phrase with tags in the proper place (which will be translated back into an HTML snippet for use in MathJax at a later point).
There are two complications to automating the collection of the strings needing translation. The first is that the use of local definitions for _()
hide the use of the domain, so there will need to be special processing to obtain the domains. It may be possible to recognize the local definition (if we use a common syntax for that) so that the domain can be handled automatically. Alternatively, one could use special comments to mark the domain regions so that the collection program will be able to handle the domains properly. The latter is probably more reliable, but takes extra steps to be sure to include the comments.
The second issue is if _()
is called from within a routine that has the message strings passed to it (so that the message passed to _()
is not a string literal). This would be the case, for example, if TEX.Error()
was made to call _()
for you. Such shorthands are very convenient (and reduce the code size), so it would be good to be able to accommodate this case as well. One approach would be to use comments again to tell the collector program what other functions to treat like _()
.
The collector will be run over the various MathJax components that have message strings, and produce individual .po
for each component. These can be combined to make one large .po
for translators, or translators could handle them individually. Certainly in the case of third-party extensions, their files will be translated separately.
Once the translations are obtained (as new .po
files), we need a second program to turn these into the .js
files that MathJax needs, in the formats described above. We may want to have control files that tell which domains to combine in the main language file, and which to make as individual domain files.