Skip to content

Localization Proposal

dpvc edited this page Feb 6, 2013 · 11 revisions

Localization Proposal -- Draft

Goals

The main goal is to make it possible to present MathJax's user interface elements in languages other than English. This includes things like the MathJax menu, the About MathJax dialog, the loading messages, and the various error messages produced by the input jax. This document describes a proposal for the underlying code and data structures for implementing this in MathJax.

The code must be able to handle the following:

  • expressions with substitution values (e.g., "file xxx not found")
  • plural forms (e.g. "loaded xx file" versus "loaded xx files")
  • multiple forms for a word (e.g., "Post" as a verb versus "Post" as a noun)
  • HTML-snippets as defined in MathJax (since many dialogs are constructed from these)
  • fallback to English when translations are not available
  • translations for dynamically loaded components
  • components that may not all come from the same location
  • third-party translations

The mechanism for specifying the selected language has yet to be determined, but the page author should be able to give a default language, and users should be able to override that if they choose.

Overview

A new Localization object will be added to the MathJax variable to handle localization functions. This will include the data needed for the translations into the selected language, the methods to be called for obtaining those translations, and the methods needed for loading and registering translations.

Currently all messages used in MathJax are in English, and the text of these messages usually are hard-coded as literal strings at the locations the messages are used. (Some messages are constructed on the fly from smaller pieces. These messages may need to be handled differently to allow for easier translation.) This is convenient since it is easy to see what message will be produced at any particular point, but in order to allow MathJax to be localized, these strings will need to be replaced by function calls that obtain the translation appropriate for the selected language.

One approach would be to use these message strings as the keys for looking up the translations, but this would make it harder to modify the English messages if rewording were required, or if spelling errors were found. Instead, each message will have an ID string that will be used to identify the phrase so that the English can be changed without requiring all the translation files to be modified to reflect the change. This also has the advantage the the same word or phrase, when used in different ways, can have different identifiers, so "Post" as a verb and "Post" as a noun can be translated differently, if necessary.

Getting a Translated String

The basic means of obtaining the string to use for a message to display to the user is to call the _() method of the MathJax.Localization object, passing the string id and the English phrase. For example,

MathJax.Message.Set("Typesetting complete");

could be replaced by

MathJax.Message.Set(_("TC","Typsetting complete"));

where "TC" is the identifier for the message "Typesetting complete", and provided you have defined

var _ = function () {MathJax.Localization._.apply(MathJax.Localization,arguments)}

earlier. (Since most of MathJax is defined within a function closure, making such function shortcuts is straight-forward.)

The advantage of having both the identifier and the English string together is that

  1. You still can see the actual English message at the location in the code where it is used.
  2. The English version is available to use as a fallback if the phrase has not been translated into the selected language.
  3. The English translation doesn't need to be loaded separately (i.e., you don't need to load two language files, the selected one, plus English for fallback, and English users won't need to download any language files at all).

Id's and Domains

Using short identifiers can lead to collisions if not handled carefully. To help avoid this, we introduce identifier domains that are used to isolate collections of identifiers for one component of MathJax from those for another component. For example, each input jax could have its own domain, as could each extension. This means you only have to worry about collisions within your own domain, and so can more easily manage the uniqueness id's in use.

To use a domain with your id, pass _() an array consisting of the domain and the id in place of the id. For example, the TeX input jax could use

TEX.Error(_(["TeX","mb"],"Missing Close Brace"));

to get the message with id "mb" in the domain "TeX". Note that the local definition for _() within the TeX input jax could be

var _ = function (id) {MathJax.Localization._.apply(MathJax.Localization,[ ["TeX",id] ].concat([].slice.call(arguments,1)));

in which case the message above could become

TEX.Error(_("mb","Missing Close Brace"));

This lets you avoid having to repeat the domain within every call to _() in the input jax. (It would also be possible for TEX.Error() to call _() for you, but see below for information about obtaining the translation data.)

The default domain is "_".

Substitutions

Many messages need to include words that are not available until run time (like file names, or a token that is causing an error, etc.). To include such values in a message, pass the values to _() following the main message string, and use %1, %2, etc., within the message to indicate where to put the additional strings. For example

MathJax.Message.Set(_("fnf","File %1 not found"));

or

TEX.Error(_("'%1' seen where '%2' was expected",token,delimiter));

Note that the extra arguments can be used in any order (in particular, a translation may put them in a different order), so

TEX.Error(_("'%2' was expected where '%1' was seen",token,delimiter));

would also be valid.

Although it would be rare to need more than 9 additional parameters, you can use %10, %11, etc., to get the 10-th, 11-th, and so on. If you need a parameter to be followed directly by a number, use %{1}0 rather than %10.

A % followed by a non-number (and not matching %\{\d+\} as a regular expression) generates just the character following the percent, so %% is a literal %, and %: would generate just :.

Plural Forms

If a message must be represented differently depending on a particular numeric value (say to distinguish between "1 file loaded" and "2 files loaded"), replace the message by an array consisting of the numeric value followed by the strings to use when that value is 1, 2, 3, etc., where the last string is used if the numeric value is outside the number of entries given. For example,

MathJax.Message.Set(_("fl",[n,"%1 file loaded","%1 files loaded"],n));

would select "%1 file loaded" when n is 1, and "%1 files loaded" for any other value of n. Then that string is used as the message, with the value of n inserted for %1 (since n is passed as the third parameter to _()).

If you need a different value for 0, for example, you could use something like

MathJax.Message.Set(_("fl",[n+1,"No files loaded","%1 file loaded","%1 files loaded"],n));

to select the string based on n+1 rather than n.

HTML Snippets

A number of the dialogs used in MathJax are defined using HTML snippets, which allow you to encode an HTML DOM fragment using JavaScript objects. These can include things like bold and italic indicators, as well as other styling or layout. While it is possible to break these into pieces to pass to _() separately, it may be better to allow the translator to translate the complete snippet, so that styling and layout can be properly adjusted for the target language. Thus _() allows a complete HTML snippet in place of the message string (and will return an HTML snippet rather than a string literal). E.g.,

MathJax.HTML.Element("span",{},_("dtn",["Do this",["b",null,["now!"]]]));

would get the translation for the snippet (that is effectively Do this <b>now!</b>) and put it in a <span>.

If the snippet depends on a numeric value for its plural form, then you can use an array that consists of a number followed by the various HTML snippets; the snippet corresponding to the given numeric value will be selected (just as it was for strings above). E.g.,

MathJax.HTML.Element("span,null,_("fl",[n,
  ["%1 ",["b",null,"file"]," loaded"],
  ["%1 ",["b",null,"files"], loaded"]
],n));

would return a DOM element representing <span>1 <b>file</b> loaded</span> if n is 1, but <span>3 <b>files</b> loaded</span> if n is 3.

Note that parameter substitution is performed on the strings of the snippet that will become text in the DOM fragment that is generated from the snippet.

Specifying a Form

Some words or phrases may be used in more than one way, and these may require different translations. For example, "Post" may be used as a verb as a button label, while "Post" as a noun could refer to a blog post. These may need to be translated into different words or phrases in another language. Since a translator will be presented with the same word ("Post") in both cases, you may need to give the translator more help in determining how the word will be used. You do this by providing an extra argument following the message string (or array) that indicates the extra data to be shown to the translator. For example

_("pn","Post",{form:"noun"})

or

_("pv","Post",{form:"verb"})

Note that the id is different for these two, so there will be two values for the translator; the form tells the translator how the word is used. The value for form can be anything that will help the translator figure out how best to translate the word, e.g.,

_("pcol","Post",{form:"column name"})

In fact, you can supply as much meta-data between the braces as you would like. [I'm not sure yet how this will be used, other than form, but it gives flexibility for the future.]

The Localization Data

The MathJax.Localization object holds the data for the various translations, as well as the service routines for adding to the translations, and retrieving translations.

Methods

The methods in MathJax.Localization include:

_(id,message[,form][,arguments])
The function described in detail above that returns the translated string for a given id.
setLocale(locale)
Sets the selected locale to the given one, e.g. MathJax.Localization.setLocale("fr");
addTranslation(local,domain,def)
Defines (or adds to) the translation data for the given locale and domain. The def is the definition to be merged with the current translation data (if it exists) or to be used as the complete definition (if not). The data format is described below.
fontFamily()
Get the font-family needed to display text in the selected language. Returns null if no special font is required.

Properties

locale
The currently selected locale, e.g., "fr". This is set by the setLocale() method, and should not be modified by hand.
directory
The URL for the localization data files. This can be overridden for individual languages or domains (see below). The default is [MathJax]/localization.
strings
This is the main data structure that holds the translation strings. It consists of an entry for each language that MathJax knows about, e.g., there would be an entry with key `fr` whose value is the data for the Frenchtranslation. Initially, these simply reference the files that define the translation data, which MathJax will load when needed. After the file is loaded, they will contain the translation data as well. This is described in more detail below.

Translation Data

Each language has its own data in the MathJax.Localization.strings structure. This structure holds data about the translation, plus the translated strings for each domain.

A typical example might be

fr: {
  version: "1.0",
  directory: "[MathJax]/localization/fr",    // optional
  file: "fr.js",                             // optional
  isLoaded: true,                            // set when loaded
  font: "...",                               // optional
  meta: {
    translator: "...",                       // other metadata could be added
  },
  domains: {
    hub: {
      version: "1.0",
      file: "http://somecompany.com/MathJax/localization/fr/hub.js",  // optional
      isLoaded: true,
      strings: {
        fnf: "File '%1' not found",
        fl: ["%1 file loaded","%1 files loaded"],
        ...
      }
    },
    TeX: {
      ...
    },
    "_": {
      ...
    },
    ...
  }

The fields have the following meanings:

version
The version of the translation data.
directory
An optional value that can be used to override the directory where the translation files for this language are stored. The default is to add the locale identifier to the end of `MathJax.Localization.directory`, so the value given in the example above is the default value, and could be omitted.
file
The name of the file containing the translation data for this language. The default is the locale identifier with .js appended, so the value given in the example above is the default value, and could be omitted.
isLoaded
This is set to true when MathJax has loaded the data for this language. Typically, when a language is registered with MathJax, the data file isn't loaded at that point. It will be loaded when it is first needed, and when that happens, this value is set.
font
This is a font-family (or list of font-families) that should be used when text in this language is displayed. If not present, then no special font is needed.
meta
This is an object that contains the meta-data about the translation. Such information can include the name of the translator, the date of the translation, etc.
domains
This is an object that contains the translation strings for this language, grouped by domain. Each domain has an entry, and its value is an object that contains the translation strings for that domain. The format is described in more detail below.

Domain Data

Each domain for which there are translations has an entry in the locale's domains object. These store the following information:

version
The version of the data for this domain
file
If the domain data is stored in a separate file from the rest of the language's data (e.g., a third-party extension that is not stored on the CDN may have translation data that is provied by the thrid-party), this property tells where to obtain the translation data. In the example above, the data is provided by another company via a complete URL. The default value is the locale's directory with the domain name appended and .js appended to that.
isLoaded
This is set to true when the data file has been loaded.
strings
This is an object that contains that actual translated strings. The keys are the message identifiers described in the section on "Getting a Translated String" above, and the values are the translations, or arrays of translations (see the sections on "Plural Forms" above), or translated HTML snippets (see the section on "HTML Snippets" above).

Registering a Translation

Typically, for languages stored on the CDN, MathJax will register the language with a call like

MathJax.Localization.addTranslation("fr",null,{});

which will create an fr entry in the localization data that will be tied to the [MathJax]/localization/fr directory, and the [MathJax]/localization/fr/fr.js file. That directory could contain individual files for the various domains, or the fr.js file could contain combined data that includes the most common domains, leaving only the lesser-used domains in separate files.

An example fr.js file could be

MathJax.Localization.addTranslation("fr",null,{
  version: "1.0",
  meta: {
    translator: "Joe Green"
  },
  domains: {
    "_": {},
    TeX: {},
    Menu: {}
  }
});

This would declare that there are translation files for the _, TeX, and Menu domains, and that these will be loaded individually from their default file names in the default directory of [MathJax]/localization/fr. Other domains will not be translated unless they register themselves via a command like

MathJax.Localization.addTranslation("fr","Zoom",{});

in which case the domain's data file will be loaded automatically when needed.

One could preload translation strings by including them in the fr.js file:

MathJax.Localization.addTranslation("fr",null,{
  version: "1.0",
  meta: {
    translator: "Joe Green"
  },
  domains: {
    "_": {
      isLoaded: true,
      strings: {
        'fnf': "Fichier `%1` non trouvé",
        ...
      }
    },
    TeX: {
      isLoaded: true,
      strings: {
        'mcb': "Accolade de fermeture manquante",
        ...
      }
    },
    Menu: {}
  }
});

Here the _ and TeX strings are preloaded, while the Menu strings will be loaded on demand.

A third party extension could include

MathJax.Localization.addTranslation("fr","myExtension",{
  file: "http://myserver.com/MathJax/localization/myExtension/fr.js"
});

to add french translations for the myExtension domain (used by the extension) so that they would be obtained from the third-party server when needed.

A third party could provide a translation for a language not covered by the MathJax CDN by using

MathJax.Localization.addTranslation("kr",null,{
  directory: "http://mycompany.com/MathJax/localization/kr"
});

and providing a kr.js file in their MathJax/localization/kr directory that defines the details of their translation. If the Korean (kr) locale is selected, MathJax will load http://mycompany.com/MathJax/localization/kr/kr.js and any other domain files when they are needed.

The Translation Files

Clone this wiki locally