Skip to content

plurimath/asciimath2unitsml

Repository files navigation

DEPRECATION NOTICE: This repository is no longer maintained. Please visit Plurimath for the latest updates and features.

Gem Version Build Status Pull Requests Commits since latest

asciimath2unitsml

Convert Units expressions via MathML to UnitsML

This gem converts MathML incorporating UnitsML expressions (based on the Ascii representation provided by NIST) into MathML complying with https://www.w3.org/TR/mathml-units/, with UnitsML markup embedded in it, and with unique identifiers for each distinct unit, prefix, and dimension. Dimensions are automatically inserted corresponding to each unit. Units expressions are identified in MathML as <mtext>unitsml(…​)</mtext>, which in turn can be identified in AsciiMath as "unitsml(…​)".

The consuming document is meant to deduplicate the instances of UnitsML markup with the same identifier, and potentially remove them to elsewhere in the document or another document.

Notation

The unitsml() expression consists of a unit string. The units used in unitsml() are taken from the UnitsDB database as updated by Ribose: https://github.com/unitsml/unitsdb. Units are given as an ASCII based code, consisting of multiplication or division of single units, each of which is defined as a Prefix (taken from https://github.com/unitsml/unitsdb/blob/master/prefixes.yaml), unit (taken from https://github.com/unitsml/unitsdb/blob/master/units.yaml), and exponent; e.g. mm*s^-2.

The conventions used for writing units are:

  • ^ for exponents, e.g. m^-2

  • * to combine two units by multiplication; e.g. m*s^-2.

  • / to combine two units by division;

  • u for μ (micro-)

For more on units notation, see Units Notation.

The unitsml() can take additional optional parameters, giving further information for the UnitsML to be generated:

  • unitsml(unit-string, quantity: ID) provides the UnitsDB identifier for the quantity being measured (taken from https://github.com/unitsml/unitsdb/blob/master/quantities.yaml). For example, unitsml(s, quantity: NISTq109) indicates that the second is used to measure period duration. If a single quantity is associated with the unit in UnitsDB (as given in https://github.com/unitsml/unitsdb/blob/master/units.yaml), that quantity is added automatically; otherwise, no quantity is added unless explicitly nominated in this way.

  • unitsml(unit-string, name: NAME) provides a name for the unit, if one is not already available from UnitsDB. For example, unitsml(cal_th/cm^2, name: langley).

  • unitsml(unit-string, symbol: SYMBOL) provides an alternate symbol for the unit, in AsciiMath. The unit-string gives the canonical representation of the unit, but SYMBOL is what will be rendered. For example, unitsml(cal_th/cm^2, name: langley, symbol: La), or unitsml(mm*s^-2, symbol: mm cdot s^-2). (All variables in SYMBOL are rendered upright, as is the default for units.)

  • unitsml(unit-string, multiplier: SYMBOL) provides an alternate symbol for the multiliper of units. The options are an XML entity, or the values space or nospace (for which see discussion under Usage).

Standalone prefixes can be recognised by replacing the unit with hyphen; so unitsml(p-) corresponds to the standalone prefix "pico" (and is rendered as "p").

The gem also supports fundamental units, e.g. unitsml(e) for the atomic unit of charge, e, and symbols for dimensions. The latter are entered as dim_XXX, where XXX is their established symbol:

Symbol Dimension

dim_L

Length

dim_M

Mass

dim_T

Time

dim_I

Electric Current

dim_Theta

Thermodynamic Temperature

dim_N

Amount of Substance

dim_J

Luminous Intensity

dim_phi

Plane Angle (dimensionless)

e.g. unitsml(dim_I) for the dimension of electric current, 𝖨.

Rendering

The output of the gem is MathML, with MathML unit expressions (expressed as <mi>, complying with MathML Units) cross-referenced to UnitsML definitions embedded in the MathML.

The gem follows the MathML Units convention of inserting a spacing invisible times operator (<mo rspace='thickmathspace'>⁢</mo>) between any numbers (<mn>) and unit expressions in MathML, and representing units in MathML as non-italic variables (<mi mathvariant='normal'>).

Space is not inserted between a number and a unit expression, when that unit expression wholly consists of punctuation: 1 m, 1 °C, but 9° 7′ 22″.

Example

9 "unitsml(C^3*A)"

is converted into:

<math xmlns='http://www.w3.org/1998/Math/MathML'>
  <mrow>
    <mn>9</mn>
         <mo rspace='thickmathspace'>&#x2062;</mo>
         <mrow xref='U_C3.A'>
           <msup>
             <mrow>
               <mi mathvariant='normal'>C</mi>
             </mrow>
             <mrow>
               <mn>3</mn>
             </mrow>
           </msup>
           <mo>&#xB7;</mo>
           <mi mathvariant='normal'>A</mi>
         </mrow>

         <Unit xmlns='http://unitsml.nist.gov/2005' xml:id='U_C3.A' dimensionURL='#D_T3I4'>
           <UnitSystem name='SI' type='SI_derived' xml:lang='en-US'/>
           <UnitName xml:lang='en'>C^3*A</UnitName>
           <UnitSymbol type='HTML'>C<sup>3</sup> &#xB7; A</UnitSymbol>
           <UnitSymbol type='MathML'>
             <math xmlns='http://www.w3.org/1998/Math/MathML'>
               <mrow>
                 <msup>
                   <mrow>
                     <mi mathvariant='normal'>C</mi>
                   </mrow>
                   <mrow>
                     <mn>3</mn>
                   </mrow>
                 </msup>
                 <mo>&#xB7;</mo>
                 <mi mathvariant='normal'>A</mi>
               </mrow>
             </math>
           </UnitSymbol>
           <RootUnits>
             <EnumeratedRootUnit unit='coulomb' powerNumerator='3'/>
             <EnumeratedRootUnit unit='ampere'/>
           </RootUnits>
         </Unit>
         <Dimension xmlns='http://unitsml.nist.gov/2005' xml:id='D_T3I4'>
           <Time symbol='T' powerNumerator='3'/>
           <ElectricCurrent symbol='I' powerNumerator='4'/>
         </Dimension>

  </mrow>
</math>

Usage

The converter is run as:

c = Asciimath2UnitsML::Conv.new()
c.Asciimath2UnitsML('1 "unitsml(mm*s^-2)"') # AsciiMath string containing UnitsML
c.MathML2UnitsML("<math xmlns='http://www.w3.org/1998/Math/MathML'><mn>7</mn>"\
  "<mtext>unitsml(kg^-2)</mtext></math>") # AsciiMath string containing <mtext>unitsml()</mtext>
c.MathML2UnitsML(Nokogiri::XML("<math xmlns='http://www.w3.org/1998/Math/MathML'><mn>7</mn>"\
  "<mtext>unitsml(kg^-2)</mtext></math>")) # Nokogiri parse of MathML document containing <mtext>unitsml()</mtext>

The converter class may be initialised with options:

  • multiplier is the symbol used to represent the multiplication of units. By default, following MathML Units, the symbol is middle dot (&#xB7). An arbitrary UTF-8 string can be supplied instead; it will be encoded as XML entities. The value :space is rendered as a spacing invisible times in MathML (<mo rspace='thickmathspace'>⁢</mo>), and as a non-breaking space in HTML. The value :nospace is rendered as a non-spacing invisible times in MathML (<mo>⁢</mo>), and is not rendered in HTML.

Units Notation

The units used in unitsml() are taken from the UnitsDB database as updated by Ribose: https://github.com/unitsml/unitsdb. Units are given as an ASCII based code, consisting of multiplication or division of single units, each of which is defined as a Prefix (taken from https://github.com/unitsml/unitsdb/blob/master/prefixes.yaml), unit (taken from https://github.com/unitsml/unitsdb/blob/master/units.yaml), and exponent; e.g. mm*s^-2.

In case of ambiguity, the interpretation with no prefix is prioritised over the interpretation as a unit; so ct is interpreted as hundredweight, rather than centi-ton. Exceptionally, kg is decomposed into kilo-gram rather than treated as a basic unit, for consistency with other prefixes of grams. (Prefixed units appear in UnitsDB, and are indicated as prefixed: true.)

A unit may have multiple symbols; these are registered separately in units.yaml, as entries under unit_symbols. These different symbols will be recognised as the same Unit in the UnitsML markup, but the original symbol will be retained in the MathML expression. So an expression like 1 unitsml(mL) will be recognised as referring to microlitres; the expression will be given under its canonical rendering ml in UnitsML markup, but the MathML rendering referencing that UnitsML expression will keep the notation mL.

The symbols used for units can be highly ambiguous; in order to guarantee accurate parsing, the symbols used to data enter units are unambiguous in units.yaml. They may be found as the entries for unit_symbols/id under each unit. For example, B is ambiguous between bel (as in decibel) and byte; they are kept unambiguous by using bel_B and byte_B to refer to them, although they will still both be rendered as B.

The following table is the current list of ambiguous symbols, which are disambiguated in the symbol ids used. This table can be generated (in Asciidoc format) through Asciimath2UnitsML::Conv.new().ambig_units:

Symbol Unit + ID

minute (minute of arc): '

foot: '_ft

minute: '_min

minute (minute of arc): prime

foot: prime_ft

minute: prime_min

second (second of arc): "

second: "_s

inch: "_in

second (second of arc): dprime

second: dprime_s

inch: dprime_in

″Hg

conventional inch of mercury: "Hg

conventional inch of mercury: dprime_Hg

inch of mercury (32 degF): "Hg_32degF

inch of mercury (60 degF): "Hg_60degF

inch of mercury (32 degF): dprime_Hg_32degF

inch of mercury (60 degF): dprime_Hg_60degF

hp

horsepower: hp

horsepower (UK): hp_UK

horsepower, water: hp_water

horsepower, metric: hp_metric

horsepower, boiler: hp_boiler

horsepower, electric: hp_electric

Btu

British thermal unit_IT: Btu

British thermal unit (mean): Btu_mean

British thermal unit (39 degF): Btu_39degF

British thermal unit (59 degF): Btu_59degF

British thermal unit (60 degF): Btu_60degF

a

are: a

year (365 days): a_year

year, tropical: a_tropical_year

year, sidereal: a_sidereal_year

d

day: d

darcy: darcy

day, sidereal: d_sidereal

inHg

conventional inch of mercury: inHg

inch of mercury (32 degF): inHg_32degF

inch of mercury (60 degF): inHg_60degF

inH2O

conventional inch of water: inH_2O

inch of water (39.2 degF): inH_2O_39degF

inch of water (60 degF): inH_2O_60degF

min

minute: min

minim: minim

minute, sidereal: min_sidereal

pc

parsec: pc

pica (printer’s): pica_printer

pica (computer): pica_computer

t

metric ton: t

long ton: ton_long

short ton: ton_short

B

bel: bel_B

byte: byte_B

cmHg

conventional centimeter of mercury: cmHg

centimeter of mercury (0 degC): cmHg_0degC

cmH2O

conventional centimeter of water: cmH_2O

centimeter of water (4 degC): cmH_2O_4degC

cup

cup (US): cup

cup (FDA): cup_label

D

debye: D

darcy: Darcy

ft

foot: ft

foot (based on US survey foot): ft_US_survey

ftH2O

conventional foot of water: ftH_2O

foot of water (39.2 degF): ftH_2O_39degF

gi

gill (US): gi

gill [Canadian and UK (Imperial)]: gi_imperial

h

hour: h

hour, sidereal: h_sidereal

′Hg

conventional foot of mercury: 'Hg

conventional foot of mercury: prime_Hg

ħ

natural unit of action: h-bar

atomic unit of action: h-bar_atomic

me

natural unit of mass: m_e

atomic unit of mass: m_e_atomic

in

inch: in

inch (based on US survey foot): in_US_survey

K

kelvin: K

kayser: kayser

L

liter: L

lambert: Lambert

lb

pound (avoirdupois): lb

pound (troy or apothecary): lb_troy

mi

mile: mi

mile (based on US survey foot): mi_US_survey

mil

mil (length): mil

angular mil (NATO): mil_nato

oz

ounce (avoirdupois): oz

ounce (troy or apothecary): oz_troy

pt

point (printer’s): pt_printer

point (computer): pt_computer

rad

radian: rad

rad (absorbed dose): rad_radiation

s

second: s

second, sidereal: s_sidereal

tbsp

tablespoon: tbsp

tablespoon (FDA): tbsp_label

ton

ton of TNT (energy equivalent): ton_TNT

ton of refrigeration (12 000 Btu_IT/h): ton_refrigeration

tsp

teaspoon: tsp

teaspoon (FDA): tsp_label

yd

yard: yd

yard (based on US survey foot): yd_US_survey

°

degree (degree of arc): deg

γ

gamma: gamma

μ

micron: micron

ohm: Ohm

Å

angstrom: Aring

ħ

natural unit of action in eV s: h-bar_eV_s

abΩ

abohm: abohm

(abΩ)-1

abmho: abS

aW

abwatt: aW (Cardelli)

b

barn: barn

Btuth

British thermal unit_th: Btu_th

°C

degree Celsius: degC

calIT

I.T. calorie: cal_IT

calth

thermochemical calorie: cal_th

°F

degree Fahrenheit: degF

a0

atomic unit of length: a_0

c

natural unit of velocity: c

c0

natural unit of velocity: c_0

e

atomic unit of charge: e

Eh

atomic unit of energy: e_h

μin

microinch: uin

°K

kelvin: degK

kcalIT

kilocalorie_IT: kcal_IT

kcalth

kilocalorie_th: kcal_th

mmH2O

conventional millimeter of water: mmH_2O

°R

degree Rankine: degR

ƛC

natural unit of length: lambda-bar_C