Skip to content

Latest commit

 

History

History
49 lines (41 loc) · 1.95 KB

CHANGELOG.md

File metadata and controls

49 lines (41 loc) · 1.95 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Added CONTRIBUTING.md

[1.0.0] - 2017-10-15

Added

  • New "transformer" model based on Attention is all you need
  • Options specific for the transformer model
  • Linear learning rate warmup with and without initial value
  • Cyclic learning rate warmup
  • More options for learning rate decay, including: optimizer history reset, repeated warmup
  • Continuous inverted square root decay of learning (--lr-decay-inv-sqrt) rate based on number of updates
  • Exposed optimizer parameters (e.g. momentum etc. for Adam)
  • Version of deep RNN-based models compatible with Nematus (--type nematus)
  • Synchronous SGD training for multi-gpu (enable with --sync-sgd)
  • Dynamic construction of complex models with different encoders and decoders, currently only available through the C++ API
  • Option --quiet to suppress output to stderr
  • Option to choose different variants of optimization criterion: mean cross-entropy, perplexity, cross-entopry sum
  • In-process translation for validation, uses the same memory as training
  • Label Smoothing
  • Added CHANGELOG.md
  • Swish activation function default for Transformer (https://arxiv.org/pdf/1710.05941.pdf)

Changed

  • Renamed "s2s" binary to "marian-decoder"
  • Renamed "rescorer" binary to "marian-scorer"
  • Renamed "server" binary to "marian-server"
  • Renamed option name --dynamic-batching to --mini-batch-fit
  • Unified cross-entropy-based validation, supports now perplexity and other CE
  • Changed --normalize (bool) to --normalize (float)arg, allow to change length normalization weight as score / pow(length, arg).

Removed

  • Temporarily removed gradient dropping (--drop-rate X) until refactoring.