Simultaneous Speech Translation

Code base for simultaneous speech translation experiments. It is based on fairseq.

Implemented

Encoder

Transformers with convolutional context for ASR
Emformer

Streaming Models

Wait-k [example]
Monotonic Multihead Attention [example]
Continuous Integrate-and-Fire [example]

Setup

Install fairseq

git clone https://github.com/pytorch/fairseq.git
cd fairseq
git checkout 4a7835b
python setup.py build_ext --inplace
pip install .

(Optional) Install apex for faster mixed precision (fp16) training.
Install dependencies

pip install -r requirements.txt

Update submodules

git submodule update --init --recursive

Pre-trained model

ASR model with Emformer encoder and Transformer decoder. Pre-trained with joint CTC cross-entropy loss.

MuST-C (WER)	en-de (V2)	en-es
dev	9.65	14.44
tst-COMMON	12.85	14.02
model	download	download
vocab	download	download

Sequence-level Knowledge Distillation

MuST-C (BLEU)	en-de (V2)
valid	31.76
distillation	download
vocab	download

Citation

Please consider citing our paper:

@inproceedings{chang22f_interspeech,
  author={Chih-Chiang Chang and Hung-yi Lee},
  title={{Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={5175--5179},
  doi={10.21437/Interspeech.2022-10627}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simultaneous Speech Translation

Implemented

Encoder

Streaming Models

Setup

Pre-trained model

Sequence-level Knowledge Distillation

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simultaneous Speech Translation

Implemented

Encoder

Streaming Models

Setup

Pre-trained model

Sequence-level Knowledge Distillation

Citation