ArabicTransformer: Efficient Large Arabic Language Model with Funnel Transformer and ELECTRA Objective

Abstract

Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pretraining cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.

Pre-Trained Models ( PyTorch + PT + TensorFlow )

ArabicTransformer small (B4-4-4) Link
ArabicTransformer intermediate (B6-6-6) Link
ArabicTransformer large (B8-8-8) Link

Google Colab Examples

Text Classification with ArabicTransformer with PyTorchXLA on TPU or with PyTorch on GPU (Better reproducibility but slower).

Text Classification with ArabicTransformer and TPU and Keras API (Faster but reproducibility is not better than PyTorchXLA).

Question Answering ( TyDi QA / Quran QA dataset) with ArabicTransformer.

@inproceedings{alrowili-shanker-2021-arabictransformer-efficient,
    title = "{A}rabic{T}ransformer: Efficient Large {A}rabic Language Model with Funnel Transformer and {ELECTRA} Objective",
    author = "Alrowili, Sultan  and
      Shanker, Vijay",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.108",
    pages = "1255--1261",
    abstract = "Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pre-training cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Examples		Examples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArabicTransformer: Efficient Large Arabic Language Model with Funnel Transformer and ELECTRA Objective

Abstract

Pre-Trained Models ( PyTorch + PT + TensorFlow )

Google Colab Examples

About

Releases

Packages

License

salrowili/ArabicTransformer

Folders and files

Latest commit

History

Repository files navigation

ArabicTransformer: Efficient Large Arabic Language Model with Funnel Transformer and ELECTRA Objective

Abstract

Pre-Trained Models ( PyTorch + PT + TensorFlow )

Google Colab Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages