Geographic Distribution and Development of Arabic LLMs. Model names with the same color indicate collaborative development efforts between different countries.
- OpenITI corpus (v1.2) Link.
- 1.5 billion Words Arabic Corpus Link.
- OSIAN Corpus Link.
- Gigaword Corpus Link.
- Oscar Corpus Link.
- Arabic Wikipedia Dump Link.
- ArabicText 2022 Link.
- AraC4 Link.
- Maktabah Link.
- TyDi dataset Link.
- ARCD Link.
- CC100-Arabic Link.
- OpenSubtitles2016 corpus Link.
- AraNews Link.
- Hindawi Link.
- CALLHOME Egyptian Arabic Transcripts Link.
- Babylon Levantine Arabic Transcripts Link.
- Levantine Arabic QT Training Data Set 4 Transcripts Link.
- Levantine Arabic QT Training Data Set 5 Transcripts Link.
- Gulf Arabic Conversational Telephone Transcripts Link.
- Iraqi Arabic Conversational Telephone Transcripts Link.
- Levantine Arabic Conversational Telephone Transcripts Link.
- Fisher Levantine Arabic Conversational Telephone Transcripts Link.
- AOC Dataset Link.
- Arabic-Dialect/English Parallel Text Link.
- PADIC Corpus Link.
- Curras Corpus Link.
- BOLT Egyptian Arabic SMS/Chat and Transliteration Link.
- SDC (Shami Dialect Corpus) Link.
- Gumar Corpus Link.
- MADAR Corpus Link.
- Habibi Corpus Link.
- NADI 2020 Corpus Link.
- QADI Corpus Link.
- Darija-SFT-Mixture dataset Link.
- AraBERT Link.
- MARBERT Link.
- ARBERT Link.
- QARiB Link.
- SudaBERT Link.
- AraELECTRA Link.
- AraGPT2 Link.
- CAMeLBERT Link.
- JABER Link.
- SABER Link.
- AraBART Link.
- AraLegal-BERT Link.
- AraRoBERTa Link.
- DziriBERT Link.
- TunBERT Link.
- DarijaBERT Link.
- AraMUS Link.
- MorRoBERTa Link.
- MorrBERT Link.
- JASMINE Link.
- AraQA Link.
- ArabianGPT Link.
- AraPOEMBERT Link.
- SaudiBERT Link.
- AlcLaM Link.
- AraStories Link.
- EgyBERT Link.
- Atlas-Chat Link.
Please cite our paper if you use it in your work:
BibTeX
@misc{mashaabi2024survey,
title={A Survey of Large Language Models for Arabic Language and its Dialects},
author={Malak Mashaabi and Shahad Al-Khalifa and Hend Al-Khalifa},
year={2024},
institution={iWAN Research Group, College of Computer and Information Sciences, King Saud University},
url={https://arxiv.org/abs/2410.20238},
}