Darija Dataset

A continuously growing collection ofdataset for training AI models.

Current Dataset Size

About the Dataset

This dataset focuses on Moroccan Darija, created by translating the Alpaca dataset into Moroccan Darija using an LLM. The dataset will be continuously updated in huggingface once it reaches a certain milestone.

How It's Used

The dataset supports machine learning models, chatbots, translation tools, and research projects. It helps bridge the gap between modern AI technologies and the rich linguistic landscape of Morocco.

This dataset is part of an ongoing effort to digitally preserve and promote Moroccan Darija. The count is automatically updated as new examples are added to the collection.