AI4Bharat's open-source dataset covers 22 Indian languages for future language technology developments.
IndicVoices collects 7,348 hours of audio to develop IndicASR, supporting all languages in India's constitution.
AI4Bharat's open-source dataset covers 22 Indian languages for future language technology developments.
AI4Bharat shares a blueprint, facilitating data collection for multilingual regions globally, aiding future projects.
1,639 hours of the dataset already transcribed, providing a foundation for building 22 language models.
Bhashini aims to create a National Public Digital Platform for language-based services, promoting AI and technology.
Over 70 research institutes, including IITs and AI4Bharat, benefit from Bhashini's support for innovative language solutions.
Bhashini's CEO, Amitabh Nag, anticipates the dataset's role in shaping language models and use cases.
The dataset's open nature eliminates barriers, enabling startups and academia to innovate with native voice datasets.
The government can extend services using the dataset, particularly in remote areas, enhancing citizen engagement.