Developing for Indic languages | Gemma and Navarasa
Вставка
- Опубліковано 13 тра 2024
- While many early large language models were predominantly trained on English language data, the field is rapidly evolving. Newer models are increasingly being trained on multilingual datasets, and there's a growing focus on developing models specifically for the world’s languages. However, challenges remain in ensuring equitable representation and performance across diverse languages, particularly those with less available data and computational resources.
Gemma, Google's family of open models, is designed to address these challenges by enabling the development of projects in non-Germanic languages. Its tokenizer and large token vocabulary make it particularly well-suited for handling diverse languages. Watch how developers in India used Gemma to create Navarasa - a fine-tuned Gemma model for Indic languages.
Watch the full keynote: ua-cam.com/users/liveXEzRZ35u...
To watch this keynote with American Sign Language (ASL) interpretation, please click here: ua-cam.com/users/live6rP2rEWs...
#GoogleIO #GoogleIO2024
Subscribe to our Channel: / google
Find us on X: / google
Watch us on TikTok: / google
Follow us on Instagram: / google
Join us on Facebook: / google - Наука та технологія
Let's ensure support for the preservation of the endangered Crimean Tatar language. Our NGO is ready to help you
This is a necessary step in this process of evolution. Being able to communicate but not forcing people to learn a specific language. Keep the cultural norms of their own society but still being able to communicate will be amazing :)
Pumped for you try it
Awesome, Superb, excellent!! Excited to explore this, Thanks Google!
Ready to make magic happen together! ✨
Another excellent step. Thank you !!
We can't wait to see everything you create ✨
Superb folks….many many congratulations
Can't wait to hear what you think
Congrats to Navarasa and excellent initiative by Google in showing LLM innovation from India.
Following this space in detail, I must say that Indic LLMs are becoming a big deal now. And even surpass GPT - 4.
Socket AI labs recently unveiled an LLM called Pragna 1B which has more efficient tokenizer than GPT 4 for Indic languages.
GenVR Research unveiled AryaBhatta Gemma LLM which is a Gemma model trained on 6 million plus Indic cultural data (10x more SFT data than most Indic LLM) and is currently the leader on Indic LLM leaderboard and also on Microsoft Pratiksha leaderboard. And became the first Indic LLM to surpass GPT 4 on human evals in Microsoft Pratiksha study. Gemma finetune again.
OpenBioLLM70B is the current leader on Medical LLM leaderboard and is finetuned on llama-3-70B. And is created in India.
These three models (one made from scratch, one on Gemma and one on llama-3) show that Indians can surpass GPT - 4 despite our funding crunch.
Microsoft Pariksha study *
can't wait to tell sam that yes we indians can do it.
Superb!
Thats really great to see all languages and cultures followed by thousands, and even millions treated equally.
thank you Google for this initiative, I am proud to be in such a world,
thank you Google!!!!
Rest in Peace OLA Krutrim.
Really needed. Long due!
Next is to ensure every language in Indonesia is resolved as well. There's so many different languages here too. Please include Indonesia in many different Google Projects that are typically only included in United States first.
Amazing!
nice, thanksss
i wonder if i could use this to maintain Javanesse language
Google is upping the game everyday. This will be really helpful to Indians if it actually delivers.
Excited for you to try it
Damn! Google replied to your comment XD
Try what? Google is doomed check the comments,ill make sure Google isn't there at 2060,6 th generation computer,your Ai is stupid,i understand but sorry@@Google
This is awesome. Thank you google.
Please include Bengali too 🥺
Thank you 🙏 Google...
Hope it works well
This will definitely help understand Sanskrit and will help to learn the language as well.
Thrilled you're ready to play around with it!
Thankyou, Google
.
.
#teampixel
hope this encourages big companies to translate their content to indian languages 😅
В аккордах Мироздания/
Природа внемлет/
Для продолжения/
Быть/не взъерошив Землю!//
Greate initiative!
Can we do it for Nepalese too please?
small market I think, how much data nepalese language generate ? I recently made one for sanskrit to understand old books using LLAMA 3 so try if you are developer, you are from land where panini (father of linguistics) researched
Google dhanyawaad from entire India.
Now my mom can also search in google using her native language Telugu.. Thanks TeluGoogu.. :)
Solteiro moro no Brasil
I sure hope you don't bring India's religious and class bigotry to the entire world.
Insightful comment!
Meanwhile Muslims demanding Sharia in UK, Germany and and creating chaos in Europe
If you think bigotry is the exclusive domain of India and Indians it reveals both your bigotry and ignorance. Are you not even following what’s happening on college campuses in the US?
This io was ass as always