Spain Launches Open-Source Multilingual AI Project

Spain Announces Open-Source Language Model Project to Boost AI Competitiveness in Latin America

Spanish AI startups stand to gain from a new open-source language model project, which aims to improve language accuracy and fluency in Spanish, Basque, Catalan, Galician, and Valencian.

In a significant move aimed at boosting Spain’s artificial intelligence (AI) competitiveness in the Latin American market, the Spanish government has announced the development of an open-source large language model (LLM) trained in Spanish (Castellano), Basque, Catalan, Galician, and Valencian. The project, which involves a range of public and private organizations, was unveiled by Spanish Prime Minister Pedro Sánchez at the Mobile World Congress in Barcelona.

The new LLM project is expected to enable Spanish AI startups to better compete in the vast markets of Latin America and the Spanish-speaking communities of the US. Carlos KiK, CTO and cofounder of Barcelona-based AI startup AiMA Beyond AI, emphasized the urgency of the project, saying, “We need this project very much to compete with the American tech companies. If we don’t move quickly, one of the big ones will come and impose their Spanish model on us.”

The LLM will be developed through a public-private partnership between the Barcelona Supercomputer Center (BSC), the Spanish Supercomputing Network, the Royal Spanish Academy, and the Association of Spanish Language Academies. The BSC’s tech transfer director for AI and language technology projects, Albert Cañigueral, hopes the LLM will be equivalent to OpenAI’s GPT-3 model and will be released by the summer, assuming the center’s MareNostrum 5 supercomputer comes into operation as planned this spring.

The initiative will build upon two existing BSC projects, Aina on Catalan and Ilenia on Spanish and other regional languages, which have been gathering written data and recording speech in many parts of Spain. Once released, the BSC project’s second phase will focus on ensuring its LLM is adopted by industry and public institutions.

The BSC data, which will not include licensed content, is already accessible to companies of all sizes and has been used by Google to improve its PaLM-2 model, according to Cañigueral. A handful of startups and publicly-funded projects had already been developing LLMs trained with data in these languages, such as Clibrain and Latxa, which Cañigueral says will also benefit from the BSC initiative.

KiK believes the BSC LLM project could substantially improve the accuracy of the languages spoken by AI startups in Spain, which currently rely on LLMs trained with up to 90% English language data. The new model would save developers’ time by not having to introduce modifications to make the software sound more natural, and it would enable AI companions to identify local dialects and adapt their responses accordingly.

In summary, the Spanish government’s open-source LLM project has the potential to significantly enhance the competitiveness of Spanish AI startups in the Latin American market, while also improving language accuracy and fluency in various Spanish dialects and regional languages. The collaboration between public and private organizations and the involvement of Latin American countries in training the LLM will ensure its relevance and usefulness to users from any Spanish-speaking country.


Grow your business with AI. Be an AI expert at your company in 5 mins per week with this Free AI Newsletter

Recent Articles

Related Stories