Google is getting more languages, but will it have nuances?

LIMA, Peru (AP) – About 10 million people speak Quechua, but try to automatically translate e-mail. letters and text messages to the most widely used indigenous language family in America were nearly impossible.

That changed on Wednesday, when Google added Quechua and many other languages ​​to its digital translation service.

The Internet giant says the new artificial intelligence technology has made it possible to expand Google Translate’s global language repertoire. She added 24 of them this week, including Quechua and other local South American languages ​​such as Guarani and Aymara. It also adds many widely used African and South Asian languages ​​that were lacking in popular technology products.


“We’ve looked at languages ​​with a very large, under-served language,” Isaac Caswell, a Google researcher, told reporters.

News from the California company’s annual I / O technology demonstrations can be celebrated in many parts of the world. However, it is also likely to receive criticism from those who are frustrated with previous technology products who have failed to understand the nuances of their language or culture.

Quechua was the lingua franca of the Inca Empire, which stretched from present-day southern Colombia to central Chile. His status began to decline after the Spanish conquest of Peru more than 400 years ago.

Adding it to Google’s recognizable languages ​​is a great victory for Quechua language activists such as Louis Illaccanqui, a Peruvian who has created Qichwa 2.0, a site that includes dictionaries and language learning resources.

“This will help give Quechua and Spanish the same status,” said Illaccanqui, who was not involved in the Google project.

Illaccanqui, whose name in Quechua means “you are lightning,” said the translator will also help keep the language to a new generation of young people and teens, “who speak Quechua and Spanish at the same time and are fascinated by social networks.”

Caswell called the news a “very technological step forward” because it was not yet possible to add languages ​​if researchers could not find a large enough flow of online text, such as digital books, newspapers, or social media entries. AI systems to learn from.

U.S. technology giants don’t have great achievements to make their language technologies work well outside of the richest markets. This problem also makes it harder for them to detect dangerous misinformation on their platforms. Until this week, Google Translate was offered in European languages ​​such as Frisian, Maltese, Icelandic and Corsican (each with less than 1 million speakers), but not in East African languages ​​such as Oromo and Tigrinya, which are spoken by millions.

New languages ​​will be introduced this week. They are not yet understood by Google Voice Assistant, so they are currently being translated from text to text only. Google said it was working to add speech recognition and other features, such as the ability to translate the mark by pointing a camera at it.

This will be important for languages ​​that are mostly spoken, such as Quechua, especially in the field of health, because many Peruvian-speaking doctors and nurses work in rural areas and “cannot understand patients who speak Quechua mostly,” Illaccanqui said.

“The other frontier, or challenge, is working with language,” said Arturo Oncevay, a Peruvian machine translation researcher at the University of Edinburgh who formed a research coalition to improve Indigenous language technology across America. “Native American languages ​​are traditionally verbal.

In its report, Google warned that the quality of translations into the newly added languages ​​”still lags far behind” other supported languages, such as English, Spanish, and German, and noted that the models will “make mistakes and show bias.” “However, the company only included languages ​​if its AI systems met a certain payment threshold,” Caswell said.

“If there are a lot of cases where this is very wrong, then we wouldn’t include it,” he said. – Even if 90% of the translations are perfect, but 10% are nonsense, for us it is a little too much. “

Google said its products now support 133 languages. The latest 24 is the largest single group that has been added since 2010. Google has added 16 new languages. The expansion was made possible by what Google calls a “zero-frame” or “zero-resource” machine translation model that never learns to translate into another language.

Facebook and Instagram parent company Meta unveiled a similar concept last year called Universal Speech Translator.

The Google model works by training “one giant neural AI model” in about 100 data-rich languages, which is learned by applying to hundreds of other languages ​​they don’t know, Caswell said. “Imagine if you’re a big polyglot and you’re just starting to read novels in another language, you can start collecting what it might mean to your knowledge of the language in general,” he said.

He said the new group ranges from smaller languages ​​such as bark, spoken by about 800,000 people in northeast India, to more widely used languages ​​such as lingala, spoken by about 45 million people across Central Africa.

That was more than 15 years ago – in 2006. – Microsoft has received a lot of attention in South America with its software feature that translates familiar Microsoft menus and commands into Quechua. However, this was before the current wave of AI progress in real-time translation.

Harvard University’s linguist Américo Mendoza-Mori, who speaks Quechua, said Google’s attention to the language provides the necessary visibility in places like Peru, where Quechua is still lacking in many public services. The survival of many of these languages ​​”will depend on their use in the digital context,” he said.

Another linguist, Roberto Zariquiey, said he was skeptical that Google could create an effective tool for reviving Quechua, Aymara or Guarani without more active involvement of community groups in the region.

“Languages ​​are closely linked to life, culture, ethnic groups and political organizations,” said Zariquiey, a linguist at the Pontifical Catholic University of Peru. “That should be taken into account.”

New languages ​​added: Assamese, Aymara, Bambara, Bhojpuri, Dhivehi, Dogri, Sheep, Guarani, Ilocano, Konkani, Cryo, Lingala, Luganda, Maithili, Meitekton (Manipuri), Mizo, Oromo, Quechua, Sanskrit, Sepedi, Sorani Kurdish, Tigrinya, Tsonga and Twi.

O’Brien reported from Providence, Rhode Island.

Godfrey Kemp

"Bacon fanatic. Social media enthusiast. Music practitioner. Internet scholar. Incurable travel advocate. Wannabe web junkie. Coffeeaholic. Alcohol fanatic."

Leave a Reply

Your email address will not be published.