

To break that barrier, IARPA is funding research to develop a system that can find, translate and summarise information from any low-resource language, whether it is in text or speech. "Even our most world-renowned experts in that country may understand just a small fraction of those, if any." "In Nigeria, for instance, there are over 500 languages spoken," Rubino says. Even then, it may not be enough for the task at hand. Training a human translator or intelligence analyst in a new language can take years.

"Many challenges we face today, such as economic and political instability, the Covid-19 pandemic, and climate change, transcend our planet – and, thus, are multilingual in nature." "I would say the more interested an individual is in understanding the world, the more one must be able to access data that are not in English," says Carl Rubino, a programme manager at IARPA, the research arm of US intelligence services. That language barrier can pose a problem for anyone who needs to gather precise, global information in a hurry – including intelligence agencies. Yet there are more than 7,000 spoken languages around the world, and at least 4,000 with a writing system. Google Translate currently offers the ability to communicate in around 108 different languages while Microsoft's Bing Translator offers around 70 languages. But this amounts to a narrow dataset, and is not enough to train accurate, wide-ranging translation robots. The fallback machine-training material for these languages consists of religious publications, including the much-translated Bible. They are known as low-resource languages. No such data mountain exists, however, for languages that may be widely spoken but not as prolifically translated. The European Parliament alone produces a data trove of 1.37 billion words in 23 languages over a decade. Their human translators churn out streams of translated transcripts and other documents. There is an abundance of such material for languages like English, French, Spanish and German, thanks to multilingual institutions like the Canadian parliament, the United Nations and the European Union. That's because the algorithms that power these engines learn from human translations – ideally, millions of words of translated text. But many other languages still defy machine translation, including languages spoken by millions of people, such as Wolof, Luganda, Twi and Ewe in Africa. If the message is in French or Spanish, typing it into an automatic translation engine will instantly solve the mystery and produce a solid answer in English. You're not even sure which of the world's thousands of languages it is written in. But there's a problem: you don't understand a word.

Imagine you come across a message that could contain life-saving information.
