New AI instantly translates 5,000-year-old cuneiform tablets

Translation is not simply a matter of exchanging a word for a corresponding word in another language. A high-quality translation requires that the translator understand how both languages ​​bring thoughts together and then use that knowledge to create a translation that retains the linguistic nuances of the original, which native speakers understand effortlessly.

As difficult as this process is, it is nothing compared to the challenge of translating an ancient language into a modern language. These translators must not only resurrect extinct languages ​​from written sources, but also have a deep understanding of how the cultures that produced those sources evolved over the centuries. If that weren’t enough, their sources are often fragmented, leaving crucial context lost in time.

Because of this, the number of people able to translate languages ​​from antiquity is small, and their best efforts are often outnumbered by the volume of texts unearthed by archaeologists.

Take ancient Akkadian. This first Semitic language is one of the best attested of the ancient world. Hundreds of thousands, according to some more than a million, of Akkadian texts have been discovered and are today found in museums and universities. Many have even been scanned online. Each has the potential to teach us about the life, politics, and beliefs of early civilizations, but this knowledge gets stuck behind the time and manpower required to translate them.

To help change that, a multidisciplinary team of archaeologists and computer scientists has developed an artificial intelligence that can translate Akkadian almost instantly and unlock the historical record held in these 5,000-year-old tablets.

A display case with many different cuneiform tablets.

Hundreds of thousands of cuneiform tablets are held in museum and university collections, but many of these remain untranslated due to the long duration of the process and the few people who have the experience to do so. (Credit: Phillip Tellis/Wikimedia Commons)

Akkadian lost (and found)

Akkadian was the native language of the Akkadian empire, which arose around 2300 BC thanks to the conquests of its founder, Sargon the Great. As a spoken language, Akkadian would eventually split into Assyrian and Babylonian dialects before being completely supplanted by Aramaic in the early first millennium BC Today it is a truly extinct language, with not even any daughter languages ​​to carry on its legacy.

As a written language, however, Akkadian proved more enduring. The empire borrowed the cuneiform script from its successor, the Sumerian civilization. This writing system used a reed stylus to imprint wedge-shaped glyphs into wet clay tablets before firing them (hence the name cuneiform, which literally means “wedge-shaped” in Latin). Even after Aramaic supplanted Akkadian as the common language of the region, scholars continued to write in Akkadian cuneiform well into the first century AD – even in ancient times, it seems, scholars and academics were incredibly stubborn.

This traditional mindset has had an unintended advantage for modern archaeologists as well. While cuneiform could be written on papyrus, it was most often inscribed on clay or stone. These materials stand up much better to the fires and floods that have ravaged their fellow creatures. And while time is cruel to all things — archaeologists rarely discover cuneiform tablets in pristine condition — that’s one reason Akkadian writing can be so well attested in the historical record.

“Ironically, destructive fires have preserved some of the largest libraries in ancient Mesopotamia, because they were made of clay. By contrast, all of the papyrus libraries of ancient Egypt were burned or reduced to dust, although many individual codices survive,” writes linguist Steven Roger Fischer in A history of writing.

Even with such linguistic riches, translating these ancient libraries correctly is no mean feat. Beyond the challenges already mentioned, the Akkadian language is multipurpose. That is, its cuneiform signs can have different readings depending on how each one works in a sentence. There are many reasons for this development, but according to Fischer, one reason the Akkadians never simplified is that they “seemed to be bound by tradition and a self-imposed efficiency.” That traditional mindset has led them to continue using the Sumerian script for a language very different from Sumerian. (When it comes to historical scholarship, you win some, you lose some.)

Therefore, translating Akkadian is a two-step process. First, scholars must transliterate the cuneiform signs. That is, they take cuneiform and rewrite it using the similar-sounding phonetics of the target language. An example that most readers will be familiar with is the Arabic word الله, which translates into English as “God” but transliterated as “Allah”. This transliteration is the closest the Latin alphabet can get to producing the word as it sounds in Arabic. Scholars then take their transliteration of the text and translate it into a modern language.

Fast-acting AI for immediate results

As you can imagine, it can be a long and laborious process, requiring years of training and dedication to learn how to do it right. To help speed things up, the research team developed a neural machine translation model for Akkadian cuneiform — the same technology under the hood of Google Translate.

The team trained the AI ​​model on a sample of cuneiform texts from the open annotation-rich Cuneiform Corpus and taught it to translate in two distinct ways. First, the AI ​​model learned to translate Akkadian from transliterations of the original texts. He also learned to translate cuneiform symbols directly. More specifically, it translated Unicode glyphs of cuneiform text generated by another time-saving tool that automatically produces Unicode from an original tablet image.

The AI ​​model then had to figure out how to handle the nuances of the various genres in the sample, such as the difference between literary works and administrative letters, as well as how to handle the changes seen in cuneiform writing over the millennia it has been used. The AI ​​model was then tested using Bilingual Assessment Student 4 (BLEU4), an algorithm used to assess machine-translated text.

In its transliteration to the English test, the team’s AI model scored 37.47. In his test from cuneiform to English, he scored 36.52. Both scores were above the target baseline and within the range of a high quality translation. And there was a surprising result: the model was able to reproduce the gender nuances of every sentence in the test. While this wasn’t one of the researcher’s goals, they note in the study that it could open up possibilities for uses beyond translation.

Smarter, Faster: The Big Think Newsletter

Sign up to get counterintuitive, surprising, and impactful stories delivered to your inbox every Thursday

“In almost all cases, both the [translation] is correct or not, the genre is recognizable,” the team writes. “A promising future scenario would have the [model] show the user a list of sources they based their translations on, which would also be particularly useful for academic purposes.

The team published the findings in peer review PNAS nexus. They have also published their research and source code on GitHub at Akkademia.

A stone with cuneiform writing lying on the ground.

While clay and stone tablets may withstand the ravages of time better than papyrus, they often still lie fragmented and may lack crucial context. (Credit: homocosmicos / Adobe Stock)

The future of the past looks brighter

As promising as the initial results are, there is still work to be done. In both cases, some of the test sentences were mistranslated. And like other AI models, this one is prone to hallucinations, moments where the answer has no connection to the source. In one instance, the human translator produced the line “Why should we (even) take the case before a Libbi-Ali man?” The AI ​​translation: “I’m in the old town in the old town.” (A little off.)

All in all, the AI ​​model works best when translating short to medium length sentences. It also works better with more formal genres, such as royal decrees and administrative documents, than it does with literary genres such as myths, hymns, and prophecies. With more training on a larger data set, the researchers note in the study, they aim to improve its accuracy. In time, they hope their AI model can serve as a virtual assistant for human scholars. The AI ​​can quickly deliver the rough translation, while the scholar can refine it with his or her knowledge of historical languages, cultures, and people.

“Hundreds of thousands of clay tablets engraved in cuneiform script document the political, social, economic and scientific history of ancient Mesopotamia. However, most of these documents remain untranslated and inaccessible due to their number and the limited amount of experts able to read them,” the team writes in the study.

“This is another important step towards the preservation and dissemination of the cultural heritage of ancient Mesopotamia.”

#instantly #translates #5000yearold #cuneiform #tablets
Image Source : bigthink.com

Leave a Comment