Font Size: A A A

Advertisement

DeepL Edizioni

Tim Parks
As machine translation software grows more sophisticated, could it entirely replace human translators?
An illustration of a robot wearing a hawaiian shirt and straw hat and holding a guidebook to Italy upside down, in front of the leaning tower of Piza

Illustration by Lucas Adams

Introducing his new publishing house, Orville Press, at an event in Milan on February 23, Matteo Codignola, once one of the pillars of the prestigious Adelphi Edizioni, was asked if he intended to give due importance to translators and translation. “We take translation very seriously,” he replied, “although in the future I expect the job will be done by computers.” This drew a collective groan from the audience, some of whom, I recognized, were themselves translators.

Is Codignola right? What would be the consequences if he were?

The day after the presentation I agreed to check a translation from Italian into English. An Italian colleague was applying to teach Italian literature at a foreign university and had translated her course proposal herself. Such texts necessarily adopt a certain jargon and follow a certain standard style. Out of curiosity I put a paragraph of the original Italian into the translation software DeepL:

Una particolare attenzione sarà dedicata alla letteratura del Risorgimento e in particolare al genere del romanzo di formazione in senso lato, in opere in cui in cui il Bildungsroman si ibrida con il genere del romanzo storico (Ippolito Nievo, Alessandro Manzoni, Ugo Foscolo) o caratterizzate da una cornice fiabesca (Carlo Collodi, Edmondo De Amicis, Giovanni Verga). Tra i romanzi scritti dopo la prima guerra mondiale, verranno considerati romanzi di tema edipico (Alberto Moravia, Elsa Morante) e un romanzo “medio” come Il giardino dei Finzi-Contini di Giorgio Bassani. Per la contemporaneità, verranno considerati romanzi di formazione mancata di Niccolò Ammaniti ed Elena Ferrante.

Generally considered one of the most advanced machine translation tools, DeepL offered:

Special attention will be devoted to the literature of the Risorgimento and in particular to the genre of the broadly defined bildungsroman, in works in which the Bildungsroman is hybridized with the genre of the historical novel (Ippolito Nievo, Alessandro Manzoni, Ugo Foscolo) or characterized by a fairy-tale setting (Carlo Collodi, Edmondo De Amicis, Giovanni Verga). Among the novels written after World War I, the following will be considered novels with an Oedipal theme (Alberto Moravia, Elsa Morante) and an “average” novel such as Il giardino dei Finzi-Contini by Giorgio Bassani. For the contemporary, novels of formation will be considered missed by Niccolò Ammaniti and Elena Ferrante.

To think that this is achieved in an instant is impressive. DeepL has even eliminated the unhappy repetition of “particular” in the first sentence. However, there are glaring mistakes: “the following will be considered novels with an Oedipal theme” is miles away from the Italian “verranno considerati romanzi di tema edipico,” literally “novels with an Oedipal theme will be considered.” An “average” novel surely cannot be what was meant as a description of Bassani’s masterpiece, The Garden of the Finzi-Continis. The last sentence is incomprehensible.

If we put the same paragraph into Google Translate, the most commonly used translation software, and look at these three moments, we have: “novels with an Oedipal theme (Alberto Moravia, Elsa Morante) and an ‘average’ novel such as The Garden of the Finzi-Continis by Giorgio Bassani will be considered.” Google makes the same error with “average” but gets the verb use right, albeit in a structure that feels stretched, throwing in the passive future at the end of a long sentence. As for the last sentence, it gives: “For the contemporary, novels of a lack of training by Niccolò Ammaniti and Elena Ferrante will be considered.” This can’t be right.

Smaller details are equally puzzling. DeepL does not capitalize Bildungsroman on first use, but does on the second. This is presumably because the first use is a correct translation of the Italian “romanzo di formazione,” the Italian phrase for bildungsroman. But then the Italian uses the word “Bildung­sroman” and the translation keeps the capital, unperturbed by the inconsistency.

Both software programs offer “a fairy-tale setting,” but the Italian “una cornice fiabesca” suggests that we are not talking about a setting and even less a fairy tale, but about the framing of the novel as fable. And while Google was superior to DeepL when translating “veranno considerati,” it makes a syntactical hash with the verbs in the first sentence:

in works in which the Bildungsroman hybridizes with the genre of the historical novel (Ippolito Nievo, Alessandro Manzoni, Ugo Foscolo) or characterized by a fairy-tale setting

For some reason an “is” is missing: “is characterized by.” In parenthesis, it’s hard to imagine an original English text using the metaphor “hybridize” in this way.

*

At this point it’s worth stopping to reflect on how this software works. It does not recognize the meaning of the words and choose words that mean the same thing in the other language. It does not “understand” the text. It does not know what is being talked about. Its database contains huge quantities of already existing translations, some good, some not so good. The software identifies clusters of words and syntactical patterns, searches for the same clusters and patterns in its database, finds how they have been translated in the past, and, using its statistical competence, predicts the correct, or a correct, translation.

Advertisement

So when DeepL gives “the following will be considered novels with an Oedipal theme,” this is because any number of standard Italian texts will have clusters of the variety “veranno considerati romanzi storici le opera di Walter Scott”—which will have been correctly translated as “Walter Scott’s works will be considered historical novels.” However, because English usually proceeds with a subject before the verb, whereas in Italian the subject can be implied but not stated, the software has introduced “the following,” which is then understood to refer to the novelists named in parenthesis. In short, it has misidentified its cluster. My colleague’s translation here read: “the course will consider the ‘Oedipal’ novels of Alberto Moravia and Elsa Morante.” The problem for the software is that it cannot know that the sentence makes no sense. It cannot ask itself what has gone wrong.

Speaking of Il giardino dei Finzi-Contini, it is intriguing that DeepL does not translate the title, perhaps because there are English translations where the text is discussed with its Italian title. But Google does. They have different databases, no doubt. Both, however, describe the novel as “average.” In fact, the Italian spoke of “un romanzo ‘medio,’” which is jargon in a certain area of literary criticism for a novel meant for a large, middlebrow public. The Italian deployed inverted commas to point up this special usage. The machine’s algorithms were apparently unable to account for these, and went with the translation of “medio” used in an overwhelming majority of precedents. It has no inkling that describing The Garden of the Finzi-Continis as average can only raise a smile.

The final sentence of the Italian reads:

Per la contemporaneità, verranno considerati romanzi di formazione mancata di Niccolò Ammaniti ed Elena Ferrante.

Per la contemporaneità here can only have the sense “as regards contemporary literature.” A “romanzo di formazione,” as we have seen, is a bildungsroman, a novel that considers the “formation” of the protagonist. But in this case the “formazione” is “mancata,” or manqué, as the French would say—“failed.” The idea then is a bildungsroman whose protagonist does not achieve the desired maturity. “Novels of formation will be considered missed,” offers DeepL, missing the point. “Novels of a lack of training,” gives Google, lacking the training to do any better. It’s worth noting that these translation tools never signal that they are not sure about a meaning. The user, that is, is not alerted to the fact that one prediction has a sounder statistical basis than another.

Of course, the chief editor of Orville Press did not say that machine translation was good enough now. He said in the future. None of the errors I’ve described mean machine translation is useless. In fields where such software is used, a human editor (known as a post-editor) typically goes through the automated translation to pick up mistakes and incongruities and sort them out.

So what of the future? If you ask the celebrated ChatGPT what the difference is between its knowledge and human knowledge, it replies:

As an AI language model, my knowledge is derived from the data and algorithms that were used to train me, whereas human knowledge comes from experience, learning, and understanding gained through observation, experimentation, and analysis.

One significant difference between human knowledge and my knowledge is that humans have the ability to directly experience the world through their senses, whereas I only have access to information that has been inputted into my system. While I can process vast amounts of data quickly and efficiently, I do not have the ability to interpret that data in the same way that humans can.

This acknowledgment would seem to place a limit on what can be done with machine translation as we know it. The software’s database can be expanded, its ability to select appropriate clusters of words and then predict a correct translation can be fine-tuned, but there remains the problem that the software does not experience texts or the world the texts refer to. Phrases which have been frequently used and translated before will be translated in the same way; everything that is new or unusual will present a problem. The software can hardly be expected to look for an innovative way of expressing such novelty in its target language. It does not get excited by novelty. It has no investment in renewing the language. It cannot savor the text it produces.

*

This brings us to a larger problem, beyond issues of accuracy. The style that Italian academics use in their syllabi is rather different from the style used by British or American academics. And the difference between academic copy and a tourist brochure, art catalogue, or political speech is greater still. The software cannot recognize this context; it has not been trained to reframe a text in a particular style, genre, or format. Nor is it in the brief of the post-editor to start reorganizing all the syntax as professional translators often do; if it were, the process might well take even longer than old-fashioned manual translation. Thus the widespread use of machine translation will very likely fill the world with texts that may be grammatically correct and even semantically accurate, yet nevertheless alien to the spirit of the language they were written in. I have often used a passage from a rather lush Italian tourist brochure as a model in my translation courses:

Advertisement

La limpida poesia del paesaggio circostante su cui scendono tramonti dolcissimi, la terra ubertosa con lunghi filari di pioppi e pigre correnti di fiumi e canali, la gente vigorosa e laboriosa della vasta zona agricola ed industriale (semplice e tenace nelle proprie tradizioni) fanno come da corona al gruppo storico della città che la saggezza esemplare delle amministrazioni locali ha opportunamente rispettato.

Here is DeepL’s version:

The limpid poetry of the surrounding landscape over which sweetest sunsets descend, the uber-rich land with long rows of poplars and lazy streams of rivers and canals, the vigorous and hard-working people of the vast agricultural and industrial area (simple and tenacious in their traditions) act as a crown to the historical group of the city that the exemplary wisdom of the local government has duly respected.

And here is my translation:

The lyrical beauty of the surrounding landscape with its soft-hued sunsets, the rich, arable fields bordered by poplar rows and gently flowing waterways, add a crowning aura to a city whose ancient center has very sensibly been preserved intact. Meantime, within and without the town, the energetic, hard-working people of the busy North Italian plain carry on their simple, time-honored traditions.

One translates a tourist brochure with its seductive function in mind, adapting the content to the target culture. If DeepL is more faithful in recording “the exem­plary wisdom of the local government,” a professional translator has the advantage of knowing that this flattery (the brochure was indeed produced for the local government) will hardly be effective in persuading British and American tourists to visit the charming city of Mantua.

New York Review subscription offer with free calendar

Give the gift they’ll open all year.

Save 65% off the regular rate and over 75% off the cover price and receive a free 2025 calendar!

© 1963-2024 NYREV, Inc. All rights reserved.