Sensible: L2 Translation Assistance by Emulating the Manual Post-Editing Process

This paper describes the Post-Editor Z sys-tem submitted to the L2 writing assistant task in SemEval-2014. The aim of task is to build a translation assistance system to translate untranslated sentence fragments. This is not unlike the task of post-editing where human translators improve machine-generated translations. Post-Editor Z emulates the manual process of post-editing by (i) crawling and extracting parallel sentences that contain the untranslated fragments from a Web-based translation memory, (ii) extracting the possible translations of the fragments indexed by the translation memory and (iii) applying simple cosine-based sentence similarity to rank possible translations for the un-translated fragment.


Introduction
In this paper, we present a collaborative submission between Saarland University and Nanyang Technological University to the L2 Translation Assistant task in SemEval-2014. Our team name is Sensible and the participating system is Post-Editor Z (PEZ). The L2 Translation Assistant task concerns the translation of an untranslated fragment from a partially translated sentence. For instance, given a sentence, "Ich konnte Bärbel noch on the border in einen letzten S-Bahn-Zug nach Westberlin setzen.", the aim is to provide an appropriate translation for the underline phrase, i.e. an der Grenze.
The aim of the task is not unlike the task of post-editing where human translators correct errors provided by machine-generated translations. This work is licensed under a Creative Commons Attribution 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http://creativecommons.org/licenses/by/4.0/ The main difference is that in the context of postediting the source text is provided. A translation workflow that incorporates post-editing begins with a source sentence, e.g. "I could still sit on the border in the very last tram to West Berlin." and the human translator is provided with a machine-generated translation with untranslated fragments such as the previous example and sometimes "fixing" the translation would simply require substituting the appropriate translation for the untranslated fragment.

Related Tasks and Previous Approaches
The L2 writing assistant task lies between the lines of machine translation and crosslingual word sense disambiguation (CLWSD) or crosslingual lexical substitution (CLS) (Lefever and Hoste, 2013;Mihalcea et al. 2010). While CLWSD systems resolve the correct semantics of the translation by providing the correct lemma in the target language, CLS attempts to provide also the correct form of the translation with the right morphology. Machine translation tasks focus on producing translations of whole sentences/documents while crosslingual word sense disambiguation targets a single lexical item.
Previously, CLWSD systems have tried distributional semantics and string matching methods (Tan and Bond, 2013), unsupervised clustering of word alignment vectors (Apidianaki, 2013) and supervised classification-based approaches trained on local context features for a window of three words containing the focus word (van Gompel, 2010;van Gompel and van den Bosch, 2013;Rudnick et al., 2013). Interestingly, Carpuat (2013) approached the CLWSD task with a Statistical MT system .
Short of concatenating outputs of CLWSD / CLS outputs and dealing with a reordering issue and responding to the task organizers' call to avoid implementing a full machine translation system to tackle the task, we designed PEZ as an Automatic Post-Editor (APE) that attempts to resolve untranslated fragments.

Automatic Post-Editors
APEs target various types of MT errors from determiner selection (Knight and Chander, 1994) to grammatical agreement (Mareček et al., 2011). Untranslated fragments from machine translations are the result of out-of-vocabulary (OOV) words.
Previous approaches to the handling of untranslated fragments include using a pivot language to translate the OOV word(s) into a third language and then back into to the source language, thereby extracting paraphrases to OOV (Callison-burch and Osborne, 2006), combining sub-lexical/constituent translations of the OOV word(s) to generate the translation (Huang et al., 2011) or finding paraphrases of the OOV words that have available translations (Marton et al., 2009;Razmara et al., 2013). 1 However the simplest approach to handle untranslated fragments is to increase the size of parallel data. The web is vast and infinite, a human translator would consult the web when encountering a word that he/she cannot translate easily. The most human-like approach to post-editing a foreign untranslated fragment is to do a search on the web or a translation memory and choose the most appropriate translation of the fragment from the search result given the context of the machine translated sentence.
The PEZ system was designed to emulate the manual post-editing process by (i) first crawling a web-based translation memory, (ii) then extracting parallel sentences that contain the untranslated fragments and the corresponding translations of the fragments indexed by the translation memory and (iii) finally ranking them based on cosine similarity of the context words.

System Description
The PEZ system consists of three components, viz (i) a Web Translation Memory (WebTM) crawler, (ii) the XLING reranker and (iii) a longest ngram/string match module.

WebTM Crawler
Given the query fragment and the context sentence, "Die Frau kehrte alone nach Lima zurück", the crawler queries www.bab.la and returns sentences containing the untranslated fragment with various possible tranlsations, e.g: • isoliert : Darum sollten wir den Kaffee nicht isoliert betrachten.
The retrieval mechanism is based on the fact that the target translations of the queried word/phrase are bolded on a web-based TM and thus they can be easily extracted by manipulating the text between <bold>...</bold> tags. Although the indexed translations were easy to extract, there were few instances where the translations were embedded betweeen the bold tags on the web-based TM.

XLING Reranker
XLING is a light-weight cosine-based sentence similarity script used in the previous CLWSD shared task in SemEval-2013 (Tan and Bond, 2013). Given the sentences from the WebTM crawler, the reranker first removes all stopwords from the sentences and then ranks the sentences based on the number of overlapping stems.
In situations where there are no overlapping content words from the sentences, XLING falls back on the most common translation of the untranslated fragment.

Longest Ngram/String Matches
Due to the low coverage of the indexed translations on the web TM, it is necessary to extract more candidate translations. Assuming little knowledge about the target language, human translator would find parallel sentences containing the untranslated fragment and resort to finding repeating phrases that occurs among the target language sentences.
By simply spotting the repeating word/string from the target language sentences it is possible to guess that the possible candidates for "history book" are Geschichtsbücher or Geschichtsbüchern. Computationally, this can be achieved by looking for the longest matching ngrams or the longest matching string across the target language sentences fetched by the WebTM crawler.

System Runs
We submitted three system runs to the L2 writing assistant task in Semeval-2014.
1. WebTM: a baseline configuration which outputs the most frequent indexed translation of the untranslated fragment from the Web TM. 2. XLING: reranks the WebTM outputs based on cosine similarity. 3. PEZ: similar to the XLING but when the WebTM fetches no output, the system looks for longest common substring and reranks the outputs based on cosine similarity.

Evaluation
The evaluation of the task is based on three metrics, viz. absolute accuracy (acc), word-based accuracy (wac) and recall (rec). Absolute accuracy measures the number of fragments that match the gold translation of the untranslated fragments. Word-based accuracy assigns a score according to the longest consecutive matching substring between output fragment and reference fragment; it is computed as such: Recall accounts for the number of fragments for which output was given (regardless of whether it was correct). Table 1 presents the results for the best evaluation scores of the PEZ system runs for the English to German (en-de), English to Spanish (enes), French to English (fr-en) and Dutch to English (nl-en) evaluations. Figure 1 presents the word accuracy of the system runs for both best and out-offive (oof) evaluation 2 .

Results
The results show that using the longest ngram/string improves the recall and subsequently the accuracy and word accuracy of the system. However, this is not true when guessing untranslated fragments from L1 English to L2. This is due to the low recall of the system when searching for the untranslated fragment in French and Dutch, where the English words/phases indexed in the TM is much larger than other languages.

Error Analysis
We manually inspected the English-German outputs from the PEZ system and identified several particularities of the outputs that account for the low performance of the system for this language pair.

Weird Expressions in the TM
When attempting to translate Nevertheless in the context of "Nevertheless hat sich die neue Bundesrepublik Deutschland unter amerikanischem Druck an der militrischen Einmischung auf dem Balkan beteiligt." where the gold translation is Trotzdem or Nichtsdestotrotz. The PEZ system retrieves the following sentence pairs that contains a rarely used expression nichtsdestoweniger from a literally translated sentence pair in the TM: • EN: But nevertheless it is a fact that nobody can really recognize their views in the report.
Another example of weird expression is when translating "husband" in the context of "In der Silvesternacht sind mein husband und ich auf die Bahnhofstraße gegangen.". PEZ provided a lesser use yet valid translation Gemahl instead of the gold translation Mann. In this case, it is also a matter of register where in a more formal register one will use Gemahl instead of Mann.

Missing / Additional Words from Matches
When extracting candidate translations from the TM index or longest ngram/string, there are several matches where the PEZ system outputs a partial phrase or phrases with additional tokens that cause the disparity between the absolute accuracy and word accuracy. An instance of missing words is as follows: • Input: Eine genetische Veranlagung plays a decisive role.
For the addition of superfluous words is as follows: • Input: Geräte wie Handys sind not permitted wenn sie nicht unterrichtlichen Belangen dienen.

Case Sensitivity
For the English-German evaluation , there are several instances where the PEZ system produces the correct translation of the phrase but in lower cases and this resulted in poorer accuracy. This is unique to German target language and possibly contributing to the lower scores as compared to the English-Spanish evaluation.

Conclusion
In this paper, we presented the PEZ automatic post-editor system in the L2 writing assistant task in SemEval-2014. The PEZ post-editing system is a resource lean approach to provide translation for untranslated fragments based on no prior training data and simple string manipulations from a webbased translation memory.
The PEZ system attempts to emulate the process of a human translator post-editing outof-vocabulary words from a machine-generated translation. The best configuration of the PEZ system involves a simple string search for the longest common ngram/string from the target language sentences without having word/phrasal alignment and also avoiding the need to handle word reordering for multi-token untranslated fragments.