An upLIFTing Tale, Part 2: Fragment Matching For Your Translation Memory
Figure 1 – fragment match
Figure 1 shows a screenshot of an English-French translation, in a segment where there’s no fuzzy match result, but one fragment match. How did it get there? The symbol “" means a ‘whole TU’ match, that is, the fragment was found to match a whole TU in the TM.
Figure 2 shows some of the TM content, including that TU; maybe it was the caption on a picture of Gerard Schram in an earlier document. When our document has a fragment matching the whole source text of a TU, Studio can propose a translation of that fragment just by retrieving the translation of that TU.
Figure 2 – TU providing a ‘whole TU’ fragment match
Figure 3 – ‘whole TU’ fragment search settings
Studio 2017 gives you these ‘whole TU’ matches for all TMs, including those created in earlier Studio versions. Because TMs can give a lot of fragment matches, Studio has settings to help show only those most likely to be useful, but you may want to changes those settings, with the ‘Search’ options in Project Settings outlined in Figure 3, depending on what your text and TM is like.
Figure 4 (see below) shows the fragment matches I get if I lower Minimum significant words to ‘1’; I’m now seeing a two-word match that has the very common word ‘our’. If I lower Minimum words to ‘1’, I get some one-word matches, shown in Figure 5 (see below). Usually I’d keep the minimum higher, so as not to clutter the results with ‘obvious’ translations, but for some texts, one-word matches may be more important.
Figure 4 – two-word fragment match, one ‘significant’ word
Figure 5 – one-word fragment matches
‘Whole TU’ matches can be a big help, but we can get much more from our TMs with ‘part TU’ matches. For that, we first need to ensure the TM is set up for fragment alignment. Paul Filkin has written a great article to complement the documentation on that, explaining the steps and how to process a batch of TMs, available here.
Once that’s done, we can also change the Minimum words and Minimum significant words for these matches, too – not forgetting to ensure the ‘TU fragment’ matching option is enabled – as shown in Figure 6 (see below). For this English-French translation, we then get fragment matches like the one shown in Figure 7 (see below).
Figure 6 – ‘TU fragment’ search settings
Figure 7 – ‘part TU’ fragment matches
For this segment, Studio has automatically shown us the fragment match for ‘digital marketing ecosystem’ and its translation, which is much better than (say) having to use concordance search to find it. We can also see the TU context of the match, to check the fragment translation really is suitable. In addition, because the matches come straight from the TM, if the TM content changes – say, that TU is amended, removing the ‘de’ to read “écosystème marketing numérique" instead – the fragment translation given afterwards will change, too. So, how are those fragment translations being retrieved?
The key to that is the fragment alignment information added to the TM. This is easier to visualise with a simpler example, like that shown in Figure 8. Here, we can see how the source and target words – or spans of words – have been aligned as a result of the statistical analysis process. If I have another document to translate with a sentence containing the words “a screening procedure", that will match with the words in this TU, and the alignment information in the TU enables upLIFT to propose a fragment translation suggestion, “une procédure de détection".
The same information can be used in upLIFT Match Repair. If my document contains the sentence “It seems therefore desirable to adopt a monitoring procedure", it will get a fuzzy match with this TU – and we can see that the link between ‘screening’ and ‘procédure’ tells Studio which French word corresponds to the non-matching English word, so that it can be replaced with a different French word to get a ‘repaired’ fuzzy match. There’s a lot more to upLIFT Match Repair than that, though – but we’ll see more about that in part 3 of this “upLIFTing tale" …Find out more about upLIFT