The past and present of translation memory technology

Camille Avila 06 Feb 2019 10 min read

The idea of a translation memory had been considered as early as the 1970s and developed further in the 1980s. However, it wasn’t until the 1990s that the breakthrough in translation memories came about for SDL, with Translator’s Workbench for Windows. This was the first truly widely used TM engine, first in 16-bit from 1995, then in 32-bit from 1998 (previous generations were for too small communities – DOS-based, although DOS had some early success in the early 90s).

Why did we have this breakthrough? Machine translation at this time was evolving but the quality was considered too poor. Windows PCs were also becoming mainstream in organizations and private homes, so both freelance translators, as well as translators in organizations, started adopting more technology to help them cope with the rise of digital content. Additionally, having solutions dedicated to audiences with specific needs was seen as a big positive – e.g. Freelance Edition.

You could say that translation memories are both the heart and the brain of the CAT tool. However, this technology was initially received with some scepticism. Fast forward to the present day and it’s hard to imagine life without translation memory since the 1990s Trados has maintained its translation memory development, always looking to improve its use for our customers.


The evolution of the translation memory

When SDL acquired Trados in 2005, the translation memory was redesigned from the ground up for Trados Studio and Trados GroupShare. One of our key goals was to plug the gaps customers had reported with the Workbench TM engine over the years. These included concordance search in target language, introducing the concept of context and structure match, having a fully XML-standards based engine etc.


Extending translation memory capabilities

Our translation memories are extremely versatile and have evolved over the years to offer more productivity-focused features. Let’s use AutoSuggest Dictionaries as an example.

These are created from your translation memory content and provide you with phrases or fragments via AutoSuggest during the translation process itself. We then have Concordance Search which searches for words or chunks of text inside a translation memory that do not appear as matches from a termbase or other sources.

AutoSuggest Dictionaries and Concordance Search should be fairly well known for everyday CAT tool users but as SDL TMs have evolved there are some more intricate features that are also very useful, which you might not be aware of.

As well as supporting segment-based, our translation memories support paragraph-based segmentation which can be useful when translating from or into Asian languages where the sequence of the thought process can be different from Western languages, and so often it is better to translate paragraphs rather than segments. Interestingly, paragraph-based segmentation could have a bit of a comeback with neural machine translation (NMT), as it could ensure that translators see the entire context of a paragraph rather than translating segment by segment.

We also have the ability to provide context in a TM through the use of Document Structure, which is unique to Trados. What this means is we don’t just differentiate Context Matches, we can also use structural context that is in the document (index marker, heading, list item, etc). Often, it can be necessary to translate segments differently depending on their structural context. For instance, an index entry will be written in lower case in English, whereas the same segment would need upper case in a heading.


Flexibility

The flexibility of our translation memories can be really seen in our industry-unique App Store. Trados Studio itself allows various ways for you to manage and maintain your TMs but you can benefit from more advanced ways of working with various apps. For example, you can get source text, target text, source and target text, and all represented in different file formats with apps such as the ones listed below:

  1. SDLXliff2Tmx
  2. TmConvert

With the increasing focus on data and data protection, we can even offer the ability to anonymize data in your TM, with the Trados Data Protection Suite app available to download from the AppStore.


Scalability

At Trados we have always been passionate about what we call ‘scale up and down’. This means that for us it’s key to have a translation memory that not only scales up to hundreds of users at the same time – it’s equally important to have a solution that scales down to the individual user working locally on a PC that might not even be connected to the internet – and any scenario in between.

In all cases, the experience and performance must be as good as possible. For this to happen, it needs a design approach where you need different storage mechanisms and ways of working in the software. We refer to this as a ‘file-based’ way of working in a local desktop environment and ‘server-based’ where several users share the same resource at the same time.

Our file-based TMs are ideal and very efficient for individual users or very small teams up to a maximum of three, from there on and for optimal efficiency a server-based product is available.

Our server-based TMs can serve hundreds of users (Trados Studio and Trados GroupShare) and ensure more consistent translations by providing controlled, time-limited access to centralized translation memories. By being able to share assets in real time during translation, it increases the rates of content reuse which is not possible in a desktop-only environment.

By offering both file-based translation memories with extended productivity functionality and server-based sharing, TM collaboration is grounded in the different customer interactions that support the freelance translator as well as LSPs and Corporations that deal with large volumes of translation projects in ever-increasing turnaround times.


The emergence of upLIFT translation memory technology

After many years of continuous evolution of the TM, the launch of Trados Studio 2017 marked a real milestone with the introduction of upLIFT technology turning the ‘workhorse’ of a CAT tool into something even more intelligent.

Earlier in this blog, we talked about AutoSuggest Dictionaries and Concordance Search as great productivity extensions of the TM. However one downside was the manual interaction to set them up and work with them, this all changed with upLIFT technology or ‘Fragment Recall’.

The underlying technology of Fragment Recall is a process called fine-grained alignment. Since a TM contains pairs of aligned segments – that is, translation memory units (TUs) – operations at the segment level are straightforward, such as fuzzy matching a segment and retrieving the stored translation proposal. Operations below segment level are more challenging, such as matching just part of a TU segment (e.g. a phrase or term within a sentence) and retrieving the corresponding part of the translation. This all changed in Trados Studio 2017 as Fragment Recall made it possible to see these Whole TU fragments automatically without the user having to do anything.

Since that launch in 2016, Fragment Recall has been refined and improved. You can now see through icon tips where the fragment match originated from and you also have the ability to reject fuzzy matches that have automatically been repaired by Trados Studio as part of the Fuzzy Match Repair functionality.

And the improvements have not stopped there. A new feature called LookAhead introduced with Service Release 1 of Trados Studio 2017 provides faster access to translation memory (TM) search results by retrieving TM results in the background. When you move to a segment that you are translating, Trados Studio performs a look-up on the following two segments while you’re working on the active segment. The benefit? Almost instantaneous results every time you change segments, as the search results (if any) will already have been ‘retrieved’ for you.


Making it easier to add new content

Of course, managing and working with your translation memories is important but additionally getting the content into them is just as important.

Whether you are new to CAT tools or not, translation alignment is an efficient way to create translation assets straight away by making use of existing content to create translation memories. In Service Release 1 for Trados Studio 2019, we have made the process of aligning content much more versatile and easier to use by adding new alignment selection and connection capabilities, as well as advanced split and search functionalities.


Improving translation memory functionality even further

We have improved the accuracy of both context and fuzzy matches to provide more matches than ever before. Not only have we improved the way context matches are calculated to achieve higher accuracy matches but we have enhanced stemming for western languages providing better fuzzy matching.

In addition, we made improvements in recognizing half/full-width characters for the Japanese language, typical in DTP situations, which we believe is a real step forward for this market.

This latest Service Release shows that refining TM has not stopped and it is still possible to enhance it even further. It’s great to see how big innovations – such as AutoSuggest and upLIFT Fragment Recall and Fuzzy Match Repair – have been joined by smaller developments – such as improved stemming/fuzzy matching in Trados Studio 2019, making it possible to dramatically increase leverage from TMs.

As you can see, the translation memory has come a long way over the years. Translation memories continue to be developed with new innovations and features, making it easier than ever before to use and manage your TM.

Camille Avila
Author

Camille Avila

Senior Product Marketing Manager
Camille is a Senior Product Marketing Manager at RWS, with eight years of experience in the localization industry. Currently, she oversees the Trados portfolio, with a dedicated focus on translation technology for the corporate markets. Her role is to assist corporations in effectively communicating with their customers by ensuring their content is understood, in any language.
All from Camille Avila