A translation memory will segment the source content into “chunks" that are then compared to the translation archive. Several technologies are used to make active and intelligent comparisons, manipulating the segments to find and ensure the best translation matches are supplied. The best matches between source and target languages are called ICE (In-Context Exact), which means the TM has more than a 100% belief that there is an exact match and it is used appropriately given the context of the usage. Lesser matches of 100%, 90%, etc. are referred to as “fuzzy matches." These are likely matches but should be reviewed by a human translator to ensure accuracy. Many translator contracts give a discount for handling fuzzy matches since there is a high probability that the suggested translation is correct and the translator needs to just confirm or tweak the suggested output.
Translation memories also improve over time. With each new translation, the TM gets smarter, increasing future translation matching rates, translation speeds, translation accuracy and consistency which save enterprises time and money. SDL has found that companies with a history of using a TM can achieve higher than a 70% ICE match rate for very technical documentation, reducing translation costs by literally millions of dollars per year.
SDL has been creating, managing and enhancing industry-leading translation memory technology for over 20 years. With SDL WorldServer being the number one management system in the industry, SDL is continuously challenged by enterprise customers to enhance TM technology to drive down their cost of translation while ensuring quality and consistency. The rest of this blog highlights the technology used to enhance the ability of TMs to correctly identify and match archived translations to new content.
The importance of context
The whole goal of a TM is to maximize the reuse of previously translated content and reduce the cost of human translation as well as improve quality and consistency gains on the translated deliverables. The challenge for the technology is to ensure that only the right translations are identified as being appropriate for the source content. In English, for example, the same word can be used as a noun and a verb; where most other languages will use two different words for this purpose. (In fact the word “set" has 464 definitions in the Webster Dictionary, and “run" has 396.) The only way to tell what word is the best translation is through the context of how it is used in a sentence. Context is created by looking at the content both before and after a particular segment – thus producing ICE matches Good TM technology must include the concept of context in matching content. This concept should apply to content across documents and document types.
Translation Memory Technologies
SDL has developed and uses many different technologies to ensure previous translations can be leveraged, but adds some specific technology to maximize this leverage without sacrificing quality. Some of these are:
- TMs are multidirectional – Multinational companies don’t just translate from one single language to many but many to many. SDL WorldServer translation memory is bi-directional meaning source and target can be interchanged. A rich TM benefits translations in both directions.
- WorldServer TM customization – Repair rules, matching rules, tokenization, penalties and other behavior can be customized to meet the specific needs of the enterprise.
- Full audit trail – For heavily regulated industries and other that want full accountability all TM changes are logged with the date and individual who made any changes to the TM rules.
- Locking / Auto-locking – ICE matches can be automatically locked to prevent translators from editing them. Similarly, any match can be manually locked / unlocked given the appropriate permissions.
- Auto Translation and Auto-Repair Algorithms – There are certain types of content submitted for translation that don’t actually require translation. If this content changes, it could result in a degraded (i.e. less than 100%) translation memory match, where nothing actually needs to be translated, resulting in a cost for translation. An example of this is a date; if a date changes in content, it will trigger the need for translation. WorldServer has algorithms that can identify content of this nature, automatically update it in the target content and repair the match to a higher percentage, even 100% in given cases (configurable). Similarly, segments that contain ONLY content that is non-translatable are automatically translated without incurring a charge.
- Auto Split/Merge – WorldServer’s translation memory also contains logic to determine whether a greater translation memory match could be achieved by splitting a given segment into two or merging a given segment with an adjacent segment. This functionality, if enabled, will automatically perform this action to maximize your translation memory reuse.
This is just a partial list of the technologies that support SDL WorldServer translation memory. Many of these technologies are protected by the over 170 patents SDL holds for language tools. We encourage organizations with large volumes of content that require translation look into the advantages that a translation management system as technically advanced as SDL WorldServer bring to the process.