Content Architecture for Multilingual Delivery

The Source Content Problem

Most localisation quality problems are attributed to the translation process — poor MT output, inadequate post-editing, vendor quality variation. The more common root cause is source content quality. Content that is difficult to translate is not translated well — regardless of the translator's skill, the MT engine's capability, or the post-editing effort applied.

Difficult-to-translate source content has identifiable characteristics: idiomatic expressions that are language-specific and do not translate literally; complex sentence structures that create syntactic ambiguity; embedded cultural references that require cultural adaptation rather than linguistic translation; inconsistent terminology that forces translators to make terminology decisions that should have been made upstream; and long sentences that exceed the processing capacity of MT models, producing translation errors that multiply with sentence length.

Controlled Language Principles for Localisation-Ready Content

Sentence length: Maximum 25 words per sentence for MT-destined content, 20 words for direct human translation. Long sentences increase translation error rates exponentially. One idea per sentence. Active voice: Passive constructions are consistently more difficult to translate and more likely to produce MT errors. Active voice is more translatable, more readable, and produces better MT output. Terminology consistency: Every product name, feature name, and technical term used consistently throughout the source content and aligned with the approved term base. Terminology inconsistency in source content produces inconsistent translations that cannot be resolved by post-editing alone.

Structural Design for Reuse

Content designed as reusable components reduces localisation volume — and therefore cost — by enabling translation reuse. A component translated once can be assembled into multiple content experiences without retranslation. The localisation cost saving from component reuse is proportional to the reuse rate: a component reused five times across five different content assemblies requires one translation, not five.

Key Takeaways

1. Most translation quality problems originate in source content, not in the translation process — investing in source quality is the highest-leverage localisation improvement.

2. Controlled language principles — sentence length limits, active voice, terminology consistency — directly improve translation quality and MT performance.

3. Component-based content architecture reduces localisation cost through translation reuse — a component translated once can be assembled into multiple content experiences without retranslation.