Methodology — Localisation Cost Estimator | ECM.DEV

What this page is

This is not a brochure. It is the working paper that underlies the Localisation Cost Estimator at ecm.dev/assessment/localisation-cost. It sets out every coefficient the model uses, where each one comes from, and why we think it is roughly right. Where we are not sure, we say so.

The estimator is a research preview. That status is not a disclaimer — it is the truth about where the model sits today. Version 0.2 of the model was built from public benchmarks, published LLM pricing, and informed estimates based on ECM.dev's own work with content operations. It has not yet been pressure-tested against a large dataset of real organisational spend, which is why every output carries a ±30% confidence band and why every user is invited to tell us whether the number they see feels right.

The model will be refined quarterly. Each refresh will appear as a dated entry in the changelog at the bottom of this page, showing what changed and why. Where user feedback has pulled a coefficient in a particular direction, the changelog will say so.

We publish all of this because we think it is the only honest way to do it. A cost estimator that hides its assumptions is not a cost estimator; it is a rhetorical device.

Why six layers

Most localisation cost calculators model two things: the number of words, and the cost per word. That approach describes a supply chain built in 2005. It is not wrong, exactly; it is just radically incomplete.

Real content operations carry five kinds of cost beyond per-word translation. The tool surfaces all six:

Translation: The per-word layer that traditional models show. Human rates, machine translation with post-edit, AI translation with evaluation — blended into a single effective rate that depends on the organisation's AI maturity.
Production: Project management, linguistic quality assurance, in-country review, desktop publishing, multimedia adaptation, engineering integration. Typically 25–50% on top of translation in an AI-augmented operation, 30–60% in a traditional one. Usually invisible in vendor quotes because it is buried inside a per-word rate.
Channel adaptation: The same translated asset costs more to finish for video than for a knowledge base than for a UI string than for a voice assistant. AI compresses this curve for text-heavy channels and is starting to do the same for audio, but raises it for brand-sensitive channels like marketing.
System and governance: TMS subscriptions, connectors, glossary management, brand governance, vendor management overhead, internal review cycles. Persistent and hidden.
AI operations: The cost centres introduced by AI-in-the-loop: LLM API spend, prompt library maintenance, evaluation infrastructure, AI governance, human-in-the-loop review of AI output, regulatory compliance overhead where it applies. In mature operations this is 8–18% of total spend; in immature operations where AI is bolted onto legacy workflows, it can be higher and deliver less value.
Friction: Rework, version drift, missed reuse, fragmented vendor brief-back, duplicate AI spend across teams, brand voice erosion from unreviewed AI output. Usually 15–25% of total spend, almost never measured.

The friction layer is why the model exists. It is the category of cost that legacy calculators cannot see and that operators know is there but cannot quantify.

The coefficients, layer by layer

All monetary values are USD at a 1M-word reference scale. Volume-sensitive coefficients are multiplied by a scale factor explained further down. User-facing display converts to EUR via a snapshot FX rate (refreshed weekly in production).

Translation layer

The translation unit rate is a blended portfolio rate across the language pairs English to French, German, Spanish, Italian, Dutch, Japanese, Chinese (simplified), Korean, Polish, Portuguese, and Swedish — the language set covered by most European-enterprise localisation operations. The blend weights are approximately equal, with slight emphasis on the major European languages.

Three unit rates underlie the translation layer:

Human translation: $0.16 per source word.Source: CSA Research 2024 Rate Survey (as reported in Slator's 2024 LSP Pricing Review); cross-referenced with Nimdzi Insights 2024 rate benchmarks. Refreshes expected to pull this slightly upward in 2026 based on tightening linguist supply in some language pairs.
Machine translation with human post-edit: $0.06 per source word. Source: Slator 2024 MT Pricing Report, blended across major LSP tiers. This figure has been falling steadily as MT quality improves and post-edit effort per word declines; the 2026 refresh will likely adjust downward.
AI translation with evaluation pass: $0.03 per source word. Source: ECM.dev estimate, calibrated against published LLM API pricing and typical prompt-plus-eval workflow cost. This is the least well-anchored unit rate in the model because public benchmarks for production-grade AI translation pricing remain thin. User feedback is especially valuable here.

The effective translation rate is a weighted blend of these three unit rates, with weights determined by the organisation's AI maturity level (see “How AI maturity reshapes cost” below).

Translation memory leverage is applied as a discount on the blended rate, ranging from 10% for rare or one-off content to 40% for continuously updated content. Source: CSA/Nimdzi TM-leverage benchmarks and ECM.dev project experience.

Production layer

Production adds a multiplier on top of translation, depending on content type. The model carries two values per content type — a base multiplier for a traditional operation and an AI-mature multiplier for an operation where AI has compressed part of the production work. The effective multiplier interpolates between the two, weighted by AI maturity level.

Marketing: 1.60 base, 1.35 AI-mature. Transcreation and brand review resist automation. Source: CSA project-premium survey plus ECM.dev estimate.
Product and UI: 1.40 base, 1.20 AI-mature. String management, engineering integration, contextual QA. Source: Slator tech-localisation benchmarks.
Support and knowledge base: 1.30 base, 1.10 AI-mature. Among the highest AI compression because review is lighter. Source: ECM.dev estimate.
Legal and regulatory: 1.80 base, 1.65 AI-mature. In-country legal review remains substantially human. Source: ECM.dev estimate; limited public benchmarks.
Video: 3.20 base, 2.60 AI-mature. Subtitling, dubbing, sync, audio QA. Source: Slator 2024 Media Localisation Report.
Training and learning: 1.60 base, 1.35 AI-mature. Multimedia plus pedagogical review. Source: Nimdzi LMS benchmarks.

Channel adaptation layer

Channel factors are multipliers applied to translated volume. The web channel is the baseline at 1.0x. Factors range from 0.95x for email (the most AI-compressible channel) to 2.60x for video/audio. Each channel carries a base and AI-mature factor; the model interpolates by maturity, as with production.

Channel factors and sources: web 1.00/1.00 (baseline); mobile app 1.10/1.05 (UI constraints); in-product strings 1.20/1.10 (engineering integration); video/audio 2.60/2.10 (Slator media-loc 2024); print 1.30/1.20 (DTP, proofing); email 1.00/0.95 (most compressible); social 1.10/0.95 (short-form, AI-adaptable); voice and chat 1.40/1.15 (emerging benchmarks).

The in-product, voice/chat, and social factors carry the most uncertainty because public benchmarks are thin for these channels in a post-AI operating model.

System and governance layer

Four components:

The base tooling cost is $30,000/year at the 1M-word reference scale, scaled by the volume scale factor. This represents a typical mid-market TMS subscription plus connectors, glossary tooling, and integration maintenance. Source: Slator TMS pricing 2024 plus ECM.dev estimate. The scale factor ensures a 100,000-word operation is not charged enterprise-level tooling cost; a 10M-word operation carries tooling commensurate with its scale.

Glossary management runs at 3% of translation cost — a CSA governance benchmark for well-maintained operations. Poorly-maintained glossaries cost more in rework, which the friction layer captures.

Vendor management overhead runs at 7% of translation cost. Source: CSA 2024 vendor-management survey. Single-LSP operations sit lower; multi-LSP or in-house operations sit higher.

Internal review runs at 12% of translation cost. This is an ECM.dev estimate based on client patterns; the figure varies widely in practice (from 5% in streamlined ops to 25% in review-heavy cultures), which is why the layer carries significant uncertainty.

AI Operations layer

This layer is the newest and the weakest-anchored — the one where user feedback will move coefficients most.

LLM API spend runs at $0.80 per 1,000 translated words at full AI intensity. This is a blend of Claude, GPT-4-class, and smaller models at typical loc-workflow prompting patterns. Source: published LLM API pricing at May 2025. This coefficient is almost certainly stale at the time of any given refresh; LLM pricing has historically dropped roughly 10× every 18–24 months. The quarterly refresh cadence exists primarily to address this coefficient.

Prompt library and evaluation infrastructure: $25,000/year at 1M-word reference scale, scaled by volume and modulated by AI intensity. This covers maintaining prompt sets, writing and running evaluation suites, debugging drift, and keeping the AI layer honest. Source: ECM.dev estimate.

AI governance baseline: $35,000/year at 1M-word reference scale. Brand guardrails, safety review, model operations. Scales with volume and AI intensity. Source: ECM.dev estimate.

Regulated-industry governance uplift: $80,000/year at 1M-word reference scale, applied only when the user indicates a regulated-industry context. Covers EU AI Act obligations, financial services sectoral compliance, healthcare data governance, and similar. Scales sub-linearly with volume because compliance cost has a large fixed component — a 100,000-word regulated operation still needs most of the same compliance infrastructure as a 1M-word one. Source: ECM.dev estimate; limited public benchmarks exist for AI-specific compliance overhead.

Human-in-the-loop AI review: 22% uplift on AI-touched translation work. Hallucination detection, brand voice verification, contextual spot-checks. Scales with translation cost and AI intensity. Source: ECM.dev estimate based on client patterns; genuinely varies from 10% to 40% depending on content sensitivity.

AI intensity by maturity level modulates all of the above: Level 0 = 0.0 (no AI, no AI ops cost), Level 1 = 0.3 (ad-hoc AI), Level 2 = 0.7 (systematic MT+PE), Level 3 = 1.2 (AI in creation and translation), Level 4 = 1.6 (fully AI-native). The intensity curve reflects that AI-native operations spend materially more on the AI operations layer than partially-adopted ones, even as they spend less on translation — because sophisticated AI workflows require serious investment in evaluation, governance, and prompt infrastructure.

Friction layer

The friction coefficient is derived from three inputs the user provides directly:

Friction = min(30%, 5% base + rework × 4% + fragmentation × 3% + AI coordination gap × 5%)

The base coefficient of 5% represents a well-run, integrated operation. Each additional point on the rework scale (0–3) adds 4%; each point on the tooling fragmentation scale (0–3) adds 3%; each point on the AI coordination gap (0–3) adds 5%. The highest weight sits on AI coordination because this is the single most predictive signal of an organisation that has bought AI tools but not redesigned the operating model to absorb them. The coefficient is capped at 30% — past that, the organisation is probably not running a functional operation and the model's other assumptions start to break down.

Friction cost applies to the sum of translation, production, channel adaptation, and AI operations. It does not apply to system and governance, which are largely fixed regardless of how well or badly the operation runs.

How AI maturity reshapes cost

The model carries a single AI maturity dial from Level 0 to Level 4. It does three things simultaneously:

It reshapes the translation rate blend. At Level 0, the blend is 100% human translation. At Level 2, it is 50% human, 40% machine translation with post-edit, 10% AI with evaluation. At Level 4, it is 10% human, 20% MT+PE, 70% AI+eval. The blended unit rate drops from $0.16 per word at Level 0 to $0.05 per word at Level 4 — a 3× compression on the translation layer alone.

It changes the production and channel multipliers. Each multiplier interpolates between its “base” (traditional) value and its “AI-mature” value, with the weight equal to maturity_level ÷ 4. A fully traditional operation uses base multipliers; a fully AI-native one uses mature multipliers; everywhere in between is a linear blend. The interpolation is linear for simplicity; in reality it is probably non-linear, and this is a known simplification the refresh cycle may address.

It scales the AI Operations layer. As maturity rises, AI operations cost rises materially — more LLM API spend, more eval infrastructure, more governance. This is why the tool shows the AI Operations layer growing (not shrinking) as organisations become more AI-mature. Mature AI-native operations pay for sophistication; they get it back in the other layers.

This three-way interaction is the core insight the model wants to make visible. Moving up one maturity level is not a one-dimensional improvement. It redistributes cost across layers rather than eliminating it — and in some cases (small operations, operations with high friction) the redistribution is not net-positive unless the underlying operating model is fixed first.

Volume scaling — why fixed costs are not fixed

An earlier version of this model treated base tooling, AI governance, and prompt infrastructure as flat per-year costs. That worked for 1M-word operations and larger but produced absurd results at smaller scales — a 100,000-word operation was charged the same $60,000 base tooling as a 10M-word enterprise.

Version 0.2 applies a scale factor to all nominally-fixed costs:

scale_factor = max(0.15, (volume ÷ 1,000,000)^0.6)

The 0.6 exponent produces genuine sub-linear scaling: a 10× increase in volume produces roughly a 4× increase in fixed costs, reflecting the real economics of content operations tooling. The 0.15 floor prevents the fixed-cost layers collapsing to near-zero for very small operations, which would overstate how efficiently they can be run.

The regulated-industry uplift uses a gentler blend: reg_scale = 0.5 + 0.5 × scale_factor. Compliance cost has a large genuinely-fixed component — a regulated operation needs the same basic AI Act documentation, audit trails, and governance framework whether it translates 100,000 or 10 million words per year. The fixed share of the regulated uplift is roughly half; the other half scales with volume.

Confidence band

All outputs carry a ±30% confidence band. That is wide. It is wide deliberately.

At version 0.2, the model has not been pressure-tested against a large dataset of real organisational spend. Our coefficients are sourced from public benchmarks where those exist and ECM.dev estimates where they do not. For some coefficients — notably the AI maturity rate blends at Levels 3 and 4, the AI governance figures, and the friction weights — public benchmark data is genuinely thin and our estimates carry real uncertainty.

A ±30% band is consistent with the practice in early-stage industry cost models where the direction of the number is defensible but the precise value depends on organisational specifics we cannot capture in a 25-input web form. As user feedback accumulates and coefficient drift is identified, the band will narrow. The target is ±20% by the end of the first 90 days post-launch, and ±15% by the end of the first year.

Refresh cadence and changelog commitment

The model is refreshed on a quarterly cadence. Each refresh is dated, documented, and justified in this page's changelog. Refreshes may include: updated public benchmark data (CSA, Slator, Nimdzi), updated LLM API pricing snapshots, revised coefficients informed by user feedback signals, and structural changes to the model itself where evidence warrants them.

During the first 90 days — the research preview phase — we will refresh coefficients more frequently than quarterly, informally, as user feedback accumulates. The first public quarterly refresh will consolidate those informal updates and establish the rhythm going forward.

If we make a material change that would shift a user's prior estimate by more than 15%, we will say so explicitly in the changelog and explain why. Users who provided email addresses when using the tool will be notified of significant changes.

Known limitations

The model has real limitations. We list them here because the alternative is pretending they do not exist, which is a more expensive kind of dishonesty.

Language-portfolio assumption. The translation rate blend assumes a roughly European-enterprise language portfolio. Organisations with heavily Asian, Middle Eastern, or emerging-market language mixes may see outputs that are 10–20% off in either direction. Version 0.3 is planned to address this by adding a language-portfolio weighting input.

Single maturity dial.The AI maturity model treats “maturity” as a single dial. In reality, organisations often sit at different maturity levels across different parts of their operation (sophisticated AI in support content, legacy workflow in legal, for example). Version 0.3 may introduce a per-content-type maturity input to capture this.

Friction weights. Friction weights are ECM.dev estimates and will move most as feedback accumulates. We think the direction of the weights is right (AI coordination is the highest-signal dimension, rework second, fragmentation third) but the absolute magnitudes carry real uncertainty.

Delivery-model assumption. The model assumes a blended supply of LSP and internal-team delivery. Organisations that have fully in-sourced localisation or fully AI-automated translation without any LSP relationship may see the model under- or over-counting vendor management overhead.

LLM pricing timescale.LLM pricing moves on a different timescale than localisation rates. Our quarterly refresh cadence is likely too slow to track LLM API pricing precisely. The model compensates by carrying the LLM coefficient as a clearly-flagged item with a visible “last updated” date.

Binary regulated flag. The regulated-industry input is binary. Real regulation is a spectrum — EU AI Act obligations differ from FDA 21 CFR Part 11, which differ from GDPR cross-border transfer requirements. Future versions may introduce regulation-type-specific uplifts.

How user feedback feeds back into the model

Every estimate the tool produces invites a response: does this feel right? Users can answer “too low,” “about right,” “too high,” or “not sure,” with an optional one-line field for “where we missed it.”

Feedback is captured alongside the hashed, anonymised input profile that produced it — not the user's identity, not the organisation's identity, just the pattern of inputs that led to a particular output. Over time, systematic patterns in the feedback (a particular profile consistently producing “too high” estimates, for example) tell us where coefficients need to move.

We do not treat user feedback as ground truth — a user who expected a lower number may be feeding their preference rather than their reality. But we do treat it as signal, especially when it is consistent across many users with similar profiles. Coefficient adjustments informed by user feedback will be flagged in the changelog.

If you work in content operations and want to go deeper than the feedback form allows — for example if you want to walk us through your actual spend to help pressure-test specific coefficients — we would be grateful. There is a link at the bottom of the estimator output page to book 25 minutes.

Changelog

22 April 2026 — v0.2 (research preview launch)

Fixed-cost layers (base tooling, AI governance, prompt library, regulated uplift) now scale sub-linearly with volume using scale_factor = max(0.15, (volume/1M)^0.6). Previous v0.1 treated these as flat and produced implausibly high costs for small operations.

Default scenario lowered from 1M words / 8 languages to 250k / 6 languages, representative of a mid-sized international firm rather than a global enterprise.

Base tooling baseline reduced from $60,000 (v0.1) to $30,000 (v0.2) at the 1M-word reference scale; AI governance baseline reduced from $50,000 to $35,000; prompt library baseline reduced from $40,000 to $25,000; regulated uplift reduced from $120,000 to $80,000. All four of these reductions reflect a more realistic picture of what a 1M-word operation actually spends when tooling and governance are not gold-plated.

Earlier versions

v0.1 was internal only; not publicly released.

← Back to the estimatorCorrections, challenges, and suggestions: contact us.

How the Localisation Cost Estimator works