LLM Token Costs: The 'English Language Tax' Is Real, Says New Analysis

A recent deep dive into the tokenization mechanics of leading large language models (LLMs) has uncovered a substantial cost disparity based on the language used for input. Dubbed the “non-English tax,” the analysis demonstrates that prompting models like OpenAI’s GPT and Anthropic’s Claude in languages other than English can lead to significantly higher token consumption, directly impacting operational expenses.

English serves as the baseline, with a 1x token multiplier. However, this efficiency plummets for other languages. For instance, Hindi input incurs a 1.37x multiplier on OpenAI and nearly 3x on Anthropic. Arabic sees 1.31x on OpenAI and almost 3x on Anthropic, while Chinese is 1.15x and 1.71x, respectively. Spanish, surprisingly, performs relatively well among non-English languages, with a 1.2x multiplier on OpenAI, though it jumps to approximately 1.62x on Anthropic. Other languages like French (1.79x), Russian (2x), and particularly Arabic (up to 3x) and Hindi (3.24x) on some models, show stark increases. For example, a Spanish phrase costing 19 tokens on a GPT-based tokenizer might cost 27 tokens on an Anthropic tokenizer.

The underlying reason for this linguistic inefficiency is attributed to the primary training datasets and tokenizer optimizations. LLMs are predominantly trained on vast English corpuses, and their tokenizers are consequently optimized for English linguistic structures. This leads to less efficient token segmentation for other languages, as they may require more tokens to represent the same semantic content. While this issue is model-agnostic, some LLMs exhibit specific strengths; Chinese models like Gemini, DeepSeek, and Kimi demonstrate superior tokenization for Chinese input due to their specialized training and tokenizer design. Conversely, Spanish often emerges as the second most efficient language overall after English, excluding specialized Chinese models, owing to its widespread use and structural similarities to English.

To mitigate these elevated costs, two primary strategies are recommended. The most straightforward is to leverage English for all model interactions, ensuring the lowest possible token expenditure. Additionally, an unconventional but effective technique involves using a highly simplified, ‘caveman-like’ communication style. By stripping prompts of complex grammar and vocabulary, users can drastically reduce token counts – a response that might naturally take 70 tokens could be reduced to 20 tokens with a ‘uga uga’ approach, yielding up to a 75% cost saving. This highlights the critical importance of prompt engineering and language choice in managing LLM operational costs.