Tokens per Word: GPT-5 vs Claude vs GPT-4, Measured (2026)

We ran the same seven-language passage, plus code, JSON, Markdown, emoji, and CSV samples, through five tokenizers — exact counts from tiktoken for the GPT family and from Anthropic's official count-tokens API for Claude. Here is what a word really costs, and the full dataset is free to download.

On this page

Why tokens per word decides your bill

Every large language model bills by the token, never by the word. The exchange rate between those two units is where API budgets quietly drift. Most planning guides repeat the same rule of thumb: one token is about three quarters of an English word. That figure is roughly right for English on a modern tokenizer, and increasingly wrong for everything else: other languages, source code, structured data, and emoji all convert at their own rates.

Published numbers on this are surprisingly thin, so we measured it. This article reports exact token counts for the same content across five tokenizers and three model families, with the corpus and results downloadable below. If you budget LLM usage in any language other than English, the differences are large enough to change your projections.

The dataset and how it was measured

The corpus has 13 samples. Seven are human translations of the same 94-word passage about editing, in English, Spanish, Portuguese, French, German, Chinese, and Japanese, so the cross-language comparison holds meaning constant rather than length. The other six cover the text developers actually send to models: Python, JavaScript, a JSON order record, a Markdown document, an emoji-heavy social post, and CSV numeric data.

Counts for the GPT family come from tiktoken, OpenAI's published tokenizer, so they are exact: o200k_base (GPT-5, GPT-4o, the o-series), cl100k_base (GPT-4, GPT-3.5), and the GPT-3 era p50k_base for historical contrast. Claude counts come from Anthropic's official count-tokens API endpoint, which reports the billable figure per model. The endpoint counts the whole request, so we measured the fixed message envelope (6 tokens on Opus 4.8, 7 on Sonnet 4.6 and Haiku 4.5) and subtracted it, then verified the calibration with a doubling check that came back with zero drift. Absolute Claude counts carry about one token of uncertainty; ratios are unaffected.

Gemini is excluded from the measurements because Google does not publish its tokenizer and we had no countTokens access to verify against; we would rather scope the data honestly than estimate.

Tokens per word by language

The headline table. Same passage, same meaning, five tokenizers:

LanguageWordsGPT-5 (o200k)Tokens/wordGPT-4 (cl100k)Claude Sonnet 4.6Claude Opus 4.8
English941101.17110116177
Spanish1071431.34172184256
Portuguese1021371.34176188241
French1091531.40194207275
German931591.71203245324
Chinesen/a159n/a223217216
Japanesen/a205n/a268241240

English is the cheapest language in every column: 110 tokens for 94 words on GPT-5, or about 1.17 tokens per word. The popular 0.75-words-per-token rule holds almost exactly for English prose. Spanish runs 1.34 tokens per word on the same encoding, Portuguese 1.34, French 1.40, and German, with its long compounds, 1.71. Chinese and Japanese have no whitespace word boundaries, so per-word figures are not applicable; the next section compares them on equal meaning instead.

Same meaning, different price

Because all seven passages say the same thing, the fairest question is: what does it cost to express identical meaning in each language? Taking English as the baseline:

Languagevs English, GPT-5 (o200k)vs English, GPT-4 (cl100k)vs English, Claude Sonnet 4.6
Spanish+30%+56%+59%
Portuguese+25%+60%+62%
French+39%+76%+78%
German+45%+85%+111%
Chinese+45%+103%+87%
Japanese+86%+144%+108%

On GPT-5, expressing this passage in Spanish costs 30% more tokens than in English; Portuguese costs 25% more, and Japanese 86% more. The penalty grows on older encodings: the same Spanish passage that costs +30% on o200k cost +56% on GPT-4's cl100k, and the GPT-3 era p50k encoding needed 222 tokens for it, more than double its English equivalent. Anyone running multilingual workloads inherited those legacy ratios in their intuition, and they are now badly out of date.

The o200k effect: three GPT generations

The encoding history explains the shift. p50k and cl100k were trained heavily on English; o200k doubled the vocabulary to around 200,000 tokens and allocated far more of it to non-English text. For Spanish, the progression is 222 tokens (GPT-3 era) to 172 (GPT-4) to 143 (GPT-5) for the identical passage. Chinese improved even more sharply: 223 tokens on cl100k against 159 on o200k, a 29% drop.

The improvement is not universal. Our JavaScript sample is one honest counterexample: it costs 140 tokens on cl100k and 149 on o200k, slightly more on the newer encoding. English prose and Python were essentially flat. o200k's gains went to human languages, not to code.

Claude counts twice: Opus 4.8 vs Sonnet 4.6

The least documented result in the dataset: Anthropic's count-tokens endpoint reports two distinct counting regimes across its current models. Sonnet 4.6 and Haiku 4.5 return identical counts for every sample in the corpus. Opus 4.8 reports substantially higher figures for the same text, which matches Anthropic's own migration notes that Opus 4.7 and later count tokens differently.

SampleSonnet 4.6 / Haiku 4.5Opus 4.8Opus vs Sonnet
English prose1161771.53x
Spanish prose1842561.39x
German prose2453241.32x
Python code2082541.22x
JSON2492841.14x
Chinese2172161.00x
Japanese2412401.00x

The inflation is concentrated in Latin-script text, where Opus reports roughly 1.3 to 1.5 times the Sonnet count. On Chinese and Japanese the two regimes nearly coincide. This matters for budgeting because the billable unit differs by model: Opus 4.8 at $5 per million input tokens does not cost 1.67 times Sonnet 4.6 at $3 for English prose; measured end to end it costs about 2.5 times as much per word, because each word registers as more tokens. The cost table below uses each model's own measured counts.

Code, JSON, and CSV cost more than prose

Per character, structured text is far denser than prose. Punctuation, brackets, quotes, and digits fragment into many small tokens:

SampleCharactersGPT-5 tokensTokens per 100 chars
English prose57211019.2
Markdown document63916225.4
Python code66716725.0
JavaScript code63614923.4
Social text with emoji2838831.1
JSON order record52121441.1
CSV numeric data41623757.0

CSV numeric data is the most expensive input in the corpus at 57 tokens per 100 characters, three times the density of English prose. Dates, IDs, decimals, and percent signs tokenize one fragment at a time. The practical advice: when you pipe spreadsheets or logs into a model, the character count will mislead you; count tokens on a representative chunk first, and consider summarizing or sampling numeric tables before sending them whole.

Emoji are expensive

The social-media sample packs 11 emoji into 283 characters. Each emoji costs one to three tokens on o200k, and skin-tone or compound variants cost more. The sample lands at 88 GPT-5 tokens, a per-character density between prose and code. For chat products that process social text at scale, emoji are a real line item, not a rounding error.

What a million words costs

Converting measured tokens per word into input cost at current published prices (GPT-5 $1.25, GPT-5 mini $0.25, GPT-4o $2.50, Claude Haiku 4.5 $1.00, Sonnet 4.6 $3.00, Opus 4.8 $5.00 per million input tokens) gives the number a budget owner actually wants, the cost to process one million words:

LanguageGPT-5GPT-5 miniGPT-4oHaiku 4.5Sonnet 4.6Opus 4.8
English$1.46$0.29$2.93$1.23$3.70$9.41
Spanish$1.67$0.33$3.34$1.72$5.16$11.96
Portuguese$1.68$0.34$3.36$1.84$5.53$11.81
French$1.75$0.35$3.51$1.90$5.70$12.61
German$2.14$0.43$4.27$2.63$7.90$17.42

Two readings of this table. First, language overhead compounds with model choice: a million German words through Opus 4.8 costs $17.42 against $1.46 for English through GPT-5, a 12x spread for the same volume of meaning. Second, input pricing is cheap everywhere in absolute terms; the ratios matter when you multiply by output tokens, which typically cost four to five times the input rate and follow similar per-language inflation.

Reproduce the numbers

The full dataset and corpus are free to download and reuse with attribution (CC BY 4.0):

To check the GPT figures, run any sample through tiktoken with the o200k_base or cl100k_base encoding. To check Claude, call Anthropic's count-tokens endpoint with the sample as a single user message and subtract the envelope as described above. To get a feel for the numbers interactively, paste any corpus sample into our browser-local Token Counter: it runs the real o200k encoding client side, so the GPT counts match this dataset exactly and your text never leaves the page. For background on what a token is in the first place, see the Token Counter complete guide.

Count your own text

Exact GPT-5 token counts in your browser. Nothing is uploaded.

Open the Token Counter

Sources and further reading

Frequently asked questions

How many tokens is one English word?

About 1.17 tokens on GPT-5's o200k encoding, measured on standard prose. Claude Sonnet 4.6 reports about 1.23 tokens per English word, and Claude Opus 4.8 reports about 1.88 because its counting changed from the 4.7 generation onward. The old rule that a token is three quarters of a word holds for English on modern GPT encodings.

Does Spanish use more tokens than English?

Yes. Expressing the same meaning in Spanish costs about 30% more tokens than English on GPT-5, about 56% more on GPT-4's cl100k encoding, and roughly 59% more on Claude Sonnet 4.6, all measured on a parallel passage. Portuguese behaves similarly at 25% to 62% depending on the tokenizer.

Why is GPT-5 so much better at non-English text than GPT-4?

GPT-5 uses the o200k encoding, which roughly doubled the vocabulary to 200,000 tokens and allocated much more of it to non-English words. The same Spanish passage that needed 172 tokens on GPT-4's cl100k needs 143 on o200k, and Chinese dropped 29%. Code saw little or no improvement.

Why does Claude Opus 4.8 report more tokens than Sonnet 4.6?

Anthropic updated token counting from Opus 4.7 onward, and the official count-tokens endpoint reflects it: Opus 4.8 reports roughly 1.3 to 1.5 times the Sonnet 4.6 count for the same Latin-script text, while Chinese and Japanese counts stay nearly identical. Since billing follows each model's own count, Opus costs more per word than its price per token suggests.

Is CSV data really more expensive than prose?

Per character, yes, by about three times. Our CSV sample measured 57 GPT-5 tokens per 100 characters against 19 for English prose, because digits, decimals, dates, and separators fragment into many small tokens. Count a representative chunk before sending large tables to a model.

Can I download and reuse this dataset?

Yes. The corpus and all measurements are published under CC BY 4.0 at textkit.tech/data, in CSV and JSON form. Cite textkit.tech when you reuse them. Every number is reproducible with tiktoken and Anthropic's free count-tokens endpoint using the method described in the article.

Keep reading

Written by . We build the tools we write about. Try the Token Counter used in this post.