Token Counter: The Complete Guide (2026)
Tokens are the unit a language model reads, the unit you pay for, and the unit that fills a context window. Here is what a token actually is, why the count matters on every API call, and how to count GPT and Claude tokens exactly without sending your prompt anywhere.
- What a token actually is
- Why token count is the number that matters
- How tokenization works under the hood
- Exact for OpenAI, approximate for Claude and Gemini
- Counting tokens before an API call
- Fitting the context window
- Tokens vs words vs characters
- Why a browser-local token counter matters
- Three ways to count tokens
- Common mistakes
What a token actually is
A token is a chunk of text, usually shorter than a word. When a language model reads your prompt, it does not see letters or words. It sees a sequence of tokens, each mapped to a number the model was trained on. The conversion happens through byte pair encoding, an algorithm that starts from raw bytes and repeatedly merges the most frequent adjacent pairs into single tokens. Common English words end up as one token. Rarer or longer words split into two or three. Punctuation, line breaks, and the leading space before a word are tokens too.
The practical rule of thumb for English is about four characters per token, or roughly three quarters of a word. So 1,000 tokens is around 750 words and 100 tokens is a short paragraph. That ratio is only an average. The real count depends on the exact characters and the specific model, which is why a counter that runs the real encoding beats any rule of thumb when the number has to be right.
Why token count is the number that matters
Word count tells you how long something reads. Token count tells you three things that decide whether your model call works and what it costs.
Cost. Model APIs price by the token, quoted per million tokens, and they charge input and output separately. The token count of your prompt is, quite literally, the input bill. Multiply by your call volume and you have your spend.
Context window. Every model holds a maximum number of tokens at once, called the context window. GPT-4o offers 128,000 tokens; some models go higher. A prompt plus its expected response has to fit inside that window, or the call is rejected or the input is silently truncated and the model answers from a half-read prompt.
Latency. More tokens take longer to process, both to read your input and to generate output. When response time matters, the token budget is a speed budget too.
Count the tokens in any prompt or document, with a live cost estimate, in your browser.
Open the Token Counter →How tokenization works under the hood
Byte pair encoding builds a vocabulary by scanning a huge corpus and merging the most common pairs of symbols over and over. The result is a fixed set of tokens, from single bytes up to whole common words, plus the merge rules to turn any new text into that vocabulary. OpenAI ships this as a library called tiktoken, and the specific vocabulary is called an encoding. GPT-4o, GPT-4.1, and the o-series use an encoding named o200k. GPT-4 Turbo and GPT-3.5 use an older one named cl100k. Same text, different encoding, slightly different token count.
This is why code tokenizes differently from prose. Indentation, brackets, operators, and camelCase identifiers fragment into many small tokens, so a block of code is often denser in tokens than an equally long block of English. Non-English scripts behave differently again. The only way to know for sure is to run the encoding.
Exact for OpenAI, approximate for Claude and Gemini
OpenAI publishes its tokenizer, so token counts for OpenAI models can be exact. A browser library called gpt-tokenizer runs the real tiktoken encoding client side, which is what powers the exact figure in the TextKit Token Counter. Whatever the tool shows for an OpenAI model is the number OpenAI bills, to the token.
Claude and Gemini use their own tokenizers, and those are not published as browser libraries. For those models, the honest move is a close approximation rather than a fake-precise number. Anthropic's tokenizer lands close to cl100k for typical English, so an estimate based on cl100k is usually within a small margin. Treat Claude and Gemini figures as planning estimates, not billing figures, and confirm against the provider's own usage reporting for anything that has to be exact.
Counting tokens before an API call
The cheapest API call is the one you sized correctly before you sent it. Paste a representative prompt into a token counter, pick the model, and read the count and the estimated cost. If you are sending a system prompt plus a few examples plus user input on every request, count the fixed part once. That fixed overhead is paid on every single call, so trimming it has a multiplier effect across thousands of requests.
For cost, remember the estimate covers input only. A real round trip also pays for the model's response at a separate output rate. To budget the whole call, add your expected output tokens at the output price. A chat reply might be a few hundred tokens; a long generated document might be a few thousand.
Fitting the context window
Long documents are where token counting earns its keep. Before you feed a contract, a transcript, or a codebase to a model, count it. If it fits the window with room left for the response, send it. If it does not, you have to split it into chunks. Counting tokens lets you size each chunk so it fits with overlap, which matters for retrieval pipelines and summarization chains where a chunk that overflows is a chunk that gets truncated.
The same logic applies to conversation history. A chat that keeps appending turns eventually fills the window. Knowing the running token total tells you when to summarize earlier turns or drop them.
Tokens vs words vs characters
The three counts answer different questions and rarely agree. A word counter tells you how long a piece reads to a person. A character counter tells you whether a post fits a platform limit like Twitter's 280. A token counter tells you what a model sees and charges. Punctuation-heavy text, JSON, and source code all push the token-to-word ratio up, because symbols and structure fragment into extra tokens. If you work with the JSON formatter and feed structured data to a model, expect the token count to run higher than the word count would suggest.
Why a browser-local token counter matters
Prompts are often the most sensitive text a team handles. They carry proprietary instructions, customer records, unreleased copy, and internal context. Most online token counters send that text to a server to count it, which means the prompt you are measuring leaves your machine. A browser-local counter runs the encoding in the page itself, so the text never travels. You can confirm it by opening the browser network panel and watching it stay silent while you type. For anyone counting tokens on confidential prompts, that is the difference between a safe check and a quiet data leak.
Three ways to count tokens
In the browser. The fastest path for a one-off check or a quick cost estimate. The TextKit Token Counter loads the real tiktoken encoding locally and counts as you type, with no upload. Best when you are drafting a prompt or sizing a document.
In code. For production, call tiktoken in Python or gpt-tokenizer in JavaScript so your application counts tokens the same way the API bills them. This is how you enforce a budget or pre-flight a request before sending it.
In the provider playground. OpenAI's own tokenizer page shows the split for short snippets. Useful for seeing how a specific phrase breaks into tokens, less useful for long documents or cost math.
Common mistakes
Counting characters and assuming tokens. The four-characters-per-token rule is an average, not a guarantee. Code and non-English text break it. When the number matters, run the encoding.
Forgetting the output bill. Input tokens are only half the call. A verbose response can cost more than the prompt. Budget both.
Ignoring the system prompt. Fixed instructions sent on every call are paid every time. They are the easiest tokens to forget and the most expensive to leave bloated.
Using one model's count for another. o200k and cl100k produce different counts for the same text. Count against the model you will actually call.
Frequently asked questions
How many words is 1,000 tokens?
About 750 words of typical English, since a token averages roughly four characters or three quarters of a word. The ratio shifts for code and non-English text, which tokenize denser, so treat 750 as a guide and run the real encoding when the number has to be exact.
Are token counts the same across models?
No. GPT-4o, GPT-4.1, and the o-series use the o200k encoding, while GPT-4 Turbo and GPT-3.5 use cl100k. The same text produces a slightly different token count under each. Always count against the model you plan to call.
Can I count Claude and Gemini tokens exactly?
Not in the browser. Anthropic and Google do not publish their tokenizers as client-side libraries, so a browser tool can only estimate, usually by borrowing the cl100k encoding, which lands close for English. For exact Claude or Gemini counts, use the provider's own usage reporting.
Why does my code use more tokens than my prose?
Code fragments into many small tokens. Indentation, brackets, operators, and split identifiers each cost tokens, so a block of code is usually denser than an equally long block of English. JSON and other structured data behave the same way.
Does counting tokens send my text anywhere?
Not with a browser-local counter. The TextKit Token Counter runs the tiktoken encoding in your browser from a static file, so your prompt is never uploaded, logged, or stored. You can verify this in your browser network panel.
How do I estimate the cost of an API call?
Multiply input tokens by the model's input price per million tokens, then add expected output tokens at the output price. The Token Counter estimates the input side with an editable rate; add your output estimate for the full round-trip cost.
Keep reading
Written by SAVI. We build the tools we write about. Try the Token Counter used in this post.