Count GPT and Claude tokens as you type. Your text never leaves your browser.
Exact OpenAI count via the tiktoken o200k encoding, computed in your browser.
A token counter tells you how many tokens a piece of text becomes when a large language model reads it. Tokens are the unit that GPT-4o, GPT-4, the o-series, and Claude actually process, and they are the unit you pay for on every API call and the unit that fills a model's context window. This tool counts them exactly for OpenAI models, right in your browser, and estimates the API cost as you type. Nothing is uploaded.
A token is a chunk of text, usually a short run of characters rather than a whole word. The model breaks your text into tokens with a process called byte pair encoding, then works with the numeric IDs of those tokens. Common English words are often a single token. Longer or rarer words split into several. Spaces, punctuation, and the leading space before a word all count. As a rough guide, English runs about four characters per token, so 1,000 tokens is roughly 750 words, but the real number depends on the exact text and the model.
Three reasons. First, cost: API pricing is quoted per million tokens, separately for input and output, so the token count of your prompt is the bill. Second, context windows: every model has a maximum number of tokens it can hold at once, and a prompt that exceeds the window is rejected or silently truncated. Third, latency: longer token counts take longer to process. If you build with LLMs, the token count is the number you budget against on all three axes.
OpenAI publishes its tokenizer, so this tool gives exact counts for OpenAI models. GPT-4o, GPT-4.1, GPT-4o mini, and the o-series use the o200k encoding. GPT-4 Turbo and GPT-3.5 use the older cl100k encoding. The tool runs the real tiktoken byte pair encoding for both, so the number matches what OpenAI bills to the token.
Claude and Gemini use their own tokenizers, which are not published as browser libraries. For those models the tool shows a careful approximation based on the cl100k encoding, which lands within a small margin for typical English prose. The label switches to "approx" so you always know which number is exact and which is an estimate. For Claude and Gemini, treat the figure as a close planning estimate, not a billing figure.
The estimate multiplies your token count by the price per million input tokens for the selected model. Each model preset loads a representative input price, and you can edit the rate field to match your exact contract, a cached or batch rate, or output pricing. The estimate covers input tokens only. A real API call also bills the model's response, so add your expected output tokens at the output rate to budget a full round trip.
Staying inside a context window. Before sending a long document to a model, paste it here to confirm it fits. If GPT-4o gives you a 128,000-token window and your document is 140,000 tokens, you know to chunk it before the call fails.
Estimating API cost before you build. Paste a representative prompt, pick the model, and read the cost. Multiply by your expected call volume to forecast spend before writing a line of code.
Trimming prompts. System prompts and few-shot examples are paid on every single call. Counting tokens shows which instructions are expensive and lets you cut the ones that do not earn their place.
Comparing models. The same text becomes a different number of tokens under o200k and cl100k. Switching the model preset shows the difference, which matters when you are choosing between models on cost.
Chunking for embeddings and RAG. Embedding models and retrieval pipelines work in fixed token windows. Counting tokens lets you size chunks so each one fits with room for overlap.
Word count and character count answer different questions. A word counter tells you how long a piece reads to a human. A character counter tells you whether a post fits a platform limit. A token counter tells you what a language model sees and charges. The three rarely match: punctuation-heavy text, code, and non-English scripts all push the token-to-word ratio around. Code in particular tokenizes denser than prose because symbols, indentation, and identifiers fragment into many small tokens.
Prompts are often sensitive. They can contain proprietary instructions, customer data, unreleased copy, or internal context. Most online token counters send your text to a server to count it. This one does not. The tiktoken encoding runs entirely in your browser, so the prompt you are measuring never leaves your device. You can confirm this by opening your browser's network tab and watching it stay silent as you type.
When you choose an OpenAI model, the tool loads the matching tiktoken encoding once and caches it. Every keystroke is encoded locally and the exact token count appears, along with characters, words, the characters-per-token ratio, and the estimated cost. The encoding files are served as static assets from this site, so there is no API call and no third-party request. Until the encoding finishes loading on the first use, the tool shows a quick estimate, then upgrades to the exact count automatically.
Yes for OpenAI models. The tool runs the real tiktoken byte pair encoding for the o200k and cl100k encodings, so the count matches what OpenAI bills. Claude and Gemini counts are close approximations, clearly labeled as such, because those tokenizers are not available as browser libraries.
GPT-4o, GPT-4.1, GPT-4o mini, and the o-series use o200k. GPT-4 Turbo and GPT-3.5 Turbo use cl100k. The tool picks the right encoding automatically when you select a model.
No. The estimate covers input tokens only. A full API call also bills the output the model generates, at a separate output rate. Add your expected output tokens at the output price to budget a complete round trip.
No. The encoding runs entirely in your browser. Your prompt is never sent to a server, logged, or stored. The tokenizer data is a static file served from this site, not an API.
For typical English, byte pair encoding merges common letter sequences into single tokens, which works out to about four characters per token on average. Code, rare words, and non-English text change that ratio, which is why the tool shows the live characters-per-token figure for your specific text.
Yes. Each model preset loads a representative input price, and the rate field is editable. Set it to your negotiated rate, a batch or cached rate, or the output rate to estimate response cost.