Token

A unit of text processed by an AI model, roughly equivalent to a word fragment, used to measure and limit input and output size.

What is Token?

A token is the smallest unit of text an AI model processes. Roughly speaking, one token equals about three to four characters of English text, which means one word is typically one to two tokens. Punctuation, spaces, and unusual characters may each consume a separate token. The token count of your prompt determines both what the model can process and what you pay for each API call.

Understanding tokens matters practically because AI pricing is always quoted per token, not per word or per request. A 1,000-token system prompt sent with every API call across 50,000 monthly requests adds up quickly. Token optimisation, the practice of reducing prompt length without reducing clarity, is a real cost lever in high-volume AI workflows.

Tokens also determine how much content you can include in a single call. The model's context window is measured in tokens, so knowing your token counts helps you plan what fits. A typical LinkedIn post is 50 to 150 tokens. A one-page document might be 600 to 800. A full transcript or long PDF chapter can be 5,000 to 20,000 tokens.

Different languages tokenise differently. English is typically the most efficient. Languages with longer words, non-Latin scripts, or complex morphology consume more tokens per word than English, which can meaningfully increase costs for multilingual campaigns.

A practical token strategy for outbound teams: audit your most-used prompt templates quarterly, identify repetitive phrasing or instructions already encoded in fine-tuned models, and strip them. Every 100 tokens removed from a prompt template saves money at scale and can actually improve output quality by reducing noise in the instructions the model must process.

What separates a useful AI term from AI theater is whether it reduces manual work without creating new accuracy or compliance risk. The strongest teams define exactly where the model is allowed to help, what still needs human review, and which failure modes are unacceptable before they automate anything. It usually becomes more useful when it is defined alongside Context window, Prompt, and Structured output.

Token — example

An outbound agency processes 10,000 email drafts per month using the Claude API. Their standard system prompt is 1,800 tokens, covering tone instructions, ICP context, brand voice examples, and output format rules. A developer audits the prompt and identifies 600 tokens of redundant examples that duplicate the tone instructions already present.

After stripping the redundant examples, the prompt drops to 1,200 tokens. At 10,000 calls per month, this saves 6 million tokens. At their provider's rate, this represents a 30% reduction in monthly AI spend with no measurable drop in output quality. The audit takes two hours. The saving is ongoing.

A mid-market SaaS team applies Token to a narrow workflow first, usually lead research, outbound drafting, or support triage. They connect it to their existing knowledge base, define a small review queue, and test it on one segment before rolling it across the whole go-to-market motion. They also make sure it connects cleanly to Context window and Prompt so the definition is not trapped inside one team.

Frequently asked questions

How can I count tokens before sending a prompt to avoid surprises?

Use your AI provider's tokeniser tool. OpenAI offers a tiktoken library, and Anthropic provides token counting via the API. For rough estimation, divide your character count by 4 for English text. Build token counting into your prompt testing workflow so you know the cost before scaling a template.

Why do non-English prompts cost more than I expected?

Non-Latin scripts, complex morphology, and some European languages tokenise less efficiently than English. A Spanish or German prompt at the same word count may be 20 to 40% more tokens. If you run multilingual campaigns at volume, test token counts per language and factor this into your per-language cost model.

What is the difference between input tokens and output tokens?

Input tokens are everything you send to the model: your system prompt, user message, and any context. Output tokens are what the model generates in response. Most providers charge different rates for input and output, with output tokens typically costing two to four times more. This means long-form outputs are disproportionately expensive compared to short structured responses.

How do I reduce token count without reducing output quality?

Replace verbose instructions with examples. One concrete example often communicates more precisely than a paragraph of description. Remove redundant formatting instructions if you are already using structured output schemas. Strip any boilerplate from documents before pasting them in. Use bullet points over prose where structure is what you need.

Do tokens carry over between separate API calls in the same campaign?

No. Each API call is independent. Tokens from one call do not carry into the next. This is why context management matters in multi-turn workflows. If a task requires memory across calls, you need to explicitly include prior conversation history or task state in each new prompt, which adds tokens to every subsequent call.