Context window

The maximum amount of text an AI model can process in one prompt, including instructions, examples, and inputs.

What is Context window?

The context window is the total amount of text an AI model can hold in memory during a single interaction, measured in tokens. It includes everything the model processes at once: your system instructions, the user input, examples you provide, documents you paste in, and the model's own previous responses. Once you exceed the context window, the model begins dropping content, typically from the earliest part of the conversation or document.

In B2B workflows, context window size determines what kinds of tasks are feasible. A small context window limits you to short prompts and brief outputs. A large window lets you paste in full documents, entire CRM records, or long conversation histories for the model to reason across. Current frontier models offer windows ranging from 8,000 to over 200,000 tokens, which represents roughly 6,000 to 150,000 words.

Understanding token distribution within the context window matters. If you have a 100,000-token window and your document is 90,000 tokens, you have only 10,000 tokens for your instructions, output, and any additional context. Running out mid-task produces truncated or degraded outputs without a clear error message, which is harder to debug than an obvious failure.

A common mistake is assuming that a larger context window produces better reasoning across long documents. Most models show degraded attention to content in the middle of very long inputs, a phenomenon sometimes called the lost-in-the-middle problem. Critical instructions and key facts should appear at the beginning or end of the prompt, not buried in the middle of a long document block.

For B2B outreach and enrichment at scale, context window management is a cost lever as well as a quality lever. Every token costs money. Stripping unnecessary whitespace, removing boilerplate from documents before feeding them in, and using structured data formats instead of prose paragraphs all reduce token consumption while maintaining output quality.

What separates a useful AI term from AI theater is whether it reduces manual work without creating new accuracy or compliance risk. The strongest teams define exactly where the model is allowed to help, what still needs human review, and which failure modes are unacceptable before they automate anything. It usually becomes more useful when it is defined alongside Token, RAG, and Prompt template.

Context window — example

A RevOps team wants to use AI to summarise deal history and generate next-step recommendations from CRM notes. Their average deal record, when exported as text, is 12,000 tokens. They initially try to batch five records per prompt to save API calls. At 60,000 tokens of input plus 3,000 tokens of instructions, they hit the context limit.

After restructuring, they strip CRM boilerplate, compress notes to key facts, and process records individually. Each call is 4,000 tokens and fits comfortably within the window, producing clean, consistent summaries. Processing five records separately costs slightly more per call but eliminates truncation errors and produces reliable outputs. The lesson: fitting more in is not always better.

A mid-market SaaS team applies Context window to a narrow workflow first, usually lead research, outbound drafting, or support triage. They connect it to their existing knowledge base, define a small review queue, and test it on one segment before rolling it across the whole go-to-market motion. They also make sure it connects cleanly to Token and RAG so the definition is not trapped inside one team.

Frequently asked questions

How do I know if my prompt is hitting the context window limit?

Most API providers return an error or truncate silently. The cleaner approach is to count tokens before sending using your provider's tokeniser. OpenAI and Anthropic both offer token-counting tools. Set a budget ceiling at 80% of the model's context window to leave room for outputs and buffer.

Does a larger context window always mean I should use a bigger model?

Not necessarily. Larger context windows usually come with higher cost per token and slower response times. Use the smallest model whose context window is adequate for your specific task. If your workflow typically uses 4,000 tokens, a 200,000-token model is expensive overkill.

What is the difference between the context window and the model's knowledge?

The context window is what the model can process in a single call. The model's training knowledge is what it learned during training, which has a cutoff date and is stored in the model weights. Information you paste into the context window is temporary and task-specific. The model's trained knowledge is permanent and general. They work independently.

Can I use the context window to teach the model about my company?

You can include company background, ICP definitions, messaging guidelines, and examples in the context window as a system prompt. This gives the model that information for the current session but does not persist to the next call. If you need this context reliably across all calls, either include it in every prompt or use fine-tuning to encode it into the model's behaviour.

Why do my outputs get worse when I make my prompts much longer?

Very long prompts can dilute the model's attention across competing instructions. Models attend more reliably to content near the start and end of a prompt. If your prompts have grown beyond 2,000 to 3,000 tokens, audit them for redundancy. Remove examples that do not add new information and consolidate overlapping instructions into single clear directives.