Jump to content

Context window

From Wikipedia, the free encyclopedia

The context window of a large language model (LLM) is the maximum amount of text or other tokenized input available to the model at one time when generating output. It is usually measured in tokens, which are units produced by the model's tokenizer rather than words or characters. In practical terms, the context window is the material the model can "see" while producing a response; anything outside that window is not directly available unless it is summarized, retrieved, or provided again. A longer context window can allow a model to work with longer prompts, conversations, documents, codebases, or retrieved passages without first compressing or discarding as much information.[1]

The practical size of context windows has increased rapidly as LLM systems have developed. Some models are limited by the sequence lengths used during training, while attention variants and positional-encoding methods can allow models to operate on longer sequences than those seen during training.[2] By the mid-2020s, long-context systems had reported context windows ranging from hundreds of thousands to millions of tokens; Google researchers reported Gemini 1.5 evaluations on retrieval tasks at up to 10 million tokens.[3]

A larger context window does not necessarily mean that a model can use the entire context equally well. In "Lost in the Middle", Liu et al. found that performance on long-context tasks was often worse when relevant information appeared in the middle of an input rather than near the beginning or end.[4] Other benchmarks have assessed long- context capability using tasks that go beyond simple retrieval, including multi-document question answering, long-dialogue understanding, code repository understanding, and structured-data reasoning.[5][6]

References

[edit]
  1. ^ Nir Ratner et al., "Parallel Context Windows for Large Language Models", Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 6383-6402, doi:10.18653/v1/2023.acl-long.352.
  2. ^ Ofir Press et al., "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation", arXiv:2108.12409, 2021.
  3. ^ Machel Reid et al., "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context", arXiv:2403.05530, 2024.
  4. ^ Nelson F. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts", Transactions of the Association for Computational Linguistics, 2024, vol. 12, pp. 157-173, doi:10.1162/tacl_a_00638.
  5. ^ Cheng-Ping Hsieh et al., "RULER: What's the Real Context Size of Your Long- Context Language Models?", arXiv:2404.06654, 2024.
  6. ^ Yushi Bai et al., "LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks", Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025, pp. 3639-3664, doi:10.18653/v1/2025.acl-long.183.