Momentarily Reconstructed Contexts: A New Approach to LLM Usage
Momentarily Reconstructed Contexts: A New Approach to LLM Usage
This post introduces our recent research work, and for further details, please refer to the draft paper. The paper proposes a new framework to address the “cumulative context” issue that arises when processing long-term dialogues with large language models (LLMs). This framework aims to reconstruct only the minimal essential information at each moment in the conversation, enabling the model to focus on critical facts and naturally discard unnecessary data.
Limitations of Cumulative Context
Until now, most LLM-based dialogue systems have relied on accumulating all previous turns. Although various implementations have been attempted, they essentially remain within the category of cumulative context. As a conversation grows longer, this approach leads to:
- Weight Dilution: As the number of tokens increases, even important information fails to receive sufficient attention.
- Noise Amplification: Outdated or irrelevant information remains, making it difficult for the model to concentrate on what is currently needed.
As a result, both model performance and computational efficiency tend to degrade in later stages of the conversation.
Entropy-Based Analysis of Attention
The paper interprets the model’s ability to maintain focus using the information-theoretic measure “entropy.” High entropy indicates that attention is broadly dispersed across many tokens, while low entropy suggests concentration on a smaller set of crucial tokens. Traditional cumulative context approaches increase entropy as the dialogue progresses. However, by employing the “momentarily reconstructed minimal context” method and removing unnecessary information, entropy can be reduced, helping the model concentrate more effectively on essential content.
Retrieval-Based Minimal Context Reconstruction
The core mechanism for implementing this approach is the use of retrieval techniques. Instead of re-inserting the entire conversation history each time, the dialogue is stored in an external database. Algorithms such as BM25, HNSW, and RRF are used to swiftly select only the information most relevant to the current user prompt. The most recent utterance is always included to maintain conversational flow, and only the top K results are chosen while strictly managing input length. In this way, each turn’s context is “small, meaningful, and precisely suited to the current query.”
Distinguishing from Retrieval-Augmented Generation (RAG) Approaches
While Retrieval-Augmented Generation (RAG) approaches also incorporate external knowledge into model responses, many RAG methods simply accumulate retrieved documents or preserve the entire conversation history, failing to fundamentally resolve the cumulative context problem. Moreover, such methods are not primarily aimed at addressing context length issues.
In contrast, the “momentary reconstruction” approach proposed in the paper represents a new direction: it rebuilds the context from scratch at every turn. Even if previously used information was once useful, it is decisively excluded if no longer needed. This allows for stable, high-quality context maintenance regardless of conversation length.
Wittgenstein’s Perspective: A Totality of Facts, Not Things
This methodology transcends a mere technical improvement, connecting to philosophical perspectives on meaning formation. Philosopher Ludwig Wittgenstein noted that “the world is not the totality of things, but the totality of facts.” Applying this to LLM context management suggests selecting only the “facts” needed at each moment rather than stockpiling all “things” (tokens). Thus, the conversation’s context is not a fixed accumulation but a dynamic formation, rearranging necessary facts as required.
This viewpoint also aligns with Wittgenstein’s notion of “language games,” where meaning is not static but constantly reshaped through interaction. Similarly, LLM context can be viewed as a transient, dynamically reconstructed concept at each moment of interaction, providing a philosophical and logical basis for treating context as an active element rather than a passive input. Modern philosophy of language and linguistics offer a theoretical foundation for this approach.
Expected Benefits
This new framework is anticipated to provide:
- Improved Computational Efficiency: Eliminating unnecessary tokens reduces processing costs.
- Accurate and Consistent Responses: Concentrating on essential information enhances response quality.
- Easier Domain Knowledge Integration: Retrieval allows selective incorporation of domain knowledge only when needed.
Ultimately, LLM-based conversation can evolve beyond simply accumulating all information, moving toward a paradigm where context is meticulously optimized at every step.
Conclusion
The concept of “momentarily reconstructed minimal contexts” highlights new possibilities for LLM-based dialogue handling. By combining entropy analysis, retrieval-driven dynamic filtering, and Wittgenstein’s philosophical insights, this approach surpasses mere performance optimization. It offers a foundation for conversational AI to evolve in terms of information management and meaning formation.
As this approach is applied and refined in various scenarios, LLMs are expected to handle long-term conversations more efficiently and clearly. This advancement will enable conversational AI to achieve more meaningful and systematic context formation capabilities.