Why AI “Forgets” Things
Understanding how AI systems “remember” relevant details and how their “memory architecture determines whether they truly benefit users
When using AI, sometimes it feels like you have to keep repeating the same details, even when these are clearly stored on the platform. The core issue is that these systems can’t reliably retrieve relevant details, so they appear to “forget” things they shouldn’t.
AI solutions need persistent memory to be useful. When these systems can “remember” the right details, users get the full benefit of the model’s ‘intelligence.’ This can noticeably enhance the system’s performance and the user experience.
However, memory improvements don't only benefit users. They also give platforms more insight into user behavior. When systems retain too much information without giving users meaningful visibility or control, it becomes unclear whose interests the platform is really serving.
When we understand how AI’s memory works, we can assess whether the system’s underlying architecture is designed to facilitate effective recall, and whether the system's 'memory' is truly working for users or against them.
What AI’s Memory Actually Is
AI systems can accomplish impressive things because the underlying models possess powerful reasoning capabilities. However, AI models are stateless, meaning they are incapable of “remembering” things. We need to give them the ability to retain, use, and update information between interactions to operate “intelligently.” AI systems layer memory systems on top of the model, allowing it to “remember” relevant information. This is how we give AI memory.
Context windows are just the most visible part of AI’s memory, not the only part.
Most of the discourse around AI’s memory focuses on models’ context windows, which represent the maximum amount of information, measured in tokens, that an AI model can process in a single request. Everything the model factors into its reasoning must fit within this window: the system prompt, conversation history, attached files, retrieved details, and response itself. Since this context window is finite, as conversations grow longer, older data must be removed to accommodate new data. While context windows are important, the system stores critical information in other memory modules as well.
These other modules contain the vital “memories” that actually allow the system to improve its responses. During each interaction, relevant details are pulled from different memory modules, filling up the context window. Since the window size limits how much information the model can retain, AI labs actively work to make it larger. However, if the system is not filling in the right information from previous interactions, a bigger window doesn’t help. This is why ensuring the system can effectively retrieve information matters.
We underestimate how challenging it is for AI systems to “remember” things.
We tend to interpret the world through a human lens, so we can’t help but draw parallels between human memory and AI memory. We feel frustrated when chatbots forget simple details, even though we ourselves forget things all the time. We expect that machines that log every single piece of information shouldn’t “forget” anything. However, we are assuming that an AI system’s “mind” is similar to our own. Since it has “perfect” memory, we believe that recalling details should be trivial for systems that can do so many impressive things. However, for AI systems, retrieval is much more challenging than it seems.
We underestimate just how capable human minds are. We can’t precisely store information like machines can, but we can still remember relevant details when necessary. We dismiss how computationally challenging this task is because it often feels effortless to us. Our minds search through all the information we have accumulated over a lifetime and surface the exact details we need at any given moment. While our memories are imperfect, there’s still an enormous amount of information in our brains. Yet even if we may not remember all the details, we’re generally good at remembering the most important ones. This is much harder for an AI system to do.
The Full Memory Stack
AI systems store massive volumes of data. Every chat message exchange, every artifact generated, and every system you’ve connected to the platform is a piece of data that the model may need to examine before responding. It also needs to examine all its internal knowledge and any relevant online information. Models cannot process all this data at once, at least not fast enough or cost-effectively enough to be useful and valuable to users. Therefore, AI systems require multiple types of memory to effectively store and retrieve context.
The raw data stored within the system serves as a repository of information (e.g., conversation history, user profiles, connected databases). This data must be distilled into more useful and usable forms. Relevant information is organized into short-term memory (immediately relevant context) and long-term memory (critical context from past conversations). Optimized memory management ensures that models can rapidly access relevant information and minimize response times.
Short-term memory represents all the information the model needs to track within the current conversation. This is the immediate conversational context the model needs to make decisions and generate coherent responses across multiple exchanges. As the conversation grows longer, irrelevant past details are “forgotten,” and relevant ones are summarized to make room for newer information. This memory does not persist beyond the current session.
Working memory represents the information the model needs to track as it reasons and executes multi-step tasks in the current session. While short-term memory stores all immediately relevant details, working memory holds all the information needed for reasoning, planning, and decision-making: the details from the short-term memory, active knowledge (generated from the reasoning process), information retrieved from long-term memory, and system-level information (e.g., available tools, user profiles, connected systems).
Long-term memory represents all the information the model may need to reference when responding in the current conversation. This memory persists across multiple sessions, allowing AI systems to “learn” from accumulated interaction history and deliver more personalized and intelligent responses over time. Long-term memory can be classified into three distinct types that each serve a different purpose:
Episodic memory stores information about specific past experiences that can help the system make better decisions in the future. This helps it avoid past mistakes, optimize strategies, and provide more informed responses.
Semantic memory stores factual knowledge that can help the system make decisions requiring general and domain-specific context (e.g., definitions, rules). This is essentially the system’s knowledge about the world and its specific use case, often backed by internal or external knowledge bases.
Procedural memory stores information on specific skills, tasks, or processes that can help the system recall how to perform routine actions without having to reason each time. As an AI system’s procedural memory grows, it becomes more capable, useful, and efficient, making this type of memory especially critical for agentic systems that need to act on behalf of users.
Why AI Memory Is Hard To Get Right
The context window is a hard architectural limit.
AI systems have to retrieve relevant details when generating responses, but the model’s context window and token limits serve as critical constraints. The system must fit all necessary context within this window. However, these systems tend to pay more attention to information at the beginning and end of conversations, losing track of details in the middle (the “Lost-in-the-Middle Problem”). Even when the model identifies every relevant detail, all that context may not fit within the window. Long conversations also lead to context rot, where responses degrade as context accumulates. Even when the window is managed well, the system still has to judge what to retrieve from long-term memory, and it might fail to do this accurately.
The system struggles to judge what’s worth remembering.
AI systems have to distill critical details across multiple interactions and sessions to build long-term memories. They have to recognize related information over time and combine it without duplicating details or creating contradictions. But ensuring relevant information is remembered and unnecessary details are forgotten is tricky because the system may evaluate information incorrectly. It may remove critical details and retain unnecessary ones, which can degrade the system’s reasoning. As conversation history grows, identifying relevant context from previous interactions becomes harder.
The system has to constantly make cost-performance tradeoffs.
AI systems must balance supplying relevant context with optimizing computation costs and response times. If the system provides too much context to keep response quality high, users end up spending more tokens and waiting longer. While the difference may be minimal per request, costs add up for power users and enterprise customers. If the system provides too little context, users get worse responses, which can increase customer satisfaction and churn. So, the system must selectively include only relevant information, but this assumes that it can accurately determine relevance in the first place.
Why The System’s Memory Architecture Matters
Memory Has to Be Deliberately Designed
An AI system’s “memory” is clearly more than just the collection of all the data stored in the system. It’s the distillation of relevant details from the things you have explicitly told AI (e.g., instructions, requirements, goals), inputs you have provided (e.g., documents, datasets, images), and relevant information from past interactions and connected systems. This is the only context the model has when responding to you. If something you previously mentioned did not make it into the model’s memory, the model has no idea it exists.
Product teams have to make deliberate architectural decisions to ensure the model can build memory across conversations, learn user preferences, and retrieve relevant context when needed. The system must effectively manage different types of memories to ensure the model gets the right context when deciding how to respond. Every time an AI product “remembers” a key detail, that’s one less thing users have to “tell” the system, further eliminating friction from the user experience. This convenience is a direct result of the product team’s technical decisions.
Product Decisions Impact How Effective AI’s Memory Can Be
Most teams building AI products underestimate the importance of designing different memory types. They approach the system’s memory as they would with traditional software—ensuring there are mechanisms in place to efficiently store and retrieve information. But they forget that the model may not always correctly decide what to retrieve and when. Even the most “intelligent” models can be rendered ineffective by poorly designed memory systems.
Product teams must ensure they design memory in a way that helps the model find information more effectively and efficiently. Not every AI system will need all memory types. For example, an AI customer support agent might need episodic memory for customer history across tickets, an AI legal assistant might need semantic memory for information on laws, regulations, and past cases, and an AI travel platform may need procedural memory for navigating different travel websites.
Product teams must ask several critical technical questions: Which memory types does their product need? What information should persist within sessions and across sessions? How should they design their memory retrieval mechanisms? and so on. Teams must consider how they need to design their system’s memory to support the intended user experience. The sheer amount of information these systems accumulate makes the ability to recall the right information at the right moment even more important.
What’s Driving The Industry’s Interest In Improving AI’s Memory
Memory Can Be A Competitive Advantage
AI platforms gather more information about users over time through conversations, building a more detailed profile of their needs and preferences. This makes the experience more personalized, but it also makes it harder for users to switch platforms, since a new one won’t have the same “memories” as the previous one. While users can transfer memories, they won’t immediately get the same level of personalization they had before.
The issue is that the memories users can access don’t necessarily represent everything the system “remembers” about them. Even if users could transfer their entire conversation history, the new system would still need to reconstruct relevant “memories.” If users have invested months or years in a platform, their experience will be very personalized. They won’t be willing to give that up easily, and companies know it.
Memory also influences how useful AI agents can be. Companies know that agents that can recall what matters, without being prompted repeatedly, are much more valuable to users. Without reliable memory, an agent repeats mistakes it should have learned from, asks for information it should already have, and loses track of critical context on long-running tasks. As the industry grows increasingly focused on agents, memory architecture will become a competitive differentiator.
As competition intensifies, platforms will increasingly treat memory as a moat: the more personalized the experience, the harder it becomes for users to walk away. So, companies have little incentive to make memory more portable, since this just makes it easier for users to leave. Instead, they are working on making memory even better to deepen their hold over users. This is not great for user choice, but this is where the market seems to be heading.
Memory Can Be Used To Extract More Value From User Data
Tech companies collect every piece of potentially useful data from our interactions. They are currently actively working on making AI more deeply embedded in users’ lives. This will give them an enormous amount of data on users and their behavior. However, finding actionable insights from increasingly large volumes of user data is challenging. They will undoubtedly use AI to analyze and identify new ways to exploit this information. If they can design better memory systems, they can do this much more efficiently and cost-effectively.
OpenAI CEO Sam Altman has said that memory will have a major impact on how powerful and personalized AI can be. Improving memory directly enhances the degree of personalization that’s possible within products. However, it’s quite likely that companies will just use this to enhance their efforts to increase engagement and influence users. Users may not realize just how much the system may “remember” about them. Every revealing detail mentioned (e.g., indicating a psychological vulnerability, unconscious bias) may become part of the system’s memory, even if it’s only shared once.
While memory improvements will certainly make AI systems more useful, users still need to know what specific information is shaping their experience and have meaningful control over what the system should commit to memory. Otherwise, they have no way to know whether the system is truly helping them or just exploiting their personal and emotional data in ways that are not in their best interests. Therefore, users need to clearly understand what an AI system knows about them, how it uses that information, and whether they can manage its memories.
Conclusion
Memory is a critical mechanism that allows an AI system to apply a model’s “intelligence.” It’s what makes an AI feel genuinely useful. The best AI solutions derive their value primarily from how they implement the model within their system. Technical choices made by product teams determine whether AI can actually deliver on the promise of intelligent, personalized assistance. Teams must make the right architectural decisions: if these systems remember too little, this can diminish the user experience, and if they remember too much, this can degrade user trust.
Better memory allows an AI system to recall your preferences, learn your habits, and anticipate your needs, but it also allows it to exploit your data in ways you can’t see and haven’t consented to. As AI becomes more embedded in daily life, product teams have to consider whether their AI system’s memory has been implemented responsibly, and users have to consider whether they are truly getting good outcomes from these systems. The convenience of a system that “remembers” you is real, but so is the cost of not knowing exactly what it remembers and how that information influences your interactions.
Thanks For Reading
If you haven’t already, please consider subscribing and sharing this newsletter with a friend. I hope you have a great week!
Key References
Analytics India Magazine | The Race to Give AI Models Infinite Memory
arXiv | Long Term Memory: The Foundation of AI Self-Evolution
Center for Democracy & Technology | A Roadmap for Responsible Approaches to AI Memory
DataCamp | How Does LLM Memory Work: Building Context-Aware AI Applications
Data Science Dojo | What is the Role of Memory in Agentic AI Systems?
IBM | What is AI agent memory?, When AI remembers everything


