Build Hour: Agent Memory Patterns

In this Build Hour session, Michaela, Emry, and Brian explore agent memory patterns, focusing on context engineering techniques like trimming, summarization, and routing to optimize AI agent performance and manage token limits effectively. Through live demos and best practices, they demonstrate how memory strategies enhance conversation quality, enable long-term personalization, and support scalable AI applications using OpenAI’s APIs and agents SDK.

In this Build Hour session, Michaela from the startup marketing team introduces a deep dive into agent memory patterns with Emry and Brian from the solution architecture team at OpenAI. Emry, a solution architect specializing in AI agents, leads the discussion on context engineering, which forms the foundation of agent memory. The session builds on previous tutorials about building agents from scratch and using the responses API, focusing this time on how to manage memory effectively in AI agents. The goal is to equip viewers with best practices, tools, and expertise to scale AI applications using OpenAI’s APIs and models.

Emry begins by defining context engineering as both an art and a science, emphasizing the importance of managing what information is included in the model’s context to optimize performance. He explains that context engineering encompasses prompt engineering, memory management, state and history tracking, and retrieval techniques. The session highlights three core memory strategies: reshape and fit (trimming and summarizing context), isolate and route (delegating context to sub-agents), and extract and retrieve (long-term memory management). These strategies help address challenges like context burst, conflict, poisoning, and noise, which can degrade agent performance.

A live demo showcases an IT troubleshooting agent built with OpenAI’s agents SDK, illustrating how memory impacts conversation quality. Without memory, the agent forgets earlier issues and repeats questions, but with memory enabled, it maintains context across turns, providing more intelligent and reliable responses. The demo also visualizes token usage and demonstrates how context burst occurs when too much information, such as extensive tool outputs, floods the context window. Emry explains how trimming, compaction, and summarization techniques can manage token limits by selectively retaining or compressing information to keep the context fresh and relevant.

The session further explores best practices for prompt design to avoid conflicts and noise, such as using explicit, structured language and minimizing overlapping tool definitions. Emry details engineering techniques like context trimming (dropping older turns), compaction (removing older tool outputs), and summarization (compressing prior messages into structured summaries). He discusses how to balance these approaches based on use case needs, session length, and task dependencies. The demo also demonstrates enabling these techniques in the agent, showing how summarization creates dense memory objects that improve continuity across sessions.

Finally, Emry addresses long-term memory management, including injecting summarized memories into new sessions for personalized interactions, and discusses scaling strategies for handling many users with individual or shared memory pools. He emphasizes the importance of deciding what to remember and forget, using temporal tags and memory consolidation to prune stale information. The session concludes with a Q&A covering libraries for context engineering, evaluation methods for memory effectiveness, hierarchical memory scopes, and scaling considerations. Resources such as cookbooks and the agents SDK are shared to help viewers implement these memory patterns in their own AI agents.