Context
When using Deepagents with token streaming enabled, particularly in scenarios involving multiple subagents running in parallel, you may experience significant memory spikes (1-1.5 GB or higher). This typically occurs when using streamMode: ['messages', 'updates'] and iterating over stream chunks as JavaScript objects. Each token generates an AIMessageChunk object with metadata, and complex tasks can create tens of thousands of these objects, leading to memory pressure.
Answer
There are several approaches to resolve memory issues when streaming tokens with Deepagents:
Option 1: Use Callbacks Instead of Stream Iteration (Recommended)
Instead of using for await loops to iterate over each chunk in the stream, pass a callbacks parameter to the agent.stream function with your own handleLLMNewToken(token: string): void callback. This approach avoids instantiating new objects per token and eliminates memory spikes.
Option 2: Use Event Stream Encoding
If you can work with a serialized format, use encoding: "text/event-stream" in your stream configuration. This encoding is optimized for frontend use and can be passed directly to a Response object, avoiding the creation of individual JavaScript objects for each token.
Option 3: Disable Subgraph Streaming
If you only need streaming for the main agent and not subagents, you can set subgraphs: false when calling graph.stream:
graph.stream(..., { subgraphs: false })Note that this option may have limited effectiveness as the configuration can still propagate to subagents in some cases.
Understanding the Root Cause
The memory issue occurs because:
Messages mode yields one
AIMessageChunkper LLM token across all graph levels, including subagentsEach LangChain message object carries significant metadata (lc_serializable metadata, content arrays, tool_call_chunks, etc.)
When using
['messages', 'updates']stream modes, LangGraph maintains state for both modes simultaneouslyA single run with multiple tool calls and subagent invocations can generate 10,000+ chunk objects
The callback approach (Option 1) is the most effective solution as it provides the token-level granularity needed for UI updates while avoiding the memory overhead of object instantiation.