Context

When using Deepagents with token streaming enabled, particularly in scenarios involving multiple subagents running in parallel, you may experience significant memory spikes (1-1.5 GB or higher). This typically occurs when using streamMode: ['messages', 'updates'] and iterating over stream chunks as JavaScript objects. Each token generates an AIMessageChunk object with metadata, and complex tasks can create tens of thousands of these objects, leading to memory pressure.

Answer

There are several approaches to resolve memory issues when streaming tokens with Deepagents:

Option 1: Use Callbacks Instead of Stream Iteration (Recommended)

Instead of using for await loops to iterate over each chunk in the stream, pass a callbacks parameter to the agent.stream function with your own handleLLMNewToken(token: string): void callback. This approach avoids instantiating new objects per token and eliminates memory spikes.

Option 2: Use Event Stream Encoding

If you can work with a serialized format, use encoding: "text/event-stream" in your stream configuration. This encoding is optimized for frontend use and can be passed directly to a Response object, avoiding the creation of individual JavaScript objects for each token.

Option 3: Disable Subgraph Streaming

If you only need streaming for the main agent and not subagents, you can set subgraphs: false when calling graph.stream:

graph.stream(..., { subgraphs: false })

Note that this option may have limited effectiveness as the configuration can still propagate to subagents in some cases.

Understanding the Root Cause

The memory issue occurs because:

Messages mode yields one AIMessageChunk per LLM token across all graph levels, including subagents
Each LangChain message object carries significant metadata (lc_serializable metadata, content arrays, tool_call_chunks, etc.)
When using ['messages', 'updates'] stream modes, LangGraph maintains state for both modes simultaneously
A single run with multiple tool calls and subagent invocations can generate 10,000+ chunk objects

The callback approach (Option 1) is the most effective solution as it provides the token-level granularity needed for UI updates while avoiding the memory overhead of object instantiation.

Context

Understanding the Root Cause

The memory issue occurs because:

Messages mode yields one AIMessageChunk per LLM token across all graph levels, including subagents

Each LangChain message object carries significant metadata (lc_serializable metadata, content arrays, tool_call_chunks, etc.)

When using ['messages', 'updates'] stream modes, LangGraph maintains state for both modes simultaneously

A single run with multiple tool calls and subagent invocations can generate 10,000+ chunk objects

The callback approach (Option 1) is the most effective solution as it provides the token-level granularity needed for UI updates while avoiding the memory overhead of object instantiation.

How do I resolve memory issues when streaming tokens with Deepagents and subagents using JavaScript?

Context

Answer

Option 1: Use Callbacks Instead of Stream Iteration (Recommended)

Option 2: Use Event Stream Encoding

Option 3: Disable Subgraph Streaming

Understanding the Root Cause

LangChain Support

Sign in to Chat

How do I resolve memory issues when streaming tokens with Deepagents and subagents using JavaScript?

Context

Answer

Option 1: Use Callbacks Instead of Stream Iteration (Recommended)

Option 2: Use Event Stream Encoding

Option 3: Disable Subgraph Streaming

Understanding the Root Cause

LangChain Support

Sign in to Chat