Context
After upgrading LangGraph from version 0.3.x to 0.4.0 or higher, you may notice that token-by-token streaming is no longer working as expected. Instead of receiving partial messages as the LLM generates tokens, you only receive complete messages at the end of the response. This issue commonly occurs when using ReAct agents with stream_mode = "messages".
Answer
This behavior change was introduced in LangGraph 0.4.0+ due to modifications in how streaming works with subgraphs. Here are the solutions to restore token-by-token streaming:
Solution 1: Enable Subgraph Streaming
The most straightforward solution is to enable subgraph streaming in your configuration:
Add
stream_subgraphsto your stream configurationUpdate your stream mode to include subgraph streaming:
"stream_mode": ["messages", "values"],
"stream_subgraphs": trueNote that with subgraph streaming enabled, message types will have a postfix format like <message_type>|react:xxx.
Solution 2: Update Frontend Packages (for UI Integration)
If you're using a frontend UI that connects to your agent, ensure your LangChain packages are compatible:
Update your frontend LangChain packages to version 1.0+:
npm install @langchain/core@latest @langchain/langgraph-sdk@latestRestart your development server
Perform a hard refresh in your browser (Cmd+Shift+R or Ctrl+Shift+R)
Solution 3: Filter Intermediate Outputs (for create_agent)
If you're using create_agent from langchain.agents and experiencing duplicate content, filter out intermediate decision points:
async for chunk in agent.stream:
# Only stream final outputs, not intermediate decisions
if chunk.get('metadata', {}).get('step_type') == 'final':
yield chunkAdditional Resources
For more detailed guidance on streaming configurations, refer to the LangGraph streaming documentation.