Context
When deploying a LangGraph agent, you may encounter a timeout error stating "Queue Deployment is not ready after 600 seconds" even when using template agents. This issue can occur due to configuration problems or version compatibility issues with the LangSmith Platform Helm chart.
Answer
This timeout issue is commonly caused by two main factors that should be checked in order:
1. Check Your Helm Chart Version
The most common cause of this issue is using Helm chart version 0.11.14, which was yanked due to a bug in queue reconciliation logic. If you're running this version, upgrade immediately:
Check your current Helm version
Upgrade from version 0.11.14 to 0.11.15 or later
The upgrade will include listener RBAC fixes that resolve the deployment timeout
2. Verify N_JOBS_PER_WORKER Configuration
If upgrading doesn't resolve the issue, check your worker configuration. Look for this log entry:
N_JOBS_PER_WORKER is 0. Skipping queue.
If you see this message, the worker configuration is incorrect:
Set
N_JOBS_PER_WORKER = "5"in your.envfilePlace this setting after your model configuration (e.g., after
LLMAAS_MODEL_NAME = "Meta-Llama-33-70B-Instruct")Rebuild and redeploy your application
3. Additional Troubleshooting
If the issue persists after trying the above solutions:
Remove any custom auth configuration from your
langgraph.jsonfile temporarilyDelete all existing deployments and tracing projects before creating a new deployment
Collect pod logs using the troubleshooting script available in the LangChain documentation
The Helm chart version upgrade (from 0.11.14 to 0.11.15+) resolves this issue in most cases, as version 0.11.14 contained a known bug that prevented proper queue deployment.