Context
When scaling your platform to a higher number of instances (e.g., 32), you may encounter PostgreSQL "too many connections" or "too many clients already" errors. This occurs because each backend pod establishes multiple database connections, and the total number of connections can exceed PostgreSQL's connection limit.
Answer
This issue happens when your application pods try to establish more database connections than PostgreSQL allows. Each backend pod uses connection pooling with a default pool size, **Note:** For LangGraph API deployments, the default `LANGGRAPH_POSTGRES_POOL_MAX_SIZE` is 150 connections per replica, which can quickly exhaust database connections when scaling, and when you have many pods (backend, platform-backend, and queue pods), the total connections can exceed the database limit.
Short-term solutions:
Increase PostgreSQL connection limit:
ALTER SYSTEM SET max_connections = 200;
SELECT pg_reload_conf();kubectl get pods -l app.kubernetes.io/name=langsmith
kubectl exec -it <any-backend-pod> -- env | grep ASYNCPG
kubectl exec -it <any-platform-backend-pod> -- env | grep ASYNCPGqueue:
deployment:
extraEnv:
- name: "ASYNCPG_POOL_MAX_SIZE"
value: "2"2. Reduce connection pool size per pod:
For LangGraph API deployments, set
LANGGRAPH_POSTGRES_POOL_MAX_SIZE:# Example: For 100 max_connections with buffer LANGGRAPH_POSTGRES_POOL_MAX_SIZE=40For other deployments, set
ASYNCPG_POOL_MAX_SIZE:kubectl get pods -l app.kubernetes.io/name=langsmith kubectl exec -it <any-backend-pod> -- env | grep ASYNCPG kubectl exec -it <any-platform-backend-pod> -- env | grep ASYNCPGThen modify your helm values.yaml:
queue: deployment: extraEnv: - name: "ASYNCPG_POOL_MAX_SIZE" value: "2"Add connection management:
Set ASYNCPG_POOL_MIN_SIZE = 1 to reduce idle connections and review PostgreSQL connection timeouts.
Long-term recommendations:
Use external PostgreSQL instead of in-cluster PostgreSQL, as managed services handle connection limits better and are more production-ready
Implement connection pooling with PgBouncer
Consider adding PostgreSQL read replicas to distribute read load
Implement connection retry logic with exponential backoff on the application side
The connection calculation varies by deployment type:
LangGraph API: Total connections = (number of replicas) × LANGGRAPH_POSTGRES_POOL_MAX_SIZE (default 150)
Other deployments: Total connections = (number of backend pods + platform-backend pods + queue pods) × ASYNCPG_POOL_MAX_SIZE
For example, 10 LangGraph API replicas with default settings can establish up to 1,500 connections. Ensure your total stays below PostgreSQL's max_connections setting, leaving buffer for superuser connections and overhead.