Context
When scaling on-prem installed LangSmith to multiple instances, you may encounter PostgreSQL connection errors with the message "too many clients already". This typically occurs when your application attempts to establish more database connections than PostgreSQL's configured limit allows.
Answer
There are several solutions to resolve PostgreSQL connection limit issues, ranging from immediate fixes to long-term architectural improvements:
Short-term solutions:
Increase PostgreSQL connection limit:
ALTER SYSTEM SET max_connections = 200; SELECT pg_reload_conf();Optimize connection pool settings in your deployment:
Reduce ASYNCPG_POOL_MAX_SIZE from 3 to 2
Set ASYNCPG_POOL_MIN_SIZE = 1 to reduce idle connections
Review and adjust PostgreSQL connection timeouts
Long-term solutions:
Use a managed PostgreSQL service that handles connection limits better (e.g., Google Cloud SQL with managed connection pooling)
Implement a connection pooler like PgBouncer.
Add PostgreSQL read replicas to distribute read load
Implement connection retry logic with exponential backoff in your application
These solutions should be implemented based on your specific deployment size and requirements. For example, if you have 44 total pods with 3 connections each, you'll need to ensure your PostgreSQL max_connections setting can accommodate at least 132 connections, or adjust your connection pool settings accordingly.
Resources
https://docs.langchain.com/langsmith/troubleshooting
https://docs.langchain.com/langsmith/self-host-scale
https://docs.langchain.com/langsmith/script-running-pg-support-queries
https://cloud.google.com/sql/docs/postgres/managed-connection-pooling