Problem
When running load tests against a self-hosted LangSmith instance, you may encounter rate limit errors like:
langsmith.utils.LangSmithRateLimitError: Rate limit exceeded for https://your-langsmith-url/api/v1/runs/multipart
HTTPError('429 Client Error: Too Many Requests for url: https://your-langsmith-url/api/v1/runs/multipart', '')This can also manifest as missing traces in the LangSmith UI, where traces appear incomplete or fail to be recorded.
Cause
While LangSmith Cloud has documented rate limits (6000 requests/10 seconds for the /runs/multipart endpoint), 429 errors in self-hosted deployments are often caused by infrastructure components, not LangSmith itself.
Common sources of 429 errors in self-hosted environments:
Web Application Firewall (WAF) - Rate limiting rules blocking high-volume traffic
Load Balancer - Request rate limits or connection limits
Ingress Controller - nginx or other ingress rate limiting configurations
API Gateway - If using an API gateway in front of LangSmith
Solution
Step 1: Identify the Source of 429 Errors
Check if LangSmith is actually generating the 429 errors:
Review LangSmith backend pod logs for 429 responses
If no 429s appear in LangSmith logs, the error is coming from an upstream infrastructure component
Step 2: Check Infrastructure Rate Limits
WAF (Web Application Firewall):
Review WAF rules for request rate limits
Check for blocked requests in WAF logs/metrics
Example: A WAF rule blocking IPs that send >10,000 requests in a 5-minute window
Load Balancer (AWS ALB, etc.):
Check for rate limiting or throttling configurations
Review connection limits and request quotas
Ingress Controller:
Check nginx ingress annotations for rate limiting:
nginx.ingress.kubernetes.io/limit-rps: "100" nginx.ingress.kubernetes.io/limit-connections: "10"
Step 3: Adjust Infrastructure Settings
Once identified, either:
Allowlist LangSmith traffic from rate limiting rules
Increase rate limit thresholds to accommodate load testing volumes
Disable rate limiting for internal/trusted traffic sources during load tests
Step 4: Scale LangSmith for High Throughput
For sustained high-throughput workloads, ensure your self-hosted LangSmith is properly scaled:
# Helm values.yaml for high-throughput scenarios
platformBackend:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
queue:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 8
targetCPUUtilizationPercentage: 70
backend:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 70Important Notes
Self-hosted LangSmith's internal rate limits are not currently configurable via helm values
Running a single backend pod is insufficient for load testing scenarios
Keep your LangSmith deployment up to date, as newer versions include performance improvements and rate limiter fixes
The LangSmith Python SDK includes retry logic for transient 429 errors, but sustained rate limiting will still cause trace loss