Context
Self-hosted LangSmith instances may experience failures when Redis runs out of disk space or memory. This typically manifests as backend pods crashing with errors like "No space left on device" or "MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk." These issues can cause your LangSmith instance to become unavailable with 404 or 403 errors.
Answer
There are several approaches to resolve Redis disk space issues, depending on your setup:
Immediate Resolution
For internal Redis (bundled with Helm chart): Increase the persistent volume claim (PVC) size by running:
kubectl patch pvc data-langsmith-redis-0 -p '{"spec":{"resources":{"requests":{"storage":"NEW_SIZE"}}}}'Replace
NEW_SIZEwith your desired size (e.g., "500Gi").Clear Redis data: Since Redis only stores ephemeral data, you can safely restart or clear Redis. You may lose a small backlog of jobs, but this will restore functionality.
Restart backend pods: After resolving the Redis issue, restart the LangSmith backend pods to restore connectivity.
Long-term Solutions
Switch to external Redis: For production environments, use an external Redis instance instead of the bundled one. External Redis provides better reliability and high availability. Configure this in your Helm values file.
Optimize Redis settings: Reduce the Redis TTL expiry time in your configuration:
settings: redisRunsExpirySeconds: "3600" # 1 hour instead of default 12 hoursEnable blob storage: Configure blob storage to offload larger payloads from Redis. Lower the blob threshold to move more data to blob storage:
blobStorageThresholdBytes: 10240 # 10KB thresholdScale your Redis instance: Ensure your Redis instance has adequate memory (we recommend at least 20GB to start) and CPU resources.
Prevention
Set up monitoring for your Redis instance to track memory usage, CPU utilization, and disk space. Follow the scaling documentation for production-ready configurations.
Note: The basic Redis deployment included with the Helm chart is not highly available and is not recommended for production use. Always use external Redis for production environments.