Context

When running experiments on datasets with large inputs (600K-700K characters per row) in self-hosted LangSmith, you may notice that some runs are completely missing from the experiment results rather than showing as failed. This typically occurs when the backend attempts to compress large payloads but encounters filesystem issues.

Answer

This issue is caused by the LangSmith backend's compression process failing when it cannot access temporary directories to handle large input data. The backend attempts to compress data based on size, but encounters a FileNotFoundError when trying to write to temporary directories like /tmp.

To resolve this issue:

Mount the /tmp directory for your LangSmith backend pods in your Kubernetes deployment configuration
Ensure proper environment variables are set for handling large inputs:
- MAX_SIZE_POST_BODY_FIELD_KB: Set to a high value (e.g., 50000)
- MAX_TOKEN_CALCULATION_KB_SIZE: Set appropriately (e.g., 5000)
- TOKEN_CALCULATION_TIMEOUT_SEC: Increase timeout for large inputs (e.g., 30)
Verify S3 storage is properly configured if you're using blob storage for large objects

The root cause is that when processing large datasets, LangSmith's compression mechanism requires access to temporary file storage. Without proper /tmp directory mounting, the compression fails silently, causing runs to be dropped rather than showing error messages in the UI.

After implementing these changes, your experiments should process all rows in your dataset, including those with very large inputs.