Context
When upgrading self-hosted LangSmith from Helm chart version 0.11.x to 0.12.x, you may encounter several issues including:
Migration job OOM errors - The
feedbackDataMigrationor other PostgreSQL migration jobs may run out of memory (OOMKilled) during the upgrade processBlank screen after SSO login - Some users may experience a blank screen after authenticating via AWS SSO or other identity providers
Bulk export validation failures - After upgrade, bulk export destinations may fail validation with errors about
include_bucket_in_prefixBulk export format_version errors - Existing bulk exports may fail with
ValidationError: Input should be 'v1' or 'v2_beta'due to incomplete migrations
These issues are more likely to occur on deployments with:
Large amounts of historical data (200+ days of traces)
High queue worker counts (60+ workers)
Large ClickHouse memory consumption (30+ GB)
Answer
Issue 1: Migration Job OOM Errors (feedbackDataMigration)
Root Cause: The feedbackDataMigration schema migration job processes all existing feedback data in memory. Unlike horizontally-scaled workloads, this single migration pod must process ALL feedback data, which can exceed default memory limits on deployments with extensive historical data.
Solution: Increase memory allocation for migration jobs before upgrading.
In your Helm values file, add or update the migration job resources:
migrations:
resources:
requests:
memory: "8Gi"
cpu: "1"
limits:
memory: "16Gi"
cpu: "2"Recommended memory settings based on deployment age:
< 90 days of data: 4Gi limit
90-180 days of data: 8Gi limit
180+ days of data: 16Gi limit (or higher)
Important: After a failed migration attempt:
Collect diagnostics immediately using the diagnostics script before Kubernetes garbage collection removes logs
Tail migration pod logs during the upgrade:
kubectl logs -f job/langsmith-pg-migrations -n <namespace>Monitor memory usage during migration
Issue 2: Blank Screen After SSO Login
Root Cause: This issue may be related to session handling changes in the 0.12.x release or browser cache conflicts after the upgrade.
Solution:
Have affected users clear browser cache and cookies for the LangSmith domain
Try an incognito/private browser window
If the issue persists, check backend logs for authentication errors:
kubectl logs -l app=langsmith-backend -n <namespace> | grep -i "auth\|sso\|session"Copy
Issue 3: Bulk Export Destination Validation Failures
Root Cause: Helm chart 0.12.x introduced the include_bucket_in_prefix parameter for bulk export destinations. Existing destinations may require this parameter to be explicitly set.
Solution: When creating or updating bulk export destinations, add the include_bucket_in_prefix parameter:
{
"destination_type": "s3",
"config": {
"bucket_name": "your-bucket-name",
"prefix": "langsmith-exports",
"s3_region": "us-east-1",
"include_bucket_in_prefix": true
}
}For existing destinations that were working before the upgrade, setting "include_bucket_in_prefix": true should restore functionality.
Issue 4: Bulk Export format_version Validation Errors
Symptoms:
ValidationError: 1 validation error for BulkExport
format_version
Input should be 'v1' or 'v2_beta' [type=enum, input_value=None, input_type=NoneType]The /bulk-exports API may also fail to list existing exports with the same error.
Root Cause: The database migration that sets the default format_version value for existing bulk exports may not have completed successfully. This migration should update all existing bulk export records with NULL format_version to 'v1'.
The relevant migrations are:
9e3fe47a4500→bulk export format_version(adds the column)ce08d43fb55d→bulk export format_version default(sets default value for existing records)
Solution:
Verify the issue by checking the bulk_exports table:
SELECT id, format_version FROM bulk_exports WHERE format_version IS NULL;Apply the fix manually if records have NULL format_version:
UPDATE bulk_exports SET format_version = 'v1' WHERE format_version IS NULL;Verify the alembic version to confirm migrations ran:
SELECT * FROM alembic_version;For Helm 0.12.31 (app 0.12.69), the expected alembic version should be
09f3b8e4b21for later.
Pre-Upgrade Checklist
Before upgrading from 0.11.x to 0.12.x:
Review the changelog for breaking changes at Self-hosted LangSmith Changelog
Note the breaking change in v0.12.0: The
langgraphPlatformoption is deprecated. Useconfig.deploymentinstead:# Old (deprecated) langgraphPlatform: enabled: true # New (v0.12.0+) config: deployment: enabled: trueIncrease migration job memory as described above
Backup your databases before upgrading
Run the upgrade during a maintenance window when you can monitor logs and respond to issues
Have the diagnostics script ready to capture logs immediately if issues occur
Post-Upgrade Verification
After the upgrade completes:
Verify all pods are running:
kubectl get pods -n <namespace>Verify migration jobs completed:
kubectl get jobs -n <namespace> # Both langsmith-pg-migrations and langsmith-ch-migrations should show COMPLETIONS: 1/1Check for migration logs:
kubectl logs job/langsmith-pg-migrations -n <namespace>Look for:
Running upgrade ... -> ...messages confirming migrations ranTest bulk exports if you use this feature
Test SSO login with multiple users