Summary
Self-hosted LangSmith upgrades to chart versions 0.13.31–0.13.36 cause the go-backend to panic on startup with a 401 CannotVerifyCopySource error when Azure Blob Storage has firewall restrictions and uses connection string/account key authentication.
Issue Description
After upgrading a self-hosted LangSmith instance from Helm chart version 0.13.28+ to versions 0.13.31 through 0.13.36, the langsmith-go-backend container panics on startup during a blob storage health check.
The panic originates from storage/s3.go:106 in NewBlobStorageClient, specifically a "copy blob test" that performs a server-side copy operation against Azure Blob Storage.
Affected components:
- langsmith-platform-backend (langsmith-go-backend image)
- langsmith-ingest-queue (uses the same image)
Error message example:
panic: blob-storage health-check failed: copy blob test failed: PUT
https://<account>.blob.core.windows.net/<container>/langsmith-health-check/copy-object-test-...
RESPONSE 401: Server failed to authenticate the request.
ERROR CODE: CannotVerifyCopySource
CopySourceErrorCode: NoAuthenticationInformationAdditionally, the same underlying bug silently affected TTL tier copy operations (run data promotion from ttl_s/ to ttl_l/), not just the startup health check. Environments with TTL tiering configured may have experienced failures in that path as well.
Environment
- Product: LangSmith (Self-Hosted)
- Helm Chart Versions Affected: 0.13.31 through 0.13.36 (Go backend image 0.13.35)
- Last Known Working Version: 0.13.28
- Platform: Azure Kubernetes Service (AKS)
- Storage: Azure Blob Storage with firewall enabled (no public access)
- Authentication Method: Connection string / account key
Cause
The copy-blob health check, introduced between chart versions 0.13.28 and 0.13.31, uses Azure's CopyFromURL API. This API requires the source URL to be either publicly accessible or signed with a SAS token.
When the Azure Blob Storage account is behind a firewall (no public access) and authentication is performed via a connection string or account key, the source URL used in the copy operation is constructed without any authentication credentials. Azure therefore cannot read the source blob to perform the server-side copy, resulting in a 401 CannotVerifyCopySource / NoAuthenticationInformation error.
Workaround
Pin the Go backend image to the last known working version by setting the following in your Helm values:
images:
platformBackendImage:
tag: "0.13.28"Note: This workaround bypasses the health check but also reverts to an older backend version, potentially missing other fixes and features. Use only as a temporary measure.
Resolution
Upgrade to Helm chart version 0.13.37 or later. This release includes a fix that:
1. Generates a short-lived SAS read URL for the source blob before calling CopyFromURL, which works correctly with private containers authenticating via account key.
2. For authentication methods that do not support SAS generation (e.g., managed identity), it falls back to async copy polling.
Steps:
1. Update your Helm chart to version 0.13.37 or later.
2. Remove any image tag pin (e.g., images.platformBackendImage.tag: "0.13.28") from your Helm values.
3. Deploy the upgrade and verify that the langsmith-go-backend pods start without panic errors.
4. If TTL tiering is configured, confirm that run data promotion between ttl_s/ and ttl_l/ is functioning correctly.