Overview
Two common issues affect Agent Builder (and Insights) on self-hosted LangSmith deployments:
Agent Builder pod fails to start due to license verification failure or stale deployment/session conflicts after a reinstall.
Agent Builder features fail at runtime because pods cannot resolve the external hostname from inside the cluster (DNS issue).
Issue 1 — Agent Builder Pod Fails to Start
Symptoms
Agent Builder pod crashes on startup with
ValueError: License verification failedBootstrap job hangs or times out during
helm upgradeBootstrap returns
409 Conflict— a tracing project namedagent-builderorclioalready exists
Cause
The license key was not updated before running the bootstrap job, or stale deployment/session records from a previous install are blocking re-creation.
Resolution
Step 1 — Clean up stale deployments via the API
Get session IDs for agent-builder and clio:
curl -X GET "https://<your-hostname>/api/v1/sessions" \
-H "x-api-key: YOUR_API_KEY" \
-H "X-Tenant-ID: YOUR_WORKSPACE_ID" \
-H "Content-Type: application/json" | jq '.[] | select(.name == "clio" or .name == "agent-builder") | {name, id}'Delete each deployment (with its tracing project):
curl -X DELETE "https://<your-hostname>/api/v2/deployments/<DEPLOYMENT_ID>?delete_tracing_project=true" \
-H "x-api-key: YOUR_API_KEY"Step 2 — If API deletion is blocked, use SQL
-- Soft delete (recommended — triggers reconciler cleanup)
UPDATE host_projects
SET status = 'AWAITING_DELETE', updated_at = now()
WHERE name IN ('clio', 'agent-builder') AND tenant_id = '<TENANT_UUID>';
-- Also clean up associated tracing projects if needed
DELETE FROM tracer_sessions
WHERE id = '<TRACER_SESSION_ID>';Step 3 — Re-run the bootstrap
helm upgrade langsmith langsmith/langsmith -n <namespace> -f values.yamlStep 4 — If Agent Builder UI doesn't appear after bootstrap, restart the frontend
kubectl rollout restart deployment langsmith-frontend -n <namespace>Note: If the license key was updated after the bootstrap job ran, reinstall the chart so bootstrap picks up the new key.
Issue 2 — Agent Builder Fails at Runtime (DNS / External Hostname)
Symptoms
Agent Builder UI loads but features don't work (threads, assistants, crons return errors)
Pods cannot reach
SMITH_BACKEND_ENDPOINTorGO_ENDPOINTnslookup <your-hostname>from inside the cluster failsHostname is only resolvable via
/etc/hostson the host machine, not inside the cluster
Cause
The agent bootstrap script hardcodes SMITH_BACKEND_ENDPOINT, GO_ENDPOINT, HOST_BACKEND_ENDPOINT, and MCP_SERVER_URL using the external hostname from config.hostname. Pods cannot resolve this hostname from inside the cluster when it's not backed by real DNS (e.g. a dummy domain set only in /etc/hosts).
Resolution
Option A — Patch the LGP custom resource directly (immediate workaround)
# Find the agent-builder LGP CR
kubectl get lgp -n <namespace>
# Edit it
kubectl edit lgp agent-builder -n <namespace>In spec.serverSpec.env, update these entries to use internal K8s service names:
SMITH_BACKEND_ENDPOINT=http://<release-name>-backend.<namespace>.svc.cluster.local:1984
GO_ENDPOINT=http://<release-name>-platform-backend.<namespace>.svc.cluster.local:1986
HOST_BACKEND_ENDPOINT=http://<release-name>-host-backend.<namespace>.svc.cluster.local:1985
MCP_SERVER_URL=http://<release-name>-platform-backend.<namespace>.svc.cluster.local:1986/mcpReplace <release-name> and <namespace> with your Helm release name and namespace.
⚠ These env vars will be overwritten on the next helm upgrade since bootstrap re-derives them from config.hostname.
Option B — Add a CoreDNS hosts block
Add your hostname to the CoreDNS ConfigMap so it resolves to the ingress IP from inside the cluster:
hosts {
<INGRESS_IP> <your-hostname>
fallthrough
}Then restart CoreDNS:
kubectl rollout restart deployment coredns -n kube-systemVerify resolution from inside a pod:
kubectl exec -n <namespace> <any-pod> -- nslookup <your-hostname>Notes
From v0.13.17+, Agent Builder communicates with backend services via Kube DNS by default. Upgrading to this version or later eliminates the need for the workarounds above (except for the default MCP server, which still requires external DNS).
The root cause of the LGP DNS issue is that unlike
tool-serverandtrigger-server(which use internal K8s service names), the bootstrap script setsSMITH_BACKEND_ENDPOINT,GO_ENDPOINT,HOST_BACKEND_ENDPOINT, andMCP_SERVER_URLto the external hostname — routing internal traffic externally.X-Tenant-IDin API calls is the workspace ID, found in the LangSmith UI URL.