Context
After upgrading or performing multiple Helm upgrades on a self-hosted LangSmith instance, all UI tabs return 403 Forbidden and the Agent Builder page fails to load. The platform-backend logs show:
pgx error collecting org auth info
err: too many rows in result setThis error repeats on every authenticated request, affecting all users in the organization.
Root Cause
The platform-backend (langsmith-go-backend) uses pgx.CollectExactlyOneRow to query organization auth information. This query joins the provider_users table filtered by ls_user_id. If there are duplicate rows in provider_users for the same ls_user_id, the query returns multiple rows and fails, causing the backend to return 403 Forbidden on every request.
Duplicates can be created when:
Multiple Helm upgrades are performed with configuration changes (e.g., changing
initialOrgNameor auth settings between deployments)The auth bootstrap job runs multiple times under different conditions
Users are provisioned through both basic auth and another flow (e.g., invite or API)
Diagnosis
1) Confirm the error in platform-backend logs:
kubectl logs -n langsmith -l app.kubernetes.io/component=<release-name>-platform-backend --tail=100 | grep "too many rows"Look for repeated lines containing pgx error collecting org auth info and err: too many rows in result set. Note the ls_user_id value from the log entries.
2) Check for duplicate provider_users records:
SELECT id, provider, provider_user_id, ls_user_id, saml_provider_id, email, full_name,
hashed_password IS NOT NULL as has_password, created_at
FROM provider_users
WHERE ls_user_id = '<ls_user_id_from_logs>'
ORDER BY created_at;If this returns more than one row, the duplicate is confirmed.
Resolution
The fix is to ensure exactly one provider_users record exists per user. For basic auth deployments, the correct record must have provider = 'email', provider_user_id = NULL, and a valid hashed_password.
Important: Identify which record is the original basic auth record before deleting anything. The original record is typically the one created at the same timestamp as the user account (check the users table created_at), has provider = 'email', and has a non-null hashed_password. Deleting the wrong record will break login.
Run the fix in a single transaction to avoid any intermediate broken state:
BEGIN;
-- Remove the duplicate record (the one that is NOT the original basic auth record)
DELETE FROM provider_users WHERE id = '<duplicate_record_id>';
-- Verify exactly one record remains with correct attributes
SELECT id, provider, provider_user_id, hashed_password IS NOT NULL as has_password
FROM provider_users WHERE ls_user_id = '<ls_user_id>';
COMMIT;After the SQL fix, restart the platform-backend and backend to clear the Redis auth cache:
kubectl rollout restart deployment <release-name>-platform-backend -n langsmith
kubectl rollout restart deployment <release-name>-backend -n langsmithIf the wrong record was deleted (login breaks after the fix), re-insert the correct basic auth record by copying the password hash from the users table:
BEGIN;
DELETE FROM provider_users WHERE ls_user_id = '<ls_user_id>';
INSERT INTO provider_users (id, provider, provider_user_id, ls_user_id, saml_provider_id, email, full_name, hashed_password, created_at)
SELECT
gen_random_uuid(), 'email', NULL, u.ls_user_id, NULL,
lower(u.email), coalesce(u.full_name, ''), u.hashed_password, now()
FROM users u WHERE u.ls_user_id = '<ls_user_id>';
COMMIT;Then restart the platform-backend and backend pods again.