Summary
Severe latency with the GPT-4.1 model and unresolved runs in a Kubernetes-based self-hosted LangSmith setup were caused by a bug in OpenSSL v3.0.17, leading to memory corruption and pod crashes.
Issue Description
Users experienced significant performance degradation with the GPT-4.1 model, with response times exceeding two minutes. Concurrently, runs in the LangSmith platform were getting stuck in a "pending" state or appearing to hang despite a "success" status. Examination of the pod logs langgraph-queue-546ff6589-fcgp8 revealed the following errors, indicating a memory crash:
DEBUG | lex_machina.tools.vault:get_document_ids:196 - vault_api_url: https://domain.com/api
double free or corruption (out)
Fatal Python error: Aborted
Current thread 0x00007bf7fe7fc6c0 (most recent call first):
File "/usr/local/lib/python3.11/ssl.py", line 1382 in do_handshake
File "/usr/local/lib/python3.11/ssl.py", line 1104 in _create
memory crash.Environment
Products: LangSmith, LangGraph, GPT-4.1
Platform: Kubernetes (on-premise)
Cloud: Self-hosted cloud environment
Operating System: Linux (within the container)
Cause
The root cause of the issue was identified as a bug in OpenSSL v3.0.17 within the queue container. This bug, documented in an active GitHub issue, causes a segmentation fault when using a shared SSL context in a multi-threaded application, leading to memory corruption and crashes. The issue was triggered by parallel tool calls, one of which was to a Vault API.
Workaround
As a temporary workaround, the team switched from the GPT-4.1 model to the GPT-4o model and disabled parallel tool calls by setting parallel_tool_calls=false. This mitigated the issue by avoiding the conditions that triggered the OpenSSL bug.
Resolution
The definitive resolution is to address the OpenSSL bug in the container. This can be achieved by either downgrading or upgrading the OpenSSL version to a stable release that does not have this issue. After adjusting the OpenSSL version, parallel tool execution in LangGraph can be re-enabled.
References
OpenSSL GitHub Issue: https://github.com/openssl/openssl/issues/28171