Problem
When using init_chat_model with a timeout parameter, the actual timeout behavior may be longer than expected. For example, setting timeout=30 might result in waiting 90+ seconds before a timeout error is raised.
Causes
Three factors can contribute to this:
Issue | Cause | Multiplier |
Retry multiplication |
| 3× timeout |
Incomplete timeout config | Float only sets read timeout (OpenAI) | Connection hangs |
DNS multi-IP | N A records tried sequentially | N× timeout |
Cause 1: Retries Multiply the Timeout
Both OpenAI and Anthropic SDKs default to max_retries=2, meaning 3 total attempts. Each attempt waits for the timeout duration before retrying.
Example: timeout=30 with default retries = 90s actual timeout
Solution: Set max_retries=0:
from langchain.chat_models import init_chat_model
llm = init_chat_model(
"gpt-4o",
model_provider="openai",
timeout=30.0,
max_retries=0, # Disable retries
)Cause 2: Float Timeout May Not Set Connection Timeout (OpenAI)
For OpenAI-based models, a simple float like timeout=30 sets the read timeout, not the connection timeout. This means requests to unreachable endpoints may not timeout as expected.
Solution: Use httpx.Timeout for fine-grained control:
import httpx
from langchain.chat_models import init_chat_model
llm = init_chat_model(
"gpt-4o",
model_provider="openai",
timeout=httpx.Timeout(
connect=30.0, # Connection timeout (like curl --connect-timeout)
read=60.0, # Time to wait for response data
write=30.0, # Time to wait for sending data
pool=30.0 # Time to wait for connection from pool
),
max_retries=0,
)For Anthropic Provider
Float timeout works for connection timeout, but still set max_retries=0:
from langchain.chat_models import init_chat_model
llm = init_chat_model(
"claude-sonnet-4-20250514",
model_provider="anthropic",
timeout=30.0,
max_retries=0,
)Note: Anthropic does not support httpx.Timeout - only float values are accepted.
Cause 3: DNS Multi-IP Timeout Multiplication
Even with max_retries=0 and proper httpx.Timeout configuration, timeouts may still be longer than expected if your endpoint's hostname resolves to multiple IP addresses (A records).
Problem
When a hostname resolves to multiple IP addresses, httpx tries each IP sequentially, applying the full connect timeout to each attempt:
3 IPs × 10s timeout = 30s actual timeoutThis happens because httpx doesn't implement Happy Eyeballs (RFC 8305), which would race connections in parallel like browsers do.
How to Check Your DNS
# Check how many A records your endpoint has
dig +short your-endpoint.com AOr in Python:
import socket
_, _, ips = socket.gethostbyname_ex("your-endpoint.com")
print(f"Found {len(ips)} A records: {ips}")Solution: Dynamic Timeout Based on DNS
Calculate the per-IP timeout by dividing your desired total timeout by the number of DNS A records:
import socket
import httpx
from langchain.chat_models import init_chat_model
hostname = "your-endpoint.com"
desired_timeout = 10.0
# Get number of A records
_, _, ips = socket.gethostbyname_ex(hostname)
per_ip_timeout = desired_timeout / len(ips)
print(f"Found {len(ips)} IPs, using {per_ip_timeout:.2f}s per IP")
llm = init_chat_model(
"gpt-4o",
model_provider="openai",
base_url=f"https://{hostname}",
timeout=httpx.Timeout(
connect=per_ip_timeout,
read=60.0,
write=10.0,
pool=10.0
),
max_retries=0,
)Helper Function
import socket
import httpx
def get_timeout_for_hostname(hostname: str, desired_timeout: float) -> httpx.Timeout:
"""
Calculate timeout accounting for multiple DNS A records.
Args:
hostname: The endpoint hostname (without https://)
desired_timeout: Total desired timeout in seconds
Returns:
httpx.Timeout configured for predictable behavior
"""
try:
_, _, ips = socket.gethostbyname_ex(hostname)
per_ip_timeout = desired_timeout / max(len(ips), 1)
print(f"DNS has {len(ips)} A records, using {per_ip_timeout:.1f}s connect timeout")
except socket.gaierror:
per_ip_timeout = desired_timeout
return httpx.Timeout(
connect=per_ip_timeout,
read=60.0,
write=10.0,
pool=10.0
)
# Usage
hostname = "your-endpoint.com"
timeout = get_timeout_for_hostname(hostname, desired_timeout=10.0)
llm = init_chat_model(
"gpt-4o",
model_provider="openai",
base_url=f"https://{hostname}",
timeout=timeout,
max_retries=0,
)Complete Example
Combining all solutions for predictable timeout behavior:
import socket
import httpx
from langchain.chat_models import init_chat_model
# Configuration
hostname = "your-endpoint.com"
desired_connect_timeout = 10.0
read_timeout = 60.0
# Account for multiple DNS A recordstry:
_, _, ips = socket.gethostbyname_ex(hostname)
per_ip_timeout = desired_connect_timeout / len(ips)
print(f"Endpoint has {len(ips)} A records, using {per_ip_timeout:.1f}s per IP")
except socket.gaierror as e:
print(f"DNS resolution failed: {e}")
per_ip_timeout = desired_connect_timeout
# Create LLM with predictable timeout
llm = init_chat_model(
"gpt-4o",
model_provider="openai",
base_url=f"https://{hostname}",
timeout=httpx.Timeout(
connect=per_ip_timeout,
read=read_timeout,
write=10.0,
pool=10.0
),
max_retries=0,
)Summary
Provider | Timeout Type |
| DNS Handling | Behavior |
OpenAI |
| default (2) | None | Unpredictable |
OpenAI |
|
| None | Predictable (single IP) |
OpenAI |
|
| Dynamic | Predictable (multi IP) |
Anthropic |
|
| N/A | Predictable |
Key Takeaways
Always set
max_retries=0when you need predictable timeout behaviorUse
httpx.Timeoutfor OpenAI to control connection timeoutCheck DNS A records if timeout is still longer than expected — divide timeout by IP count
Technical Background
Why httpx doesn't race connections
httpx uses httpcore as its transport layer, which calls anyio for async networking.
While anyio supports Happy Eyeballs via happy_eyeballs_delay parameter since v1.2.0, httpcore doesn't pass this parameter through. This means connections are tried sequentially rather than in parallel.
Happy Eyeballs (RFC 8305)
Happy Eyeballs is an algorithm that races connection attempts to multiple IP addresses in parallel, using the first successful connection. This is what browsers and curl do, resulting in faster and more predictable connection times.
Python's asyncio supports this natively since 3.8:
await loop.create_connection(
protocol_factory,
host='example.com',
port=443,
happy_eyeballs_delay=0.25 # 250ms between attempts
)But httpx/httpcore don't use this feature yet.