Context

Users deploying applications on LangSmith Deployment need to understand the available hardware resources and scaling capabilities to properly plan their deployments and ensure optimal performance for their workloads.

Answer

Each LangSmith Deployment runs with the following specifications:

CPU: 2 cores per container
Memory: 2 GB RAM per container
Containers: Autoscales up to 10 containers

Important considerations:

Each worker is limited to 10 concurrent runs by default (N_JOBS_PER_WORKER)
Autoscaling is triggered based on:
- CPU usage (75% threshold)
- Memory usage (75% threshold)
- Pending runs (~10 per container)
Connection timeouts: Long-running requests (>10 minutes) may encounter timeout errors. Configure appropriate HTTP timeouts and implement retry logic for operations that may exceed default timeout limits
You may experience brief pending job spikes during container warm-up periods, even when CPU/memory usage is low

To optimize performance:

You can adjust N_JOBS_PER_WORKER to reduce queuing
Monitor your deployment's resource usage through the deployment monitoring tab
Consider breaking up memory-intensive workloads across multiple deployments if needed

For long-running operations, configure HTTP timeouts appropriately:

llm = ChatOpenAI(
    model="your-model",
    timeout=httpx.Timeout(
        timeout=1200.0,   # 20 minutes total
        read=1200.0,      # 20 minutes read timeout
        connect=30.0,     # 30 seconds to establish connection
        write=30.0,       # 30 seconds for write operations
    ),
    max_retries=2,
)

Note: Resource limits cannot be increased beyond these specifications. If you need additional capacity, consider architectural changes to your application to work within these constraints.

Context

Answer

Each LangSmith Deployment runs with the following specifications:

CPU: 2 cores per container
Memory: 2 GB RAM per container
Containers: Autoscales up to 10 containers

Important considerations:

Each worker is limited to 10 concurrent runs by default (N_JOBS_PER_WORKER)
Autoscaling is triggered based on:
- CPU usage (75% threshold)
- Memory usage (75% threshold)
- Pending runs (~10 per container)
Connection timeouts: Long-running requests (>10 minutes) may encounter timeout errors. Configure appropriate HTTP timeouts and implement retry logic for operations that may exceed default timeout limits
You may experience brief pending job spikes during container warm-up periods, even when CPU/memory usage is low

To optimize performance:

You can adjust N_JOBS_PER_WORKER to reduce queuing
Monitor your deployment's resource usage through the deployment monitoring tab
Consider breaking up memory-intensive workloads across multiple deployments if needed

For long-running operations, configure HTTP timeouts appropriately:

llm = ChatOpenAI(
    model="your-model",
    timeout=httpx.Timeout(
        timeout=1200.0,   # 20 minutes total
        read=1200.0,      # 20 minutes read timeout
        connect=30.0,     # 30 seconds to establish connection
        write=30.0,       # 30 seconds for write operations
    ),
    max_retries=2,
)

Note: Resource limits cannot be increased beyond these specifications. If you need additional capacity, consider architectural changes to your application to work within these constraints.

What are the hardware specifications and resource limits for LangSmith Deployments?

Context

Answer

What are the hardware specifications and resource limits for LangSmith Deployments?

Context

Answer