Context
Users deploying applications on LangSmith Deployment need to understand the available hardware resources and scaling capabilities to properly plan their deployments and ensure optimal performance for their workloads.
Answer
Each LangSmith Deployment runs with the following specifications:
CPU: 2 cores per container
Memory: 2 GB RAM per container
Containers: Autoscales up to 10 containers
Important considerations:
Each worker is limited to 10 concurrent runs by default (N_JOBS_PER_WORKER)
Autoscaling is triggered based on:
CPU usage (75% threshold)
Memory usage (75% threshold)
Pending runs (~10 per container)
Connection timeouts: Long-running requests (>10 minutes) may encounter timeout errors. Configure appropriate HTTP timeouts and implement retry logic for operations that may exceed default timeout limits
You may experience brief pending job spikes during container warm-up periods, even when CPU/memory usage is low
To optimize performance:
You can adjust N_JOBS_PER_WORKER to reduce queuing
Monitor your deployment's resource usage through the deployment monitoring tab
Consider breaking up memory-intensive workloads across multiple deployments if needed
For long-running operations, configure HTTP timeouts appropriately:
llm = ChatOpenAI( model="your-model", timeout=httpx.Timeout( timeout=1200.0, # 20 minutes total read=1200.0, # 20 minutes read timeout connect=30.0, # 30 seconds to establish connection write=30.0, # 30 seconds for write operations ), max_retries=2, )
Note: Resource limits cannot be increased beyond these specifications. If you need additional capacity, consider architectural changes to your application to work within these constraints.