Context
When deploying LangSmith agents on Google Kubernetes Engine (GKE) using the Gateway API, you may encounter health check failures. This occurs because the GCP Load Balancer performs health checks on port 80 at path `/`, but LangSmith agents expose their health endpoint at `/ok` on port 8000. This mismatch causes the Load Balancer to incorrectly mark healthy agent pods as unhealthy, preventing proper traffic routing.
Answer
This is a known limitation with GKE Gateway API. The HealthCheckPolicy is GKE-specific and not part of the standard Ingress/Gateway specification. The external load balancer health check for GKE Gateway cannot be disabled and is always created for backends attached to a Gateway.
There are two approaches to resolve this issue:
Option 1: Create Manual HealthCheckPolicy Resources (Temporary Workaround)
For each agent deployment, create a HealthCheckPolicy that configures the correct port and path:
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
name: agent-name-healthcheck
namespace: langsmith
spec:
default:
checkIntervalSec: 10
timeoutSec: 5
healthyThreshold: 2
unhealthyThreshold: 2
config:
type: HTTP
httpHealthCheck:
port: 8000
requestPath: /ok
targetRef:
group: ""
kind: Service
name: ${agent-service-name}
This approach requires manual creation for every agent deployment and is not scalable for production environments.
Option 2: Use Envoy Gateway (Recommended)
The recommended long-term solution is to migrate from GKE Gateway to Envoy Gateway, which does not require external health checks and avoids this compatibility issue entirely.
Additional Configuration
If you continue using the workaround approach, ensure you disable the ingressHealthCheckEnable value in your configuration to prevent deployment timeouts despite having healthy agent backends.
Note: LangSmith does not officially support GKE Gateway due to this health check limitation. For production deployments, migrating to Envoy Gateway is strongly recommended.