Deployment guide
Production deployment patterns and recommendations.
SDK in production
- Set
timeoutto 30s or higher for LLM calls - Enable retry logic (SDK v2 does this by default with exponential backoff)
- Use connection pooling — create one Client instance per process
- Cache prompt definitions (they rarely change) — SDK does this automatically for 5 minutes
Environment variables
PH_API_KEY=...
PH_BASE_URL=https://api.mlpipeline-cloud.com/v1
PH_TIMEOUT_SECONDS=30
PH_MAX_RETRIES=3
Observability
Enable request logging:
client = Client(api_key="...", debug=True)
# Logs to stderr with trace IDs
Forward metrics to Datadog:
from promptlayer_hub.integrations import DatadogMiddleware
client.add_middleware(DatadogMiddleware(statsd_host="localhost"))
Cost controls
- Set per-user and per-workspace budget limits in dashboard
- Enable cost alerts (email + webhook) at 80% / 100% of budget
- Use prompt caching where possible — reduces token usage by 30-70%
High availability
Enterprise plan: multi-region deployment with automatic failover, RPO < 1 min, RTO < 5 min. Contact sales for details.