Deployment guide

Production deployment patterns and recommendations.

SDK in production

  • Set timeout to 30s or higher for LLM calls
  • Enable retry logic (SDK v2 does this by default with exponential backoff)
  • Use connection pooling — create one Client instance per process
  • Cache prompt definitions (they rarely change) — SDK does this automatically for 5 minutes

Environment variables

PH_API_KEY=...
PH_BASE_URL=https://api.mlpipeline-cloud.com/v1
PH_TIMEOUT_SECONDS=30
PH_MAX_RETRIES=3

Observability

Enable request logging:

client = Client(api_key="...", debug=True)
# Logs to stderr with trace IDs

Forward metrics to Datadog:

from promptlayer_hub.integrations import DatadogMiddleware
client.add_middleware(DatadogMiddleware(statsd_host="localhost"))

Cost controls

  • Set per-user and per-workspace budget limits in dashboard
  • Enable cost alerts (email + webhook) at 80% / 100% of budget
  • Use prompt caching where possible — reduces token usage by 30-70%

High availability

Enterprise plan: multi-region deployment with automatic failover, RPO < 1 min, RTO < 5 min. Contact sales for details.