Skip to main content
Configure horizontal pod autoscaling to automatically adjust the number of replicas based on resource utilization.

Field Reference

FieldTypeDescription
enabledbooleanEnable autoscaling
minInstancesintegerMinimum number of replicas
maxInstancesintegerMaximum number of replicas
cpuThresholdPercentintegerCPU usage threshold (0-100)
memoryThresholdPercentintegerMemory usage threshold (0-100)

Basic Configuration

services:
  - name: api
    # ...
    autoscaling:
      enabled: true
      minInstances: 2
      maxInstances: 10
      cpuThresholdPercent: 80
      memoryThresholdPercent: 80
When autoscaling is enabled, the instances field is ignored. The autoscaler manages replica count automatically.
For high availability, set minInstances to at least 3. See High Availability Applications for more details.

How It Works

When either CPU or memory usage exceeds your configured threshold, Porter automatically adds replicas. When usage drops, replicas are removed (down to your minimum).

Example: Autoscaling in Action

Consider an API service with this configuration:
autoscaling:
  enabled: true
  minInstances: 2
  maxInstances: 10
  cpuThresholdPercent: 60
  memoryThresholdPercent: 80
Here’s how the autoscaler responds to changing load:
TimeAvg CPUAvg MemoryReplicasWhat Happens
t=030%40%2Baseline: both metrics below thresholds
t=175%50%4CPU (75%) exceeds 60% threshold → scale up
t=290%60%6CPU still high → continue scaling up
t=355%85%8CPU stabilized, but memory (85%) exceeds 80% → scale up
t=445%70%8Both metrics below thresholds → no change (cooldown period)
t=540%50%5Sustained low usage → scale down
t=635%45%2Continue scaling down to minimum
Key behaviors:
  • Either metric triggers scaling: If CPU or memory exceeds its threshold, replicas are added
  • Both must be low to scale down: Replicas are only removed when both CPU and memory are below their thresholds
  • Respects bounds: Replicas never drop below minInstances (2) or exceed maxInstances (10)
  • Gradual changes: The autoscaler adjusts incrementally, not all at once, to avoid oscillation

Custom Metrics Autoscaling (Prometheus)

Scale based on application-specific metrics like queue length, request latency, or custom business metrics.
FieldTypeDescription
customAutoscaling.prometheusMetricCustomAutoscaling.metricNamestringPrometheus metric name
customAutoscaling.prometheusMetricCustomAutoscaling.thresholdnumberThreshold value to trigger scaling
customAutoscaling.prometheusMetricCustomAutoscaling.querystringCustom PromQL query (optional, defaults to metric name)
services:
  - name: api
    # ...
    autoscaling:
      enabled: true
      minInstances: 1
      maxInstances: 10
      customAutoscaling:
        prometheusMetricCustomAutoscaling:
          metricName: "http_requests_per_second"
          threshold: 100
          query: "rate(http_requests_total[5m])"
Custom metrics autoscaling requires Prometheus to be accessible in your cluster. See Custom Metrics and Autoscaling for setup details.

Temporal Autoscaling

Scale Temporal workflow workers based on task queue depth. Porter monitors your Temporal task queues and automatically adjusts worker count.
Temporal autoscaling requires a Temporal integration to be configured. See Temporal Autoscaling for setup details.
FieldTypeDescription
temporalAutoscaling.temporalIntegrationIdstringUUID of the Temporal integration
temporalAutoscaling.taskQueuestringName of the Temporal task queue to monitor
temporalAutoscaling.targetQueueSizeintegerHow many queued tasks each replica should handle (e.g., set to 10 with 100 tasks queued → 10 replicas)
services:
  - name: temporal-worker
    # ...
    autoscaling:
      enabled: true
      minInstances: 2
      maxInstances: 50
      temporalAutoscaling:
        temporalIntegrationId: "550e8400-e29b-41d4-a716-446655440000"
        taskQueue: "my-task-queue"
        targetQueueSize: 10