Skip to main content

Replicas & autoscaling

A Replica is a single running instance of a service's promoted deployment. Each replica is isolated in its own namespaces + cgroup + overlay rootfs and is reachable on a private Unix socket. See Runtime isolation for the mechanics.

Autoscaling

The autoscaler (ADR-018) runs a periodic control loop that reads each service's cgroup CPU/memory usage and decides whether to add or drain replicas. It supports scale-to-zero with a single-flight cold start (only one replica is birthed at a time), bounded by a host resource ledger that reserves headroom so the node is never over-committed.

An AutoscalePolicy on a service carries:

FieldMeaning
min_replicas / max_replicasBounds; min_replicas: 0 enables scale-to-zero
target_cpu_pct / target_mem_pctUtilization targets that drive scale-up
scale_down_cooldown_sWait before draining after load drops
idle_timeout_sIdle time before scaling to zero

Tuning the node ledger

Host-wide headroom is configured with environment variables (see Configuration):

  • DENIA_AUTOSCALE_INTERVAL_S (default 15) — control-loop tick.
  • DENIA_AUTOSCALE_HEADROOM_CPU_MILLIS (default 1000) — reserved CPU.
  • DENIA_AUTOSCALE_HEADROOM_MEM_BYTES (default 512 MiB) — reserved memory.

Deploy → autoscale ownership of replicas is handed off explicitly so a new deployment and the autoscaler never fight over the same replica (ADR-028).