Alethia Labs

Runner Scaling

How Runners scale from zero to one when jobs arrive and back to zero when idle.

Runner Scaling

Runners run on AWS ECS Fargate with scale-to-zero enabled by default. When a job is queued, Console immediately invokes a Lambda scaler via its Function URL to spin up a Runner container. An EventBridge rule polls every minute as a fallback. When no jobs remain, the same Lambda scales containers back down after a short idle period.

Runner Scaling Architecture

Why Scale-to-Zero

Each Runner container ships Terraform, kubectl, Helm, AWS CLI, Google Cloud SDK, Azure CLI, and Infracost — roughly 500 MB of tooling. Running these 24/7 across multiple regions is expensive, but infrastructure provisioning happens in bursts: a user clicks "Apply," the Runner works for 5–15 minutes, then sits idle for hours.

Scale-to-zero eliminates idle costs entirely. The platform only pays for compute when jobs are actually running.

Architecture

Four components work together:

ComponentRole
ConsoleInvokes the Lambda scaler instantly via Function URL when a job is queued
Lambda scalerQueries Supabase for queued jobs, adjusts ECS service desired count
EventBridge ruleTriggers the Lambda every 1 minute as a fallback (also drives scale-down)
ECS FargateRuns Runner containers, pulls image from GHCR on scale-up

The Lambda function runs in eu-west-1 but manages ECS services across multiple regions (currently eu-west-1 and eu-central-1). Each Runner deployment gets its own ECS cluster, service, VPC, and secrets.

Scale-Up Flow

Job Created

A user clicks "Plan," "Apply," or "Destroy" in Console (or runs alethia plan / alethia apply / alethia destroy). A provision_jobs row is inserted with status QUEUED.

Console Notifies the Scaler

Immediately after inserting the job, Console sends a fire-and-forget POST to the Lambda scaler's Function URL. The Lambda queries Supabase's REST API for the count of QUEUED jobs using the content-range header. If the direct call fails, EventBridge retries within 60 seconds.

ECS Scales Up

If queued jobs exist and the service's desiredCount is 0, the Lambda calls ecs:UpdateService to set desiredCount to 1.

Container Starts

ECS Fargate pulls the Runner Docker image from GHCR (ghcr.io/bobikenobi12/runner:latest), injects secrets from AWS Secrets Manager, and starts the container. This takes roughly 30–60 seconds depending on image cache state.

Runner Claims Job

The Runner authenticates with Console using its worker token, enters its poll loop, and claims the queued job atomically via FOR UPDATE SKIP LOCKED. See Job Queue Pattern for claiming details.

Cold start latency: ~30–60 seconds from job creation to job claimed, dominated by ECS task startup (image pull + container init). The Lambda is invoked instantly by Console, so there is no polling delay. Subsequent jobs while the Runner is already running are claimed within seconds.

Scale-Down Flow

When the Lambda detects zero queued jobs but the ECS service is running (desiredCount > 0), it increments an idle counter for that service:

Check 1: 0 queued, 1 running → idle 1/5
Check 2: 0 queued, 1 running → idle 2/5
Check 3: 0 queued, 1 running → idle 3/5
Check 4: 0 queued, 1 running → idle 4/5
Check 5: 0 queued, 1 running → scale DOWN to 0

After 5 consecutive idle checks (5 minutes), the Lambda sets desiredCount to 0. ECS drains the running task and the Runner shuts down gracefully.

If a new job arrives during the idle countdown, the counter resets to zero immediately.

Heartbeat Monitoring

While running, a Runner sends a heartbeat to Console every 30 seconds via POST /api/workers/heartbeat. Each heartbeat includes the Runner's binary version and resets its status to ONLINE.

If heartbeats stop for 60 seconds, Console marks the Runner as OFFLINE. Jobs stuck in PROCESSING with no log activity for 5 minutes are marked FAILED — see Failure Recovery for details.

On this page