Operating the AWS agent

Procedures that apply to a formae agent deployed on AWS, regardless of whether you followed the ECS Express or Standard ECS walkthrough.

Updating the agent

Roll the agent to a newer image tag by registering a new task definition revision and forcing a service deployment. Find the target tag at GitHub Releases.

NEW_TAG=<new-version>

# Clone the current task def, swap the image, register a new revision.
aws ecs describe-task-definition --task-definition formae-agent \
  --query 'taskDefinition' \
  | jq --arg image "ghcr.io/platform-engineering-labs/formae:$NEW_TAG" '
      del(.taskDefinitionArn, .revision, .status, .requiresAttributes,
          .compatibilities, .registeredAt, .registeredBy)
      | .containerDefinitions[0].image = $image
    ' > /tmp/agent-td.json

NEW_REV=$(aws ecs register-task-definition \
  --cli-input-json file:///tmp/agent-td.json \
  --query 'taskDefinition.revision' --output text)

# Force a new deployment and wait for it to stabilise.
aws ecs update-service \
  --service formae-agent \
  --task-definition formae-agent:$NEW_REV \
  --force-new-deployment

aws ecs wait services-stable --services formae-agent
rm /tmp/agent-td.json

For Standard ECS deployments, also pass --cluster formae to the update-service and wait calls (Express uses the default cluster).

If you're running a custom image (e.g. with the RDS CA bundle or extra plugins), substitute your registry path for the --arg image value above.

Important: The CLI enforces a strict version match against the agent. After rolling the agent, install the matching CLI version on your local machine:

formae upgrade --version <new-version>

Rolling back

To roll back to the previous task definition revision:

aws ecs list-task-definitions --family-prefix formae-agent --sort DESC --max-items 5

aws ecs update-service --service formae-agent \
  --task-definition formae-agent:<previous-revision> --force-new-deployment

aws ecs wait services-stable --services formae-agent

ECS Express services keep the canary-deployment rollback alarm enabled by default — failed deployments automatically roll back. For Standard ECS, configure deployment circuit breakers on the service if you want similar behaviour.

Custom images with extra plugins

The stock formae image ships with the standard plugin metapackage — AWS, Azure, GCP, OCI, OVH, and the auth-basic schema. Most real deployments need plugins beyond this set: Grafana for observability, Kubernetes, your own custom plugins, etc.

To add plugins, build a custom image FROM the stock formae image and run the install script for each additional plugin:

FROM ghcr.io/platform-engineering-labs/formae:<version>

USER root
RUN apt-get update && apt-get install -y --no-install-recommends jq curl && \
    HOME=/home/pel /bin/bash -e -c "$(curl -fsSL https://hub.platform.engineering/get/setup.sh)" \
      -- install --yes --channel stable grafana@<plugin-version> && \
    HOME=/home/pel /bin/bash -e -c "$(curl -fsSL https://hub.platform.engineering/get/setup.sh)" \
      -- install --yes --channel stable k8s@<plugin-version> && \
    apt-get remove -y jq curl && apt-get autoremove -y --purge && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
USER pel

jq and curl are required by the install script but aren't kept in the final image. All plugin installs run in the same RUN block so those tools are present when needed and removed at the end. Pin specific plugin versions for reproducibility.

If you're also baking in the RDS Global CA bundle for Standard ECS, combine both patterns in a single Dockerfile.

Build and push to your registry, then reference the custom image in the task definition. Roll the agent via the Updating the agent procedure above.

Aurora master credential rotation

If you're using Aurora as the agent's datastore (Data API or direct PostgreSQL), be aware of this: Aurora's managed master credential rotation (the default 30-day cycle) updates the master password on the cluster but does not update the agent's formae-config Secrets Manager secret, which holds the connection string. The agent will fail to connect after the next rotation.

The simplest mitigation is to disable managed rotation on the cluster's master credential and rotate manually when you choose to:

aws rds modify-db-cluster \
  --db-cluster-identifier formae-db \
  --manage-master-user-password false \
  --apply-immediately

After disabling, supply the master password directly with --master-user-password on cluster modifications.

In theory you could keep rotation enabled and wire a Lambda that re-renders the formae-config secret whenever the cluster's master credential rotates. We don't currently ship a reference implementation; disabling rotation is the practical recommendation.

Either way: address this before the 30-day mark — discovering it after the first rotation is no fun.

Backups

The agent itself is stateless; everything recoverable lives in the datastore. Aurora's default automated backups (1-day retention, point-in-time recovery within the retention window) cover the formae state. For longer retention, either bump the cluster's --backup-retention-period or run scheduled aws rds create-db-cluster-snapshot jobs.