Emergencies and incidents

Something's broken. Fix it fast.

Scenarios covered:

I want to make a quick fix during an incident
I want to understand why an operation failed

I want to make a quick fix during an incident

When to use this

Use this workflow when you:

Need to apply an urgent patch to a live resource with minimal blast radius — a security group is missing an ingress rule, a config value is wrong, a resource needs to be added immediately
Want to make a targeted change without risking the rest of your infrastructure
Made a hotfix via the console and want to capture it in code

Patch mode is your friend here. It creates and updates only — never destroys. Minimal blast radius, maximum speed.

What you'll do

Assess the situation — check what's currently deployed
Write a surgical forma — target exactly what needs to change
Apply with patch mode — fast, additive, safe
Verify the fix — confirm the change landed

Step-by-step

Before you begin: review the baseline prerequisites and make sure you know the stack and target for the affected resources.

Step 1: Assess what's deployed

Check the current state of the resources you need to fix:

formae inventory resources --query="stack:production"

Narrow it down:

formae inventory resources --query="stack:production type:AWS::EC2::SecurityGroup"

If you need the full properties of a specific resource, extract it:

formae extract --query="stack:production label:api-sg" current-state.pkl

This gives you a Pkl file with the resource's current configuration — a useful starting point for your fix.

Step 2: Write the fix

Create a minimal forma that targets only what needs to change. You don't need to include every resource in the stack — just the ones you're fixing.

Example: Open an emergency port on a security group

amends "@formae/forma.pkl"
import "@formae/formae.pkl"
import "@aws/aws.pkl"
import "@aws/ec2/securitygroup.pkl"

forma {
  new formae.Stack {
    label = "production"
  }

  new formae.Target {
    label = "prod-target"
    config = new aws.Config {
      region = "us-east-1"
    }
  }

  new securitygroup.SecurityGroup {
    label = "api-sg"
    groupDescription = "API security group"
    ingress {
      new {
        fromPort = 443
        toPort = 443
        protocol = "tcp"
        cidrBlocks { "0.0.0.0/0" }
      }
      new {
        fromPort = 8080
        toPort = 8080
        protocol = "tcp"
        cidrBlocks { "10.0.0.0/8" }  // Emergency: allow internal health checks
      }
    }
  }
}

Note: This example uses AWS. The same workflow applies to any technology that has a formae plugin.

Tip: The file doesn't need to live in your repo. Create it anywhere — /tmp/hotfix.pkl works fine. You can formalize it later.

Step 3: Simulate first (if you have 30 seconds)

Even under pressure, a quick simulation can prevent making things worse:

formae apply --mode patch --simulate hotfix.pkl

This runs instantly — no cloud API calls. It shows exactly what will be created or updated.

Step 4: Apply the fix

formae apply --mode patch --watch hotfix.pkl

If you're confident and want to skip the confirmation prompt:

formae apply --mode patch --watch --yes hotfix.pkl

If the change conflicts with existing state

If formae detects a conflict (someone else changed the resource since the last sync), use --force:

formae apply --mode patch --watch --force hotfix.pkl

--force tells formae to overwrite the current state with your forma, even if external changes were detected.

Step 5: Verify the fix

Confirm the change is live:

formae inventory resources --query="stack:production label:api-sg"

Or check the command status:

formae status command --output-layout=detailed

Step 6: Stop a runaway command (if needed)

If you applied something and it's making things worse:

formae cancel

This cancels the most recent in-progress command. Operations already completed won't be rolled back, but pending operations will stop.

Tips + gotchas

Tip	Details
Temporary files are fine	Write the fix in `/tmp`, apply it, formalize later. Don't let Git workflows slow you down during an incident.

Common gotcha: If you apply a patch fix and later run formae apply --mode reconcile with an older forma that doesn't include your fix, reconcile will revert the emergency change. Always merge emergency fixes into your committed code before the next reconcile.

What's next

Goal	Guide
Detect out-of-band changes automatically	Core Concepts → Synchronization
Cancel a running command	CLI → Cancel
Monitor command history	CLI → Status

I want to understand why an operation failed

When to use this

Use this workflow when you:

An apply or destroy returned a failure and you need to figure out what went wrong
A pipeline is failing and the summary output doesn't tell you enough
Want to see which specific resource caused the problem

What you'll do

Check recent command status — find the failed command
Switch to detailed output — see per-resource results
Dig into the specific failure — isolate the root cause

Step-by-step

Before you begin: review the baseline prerequisites.

Step 1: Check recent commands

See your most recent command status:

formae status command

This shows a summary view — command ID, status, start time.

Step 2: Find failed commands

Filter to just failures:

formae status command --query="status:Failed"

To see only your own commands (not other team members'):

formae status command --query="client:me status:Failed"

Step 3: Get the detailed breakdown

Switch to the detailed layout to see individual resource update states:

formae status command --output-layout=detailed

This shows a tree view with each resource update and its status. Failed entries stand out — look for them to find which resource caused the problem.

For a specific command by ID:

formae status command --query="id:01JXYZABC1234567890" --output-layout=detailed

Step 4: Get machine-readable output for scripts

In a pipeline, parse the failure programmatically:

formae status command --query="status:Failed" \
  --output-consumer=machine \
  --output-schema=json

Step 5: Check the agent logs

If the status output isn't enough, open the agent log:

tail -100 ~/.pel/formae/log/formae.log

Look for error messages near the timestamp of the failed command.

Tips + gotchas

Tip	Details
Query fields for status	`id`, `status` (NotStarted, InProgress, Success, Failed, Canceled), `client` (use `client:me`), `command` (apply, destroy, sync), `stack`
Watch mode	Add `--watch` to refresh status every 2 seconds. Useful for monitoring a running command.

Common gotcha: The summary layout only shows the overall command status. If you see "Failed" but don't know why, you almost certainly need --output-layout=detailed to see which resource update failed and the error message.

What's next

Goal	Guide
Fix the problem quickly	I want to make a quick fix during an incident
Cancel a stuck command	CLI → Cancel
CLI reference	CLI → Status