Emergencies and incidents
Something's broken. Fix it fast.
Scenarios covered:
I want to make a quick fix during an incident
When to use this
Use this workflow when you:
- Need to apply an urgent patch to a live resource with minimal blast radius — a security group is missing an ingress rule, a config value is wrong, a resource needs to be added immediately
- Want to make a targeted change without risking the rest of your infrastructure
- Made a hotfix via the console and want to capture it in code
Patch mode is your friend here. It creates and updates only — never destroys. Minimal blast radius, maximum speed.
What you'll do
- Assess the situation — check what's currently deployed
- Write a surgical forma — target exactly what needs to change
- Apply with patch mode — fast, additive, safe
- Verify the fix — confirm the change landed
Step-by-step
Before you begin: review the baseline prerequisites and make sure you know the stack and target for the affected resources.
Step 1: Assess what's deployed
Check the current state of the resources you need to fix:
formae inventory resources --query="stack:production"
Narrow it down:
formae inventory resources --query="stack:production type:AWS::EC2::SecurityGroup"
If you need the full properties of a specific resource, extract it:
formae extract --query="stack:production label:api-sg" current-state.pkl
This gives you a Pkl file with the resource's current configuration — a useful starting point for your fix.
Step 2: Write the fix
Create a minimal forma that targets only what needs to change. You don't need to include every resource in the stack — just the ones you're fixing.
Example: Open an emergency port on a security group
amends "@formae/forma.pkl"
import "@formae/formae.pkl"
import "@aws/aws.pkl"
import "@aws/ec2/securitygroup.pkl"
forma {
new formae.Stack {
label = "production"
}
new formae.Target {
label = "prod-target"
config = new aws.Config {
region = "us-east-1"
}
}
new securitygroup.SecurityGroup {
label = "api-sg"
groupDescription = "API security group"
ingress {
new {
fromPort = 443
toPort = 443
protocol = "tcp"
cidrBlocks { "0.0.0.0/0" }
}
new {
fromPort = 8080
toPort = 8080
protocol = "tcp"
cidrBlocks { "10.0.0.0/8" } // Emergency: allow internal health checks
}
}
}
}
Note: This example uses AWS. The same workflow applies to any technology that has a formae plugin.
Tip: The file doesn't need to live in your repo. Create it anywhere —
/tmp/hotfix.pklworks fine. You can formalize it later.
Step 3: Simulate first (if you have 30 seconds)
Even under pressure, a quick simulation can prevent making things worse:
formae apply --mode patch --simulate hotfix.pkl
This runs instantly — no cloud API calls. It shows exactly what will be created or updated.
Step 4: Apply the fix
formae apply --mode patch --watch hotfix.pkl
If you're confident and want to skip the confirmation prompt:
formae apply --mode patch --watch --yes hotfix.pkl
If the change conflicts with existing state
If formae detects a conflict (someone else changed the resource since the last sync), use --force:
formae apply --mode patch --watch --force hotfix.pkl
--force tells formae to overwrite the current state with your forma, even if external changes were detected.
Step 5: Verify the fix
Confirm the change is live:
formae inventory resources --query="stack:production label:api-sg"
Or check the command status:
formae status command --output-layout=detailed
Step 6: Stop a runaway command (if needed)
If you applied something and it's making things worse:
formae cancel
This cancels the most recent in-progress command. Operations already completed won't be rolled back, but pending operations will stop.
Tips + gotchas
| Tip | Details |
|---|---|
| Temporary files are fine | Write the fix in /tmp, apply it, formalize later. Don't let Git workflows slow you down during an incident. |
Common gotcha: If you apply a patch fix and later run
formae apply --mode reconcilewith an older forma that doesn't include your fix, reconcile will revert the emergency change. Always merge emergency fixes into your committed code before the next reconcile.
What's next
| Goal | Guide |
|---|---|
| Detect out-of-band changes automatically | Core Concepts → Synchronization |
| Cancel a running command | CLI → Cancel |
| Monitor command history | CLI → Status |
I want to understand why an operation failed
When to use this
Use this workflow when you:
- An apply or destroy returned a failure and you need to figure out what went wrong
- A pipeline is failing and the summary output doesn't tell you enough
- Want to see which specific resource caused the problem
What you'll do
- Check recent command status — find the failed command
- Switch to detailed output — see per-resource results
- Dig into the specific failure — isolate the root cause
Step-by-step
Before you begin: review the baseline prerequisites.
Step 1: Check recent commands
See your most recent command status:
formae status command
This shows a summary view — command ID, status, start time.
Step 2: Find failed commands
Filter to just failures:
formae status command --query="status:Failed"
To see only your own commands (not other team members'):
formae status command --query="client:me status:Failed"
Step 3: Get the detailed breakdown
Switch to the detailed layout to see individual resource update states:
formae status command --output-layout=detailed
This shows a tree view with each resource update and its status. Failed entries stand out — look for them to find which resource caused the problem.
For a specific command by ID:
formae status command --query="id:01JXYZABC1234567890" --output-layout=detailed
Step 4: Get machine-readable output for scripts
In a pipeline, parse the failure programmatically:
formae status command --query="status:Failed" \
--output-consumer=machine \
--output-schema=json
Step 5: Check the agent logs
If the status output isn't enough, open the agent log:
tail -100 ~/.pel/formae/log/formae.log
Look for error messages near the timestamp of the failed command.
Tips + gotchas
| Tip | Details |
|---|---|
| Query fields for status | id, status (NotStarted, InProgress, Success, Failed, Canceled), client (use client:me), command (apply, destroy, sync), stack |
| Watch mode | Add --watch to refresh status every 2 seconds. Useful for monitoring a running command. |
Common gotcha: The summary layout only shows the overall command status. If you see "Failed" but don't know why, you almost certainly need
--output-layout=detailedto see which resource update failed and the error message.
What's next
| Goal | Guide |
|---|---|
| Fix the problem quickly | I want to make a quick fix during an incident |
| Cancel a stuck command | CLI → Cancel |
| CLI reference | CLI → Status |