10 - Error Handling
This section covers error handling patterns and the OperationErrorCode type in depth.
NotFound Revisited
We've already seen NotFound in action during 06 - Read and 08 - Delete. This error code deserves special attention because its meaning changes depending on the operation:
| Operation | NotFound Meaning | Agent Behavior |
|---|---|---|
| Read | Resource was deleted out-of-band | Marks resource as deleted in inventory |
| Delete | Resource already gone | Treats as success (idempotent) |
| Update | Resource disappeared before update | Retries (recoverable error) |
| Status | Operation tracking lost | Retries (recoverable error) |
The agent handles NotFound intelligently based on context, so your plugin should return it whenever the resource doesn't exist rather than trying to handle the semantics yourself.
OperationErrorCode
When an operation fails, set the ErrorCode field to tell the agent what went wrong:
type OperationErrorCode string
const (
// Validation and request errors
OperationErrorCodeInvalidRequest OperationErrorCode = "InvalidRequest"
OperationErrorCodeNotUpdatable OperationErrorCode = "NotUpdatable"
// Authentication and authorization
OperationErrorCodeAccessDenied OperationErrorCode = "AccessDenied"
OperationErrorCodeInvalidCredentials OperationErrorCode = "InvalidCredentials"
OperationErrorCodeUnauthorizedTaggingOperation OperationErrorCode = "UnauthorizedTaggingOperation"
// Resource state errors
OperationErrorCodeNotFound OperationErrorCode = "NotFound"
OperationErrorCodeAlreadyExists OperationErrorCode = "AlreadyExists"
OperationErrorCodeResourceConflict OperationErrorCode = "ResourceConflict"
OperationErrorCodeNotStabilized OperationErrorCode = "NotStabilized"
// Service errors
OperationErrorCodeThrottling OperationErrorCode = "Throttling"
OperationErrorCodeServiceLimitExceeded OperationErrorCode = "ServiceLimitExceeded"
OperationErrorCodeServiceInternalError OperationErrorCode = "ServiceInternalError"
OperationErrorCodeServiceTimeout OperationErrorCode = "ServiceTimeout"
OperationErrorCodeGeneralServiceException OperationErrorCode = "GeneralServiceException"
// Infrastructure errors
OperationErrorCodeNetworkFailure OperationErrorCode = "NetworkFailure"
OperationErrorCodeInternalFailure OperationErrorCode = "InternalFailure"
// Formae-specific
OperationErrorCodeDependencyFailure OperationErrorCode = "DependencyFailure"
OperationErrorCodePluginNotFound OperationErrorCode = "PluginNotFound"
)
Recoverable vs Non-Recoverable Errors
The agent classifies errors into two categories:
- Recoverable errors - Transient failures that might succeed on retry. The agent will retry up to a configurable number of attempts (default: 3 retries, so 4 total attempts).
- Non-recoverable errors - Permanent failures that won't be resolved by retrying. The agent fails the operation immediately.
Recoverable Errors
Use these when the failure might be temporary:
| Error Code | When to Use | Retry Behavior |
|---|---|---|
Throttling |
Rate limited by the infrastructure API | Exponential backoff (doubles each attempt, max 30s) |
ServiceInternalError |
Infrastructure returned 5xx error | Fixed delay between retries |
ServiceTimeout |
Infrastructure operation timed out | Fixed delay between retries |
NetworkFailure |
Connection failed, DNS error, etc. | Fixed delay between retries |
InternalFailure |
Plugin-side unexpected error | Fixed delay between retries |
NotStabilized |
Resource not yet ready | Fixed delay between retries |
NotFound |
Resource doesn't exist (context-dependent) | Fixed delay between retries |
Non-Recoverable Errors
Use these when retrying won't help:
| Error Code | When to Use |
|---|---|
InvalidRequest |
Malformed request, missing required fields, invalid property values |
NotUpdatable |
Attempted to change an immutable property |
AccessDenied |
Principal lacks permission for this operation |
InvalidCredentials |
Authentication failed (bad username/password/token) |
AlreadyExists |
Create called but resource already exists |
ResourceConflict |
Resource is busy or in an incompatible state |
ServiceLimitExceeded |
Quota exceeded (won't change without user action) |
DependencyFailure |
A resource this depends on failed |
StatusMessage
The StatusMessage field provides human-readable context about the error. Always set it when returning a failure:
// Good: specific and actionable
StatusMessage: fmt.Sprintf("failed to connect to %s:%s: %v", host, port, err)
// Good: includes the underlying error
StatusMessage: fmt.Sprintf("failed to create file: %v", err)
// Bad: too vague
StatusMessage: "operation failed"
// Bad: missing underlying cause
StatusMessage: "connection error"
The agent displays StatusMessage in CLI output and logs, so make it descriptive enough that operators can diagnose the issue.
Mapping Infrastructure Errors
When your plugin calls an external API, map the response to the appropriate error code. Here's a pattern from the OVH plugin:
func mapAPIError(err error) resource.OperationErrorCode {
errStr := err.Error()
switch {
case strings.Contains(errStr, "404"), strings.Contains(errStr, "not found"):
return resource.OperationErrorCodeNotFound
case strings.Contains(errStr, "409"), strings.Contains(errStr, "conflict"):
return resource.OperationErrorCodeAlreadyExists
case strings.Contains(errStr, "401"), strings.Contains(errStr, "unauthorized"):
return resource.OperationErrorCodeAccessDenied
case strings.Contains(errStr, "403"), strings.Contains(errStr, "forbidden"):
return resource.OperationErrorCodeAccessDenied
case strings.Contains(errStr, "400"), strings.Contains(errStr, "bad request"):
return resource.OperationErrorCodeInvalidRequest
case strings.Contains(errStr, "429"), strings.Contains(errStr, "rate limit"):
return resource.OperationErrorCodeThrottling
case strings.Contains(errStr, "500"), strings.Contains(errStr, "internal server error"):
return resource.OperationErrorCodeServiceInternalError
case strings.Contains(errStr, "503"), strings.Contains(errStr, "service unavailable"):
return resource.OperationErrorCodeServiceInternalError
case strings.Contains(errStr, "quota"):
return resource.OperationErrorCodeServiceLimitExceeded
default:
return resource.OperationErrorCodeInternalFailure
}
}
For our SFTP plugin, we only need to handle a few cases:
func mapSFTPError(err error) resource.OperationErrorCode {
if errors.Is(err, asyncsftp.ErrNotFound) {
return resource.OperationErrorCodeNotFound
}
if errors.Is(err, asyncsftp.ErrPermissionDenied) {
return resource.OperationErrorCodeAccessDenied
}
// Network and other errors are internal failures
return resource.OperationErrorCodeInternalFailure
}
Never Return Go Errors for Expected Conditions
Plugin methods return (Result, error). Reserve the error return for truly exceptional conditions like panics or bugs. Expected conditions should be communicated through the result:
// Correct: NotFound is expected, return it in the result
if errors.Is(err, asyncsftp.ErrNotFound) {
return &resource.ReadResult{
ErrorCode: resource.OperationErrorCodeNotFound,
}, nil // No Go error
}
// Incorrect: don't return a Go error for expected conditions
if errors.Is(err, asyncsftp.ErrNotFound) {
return nil, fmt.Errorf("resource not found") // Don't do this
}
Complete Error Handling Example
Here's how a well-structured Create implementation handles errors:
func (p *Plugin) Create(ctx context.Context, req *resource.CreateRequest) (*resource.CreateResult, error) {
// Validation error - non-recoverable
props, err := parseFileProperties(req.Properties)
if err != nil {
return &resource.CreateResult{
ProgressResult: &resource.ProgressResult{
Operation: resource.OperationCreate,
OperationStatus: resource.OperationStatusFailure,
ErrorCode: resource.OperationErrorCodeInvalidRequest,
StatusMessage: fmt.Sprintf("invalid properties: %v", err),
},
}, nil
}
// Connection error - recoverable
client, err := p.getClient(req.TargetConfig)
if err != nil {
return &resource.CreateResult{
ProgressResult: &resource.ProgressResult{
Operation: resource.OperationCreate,
OperationStatus: resource.OperationStatusFailure,
ErrorCode: resource.OperationErrorCodeNetworkFailure,
StatusMessage: fmt.Sprintf("failed to connect: %v", err),
},
}, nil
}
// Start operation...
}
Summary
| Aspect | Guideline |
|---|---|
| Error classification | Use recoverable codes for transient failures, non-recoverable for permanent ones |
| StatusMessage | Always include a descriptive message with the underlying error |
| NotFound | Return it consistently - the agent handles context-specific semantics |
| Go errors | Only for truly exceptional conditions, not expected failures |
| Error mapping | Create a helper function to map infrastructure errors to SDK codes |
With proper error handling, your plugin integrates smoothly with the agent's retry logic and provides clear feedback to users when things go wrong.
Next: 11 - Conformance & CI - Set up conformance tests and continuous integration