Skip to content

10 - Error Handling

This section covers error handling patterns and the OperationErrorCode type in depth.

NotFound Revisited

We've already seen NotFound in action during 06 - Read and 08 - Delete. This error code deserves special attention because its meaning changes depending on the operation:

Operation NotFound Meaning Agent Behavior
Read Resource was deleted out-of-band Marks resource as deleted in inventory
Delete Resource already gone Treats as success (idempotent)
Update Resource disappeared before update Retries (recoverable error)
Status Operation tracking lost Retries (recoverable error)

The agent handles NotFound intelligently based on context, so your plugin should return it whenever the resource doesn't exist rather than trying to handle the semantics yourself.

OperationErrorCode

When an operation fails, set the ErrorCode field to tell the agent what went wrong:

type OperationErrorCode string

const (
    // Validation and request errors
    OperationErrorCodeInvalidRequest   OperationErrorCode = "InvalidRequest"
    OperationErrorCodeNotUpdatable     OperationErrorCode = "NotUpdatable"

    // Authentication and authorization
    OperationErrorCodeAccessDenied                  OperationErrorCode = "AccessDenied"
    OperationErrorCodeInvalidCredentials            OperationErrorCode = "InvalidCredentials"
    OperationErrorCodeUnauthorizedTaggingOperation  OperationErrorCode = "UnauthorizedTaggingOperation"

    // Resource state errors
    OperationErrorCodeNotFound         OperationErrorCode = "NotFound"
    OperationErrorCodeAlreadyExists    OperationErrorCode = "AlreadyExists"
    OperationErrorCodeResourceConflict OperationErrorCode = "ResourceConflict"
    OperationErrorCodeNotStabilized    OperationErrorCode = "NotStabilized"

    // Service errors
    OperationErrorCodeThrottling              OperationErrorCode = "Throttling"
    OperationErrorCodeServiceLimitExceeded    OperationErrorCode = "ServiceLimitExceeded"
    OperationErrorCodeServiceInternalError    OperationErrorCode = "ServiceInternalError"
    OperationErrorCodeServiceTimeout          OperationErrorCode = "ServiceTimeout"
    OperationErrorCodeGeneralServiceException OperationErrorCode = "GeneralServiceException"

    // Infrastructure errors
    OperationErrorCodeNetworkFailure   OperationErrorCode = "NetworkFailure"
    OperationErrorCodeInternalFailure  OperationErrorCode = "InternalFailure"

    // Formae-specific
    OperationErrorCodeDependencyFailure OperationErrorCode = "DependencyFailure"
    OperationErrorCodePluginNotFound    OperationErrorCode = "PluginNotFound"
)

Recoverable vs Non-Recoverable Errors

The agent classifies errors into two categories:

  • Recoverable errors - Transient failures that might succeed on retry. The agent will retry up to a configurable number of attempts (default: 3 retries, so 4 total attempts).
  • Non-recoverable errors - Permanent failures that won't be resolved by retrying. The agent fails the operation immediately.

Recoverable Errors

Use these when the failure might be temporary:

Error Code When to Use Retry Behavior
Throttling Rate limited by the infrastructure API Exponential backoff (doubles each attempt, max 30s)
ServiceInternalError Infrastructure returned 5xx error Fixed delay between retries
ServiceTimeout Infrastructure operation timed out Fixed delay between retries
NetworkFailure Connection failed, DNS error, etc. Fixed delay between retries
InternalFailure Plugin-side unexpected error Fixed delay between retries
NotStabilized Resource not yet ready Fixed delay between retries
NotFound Resource doesn't exist (context-dependent) Fixed delay between retries

Non-Recoverable Errors

Use these when retrying won't help:

Error Code When to Use
InvalidRequest Malformed request, missing required fields, invalid property values
NotUpdatable Attempted to change an immutable property
AccessDenied Principal lacks permission for this operation
InvalidCredentials Authentication failed (bad username/password/token)
AlreadyExists Create called but resource already exists
ResourceConflict Resource is busy or in an incompatible state
ServiceLimitExceeded Quota exceeded (won't change without user action)
DependencyFailure A resource this depends on failed

StatusMessage

The StatusMessage field provides human-readable context about the error. Always set it when returning a failure:

// Good: specific and actionable
StatusMessage: fmt.Sprintf("failed to connect to %s:%s: %v", host, port, err)

// Good: includes the underlying error
StatusMessage: fmt.Sprintf("failed to create file: %v", err)

// Bad: too vague
StatusMessage: "operation failed"

// Bad: missing underlying cause
StatusMessage: "connection error"

The agent displays StatusMessage in CLI output and logs, so make it descriptive enough that operators can diagnose the issue.

Mapping Infrastructure Errors

When your plugin calls an external API, map the response to the appropriate error code. Here's a pattern from the OVH plugin:

func mapAPIError(err error) resource.OperationErrorCode {
    errStr := err.Error()

    switch {
    case strings.Contains(errStr, "404"), strings.Contains(errStr, "not found"):
        return resource.OperationErrorCodeNotFound
    case strings.Contains(errStr, "409"), strings.Contains(errStr, "conflict"):
        return resource.OperationErrorCodeAlreadyExists
    case strings.Contains(errStr, "401"), strings.Contains(errStr, "unauthorized"):
        return resource.OperationErrorCodeAccessDenied
    case strings.Contains(errStr, "403"), strings.Contains(errStr, "forbidden"):
        return resource.OperationErrorCodeAccessDenied
    case strings.Contains(errStr, "400"), strings.Contains(errStr, "bad request"):
        return resource.OperationErrorCodeInvalidRequest
    case strings.Contains(errStr, "429"), strings.Contains(errStr, "rate limit"):
        return resource.OperationErrorCodeThrottling
    case strings.Contains(errStr, "500"), strings.Contains(errStr, "internal server error"):
        return resource.OperationErrorCodeServiceInternalError
    case strings.Contains(errStr, "503"), strings.Contains(errStr, "service unavailable"):
        return resource.OperationErrorCodeServiceInternalError
    case strings.Contains(errStr, "quota"):
        return resource.OperationErrorCodeServiceLimitExceeded
    default:
        return resource.OperationErrorCodeInternalFailure
    }
}

For our SFTP plugin, we only need to handle a few cases:

func mapSFTPError(err error) resource.OperationErrorCode {
    if errors.Is(err, asyncsftp.ErrNotFound) {
        return resource.OperationErrorCodeNotFound
    }
    if errors.Is(err, asyncsftp.ErrPermissionDenied) {
        return resource.OperationErrorCodeAccessDenied
    }
    // Network and other errors are internal failures
    return resource.OperationErrorCodeInternalFailure
}

Never Return Go Errors for Expected Conditions

Plugin methods return (Result, error). Reserve the error return for truly exceptional conditions like panics or bugs. Expected conditions should be communicated through the result:

// Correct: NotFound is expected, return it in the result
if errors.Is(err, asyncsftp.ErrNotFound) {
    return &resource.ReadResult{
        ErrorCode: resource.OperationErrorCodeNotFound,
    }, nil  // No Go error
}

// Incorrect: don't return a Go error for expected conditions
if errors.Is(err, asyncsftp.ErrNotFound) {
    return nil, fmt.Errorf("resource not found")  // Don't do this
}

Complete Error Handling Example

Here's how a well-structured Create implementation handles errors:

func (p *Plugin) Create(ctx context.Context, req *resource.CreateRequest) (*resource.CreateResult, error) {
    // Validation error - non-recoverable
    props, err := parseFileProperties(req.Properties)
    if err != nil {
        return &resource.CreateResult{
            ProgressResult: &resource.ProgressResult{
                Operation:       resource.OperationCreate,
                OperationStatus: resource.OperationStatusFailure,
                ErrorCode:       resource.OperationErrorCodeInvalidRequest,
                StatusMessage:   fmt.Sprintf("invalid properties: %v", err),
            },
        }, nil
    }

    // Connection error - recoverable
    client, err := p.getClient(req.TargetConfig)
    if err != nil {
        return &resource.CreateResult{
            ProgressResult: &resource.ProgressResult{
                Operation:       resource.OperationCreate,
                OperationStatus: resource.OperationStatusFailure,
                ErrorCode:       resource.OperationErrorCodeNetworkFailure,
                StatusMessage:   fmt.Sprintf("failed to connect: %v", err),
            },
        }, nil
    }

    // Start operation...
}

Summary

Aspect Guideline
Error classification Use recoverable codes for transient failures, non-recoverable for permanent ones
StatusMessage Always include a descriptive message with the underlying error
NotFound Return it consistently - the agent handles context-specific semantics
Go errors Only for truly exceptional conditions, not expected failures
Error mapping Create a helper function to map infrastructure errors to SDK codes

With proper error handling, your plugin integrates smoothly with the agent's retry logic and provides clear feedback to users when things go wrong.


Next: 11 - Conformance & CI - Set up conformance tests and continuous integration