10 - Error Handling

This section covers error handling patterns and the OperationErrorCode type in depth.

NotFound Revisited

We've already seen NotFound in action during 06 - Read and 08 - Delete. This error code deserves special attention because its meaning changes depending on the operation:

Operation	NotFound Meaning	Agent Behavior
Read	Resource was deleted out-of-band	Marks resource as deleted in inventory
Delete	Resource already gone	Treats as success (idempotent)
Update	Resource disappeared before update	Retries (recoverable error)
Status	Operation tracking lost	Retries (recoverable error)

The agent handles NotFound intelligently based on context, so your plugin should return it whenever the resource doesn't exist rather than trying to handle the semantics yourself.

OperationErrorCode

When an operation fails, set the ErrorCode field to tell the agent what went wrong:

type OperationErrorCode string

const (
    // Validation and request errors
    OperationErrorCodeInvalidRequest   OperationErrorCode = "InvalidRequest"
    OperationErrorCodeNotUpdatable     OperationErrorCode = "NotUpdatable"

    // Authentication and authorization
    OperationErrorCodeAccessDenied                  OperationErrorCode = "AccessDenied"
    OperationErrorCodeInvalidCredentials            OperationErrorCode = "InvalidCredentials"
    OperationErrorCodeUnauthorizedTaggingOperation  OperationErrorCode = "UnauthorizedTaggingOperation"

    // Resource state errors
    OperationErrorCodeNotFound         OperationErrorCode = "NotFound"
    OperationErrorCodeAlreadyExists    OperationErrorCode = "AlreadyExists"
    OperationErrorCodeResourceConflict OperationErrorCode = "ResourceConflict"
    OperationErrorCodeNotStabilized    OperationErrorCode = "NotStabilized"

    // Service errors
    OperationErrorCodeThrottling              OperationErrorCode = "Throttling"
    OperationErrorCodeServiceLimitExceeded    OperationErrorCode = "ServiceLimitExceeded"
    OperationErrorCodeServiceInternalError    OperationErrorCode = "ServiceInternalError"
    OperationErrorCodeServiceTimeout          OperationErrorCode = "ServiceTimeout"
    OperationErrorCodeGeneralServiceException OperationErrorCode = "GeneralServiceException"

    // Infrastructure errors
    OperationErrorCodeNetworkFailure   OperationErrorCode = "NetworkFailure"
    OperationErrorCodeInternalFailure  OperationErrorCode = "InternalFailure"

    // Formae-specific
    OperationErrorCodeDependencyFailure OperationErrorCode = "DependencyFailure"
    OperationErrorCodePluginNotFound    OperationErrorCode = "PluginNotFound"
)

Recoverable vs Non-Recoverable Errors

The agent classifies errors into two categories:

Recoverable errors - Transient failures that might succeed on retry. The agent will retry up to a configurable number of attempts (default: 3 retries, so 4 total attempts).
Non-recoverable errors - Permanent failures that won't be resolved by retrying. The agent fails the operation immediately.

Recoverable Errors

Use these when the failure might be temporary:

Error Code	When to Use	Retry Behavior
`Throttling`	Rate limited by the infrastructure API	Exponential backoff (doubles each attempt, max 30s)
`ServiceInternalError`	Infrastructure returned 5xx error	Fixed delay between retries
`ServiceTimeout`	Infrastructure operation timed out	Fixed delay between retries
`NetworkFailure`	Connection failed, DNS error, etc.	Fixed delay between retries
`InternalFailure`	Plugin-side unexpected error	Fixed delay between retries
`NotStabilized`	Resource not yet ready	Fixed delay between retries
`NotFound`	Resource doesn't exist (context-dependent)	Fixed delay between retries

Non-Recoverable Errors

Use these when retrying won't help:

Error Code	When to Use
`InvalidRequest`	Malformed request, missing required fields, invalid property values
`NotUpdatable`	Attempted to change an immutable property
`AccessDenied`	Principal lacks permission for this operation
`InvalidCredentials`	Authentication failed (bad username/password/token)
`AlreadyExists`	Create called but resource already exists
`ResourceConflict`	Resource is busy or in an incompatible state
`ServiceLimitExceeded`	Quota exceeded (won't change without user action)
`DependencyFailure`	A resource this depends on failed

StatusMessage

The StatusMessage field provides human-readable context about the error. Always set it when returning a failure:

// Good: specific and actionable
StatusMessage: fmt.Sprintf("failed to connect to %s:%s: %v", host, port, err)

// Good: includes the underlying error
StatusMessage: fmt.Sprintf("failed to create file: %v", err)

// Bad: too vague
StatusMessage: "operation failed"

// Bad: missing underlying cause
StatusMessage: "connection error"

The agent displays StatusMessage in CLI output and logs, so make it descriptive enough that operators can diagnose the issue.

Mapping Infrastructure Errors

When your plugin calls an external API, map the response to the appropriate error code. Here's a pattern from the OVH plugin:

func mapAPIError(err error) resource.OperationErrorCode {
    errStr := err.Error()

    switch {
    case strings.Contains(errStr, "404"), strings.Contains(errStr, "not found"):
        return resource.OperationErrorCodeNotFound
    case strings.Contains(errStr, "409"), strings.Contains(errStr, "conflict"):
        return resource.OperationErrorCodeAlreadyExists
    case strings.Contains(errStr, "401"), strings.Contains(errStr, "unauthorized"):
        return resource.OperationErrorCodeAccessDenied
    case strings.Contains(errStr, "403"), strings.Contains(errStr, "forbidden"):
        return resource.OperationErrorCodeAccessDenied
    case strings.Contains(errStr, "400"), strings.Contains(errStr, "bad request"):
        return resource.OperationErrorCodeInvalidRequest
    case strings.Contains(errStr, "429"), strings.Contains(errStr, "rate limit"):
        return resource.OperationErrorCodeThrottling
    case strings.Contains(errStr, "500"), strings.Contains(errStr, "internal server error"):
        return resource.OperationErrorCodeServiceInternalError
    case strings.Contains(errStr, "503"), strings.Contains(errStr, "service unavailable"):
        return resource.OperationErrorCodeServiceInternalError
    case strings.Contains(errStr, "quota"):
        return resource.OperationErrorCodeServiceLimitExceeded
    default:
        return resource.OperationErrorCodeInternalFailure
    }
}

For our SFTP plugin, we only need to handle a few cases:

func mapSFTPError(err error) resource.OperationErrorCode {
    if errors.Is(err, asyncsftp.ErrNotFound) {
        return resource.OperationErrorCodeNotFound
    }
    if errors.Is(err, asyncsftp.ErrPermissionDenied) {
        return resource.OperationErrorCodeAccessDenied
    }
    // Network and other errors are internal failures
    return resource.OperationErrorCodeInternalFailure
}

Never Return Go Errors for Expected Conditions

Plugin methods return (Result, error). Reserve the error return for truly exceptional conditions like panics or bugs. Expected conditions should be communicated through the result:

// Correct: NotFound is expected, return it in the result
if errors.Is(err, asyncsftp.ErrNotFound) {
    return &resource.ReadResult{
        ErrorCode: resource.OperationErrorCodeNotFound,
    }, nil  // No Go error
}

// Incorrect: don't return a Go error for expected conditions
if errors.Is(err, asyncsftp.ErrNotFound) {
    return nil, fmt.Errorf("resource not found")  // Don't do this
}

Complete Error Handling Example

Here's how a well-structured Create implementation handles errors:

func (p *Plugin) Create(ctx context.Context, req *resource.CreateRequest) (*resource.CreateResult, error) {
    // Validation error - non-recoverable
    props, err := parseFileProperties(req.Properties)
    if err != nil {
        return &resource.CreateResult{
            ProgressResult: &resource.ProgressResult{
                Operation:       resource.OperationCreate,
                OperationStatus: resource.OperationStatusFailure,
                ErrorCode:       resource.OperationErrorCodeInvalidRequest,
                StatusMessage:   fmt.Sprintf("invalid properties: %v", err),
            },
        }, nil
    }

    // Connection error - recoverable
    client, err := p.getClient(req.TargetConfig)
    if err != nil {
        return &resource.CreateResult{
            ProgressResult: &resource.ProgressResult{
                Operation:       resource.OperationCreate,
                OperationStatus: resource.OperationStatusFailure,
                ErrorCode:       resource.OperationErrorCodeNetworkFailure,
                StatusMessage:   fmt.Sprintf("failed to connect: %v", err),
            },
        }, nil
    }

    // Start operation...
}

Summary

Aspect	Guideline
Error classification	Use recoverable codes for transient failures, non-recoverable for permanent ones
StatusMessage	Always include a descriptive message with the underlying error
NotFound	Return it consistently - the agent handles context-specific semantics
Go errors	Only for truly exceptional conditions, not expected failures
Error mapping	Create a helper function to map infrastructure errors to SDK codes

With proper error handling, your plugin integrates smoothly with the agent's retry logic and provides clear feedback to users when things go wrong.

Next: 11 - Conformance & CI - Set up conformance tests and continuous integration