- Configure a Dead Letter Queue (DLQ) on every async Lambda function: `DeadLetterConfig: { TargetArn: !GetAtt MyDLQ.Arn }` — failed events land in the DLQ for replay or alerting instead of silent loss.
- Set `maximumRetryAttempts` to 0 for Lambda event sources that must not retry on failure (e.g. Kinesis processors) — the default 2 retries can cause duplicate side effects.
- Return proper HTTP status codes from API Gateway Lambda integrations: map `ValidationError` → 400, `NotFoundException` → 404, `Error` → 500 via error regex patterns in the integration response.
- Add CloudWatch alarms on Lambda error rate and throttle metrics — alert before downstream consumers experience cascading failures.
- Attach a Dead Letter Queue to every asynchronous Lambda: set `DeadLetterConfig.TargetArn` to an SQS queue — failed invocations are captured there instead of dropped silently.
- Configure `maximumRetryAttempts` and `maximumRecordAgeInSeconds` on Lambda event source mappings (Kinesis, DynamoDB Streams, SQS) to control retry windows and avoid processing stale records indefinitely.
- Return structured error objects from Lambda handlers: `{ statusCode: 422, body: JSON.stringify({ code: 'VALIDATION_ERROR', message: '...' }) }` — use API Gateway error patterns to map Lambda exception names to HTTP status codes.
- In Step Functions, add `Catch` states to every `Task`: `Catch: [{ ErrorEquals: ['States.ALL'], Next: 'HandleError', ResultPath: '$.error' }]` — never let a workflow fail silently.
- Set `ReservedConcurrentExecutions` on critical Lambdas to prevent throttle cascades — pair with dead-letter queues and CloudWatch alarms on `Throttles` metric.
- Use AWS X-Ray active tracing on Lambda and API Gateway to pinpoint failures across distributed calls: `Tracing: Active`.