Skip to content

Batch Processor Bug - Batch Item Failure Report does not include all message IDs that failed processing #966

Closed
@kimberlyamandalu

Description

@kimberlyamandalu

When returning the batchItemFailures dictionary, I think it is supposed to return a list of dictionaries. In my understanding, this is how the batchItemFailures object must look like:

{
    "batchItemFailures": [
        {
            "itemIdentifier": "msgId1"
        },
        {
            "itemIdentifier": "msgId2"
        }
    ]
}

I checked out the documentation here: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting

While testing I noticed that the batch processor cleanup was always returning just one message ID from the batch, even when there are multiple messages that failed.

Looks like there might be a bug in the implementation where the failed message IDs are collected.
File:
https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/utilities/batch/base.py

Lines:
https://github.com/awslabs/aws-lambda-powertools-python/blob/27e502298795f5720f546e09011ae7290a50a443/aws_lambda_powertools/utilities/batch/base.py#L408-L422

I think the collection methods should be returning a list.

    def _collect_sqs_failures(self):
        if self.model:
            return [{"itemIdentifier": msg.messageId} for msg in self.fail_messages]
        return [{"itemIdentifier": msg.message_id} for msg in self.fail_messages]


    def _collect_kinesis_failures(self):
        if self.model:
            # Pydantic model uses int but Lambda poller expects str
            return [{"itemIdentifier": msg.kinesis.sequenceNumber} for msg in self.fail_messages]
        return [{"itemIdentifier": msg.kinesis.sequence_number} for msg in self.fail_messages]


    def _collect_dynamodb_failures(self):
        if self.model:
            return [{"itemIdentifier": msg.dynamodb.SequenceNumber} for msg in self.fail_messages]
        return [{"itemIdentifier": msg.dynamodb.sequence_number} for msg in self.fail_messages]

As part of the proposed changes above, the following line also needs to be updated to avoid returning a list of list of dictionaries.
https://github.com/awslabs/aws-lambda-powertools-python/blob/27e502298795f5720f546e09011ae7290a50a443/aws_lambda_powertools/utilities/batch/base.py#L388

        self.batch_response = {"batchItemFailures": messages}

I have mainly been working on SQS Batch Processing for my job so I was able to test the proposed changes locally only for the SQS collection method.

Please let me know if I am mistaken or this is a valid bug.
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions