Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay decoding a message's contents in DTFx.AzureStorage #1110

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidmrdavid
Copy link
Collaborator

@davidmrdavid davidmrdavid commented May 30, 2024

We recently encountered a case where the Azure Functions scale controller was failing to make a scaling decision due to the following error:

System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
   at System.Convert.FromBase64CharPtr(Char* inputPtr, Int32 inputLength)
   at System.Convert.FromBase64String(String s)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueueMessage.get_AsString()
   at DurableTask.AzureStorage.Storage.QueueMessage..ctor(CloudQueueMessage cloudQueueMessage) in /_/src/DurableTask.AzureStorage/Storage/QueueMessage.cs:line 26
   at DurableTask.AzureStorage.Storage.Queue.PeekMessageAsync() in /_/src/DurableTask.AzureStorage/Storage/Queue.cs:line 193
   at DurableTask.AzureStorage.Monitoring.DisconnectedPerformanceMonitor.GetQueueLatencyAsync(Queue queue) in /_/src/DurableTask.AzureStorage/Monitoring/DisconnectedPerformanceMonitor.cs:line 243

As can be seen in the stack trace, this error is thrown when trying to Peek a given queue, which we use to observe the message age amongst other metadata.

The error occurs because, after receiving an azure queue message (of type CloudQueueMessage), we wrap it in our own abstraction called QueueMessage. When constructing this object, we call the AsString method of CloudQueueMessage, which obtains the message contents, and store them in a class property named Message. This method can throw if the message cannot be decoded, yielding the error above.

This is a kind of poison message scenario that can affect the Scale Controller. In the Scale Controller's case, we don't really need to decode the message's contents, as all we need is message and queue metadata such as the average message Age, the number of messages in the queue, and so on.

As a result - this PR moves the call to AsString from the CloudQueueMessage constructor to instead occur in the implementation of it's Message property. This will prevent the ScaleController from error'ing out, as it never needs to access the message's actual contents.

As an aside - I meant to write a test for this, but the CloudQueueMessage type is sealed, so it cannot be mock'ed, and newer versions of the Azure Storage SDK are more robust against undecodable errors. So I decided against it.

// For example, if the message was encoded with a different encoding than the one used to decode it, this will throw.
// There are many cases where we don't actually need the message contents, such as when we're just trying to peek at the message's Age
// in the ScaleController. Therefore, we delay the decoding of the message until it's actually needed.
return this.CloudQueueMessage.AsString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this value be cached, to avoid deserializing the CloudQueueMessage multiple times?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants