CosmosDB: Bulk support (or guidance on throughput improvement via batching)

### Feature Request

Bulk support doesn't exist today in the Go SDK. My understanding is that it would be complex to implement and is unlikely to be done anytime soon if at all. Still, I'm curious if the Transactional Batch support added in https://github.com/Azure/azure-sdk-for-go/pull/17795 allows for an application developer to get at least _some_ of the throughput benefits of bulk or if there are fundamental differences between them.

Based on [this](https://devblogs.microsoft.com/cosmosdb/bulk-improvements-net-sdk), [this](https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk), and the .NET implementation, my understanding is that Bulk is essentially TransactionalBatch with the enhancements that it:
1. Operates on physical partitions rather than on logical keys by automatically mapping keys to their ranges/partitions (including detecting and handling partition splits)
2. Automatically dispatches filled batches when they hit size/count limits
3. Automatically dispatches partially filled batches on a timer 
4. Automatically handles congestion / retry (especially around same-partition writes)
5. Sets `x-ms-cosmos-batch-atomic` to false, `x-ms-cosmos-batch-continue-on-error` to true
6. Others I'm sure I've missed 😄 

Some of these are relatively easy for an application to implement, but (1) isn't possible because the Go SDK lacks any concept of physical partitions / ranges today, and `TransactionalBatch` (like all operations) accepts only a single partitionKey.

So I guess my question is - is `NewTransactionalBatch(partitionKey)` at all useful for bulk operations even without knowledge of physical partitions, assuming you can fill batches per partitionKey? With sequential requests it still seems useful to batch them, but I imagine as soon as you try to parallelize the batches you're going to hit severe throttling even if you tried to implement some sort of congestion control at the partitionKey level (edit: although perhaps `x-ms-documentdb-partitionkeyrangeid` response header could help).

More generally, what are the recommendations for optimizing throughput given the limitations of the Go SDK?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CosmosDB: Bulk support (or guidance on throughput improvement via batching) #22504

zarenner
openedon Mar 1, 2024

Feature Request

Assignees

Labels

Type

Projects

Milestone

Relationships

Development