-
Notifications
You must be signed in to change notification settings - Fork 318
Use computed average document size to support usage through Atlas Data Federation #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use computed average document size to support usage through Atlas Data Federation #132
Conversation
Hi @guillotjulien, Thanks for the PR. I have added a ticket SPARK-442 to track calculating the {{avgObjSize}} if not available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks worthwhile. I would recommend getting avgObjSize
if available else calculating it.
453aa29
to
bc92d09
Compare
I extracted the logic to get the average object size in I retested locally, and |
Hi @rozza, any chance you'd have time to look at this again? |
@guillotjulien apologies for the delay. Just an update I aim to do a patch release on the 6th of May and that will incorporate this. At the moment there looks to be a compile error which I'll make sure is fixed before release. |
@rozza yes sorry, I missed an import for After running it locally as per the README, I was able to build the connector and run the tests to completion. I'll push the fixed version (also had a couple of places in my code that Spotless reformatted). |
…a Federation The new implementation stops relying on the storageStats property not being recognized as a valid property when using the $collStats aggregation operation when using a Data Federation endpoint. This end up making it impossible to use the SamplePartitioner, PaginateBySizePartitioner and AutoBucketPartitioner when using a Data Federation endpoint. From what I could see, the storageStats property was only used to access avgObjSize, which can be computed from the size and number of documents of a collection. When connected to a federated Mongo instance, stats are retrieved via the collStats command, whereas the $collStats aggregation operator is used for standard Mongo instances. This difference is due to the collStats command being faster, but deprecated starting from Mongo 6.2. However it doesn't seem to be deprecated for Data Federation as far as I can tell.
bc92d09
to
adf635e
Compare
Hi @guillotjulien I've opened #133 building upon this approach and hopefully simplifying the logic. Ross |
Closed as this work was the basis of #133 Many thanks @guillotjulien for getting this across the line |
The new implementation stops relying on the storageStats property that is not being recognized as a valid property when using the $collStats aggregation operation with a Data Federation endpoint.
This end up making it impossible to use the SamplePartitioner, PaginateBySizePartitioner and AutoBucketPartitioner in that situation.
Example:
From what I could see, the storageStats property was only used to access avgObjSize, which can be computed from the size and number of documents of a collection.
When connected to a federated Mongo instance, stats are retrieved via the collStats command, whereas the $collStats aggregation operator is used for standard Mongo instances. This difference is due to the collStats command being faster, but deprecated starting from Mongo 6.2. However it doesn't seem to be deprecated for Data Federation as far as I can tell (from https://www.mongodb.com/docs/atlas/data-federation/supported-unsupported/diagnostic-commands/#collstats).