Skip to content

Conversation

rozza
Copy link
Member

@rozza rozza commented Jan 15, 2025

Spark expects hadoop configurations to be prefixed with spark.hadoop. However, documentation on the web omits this prefix when setting filesystem configuration - See the azure storage docs.

The issue the connector has is the MongoOffset support just uses the SparkContext.hadoopConfiguration() helper method, which omits any non-prefixed configuration. So this improvement adds any filesystem configuration prefixed with fs. to the hadoop configuration. This ensures that the MongoOffsets use of the Hadoop filesystem includes the configuraion.

SPARK-438

Spark expects hadoop configurations to be prefixed with `spark.hadoop`.
However, documentation on the web omits this prefix when setting filesystem
configuration - See the azure storage docs.

The issue the connector has is the `MongoOffset` support just uses the `SparkContext.hadoopConfiguration()`
helper method, which omits any non-prefixed configuration. So this improvement adds any filesystem
configuration prefixed with `fs.` to the hadoop configuration. This ensures that the `MongoOffset`s use
of the Hadoop filesystem includes the configuraion.

SPARK-438
@rozza rozza requested a review from katcharov January 16, 2025 09:38
Copy link
Contributor

@katcharov katcharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rozza rozza merged commit 560d495 into mongodb:main Jan 22, 2025
21 of 24 checks passed
@rozza rozza deleted the SPARK-438 branch January 22, 2025 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants