Skip to content

Commit

Permalink
[GLUTEN-2638][VL]fix s3 endpoint configuration (apache#2643)
Browse files Browse the repository at this point in the history
set s3 endpoint only when not use instance credentials.

(Fixes: issue-2638)
  • Loading branch information
yma11 authored Aug 8, 2023
1 parent 429fca0 commit 28ed9ad
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 3 deletions.
8 changes: 6 additions & 2 deletions cpp/velox/compute/VeloxInitializer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -171,9 +171,13 @@ void VeloxInitializer::init(const std::unordered_map<std::string, std::string>&
{"hive.s3.aws-secret-key", awsSecretKey},
});
}

// Only need to set s3 endpoint when not use instance credentials.
if (useInstanceCredentials != "true") {
s3Config.insert({
{"hive.s3.endpoint", awsEndpoint},
});
}
s3Config.insert({
{"hive.s3.endpoint", awsEndpoint},
{"hive.s3.ssl.enabled", sslEnabled},
{"hive.s3.path-style-access", pathStyleAccess},
});
Expand Down
3 changes: 2 additions & 1 deletion docs/get-started/VeloxS3.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ S3 also provides other methods for accessing, you can also use instance credenti
```
spark.hadoop.fs.s3a.use.instance.credentials true
```
Note that in this case, "spark.hadoop.fs.s3a.endpoint" won't take affect as Gluten will use the endpoint set during instance creation.

## Configuring S3 IAM roles
You can also use iam role credentials by setting the following configurations. Instance credentials have higher priority than iam credentials.
Expand Down Expand Up @@ -60,4 +61,4 @@ spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads for
spark.gluten.sql.columnar.backend.velox.ssdODirect // enbale or disable O_DIRECT on cache write, default false.
```

It's recommended to mount SSDs to the cache path to get the best performance of local caching. On the start up of Spark context, the cache files will be allocated under "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten is not able to reuse older caches for now, and the old cache files are left there after Spark context shutdown.
It's recommended to mount SSDs to the cache path to get the best performance of local caching. On the start up of Spark context, the cache files will be allocated under "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten is not able to reuse older caches for now, and the old cache files are left there after Spark context shutdown.

0 comments on commit 28ed9ad

Please sign in to comment.