-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename files written by PageSink #14974
Conversation
rebase a bit maybe? |
5b2a34e
to
3e915b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level design direction:
- Create a new interface in ConnectorMetadata called
updateMetadata
in SPI - Create a new service
HiveFileRenamer
for Hive connector in presto-hive and inject it into HiveMetadata. This is to move your current logic inContinuousTaskStatusFetcher
into thisHiveFileRenamer
. So that you can call the interfaceupdateMetadata
to tunnel the logic into it. - Change the field name in
TaskStatus
to be calledMetadataUpdates
. - Create a new connector interface
ConnectorDistributedMetadataUpdater
in SPI. This is to replaceFileNameProvider
. Make sure to name the interface in a generic way (to avoid word like 'file', 'name', etc). This class is not injected. But to created from class loader. (CheckConnectorPlanOptimizerProvider
as an example). - Create a new class
HiveDistributedMetadataUpdater
to inheritConnectorPlanOptimizerProvider
. This is to replaceFileNameManager
. Move all your logic there. SqlTask
should callHiveDistributedMetadataUpdater
to update the renamed files.HiveDistributedMetadataUpdater
will be tunneled intoHivePageSink
just likeFileNameProvider
.
0dbf2bf
to
c89651f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on SPI change. Once we fix SPI, other parts should be easy to proceed
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorPageSinkProvider.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/MetadataUpdateRequest.java
Outdated
Show resolved
Hide resolved
...spi/src/main/java/com/facebook/presto/spi/connector/ConnectorDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/MetadataUpdateResult.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
@Inject | ||
public HiveDistributedMetadataUpdater() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We might need to inject something light-weighted to track the lifecycle of a query.
SqlTaskManager
is the closest I can think of. But that is too heavy and not accessible through connectors. We may in the end hook callbacks from HivePageSink to remove dead queries. - clean up hashmap by hooking it to HiveFileWriter::commit/rollback
- add an assertion to any integration or smoke tests to assert we don't leak requests. Check 9f89256 as an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- take a look
ConnectorPlanOptimizer
andConnectorPlanOptimizerProvider
ConnectorDistributedMetadataUpdater
<->ConnectorPlanOptimizer
. It will be per query/task- Create a new
ConnectorDistributedMetadataUpdaterProvider
that is a singleton. Inject it intoLocalExecutionPlanner
and somehow return it backSqlTask
?. - In
LocalExecutionPlanner
, when you create a newTableWriterOperatorFactory
, create a newConnectorDistributedMetadataUpdater
instance and pass it in.
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadataUpdate.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveDistributedMetadataUpdater.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we pass the updater into table writer instead of injecting it, we will need to modify the SPI createPageSink
. To do it, we could rename PageSinkProperties
into PageSinkContext
as a separate commit. Then put the updater inside PageSinkContext
so we don't have to change the interface again.
00f121b
to
5100427
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Rename PageSinkProperties to PageSinkContext" LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Add ConnectorMetadataUpdater to handle worker's metadata updates"
- there is a typo":wq" in the commit message
- haven't reviewed in details yet. One quick comment: we probably need to use a session property to guard if we actually wanna do send/receive any requests for
HiveMetadataUpdater
.
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadataUpdate.java
Outdated
Show resolved
Hide resolved
...to-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadataUpdaterProvider.java
Outdated
Show resolved
Hide resolved
...to-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadataUpdaterProvider.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadataUpdater.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/TaskMetadataContext.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/TaskMetadataContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Add ConnectorMetadataUpdateHandle resolver" LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level comments of this PR: we need tests; a lot of unit tests and integration tests. Probably enable the feature for all write tests for hive connector. We also need to make sure the file names are in sequential order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Rename files written by HivePageSink": make rename async
presto-hive-common/src/main/java/com/facebook/presto/hive/filesystem/ExtendedFileSystem.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HivePageSink.java
Outdated
Show resolved
Hide resolved
c77545f
to
33d310c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"POST the metadata results to worker": we need good ways to purge queries on coordinators
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveFileRenamer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Add config parameters to enable hive file renaming": LGTM
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Disable listFiles call to underlying storage": let's remove this commit in this PR. That is more like the standalone piece to review in the next PR
First pass done. Feel free to re-request my reviews once the comments are addressed. |
aec75b5
to
eeb61e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall look good to me. Minor comments only
presto-hive/src/main/java/com/facebook/presto/hive/HivePageSink.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveFileRenamer.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveFileRenamer.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveFileRenamer.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveFileRenamer.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/metadata/MetadataManager.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/server/remotetask/TaskInfoFetcher.java
Outdated
Show resolved
Hide resolved
0ee1a54
to
6c40f9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work! Nits only
presto-main/src/main/java/com/facebook/presto/execution/SqlTask.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/TaskManager.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
Collect all the metadata update requests and send them to coordinator as part of the TaskInfo.
This PR enables the workers to send / receive metadata requests /responses from coordinator.
If this feature is enabled for Hive, then all the files written by the HivePageSink for a given partition will have increasing whole numbers as file names. (For ex: Lets say this we enabled this feature and ds=2020-09-30 is written by HivePageSink. Then we can expect the file names of that partition to be 0,1,2,3..N etc ).
depended by https://github.com/facebookexternal/presto-facebook/pull/1125