-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDDS-1620. Implement Volume Write Requests to use Cache and DoubleBuffer. #884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Rebased with the trunk, and added audit logging to volume requests. (HDDS-1600 got checked in, we can add audit log for HA requests) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
/retest |
...op-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/OMClientRequest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bharatviswa504 for working on this.
I have not finished reviewing yet. Test classes are pending. But posting the comments I have so far.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
Outdated
Show resolved
Hide resolved
...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
Show resolved
Hide resolved
...one-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/VolumeRequestHelper.java
Outdated
Show resolved
Hide resolved
...one-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/VolumeRequestHelper.java
Outdated
Show resolved
Hide resolved
...one-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/VolumeRequestHelper.java
Outdated
Show resolved
Hide resolved
...one-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/VolumeRequestHelper.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/org/apache/hadoop/ozone/om/response/volume/OMVolumeCreateResponse.java
Outdated
Show resolved
Hide resolved
44c032e
to
99ece62
Compare
Thank You @hanishakoneru for the review. |
This comment has been minimized.
This comment has been minimized.
💔 -1 overall
This message was automatically generated. |
...one-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java
Outdated
Show resolved
Hide resolved
...one-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java
Outdated
Show resolved
Hide resolved
|
||
OmVolumeArgs omVolumeArgs = null; | ||
try { | ||
omVolumeArgs = getVolumeInfo(omMetadataManager, volume); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then why do we need to get the first volume lock? It is a read operation only.
But then why do we need to get the first volume lock? It is a read operation only. So that this does not get modified during ownerName read. |
protected void setUpdatedOmRequest(OMRequest updatedOMRequest) { | ||
Preconditions.checkNotNull(updatedOMRequest); | ||
this.omRequest = updatedOMRequest; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not updating omRequest in OmClientRequest anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is the reason i removed this method. this is done based on the above comment execute one. Will revisit during merge HA/Non-HA code.
return getOmRequest().toBuilder().setUserInfo(getUserInfo()) | ||
.setCreateBucketRequest(newCreateBucketRequest.build()).build(); | ||
.setCreateBucketRequest(newCreateBucketRequest.build()).build(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not setting the new omRequest in OMClientRequest anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -451,6 +456,21 @@ private boolean startsWith(byte[] firstArray, byte[] secondArray) { | |||
public boolean isVolumeEmpty(String volume) throws IOException { | |||
String volumePrefix = getVolumeKey(volume + OM_KEY_PREFIX); | |||
|
|||
if (bucketTable instanceof TypedTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this check be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When bucketTable is not an instance of TypedTable. As only TypedTables have Cache.
This is more like a safer check, in a case in future if someone changes the bucketTable to RDBTable, we don't need to do the logic inside this if check. Because cacheIterator throws NotImplementedException for RDBTable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like a safer check, in a case in future if someone changes the bucketTable to RDBTable
But that will change all of our assumptions about how HA requests work correct? Perhaps it is safer if we fail it rather than ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will update it
💔 -1 overall
This message was automatically generated. |
omMetadataManager.getLock().releaseVolumeLock(volume); | ||
} | ||
|
||
// We cannot acquire user lock holding volume lock, so released volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks risky. Can the state of the world change between releasing the lock and reacquiring it?
Also I am thinking why do we need the user lock in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding user-lock is for protecting userTable, where we maintain list of volumes for that particular user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the release/reacquire issue? Same issue exists in OMVolumeSetOwnerRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
SetOwner Volume - V1 , old Owner ozone, newOwner hadoop
CreateVolume Volume -v7 owner-ozone
If we don't acquire the lock for oldOwner before reading from the table. We might see an issue.
This is to protect above kind of scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And holding lock for the user is to protect volumeList which we store for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bharat - let's discuss. What I mean is that release/reacquire pattern is usually a bad idea. It may be okay here, however it would be better to eliminate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay..
...e-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/OMVolumeDeleteRequest.java
Outdated
Show resolved
Hide resolved
@@ -451,6 +456,21 @@ private boolean startsWith(byte[] firstArray, byte[] secondArray) { | |||
public boolean isVolumeEmpty(String volume) throws IOException { | |||
String volumePrefix = getVolumeKey(volume + OM_KEY_PREFIX); | |||
|
|||
if (bucketTable instanceof TypedTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like a safer check, in a case in future if someone changes the bucketTable to RDBTable
But that will change all of our assumptions about how HA requests work correct? Perhaps it is safer if we fail it rather than ignore it.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Thank You @arp7 for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending CI.
/retest |
Thank you @bharatviswa504 for working on this. |
💔 -1 overall
This message was automatically generated. |
Thank You @hanishakoneru and @arp7 for the review. |
…zaContainer. SAMZA-1489 added support for committing the offsets of all the tasks when shutting down a SamzaContainer. Other components in samza such as a CheckpointListener's aren't developed to account for possibility of commit after the consumers are stopped. This unnecessarily results in a unclean shutdown of a samza standalone processor during the rebalancing phase. Here're the sample logs: ``` apache.samza.container.SamzaContainer129d533c to shutdown. 2019/01/15 02:38:09.738 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down SamzaContainer. 2019/01/15 02:38:09.738 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down consumer multiplexer. 2019/01/15 02:38:09.741 INFO [KafkaProducer] [hello-brooklin-task-i001-auditor] [hello-brooklin-task] [] [Producer clientId=hello-brooklin-task-i001-auditor, transactionalId=nullClosing the Kafka producer with timeoutMillis = 9223370489334886066 ms. 2019/01/15 02:38:09.748 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down task instance stream tasks. 2019/01/15 02:38:09.748 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down timer executor 2019/01/15 02:38:09.749 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Committing offsets for all task instances 2019/01/15 02:38:09.753 ERROR [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Caught exception/error while shutting down container. java.lang.IllegalStateException: This consumer has already been closed. at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:1735) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1214) at com.linkedin.kafka.liclients.consumer.LiKafkaConsumerImpl.commitOffsets(LiKafkaConsumerImpl.java:311) at com.linkedin.kafka.liclients.consumer.LiKafkaConsumerImpl.commitSync(LiKafkaConsumerImpl.java:282) at com.linkedin.brooklin.client.BaseConsumerImpl.commit(BaseConsumerImpl.java:136) ``` Since the final commit is not critical, it will be better to not do it as a part of the SamzaContainer shutdown sequence. Author: Shanthoosh Venkataraman <spvenkat@usc.edu> Reviewers: Prateek Maheshwari <pmaheshwari@apache.org> Closes apache#884 from shanthoosh/remove_task_commit_during_shutdown
Implement Volume write requests to use OM Cache, double buffer.
In this Jira will add the changes to implement volume operations, and HA/Non-HA will have a different code path, but once all requests are implemented will have a single code path.