Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor commit lock mechanism from HiveTableOperations #5877

Closed
wants to merge 1 commit into from

Conversation

szlta
Copy link
Contributor

@szlta szlta commented Sep 28, 2022

HiveTableOperations has grown to a substantial size and got pretty complex.
A large portion of the logic is the implementation of the locking mechanism which I think could be separated out to reduce this complexity.

Let me know what you think @pvary @szehon-ho

Change-Id: I32ef3305c9df091ad1ccc994a542ab9b54d89cfa

try {
if (state.get().equals(LockState.WAITING)) {
// Retry count is the typical "upper bound of retries" for Tasks.run() function. In fact,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you please reformat the comment?

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HiveCommitLock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to implement the standard java.util.concurrent.locks.Lock interface?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do, we may also make it clear in the doc that unlike standard Lock, this is not multi thread safe? (ie, it is not safe to call acquire lock from multiple threads) If my observation is right, I guess its worth documenting in any case

@pvary
Copy link
Contributor

pvary commented Sep 29, 2022

@szlta: any particular change to highlight for the review? Also, please check the failing tests

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its good to refactor it, but definitely need to spend a bit of time to review this critical code. Gave a pass

"Failed to heartbeat for hive lock. %s",
hiveLockHeartbeat.encounteredException.getMessage());
if (!commitLock.isHeartbeatInProgress()) {
throw new CommitFailedException("Failed to heartbeat for hive lock.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel here we may be losing some information in the stack? in the case that encounteredException is not null?

&& hiveLockHeartbeat.encounteredException == null;
}

private void acquireLockFromHms() throws UnknownHostException, TException, InterruptedException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should method be named acquireHmsLock to be consistent with release?

LOG.warn("Failed to unlock {}.{}", databaseName, tableName, e);
}
}
if (hiveLockHeartbeat != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In original code, cancelling seemed to be run before unlock, should we keep it that way?

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HiveCommitLock {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do, we may also make it clear in the doc that unlike standard Lock, this is not multi thread safe? (ie, it is not safe to call acquire lock from multiple threads) If my observation is right, I guess its worth documenting in any case

private final ClientPool<IMetaStoreClient, TException> metaClients;
private final ScheduledExecutorService exitingScheduledExecutorService;

private Optional<Long> hmsLockId = Optional.empty();
Copy link
Collaborator

@szehon-ho szehon-ho Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, is there justification to make these Optional versus just null checks?

There are some inteillij warnings because we are using this. Looking into it, there are some who say that Java Optional is not meant for a general purpose Maybe type: https://stackoverflow.com/questions/26327957/should-java-8-getters-return-optional-type/26328555#26328555 hence the intellij warnings. It seems its meant more for return types of methods to indicate that its possible something is null. Of course its more of a debate, but just noticing we dont use a whole lot of Optional elsewhere and may be worth to keep it simple.

I do realize that originally lockId is optional, but mentioning because we are adding another optional (jvmLock). Wdyt?

}

private void acquireJvmLock() {
if (jvmLock.isPresent()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im just wondering, as we check both (jvmLock and lockId) for null, it seems it will protect only the case where single caller calls acquire twice? It wont protect from multi thread calls as these are only optional and not atomic, and method acquire is not synchronized. Would a simple boolean suffice then to avoid two null checks?

"HMS lock ID=%s already acquired for table %s.%s",
hmsLockId.get(), databaseName, tableName));
}
final LockComponent lockComponent =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do realize its copy and paste, but I know from @aokolnychyi 's comments , we dont tend to use final in inline variable as there's little signfiicance. Could we remove them here?

"Waiting for lock on table %s.%s", databaseName, tableName));
}
} catch (InterruptedException e) {
Thread.interrupted(); // Clear the interrupt status flag
Copy link
Collaborator

@szehon-ho szehon-ho Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I see its copy and paste, but its weird, it seems this is checking a boolean and ignoring the value. Could you check this?

}

private void releaseHmsLock() {
if (hmsLockId.isPresent()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a simple boolean acquired (as mentioned in my other comment), we can just replace these theese two null checks with a single check on release(). As these are both private methods, checking both seems excessive?

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 20, 2024
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants