Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Translog] Introduce remote translog transfer support #4480

Merged
merged 15 commits into from
Nov 22, 2022

Conversation

Bukhtawar
Copy link
Collaborator

@Bukhtawar Bukhtawar commented Sep 10, 2022

Signed-off-by: Bukhtawar Khan bukhtawa@amazon.com

Description

The changes introduces support for blob store based remote translog store. The changes contain the remote transfer service interactions and snapshots data needed to be uploaded per commit.

Remote Translog Files and Location

  1. .tlg file : base64_uuid/index_uuid/shard_id/primary_term/translog
  2. .ckp file: base64_uuid/index_uuid/shard_id/primary_term/translog
  3. metadata file : base64_uuid/index_uuid/shard_id/primary_term/metadata

Contents of Remote Translog Files

  1. .tlg_N file: Same as .tlg file created on local disk per generation
  2. .ckp_N file: Same as .ckp file created on local disk per generation
  3. metadata file: metadata__primary_term__generation__timestamp
    1. primary term
    2. generation
    3. timestamp
    4. mapping between generation → primary_term(representative of the path in remote store, since we can derive the complete path from this)

Tests : Work In progress

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar marked this pull request as ready for review September 20, 2022 08:35
@Bukhtawar Bukhtawar requested review from a team and reta as code owners September 20, 2022 08:35
@Bukhtawar Bukhtawar changed the title WIP: Introduce remote translog transfer support [Remote Translog] Introduce remote translog transfer support Sep 20, 2022
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sachinpkale
Copy link
Member

Pushed a commit which fixes #4820 as well.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines +121 to +125
try {
return readByte() & 0xFF;
} catch (EOFException e) {
return -1;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this change ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to an interface contract that OpenSearch was breaking. This is unrelated to the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a base contract test for a stream input. Can you create an issue to create a contract test for this class since you've got the full context for this specific miss?

@Bukhtawar Bukhtawar requested a review from gbbafna November 16, 2022 08:56
@Bukhtawar Bukhtawar requested a review from gbbafna November 16, 2022 17:58
Comment on lines +121 to +125
try {
return readByte() & 0xFF;
} catch (EOFException e) {
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a base contract test for a stream input. Can you create an issue to create a contract test for this class since you've got the full context for this specific miss?

Iterable<String> remoteTransferPath,
ActionListener<TransferFileSnapshot> listener
) {
assert remoteTransferPath instanceof BlobPath;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you currently only have one instance of a TransferService, I would recommend not introducing an interface and do things like this. It's easy enough to refactor out an interface if/when you need it, but until you really need it you can only make educated guesses about what the generic interface should look like. Not a huge deal, I'll defer to your preference.

CHANGELOG.md Outdated Show resolved Hide resolved
}
translogTransferSnapshot.setMinTranslogGeneration(highestGenMinTranslogGeneration);

assert this.primaryTerm == highestGenPrimaryTerm : "inconsistent primary term";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again with me complaining about asserts :) but I would enforce these constraints unconditionally. Performance doesn't seem like a concern since the work is done unconditionally. Same applies for line 50.

I won't block on this issue if you strongly prefer asserts, but regardless we should have unit tests for these constraints.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stick with this for now and make changes as needed once we added tests for this.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.shard.SegmentReplicationIndexShardTests.testNRTReplicaPromotedAsPrimary

@@ -50,6 +50,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Relax visibility of the HTTP_CHANNEL_KEY and HTTP_SERVER_CHANNEL_KEY to make it possible for the plugins to access associated Netty4HttpChannel / Netty4HttpServerChannel instance ([#4638](https://github.com/opensearch-project/OpenSearch/pull/4638))
- Use ReplicationFailedException instead of OpensearchException in ReplicationTarget ([#4725](https://github.com/opensearch-project/OpenSearch/pull/4725))
- Migrate client transports to Apache HttpClient / Core 5.x ([#4459](https://github.com/opensearch-project/OpenSearch/pull/4459))
- Support remote translog transfer for request level durability([#4480](https://github.com/opensearch-project/OpenSearch/pull/4480))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in the Unreleased 2.x section provided this will be backported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@Bukhtawar Bukhtawar merged commit a8b5d91 into opensearch-project:main Nov 22, 2022
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Jan 4, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-4480-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a8b5d91bc8b6b2a79fa042d22775959043dac145
# Push it to GitHub
git push --set-upstream origin backport/backport-4480-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-4480-to-2.x.

gbbafna pushed a commit to gbbafna/OpenSearch that referenced this pull request Jan 4, 2023
…rch-project#4480)

* Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
sachinpkale pushed a commit to sachinpkale/OpenSearch that referenced this pull request Jan 9, 2023
…rch-project#4480)

* Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Jan 9, 2023
…5693)

* [Backport 2.x] Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
gbbafna pushed a commit that referenced this pull request Jan 9, 2023
* Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
gbbafna pushed a commit to gbbafna/OpenSearch that referenced this pull request Jan 10, 2023
…rch-project#4480)

* Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
gbbafna pushed a commit that referenced this pull request Jan 10, 2023
…5773)

* Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
kotwanikunal pushed a commit that referenced this pull request Jan 25, 2023
…5693)

* [Backport 2.x] Introduce remote translog transfer support

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants