-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Translog] Add support for downloading files from remote translog #5649
[Remote Translog] Add support for downloading files from remote translog #5649
Conversation
This Draft PR is built on #5638. Once the parent PR is merged, I will rebase the branch and will change the PR state to |
4e136fe
to
422f993
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
422f993
to
80da036
Compare
Gradle Check (Jenkins) Run Completed with:
|
@@ -3082,7 +3082,14 @@ public void startRecovery( | |||
executeRecovery("from store", recoveryState, recoveryListener, this::recoverFromStore); | |||
break; | |||
case REMOTE_STORE: | |||
executeRecovery("from remote store", recoveryState, recoveryListener, this::restoreFromRemoteStore); | |||
final Repository remoteTranslogRepo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could set remoteTranslogRepo
as null here and then get rid of else block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to push down some of this logic to restoreFromRemoteStore
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remoteTranslogRepo
needs to be final as it is provided as argument to a lambda, so we can't initialize it twice.
restoreFromRemoteStore
does not have reference to repositoriesService
to fetch the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use Optional
instead?
if (repository != null) { | ||
FileTransferTracker fileTransferTracker = new FileTransferTracker(shardId); | ||
assert repository instanceof BlobStoreRepository : "repository should be instance of BlobStoreRepository"; | ||
BlobStoreRepository blobStoreRepository = (BlobStoreRepository) repository; | ||
TranslogTransferManager translogTransferManager = new TranslogTransferManager( | ||
new BlobStoreTransferService( | ||
blobStoreRepository.blobStore(), | ||
indexShard.getThreadPool().executor(ThreadPool.Names.TRANSLOG_TRANSFER) | ||
), | ||
blobStoreRepository.basePath().add(shardId.getIndex().getUUID()).add(String.valueOf(shardId.id())), | ||
fileTransferTracker, | ||
fileTransferTracker::exclusionFilter | ||
); | ||
RemoteFsTranslog.download(translogTransferManager, indexShard.shardPath().resolveTranslog()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move this to a method syncTranslogFromRemoteTranslogStore
for ease of read?
Files.delete(file); | ||
} | ||
Map<String, String> generationToPrimaryTermMapper = translogMetadata.getGenerationToPrimaryTermMapper(); | ||
for (long i = translogMetadata.getGeneration(); i >= translogMetadata.getMinTranslogGeneration(); i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if segments upload is lagging, then just downloading the most recent translog might not be enough. Can we create a tracking issue? If it exists already, pls do share.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already tracking it here: #3754
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More details are present in #5567 .
public boolean downloadTranslog(String primaryTerm, String generation, Path location, boolean latest) throws IOException { | ||
logger.info("Downloading translog files with: Primry Term = {}, Generation = {}, Location = {}", primaryTerm, generation, location); | ||
String checkpointFilename = "translog-" + generation + ".ckp"; | ||
if (latest) { | ||
checkpointFilename = "translog.ckp"; | ||
} | ||
if (Files.exists(location.resolve(checkpointFilename)) == false) { | ||
try ( | ||
InputStream checkpointFileInputStream = transferService.downloadBlob( | ||
remoteBaseTransferPath.add(primaryTerm), | ||
"translog-" + generation + ".ckp" | ||
) | ||
) { | ||
Files.copy(checkpointFileInputStream, location.resolve(checkpointFilename)); | ||
} | ||
} | ||
String translogFilename = "translog-" + generation + ".tlog"; | ||
if (Files.exists(location.resolve(translogFilename)) == false) { | ||
try ( | ||
InputStream translogFileInputStream = transferService.downloadBlob( | ||
remoteBaseTransferPath.add(primaryTerm), | ||
"translog-" + generation + ".tlog" | ||
) | ||
) { | ||
Files.copy(translogFileInputStream, location.resolve(translogFilename)); | ||
} | ||
} | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something like this -
public boolean downloadTranslog(String primaryTerm, String generation, Path location, boolean latest) throws IOException {
logger.info("Downloading translog files with: Primry Term = {}, Generation = {}, Location = {}", primaryTerm, generation, location);
String ckpFileName = "translog-" + generation + ".ckp";
if (latest) {
String ckpWithoutGenerationFileName = "translog.ckp";
downloadToFS(ckpFileName, ckpWithoutGenerationFileName, location, primaryTerm);
}
// Download Checkpoint file from remote and store on FS
downloadToFS(ckpFileName, location, primaryTerm);
// Download translog file from remote and store on FS
String translogFilename = "translog-" + generation + ".tlog";
downloadToFS(translogFilename, location, primaryTerm);
return true;
}
private void downloadToFS(String fileName, Path location, String primaryTerm) throws IOException {
downloadToFS(fileName, fileName, location, primaryTerm);
}
private void downloadToFS(String remoteFileName, String localFileName, Path location, String primaryTerm) throws IOException {
if (Files.exists(location.resolve(localFileName)) == false) {
try (InputStream inputStream = transferService.downloadBlob(remoteBaseTransferPath.add(primaryTerm), remoteFileName)) {
Files.copy(inputStream, location.resolve(localFileName));
}
}
}
private static class MetadataFilenameComparator implements Comparator<String> { | ||
@Override | ||
public int compare(String metadaFilename1, String metadaFilename2) { | ||
// Format of metadata filename is <Primary Term>__<Generation>__<Timestamp> | ||
String[] filenameTokens1 = metadaFilename1.split(METADATA_SEPARATOR); | ||
String[] filenameTokens2 = metadaFilename2.split(METADATA_SEPARATOR); | ||
for (int i = 0; i < filenameTokens1.length; i++) { | ||
if (filenameTokens1[i].equals(filenameTokens2[i]) == false) { | ||
return (int) (Long.parseLong(filenameTokens1[i]) - Long.parseLong(filenameTokens2[i])); | ||
} | ||
} | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a WARN log here - we should not come across such situation where 2 metadata file comparison yield 0. Also, could we add log when we have to resolve to timestamp for comparing? I dont think using timestamp is fair as clocks are not synchronised across the nodes in the distributed system setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree with @ashking94's concerns here. However, putting WARN statements in a comparator could lead to craziness in the log. We don't really have control over how many times and against which pairs of files the sort algorithm will invoke the comparison. Is there somewhere else we can add these statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also not inclined to add the log in the compare
method as it may not provide good debug insights. But as I think more, both the cases: comparing timestamp as well as returning 0 should happen in exceptional cases (we think it should not happen at all). Also, it is not feasible to add this logic in upload flow as it means reading the last uploaded file each time. IMO, we should add these logs in this method. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 metadata file comparison yield 0.
: This will never happen , as there can't be two files with same name in a remote directory .
Timestamp comparison is a tricky thing here - We can check this in upload flow as it has FileTransferTracker
. However , we don't add metadata files in that . Alternate is we just throw RuntimeException
from readMetadata()
? This is better than logs (we can miss it ) and silent failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation is that two files with the same primary term and generation should never exist, right? If so, then can we just not include the timestamp in the filename? Then it would fail at upload time, when the file was generated that violated the expected invariant, which seems better than failing hard here (because I think the system would end up stuck because it would keep hitting this same error until something changed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sachinpkale in case of a primary-primary relocation, the _ can be same or will be different for the xlog upload? If yes, then failing upload might as well be not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation is that two files with the same primary term and generation should never exist, right?
This should be extremely rare but should still technically possible for isolated writers during primary-primary relocation like what @sachinpkale mentioned. The timestamp serves as a discriminator in all those cases, based on a last writer wins policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine on high level, pls make the necessary changes.
Files.delete(file); | ||
} | ||
Map<String, String> generationToPrimaryTermMapper = translogMetadata.getGenerationToPrimaryTermMapper(); | ||
for (long i = translogMetadata.getGeneration(); i >= translogMetadata.getMinTranslogGeneration(); i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More details are present in #5567 .
Map<String, String> generationToPrimaryTermMapper = translogMetadata.getGenerationToPrimaryTermMapper(); | ||
for (long i = translogMetadata.getGeneration(); i >= translogMetadata.getMinTranslogGeneration(); i--) { | ||
String generation = Long.toString(i); | ||
translogTransferManager.downloadTranslog( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can download tlog files concurrently like uploads . Since this is not in critical write path and exercised only in failver, its okay to take that as a follow up. We can create a ToDo for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created tracking issue: #5660
@@ -128,6 +132,51 @@ public boolean transferSnapshot(TransferSnapshot transferSnapshot, TranslogTrans | |||
} | |||
} | |||
|
|||
public boolean downloadTranslog(String primaryTerm, String generation, Path location, boolean latest) throws IOException { | |||
logger.info("Downloading translog files with: Primry Term = {}, Generation = {}, Location = {}", primaryTerm, generation, location); | |||
String checkpointFilename = "translog-" + generation + ".ckp"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace translog-
with TRANSLOG_FILE_PREFIX
, .ckp
with CHECKPOINT_SUFFIX, .tlog
with TRANSLOG_FILE_SUFFIX
.
We can also use Translog#getFilename
, Translog#getCommitCheckpointFileName
to generate tlog filename .
if (latest) { | ||
checkpointFilename = "translog.ckp"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not put the latest
logic here, but handle it inRemoteFsTranslog
.
FileTransferTracker fileTransferTracker = new FileTransferTracker(shardId); | ||
assert repository instanceof BlobStoreRepository : "repository should be instance of BlobStoreRepository"; | ||
BlobStoreRepository blobStoreRepository = (BlobStoreRepository) repository; | ||
TranslogTransferManager translogTransferManager = new TranslogTransferManager( | ||
new BlobStoreTransferService( | ||
blobStoreRepository.blobStore(), | ||
indexShard.getThreadPool().executor(ThreadPool.Names.TRANSLOG_TRANSFER) | ||
), | ||
blobStoreRepository.basePath().add(shardId.getIndex().getUUID()).add(String.valueOf(shardId.id())), | ||
fileTransferTracker, | ||
fileTransferTracker::exclusionFilter | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this to RemoteFSTranslog#download
? Creating translogTransferManager
, fileTransferTracker
etc is responsibility of RemoteFSTranslog
only .
We can then try to move the common parts of the ctor and download
into separate functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is RemoteFsTranslog#download
is static method. Even if we create the instances of tracker in download method, we will not be able to re-use it in RemoteFsTranslog
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on that , but at least the logic of creation of fileTransferTracker
, translogTransferManager
will reside in RemoteFSTranslog
only . We will not be reuse the instances, but reuse the code .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, do we assume that FileTransferTracker
and TranslogTransferManager
should not be initialized outside RemoteFsTranslog? Currently, they are public classes so can be instantiated outside.
As I understand, pagination is handled in the repository specific implementation (I have verified this in repository-S3). |
With #5662 and its associated ToDo in #5677 , we will be cleaning up the metadata files with every flush . So it should get taken care of automatically. |
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
eaec42a
to
d2b2611
Compare
Gradle Check (Jenkins) Run Completed with:
|
This approach might not be deterministic and download latencies unpredictable depending on what rate we ingest and how many pages we end up in. We need a mechanism to always have the most latest entries in the first page to guarantee high predictability. |
Created a tracking issue: #5696 |
…log (opensearch-project#5649) * Add support to download translog from remote store during recovery Signed-off-by: Sachin Kale <kalsac@amazon.com>
…log (opensearch-project#5649) * Add support to download translog from remote store during recovery Signed-off-by: Sachin Kale <kalsac@amazon.com>
…hanges (#5757) * Introduce TranslogFactory for Local/Remote Translog support (#4172) * Introduce TranslogFactory for Local/Remote Translog support Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> * [Remote Translog] Introduce remote translog with upload functionality (#5392) * Introduce remote translog with upload functionality Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawa@amazon.com> * Enable creation of indices using Remote Translog (#5638) * Enable creation of indices using Remote Translog behind a setting and feature flag Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * [Remote Translog] Add support for downloading files from remote translog (#5649) * Add support to download translog from remote store during recovery Signed-off-by: Sachin Kale <kalsac@amazon.com> * Integrate remote translog download on failover (#5699) * Integrate remote translog download on failover Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Signed-off-by: Sachin Kale <kalsac@amazon.com> Signed-off-by: Ashish Singh <ssashish@amazon.com>
…hanges (opensearch-project#5757) * Introduce TranslogFactory for Local/Remote Translog support (opensearch-project#4172) * Introduce TranslogFactory for Local/Remote Translog support Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> * [Remote Translog] Introduce remote translog with upload functionality (opensearch-project#5392) * Introduce remote translog with upload functionality Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawa@amazon.com> * Enable creation of indices using Remote Translog (opensearch-project#5638) * Enable creation of indices using Remote Translog behind a setting and feature flag Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * [Remote Translog] Add support for downloading files from remote translog (opensearch-project#5649) * Add support to download translog from remote store during recovery Signed-off-by: Sachin Kale <kalsac@amazon.com> * Integrate remote translog download on failover (opensearch-project#5699) * Integrate remote translog download on failover Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Signed-off-by: Sachin Kale <kalsac@amazon.com> Signed-off-by: Ashish Singh <ssashish@amazon.com>
* Introduce TranslogFactory for Local/Remote Translog support (#4172) * Introduce TranslogFactory for Local/Remote Translog support Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> * [Remote Translog] Introduce remote translog with upload functionality (#5392) * Introduce remote translog with upload functionality Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawa@amazon.com> * Enable creation of indices using Remote Translog (#5638) * Enable creation of indices using Remote Translog behind a setting and feature flag Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * [Remote Translog] Add support for downloading files from remote translog (#5649) * Add support to download translog from remote store during recovery Signed-off-by: Sachin Kale <kalsac@amazon.com> * Integrate remote translog download on failover (#5699) * Integrate remote translog download on failover Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Signed-off-by: Sachin Kale <kalsac@amazon.com> Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Signed-off-by: Sachin Kale <kalsac@amazon.com> Signed-off-by: Ashish Singh <ssashish@amazon.com> Co-authored-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com>
…hanges (#5757) * Introduce TranslogFactory for Local/Remote Translog support (#4172) * Introduce TranslogFactory for Local/Remote Translog support Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> * [Remote Translog] Introduce remote translog with upload functionality (#5392) * Introduce remote translog with upload functionality Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawa@amazon.com> * Enable creation of indices using Remote Translog (#5638) * Enable creation of indices using Remote Translog behind a setting and feature flag Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * [Remote Translog] Add support for downloading files from remote translog (#5649) * Add support to download translog from remote store during recovery Signed-off-by: Sachin Kale <kalsac@amazon.com> * Integrate remote translog download on failover (#5699) * Integrate remote translog download on failover Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> Signed-off-by: Sachin Kale <kalsac@amazon.com> Signed-off-by: Ashish Singh <ssashish@amazon.com>
Description
.tlg
and.ckp
files from remote translog.Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.