Skip to content

[SPARK-19812] YARN shuffle service fails to relocate recovery DB acro… #17748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.util.List;
import java.util.Map;

Expand Down Expand Up @@ -340,17 +339,17 @@ protected Path getRecoveryPath(String fileName) {
* when it previously was not. If YARN NM recovery is enabled it uses that path, otherwise
* it will uses a YARN local dir.
*/
protected File initRecoveryDb(String dbFileName) {
protected File initRecoveryDb(String dbName) {
if (_recoveryPath != null) {
File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbFileName);
File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName);
if (recoveryFile.exists()) {
return recoveryFile;
}
}
// db doesn't exist in recovery path go check local dirs for it
String[] localDirs = _conf.getTrimmedStrings("yarn.nodemanager.local-dirs");
for (String dir : localDirs) {
File f = new File(new Path(dir).toUri().getPath(), dbFileName);
File f = new File(new Path(dir).toUri().getPath(), dbName);
if (f.exists()) {
if (_recoveryPath == null) {
// If NM recovery is not enabled, we should specify the recovery path using NM local
Expand All @@ -363,25 +362,29 @@ protected File initRecoveryDb(String dbFileName) {
// make sure to move all DBs to the recovery path from the old NM local dirs.
// If another DB was initialized first just make sure all the DBs are in the same
// location.
File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName);
if (!newLoc.equals(f)) {
Path newLoc = new Path(_recoveryPath, dbName);
Path copyFrom = new Path(f.toURI());
if (!newLoc.equals(copyFrom)) {
logger.info("Moving " + copyFrom + " to: " + newLoc);
try {
Files.move(f.toPath(), newLoc.toPath());
// The move here needs to handle moving non-empty directories across NFS mounts
FileSystem fs = FileSystem.getLocal(_conf);
fs.rename(copyFrom, newLoc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much more expensive is this for non nfs cases ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't measure it but can't imagine there is much difference. I didn't see any noticable delays in NM startup during testing. It is a one time operation on startup to move the DB if changing to use the recovery dir. If you have a specific concern here I can go do some measurements?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementations looked different when I looked at LocalFileSystem - and was wondering if there was a perf implementation to the change.
If it is a one time thing used infrequently, then it might not be a concern.

} catch (Exception e) {
// Fail to move recovery file to new path, just continue on with new DB location
logger.error("Failed to move recovery file {} to the path {}",
dbFileName, _recoveryPath.toString(), e);
dbName, _recoveryPath.toString(), e);
}
}
return newLoc;
return new File(newLoc.toUri().getPath());
}
}
}
if (_recoveryPath == null) {
_recoveryPath = new Path(localDirs[0]);
}

return new File(_recoveryPath.toUri().getPath(), dbFileName);
return new File(_recoveryPath.toUri().getPath(), dbName);
}

/**
Expand Down