Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition issue on reading spilled file #7538

Merged
merged 6 commits into from
Sep 14, 2023

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Sep 13, 2023

Which issue does this PR close?

Closes #7537,
Closes #7523
Closes #7546

Rationale for this change

This issue seems a potential race condition issue.
To improve the stability, this issue need to be fixed.

What changes are included in this PR?

This issue happens when the parent directory of spilled files are deleted before being read.
The root cause is TempDir of the parent directory can be dropped if this async block exits before the reading task starts like as follows.

  1. Exit the async block
  2. sorter and it's members are dropped including DiskManager::local_dirs.
  3. TempFile in local_dirs are dropped then the corresponding temporary directories are deleted recursively.
  4. The reading task starts

To avoid breaking API compatibility, this change proposes to not use TempDir and let the DiskManager directly creates temporary spilled file in local_dirs.

UPDATE: After the discussion, I prefer the second solution below.

I have another solution but it breaks the API compatibility of DiskManager::create_tmp_file, which returns a pair of Arc<TempDir> and NamedTempFile to ensure TempDir lives long enough.

Spilled files are represented as NamedTempFile and temporary files are deleted when NamedTempFile::drop is called.
So, I think not using TempDir is not a problem.

Are these changes tested?

Added new test.

Are there any user-facing changes?

No.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sarutak . Very excellent debugging ❤️

I think this PR will result in DataFusion leaving many temporary directories around, which is likely not ideal for many users.

I am not sure if you have seen the fix for the test from @viirya in #7534

I have another solution but it breaks the API compatibility of DiskManager::create_tmp_file, which returns a pair of Arc and NamedTempFile to ensure TempDir lives long enough.

I think this is the solution that we should pursue as it is the easiest to use long term. Perhaps we can change the API to return something like

/// A named temporary file and the directory it lives in, which may be cleaned up on drop
struct RefCountedTempFile {
  /// directory in which temporary files are created
  temp_dir: Arc<TempDir>,
  ///  the temporary file
  tempfile: NamedTempFile,
}

impl RefCountedTempFile {
  pub fn path(&self) -> &Path {..}
  pub fn inner(&self) -> &NamedTempFile {...}
...
}

impl DiskManager {

pub fn create_tmp_file(
    &self,
    request_description: &str
) -> Result<RefCountedTempFile, DataFusionError>
...
}

Another potential solution I thought of was to pass a reference to the DiskManager to read_spill_as_stream to prevent the file from being deleted. However, that seems like a hard API to use and get right as all callers would have to know temp files were only scoped to the lifetime of the DiskManager

/// If `None` an error will be returned (configured not to spill)
local_dirs: Mutex<Option<Vec<TempDir>>>,
local_dirs: Mutex<Option<Vec<PathBuf>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding that the implication here is that the directories in local_dirs will not be removed (though all the contents will be removed, when the NamedTempFile are dropped)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution uses directories specified, or returned by std::env::temp_dir(), which is usually /tmp on Linux.
So, these directories are not deleted when the NamedTempFile are dropped.

@sarutak
Copy link
Member Author

sarutak commented Sep 13, 2023

Thank you for the comment @alamb .

I am not sure if you have seen the fix for the test from @viirya in #7534

I've seen that PR but I don't think it's just an issue of the test.

I think this is the solution that we should pursue as it is the easiest to use long term. Perhaps we can change the API to return something like

This is what I implied. If we can change the API, I'll propose the second solution.

Another potential solution I thought of was to pass a reference to the DiskManager to read_spill_as_stream to prevent the file from being deleted

Yes, I also thought of this solution. But I thought it's too much.

@alamb
Copy link
Contributor

alamb commented Sep 13, 2023

Suggested next actions:

  1. I think we should merge Fix flaky test_sort_fetch_memory_calculation test #7534 to get the CI stable again
  2. I filed DiskManager's temp files can be deleted too early #7546 to track the underlying problem
  3. We should fix DiskManager's temp files can be deleted too early #7546 (either using this approach in this PR or some other approach)

Thank you @viirya and @sarutak for all the work

@github-actions github-actions bot added the core Core DataFusion crate label Sep 13, 2023
@sarutak
Copy link
Member Author

sarutak commented Sep 13, 2023

I think this PR will result in DataFusion leaving many temporary directories around, which is likely not ideal for many users.

@alamb
BTW, the previous solution doesn't leave many temporary directories. Temporary files are created in the directories user specified, or the directory std::env::temp_dir() returnes, which is usually /tmp on Linux.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks good -- thank you @sarutak .

I've seen that PR but I don't think it's just an issue of the test.

I agree -- I filed #7546 to track the actual bug (as #7537 and #7523 track the test failures)

}
Ok(())
})
/// A named temporary file and the directory it lives in, which may be clean up on drop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A named temporary file and the directory it lives in, which may be clean up on drop
/// A wrapper around a [`NamedTemporaryFile`] that also contains a reference
/// to the temporary directory it is in. The file is cleaned up on drop.

/// A named temporary file and the directory it lives in, which may be clean up on drop
#[derive(Debug)]
pub struct RefCountedTempFile {
/// directory in which temporary files are created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// directory in which temporary files are created
/// directory in which temporary files are created (Arc is held to ensure
/// it is not cleaned up prior to the NamedTempFile)

}
}

/// A wrapper around a [`NamedTemporaryFile`] that also contains a reference
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's NamedTempFile, not NamedTemporaryFile. I'll fix it.

@viirya
Copy link
Member

viirya commented Sep 13, 2023

I am not sure if you have seen the fix for the test from @viirya in #7534

I've seen that PR but I don't think it's just an issue of the test.

Thanks @alamb and @sarutak.

Yea, as I commented in #7534 and @alamb's description in #7546, for DataFusion users this is not likely an issue except for the cases like the test that you construct physical query plan (like we did internally also) and execute it. So my idea was to propose a simplest fix to the test. As this doesn't look like a urgent bug to fix, I was thinking to find some time later to propose another fix to avoid TaskContext to be dropped in sort execution plan, but not long after I saw this PR was proposed. 😄

}
}

/// A wrapper around a [`NamedTempFile`] that also contains a reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A wrapper around a [`NamedTempFile`] that also contains a reference
/// A wrapper around a [`NamedTempFile`] that also contains a reference to its parent temporary directory

Comment on lines 153 to 154
/// directory in which temporary files are created (Arc is held to ensure
/// it is not cleaned up prior to the NamedTempFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// directory in which temporary files are created (Arc is held to ensure
/// it is not cleaned up prior to the NamedTempFile)
/// The reference to the directory in which temporary files are created to ensure
/// it is not cleaned up prior to the NamedTempFile

@sarutak
Copy link
Member Author

sarutak commented Sep 13, 2023

So my idea was to propose a simplest fix to the test. As this doesn't look like a urgent bug to fix, I was thinking to find some time later to propose another fix to avoid TaskContext to be dropped in sort execution plan, but not long after I saw this PR was proposed. 😄

Yeah, thank you for recovering CI quickly!

@alamb alamb merged commit 2845143 into apache:main Sep 14, 2023
21 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 14, 2023

Thanks again @sarutak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
3 participants