Skip to content
This repository was archived by the owner on Jul 16, 2024. It is now read-only.

Fix error when /tmp is a symlink #221

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion greenplumpython/experimental/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def _install_on_server(pkg_dir: str, requirements: str) -> str:
def _install_packages(db: gp.Database, requirements: str):
tmp_archive_name = f"tar_{uuid.uuid4().hex}"
# FIXME: Windows client is not supported yet.
local_dir = pathlib.Path("/") / "tmp" / tmp_archive_name / "pip"
local_dir = (pathlib.Path("/") / "tmp").resolve() / tmp_archive_name / "pip"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard code /tmp path, libs like tempfile is preferred to be used. https://docs.python.org/3/library/tempfile.html

/tmp is a convention, and users can actually set the temp directory to something else.

Copy link
Contributor Author

@xuebinsu xuebinsu Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using tempfile per your suggestion.

While I am OK with the switch, I don't quite understand why tempfile is preferred:

  1. If, as you said, users can actually set the temp directory to something else, then /tmp is not a hard-coded path, but can be set to point to anywhere by user.
  2. Based on my testing , tempfile also creates files and directories under /tmp, the same as the previous scheme.
  3. tempfile will delete the files upon exiting the context. Would this make debugging harder?

Could you please share your thoughts on these issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, as you said, users can actually set the temp directory to something else, then /tmp is not a hard-coded path, but can be set to point to anywhere by user.

Yes, see https://www.baeldung.com/linux/change-tmp-directory-path

Based on my testing , tempfile also creates files and directories under /tmp, the same as the previous scheme.

Not quite sure about the question.

tempfile will delete the files upon exiting the context. Would this make debugging harder?

There is argument for this

class tempfile.TemporaryDirectory(suffix=None, prefix=None, dir=None, ignore_cleanup_errors=False, *, delete=True)
The delete parameter can be used to disable cleanup of the directory tree upon exiting the context. While it may seem unusual for a context manager to disable the action taken when exiting the context, it can be useful during debugging or when you need your cleanup behavior to be conditional based on other logic.


[¶](https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, are you OK with using directory under /tmp for storing the files here?

Why tempfile is preferred over the previous scheme?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, are you OK with using directory under /tmp for storing the files here?

I prefer to have a subdir to store temporary files, for example, store tmp files in /tmp/sOmEUuid/ actually, won't this be the same with previouse code? previously, tmp_archive_name is a uuid and it is the subdir name in /tmp. tempfile.TemporaryDirectory() does the same thing.

Why tempfile is preferred over the previous scheme?

  • it handles all platforms temp directory
  • it handles customized TMPDIR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The previous code stores temp files in file named /tmp/tar_<uuid>. I think it's similar to what tempfile does without setting something like TMPDIR.

BTW, I think all files handled in this module are temp files and can be deleted at the end of the function:

  • For _from_files(), the data will eventually be parsed from files and written to tables in databases;
  • For _install_packages(), the packages will eventually be installed to a Python environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the doc https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory:

Changed in version 3.12: Added the delete parameter.

We will need to use GD to "pin" the temp directory to make it available to multiple UDFs.

local_dir.mkdir(parents=True)
cmd = [
sys.executable,
Expand Down