Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move file existence checks into field validators #649

Open
effigies opened this issue May 4, 2023 · 5 comments
Open

Move file existence checks into field validators #649

effigies opened this issue May 4, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@effigies
Copy link
Contributor

effigies commented May 4, 2023

Right now we have a weird special case for files and directories:

pydra/pydra/engine/specs.py

Lines 211 to 235 in 426564e

if (
field.type in [File, Directory]
or "pydra.engine.specs.File" in str(field.type)
or "pydra.engine.specs.Directory" in str(field.type)
):
self._file_check(field)
def _file_check(self, field):
"""checking if the file exists"""
if isinstance(getattr(self, field.name), list):
# if value is a list and type is a list of Files/Directory, checking all elements
if field.type in [ty.List[File], ty.List[Directory]]:
for el in getattr(self, field.name):
file = Path(el)
if not file.exists() and field.type in [File, Directory]:
raise FileNotFoundError(
f"the file {file} from the {field.name} input does not exist"
)
else:
file = Path(getattr(self, field.name))
# error should be raised only if the type is strictly File or Directory
if not file.exists() and field.type in [File, Directory]:
raise FileNotFoundError(
f"the file {file} from the {field.name} input does not exist"
)

These would make more sense as validators on the fields themselves, rather than special-casing inside Pydra, as right now we are only capable of handling File/Directory and list of File/Directory.

To avoid losing functionality, we could improve the @pydra.mark.task decorator to convert Paths into fields with an existence check. And @tclose's cmd_arg suggestion could probably help out by making these implicit, as well.

@effigies effigies added the enhancement New feature or request label May 4, 2023
@tclose
Copy link
Contributor

tclose commented May 4, 2023

If the file and directory classes inside fileformats are adopted by convention then this functionality is covered (+ magic number checking) when the objects are instantiated.

To avoid losing functionality, we could improve the @pydra.mark.task decorator to convert Paths into fields with an existence check.

I didn't quite follow this, what is a "field" in this case. As I think we discussed with @djarecka and @ghisvail on another thread, to work nicely with fileformats, paths need to be somehow "cast" to the expected fileformat class. Not sure if this is related to what you are suggesting

@djarecka
Copy link
Collaborator

djarecka commented May 4, 2023

@effigies - I don't fully follow, what do you mean be "convert Paths into fields"?

As for the validation, I think at the beginning we assumed that we want to distinguish between output file that would be a string in the input/output_spec and an existing input file that has a type of File (or Directory). I think nipype there is exist=True for this?

@effigies
Copy link
Contributor Author

effigies commented May 5, 2023

Yeah, I'm not positive what words everyone uses. I'm pretty sure in_file: Path = attr.field() is a thing we can do in an input spec, and when we make functions, we would just do def f(in_file: Path) and let @task do the work of converting the function signature to an input spec. If we stop special-casing File/Directory and doing existence checks in BaseSpec, then when we do the conversions, the attr.field() object that we create would need to be given a validator.

Not sure if this is clear...

@tclose
Copy link
Contributor

tclose commented May 5, 2023

Sorry, still not 100% clear to me (but I'm struggling after an interrupted night's sleep).

I like the idea of pydra.mark.task creating an attrs class with attrs.fields for the input spec if it doesn't already (and that is what you are suggesting).

However, if you use a fileformats class then existence+format validation will occur before the object is passed to a the input field (on initialisation of the File/Directory object) and validation in the input spec wouldn't be necessary. The only issue would be output/downstream-input validation, but this could be handled by casting the outputs generated by output_file_template and similar to the explicitly defined output types, which would trigger validation at that point.

@ghisvail
Copy link
Collaborator

ghisvail commented May 5, 2023

The only issue would be output/downstream-input validation, but this could be handled by casting the outputs generated by output_file_template and similar to the explicitly defined output types, which would trigger validation at that point.

On that note, it still bothers me that a field with output_file_template must be of type string. It is not any string, it's a pathlike string representing a future file.

Under click or typer, this would be declared as a click.Path(exists=False, file_okay=True, dir_okay=False, ...) or a click.File(lazy=True).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants