Skip to content

Path handling with Input #2427

@fetimo

Description

@fetimo

Hi!

I noticed some strange behaviour when using the Path input and I'm not sure if this is a feature or a bug.

If I have a training definition like so:

from cog import Path

def train(
  dataset_path: Path = Input("Path to dataset directory.")
) -> TrainingOutput:
  print(dataset_path)

and invoke it with cog train -i dataset_path="training_data", where training_data is a local directory it throws an error complaining that it isn't one of http, https, or data schema.

{
       "detail": [
               {
                       "loc": [
                               "body",
                               "input",
                               "dataset_path",
                               "is-instance[Path]"
                       ],
                       "msg": "Input should be an instance of Path",
                       "type": "is_instance_of"
               },
               {
                       "loc": [
                               "body",
                               "input",
                               "dataset_path",
                               "function-plain[validate()]"
                       ],
                       "msg": "Value error, '' is not a valid URL scheme. 'data', 'http', or 'https' is supported.",
                       "type": "value_error"
               }
       ]
 }

If I then point it at a hosted file and run it again with cog train -i dataset_path="https://my-bucket.com/bucket" it passes schema validation but fails to download and create a temporary file (the file is a zip hosted with GCP and is public, I've had success with Cloudinary so this is odd). I'm not sure what's causing the error here but the docs say that "[Path] represents a path to a file on disk." so it's a bit surprising that you can also pass it a URL.

I've solved it by changing it slightly to:

from cog import Path

def train(
  dataset_path: str = Input("Path to dataset directory.")
) -> TrainingOutput:
  dataset_path = Path(dataset_path)
  print(dataset_path)

note that it now uses dataset_path as a simple str and converts it to a Path in the body itself.

Is the first example, passing a local filepath and getting an error, a bug? And is the behaviour of Path and Input expected?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions