Skip to content

[FEA]: Unify the FileSourceStage, MultiFileSource and DirectoryWatcher functionality #976

Open

Description

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

Currently, there are 2 stages and a utility class which can read files and push them into the pipeline: MultiFileSource, FileSourceStage, and DirectoryWatcher. All 3 are very similar but have slightly different features. Having very similar, but slightly different functionality can be confusing and makes it difficult to use functionality in 2 stages at the same time (i.e. DirectoryWatcher with multiple search patterns)

Describe your ideal solution

This should combine the features of all 3 into a single stage to make it easier for users. Instead of needing to decide which stage to use based on the features a user wants, there will be 1 stage with the capability of all 3 and options to configure the functionality.

For example, the FileSourceStage should be able to support the following:

  • FileSource(watch=True, files=["my_directory/*.json"])
    • Enable watching for new files that match the glob pattern (i.e. the directory watcher stage)
  • FileSource(watch=True, files=["s3://my_bucket/my_directory/*.json"])
    • Enable watching an s3 bucket (combining the directory watcher and multi-file source)
  • FileSource(files=["local_directory1/*.json", "local_directory2/*.json"])
    • Using multiple globs for (using the multi-file source functionality)

The end goal is a single stage which has has the capability of all 3.

Describe any alternatives you have considered

No response

Additional context

This is a follow on issue that will help #975

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

  • Status

    Blocked

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions