Description
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
Currently, there are 2 stages and a utility class which can read files and push them into the pipeline: MultiFileSource
, FileSourceStage
, and DirectoryWatcher
. All 3 are very similar but have slightly different features. Having very similar, but slightly different functionality can be confusing and makes it difficult to use functionality in 2 stages at the same time (i.e. DirectoryWatcher
with multiple search patterns)
Describe your ideal solution
This should combine the features of all 3 into a single stage to make it easier for users. Instead of needing to decide which stage to use based on the features a user wants, there will be 1 stage with the capability of all 3 and options to configure the functionality.
For example, the FileSourceStage
should be able to support the following:
FileSource(watch=True, files=["my_directory/*.json"])
- Enable watching for new files that match the glob pattern (i.e. the directory watcher stage)
FileSource(watch=True, files=["s3://my_bucket/my_directory/*.json"])
- Enable watching an s3 bucket (combining the directory watcher and multi-file source)
FileSource(files=["local_directory1/*.json", "local_directory2/*.json"])
- Using multiple globs for (using the multi-file source functionality)
The end goal is a single stage which has has the capability of all 3.
Describe any alternatives you have considered
No response
Additional context
This is a follow on issue that will help #975
Code of Conduct
- I agree to follow this project's Code of Conduct
- I have searched the open feature requests and have found no duplicates for this feature request
Metadata
Assignees
Labels
Type
Projects
Status
Blocked