Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files.inboundAdapter watchService - ignore subdirectories #3557

Closed
szilardk opened this issue Apr 28, 2021 · 5 comments · Fixed by #8596
Closed

Files.inboundAdapter watchService - ignore subdirectories #3557

szilardk opened this issue Apr 28, 2021 · 5 comments · Fixed by #8596

Comments

@szilardk
Copy link

szilardk commented Apr 28, 2021

Expected Behavior

Files.inboundAdapter(new File(dir))
        .useWatchService(true)

avoid listing the files from all subdirectoried

directory structure

/workDir
     /done
     /failed

would like to read all the files from /workDir and move them in done/failed after processing. in my scenario it is not useful to scan the subdirectories. it is just taking time if the subdirectories contain a lot of files.
i had a look in the WatchServiceDirectoryScanner.walkDirectory where Files.walkFileTree is used. this has "int maxDepth". would it make sense to expose this?

Current Behavior
current implementation would look for all files in all subdirectories.

Context
what i use now is

Files.inboundAdapter(new File(dir))
        .useWatchService(true)
        .filter 

with the filter i can eliminate everything i do not need. it would be even better if the subdirectories were not scanned at all since in my case there are 10K -100K files

@szilardk szilardk added status: waiting-for-triage The issue need to be evaluated and its future decided type: enhancement labels Apr 28, 2021
@artembilan
Copy link
Member

Why just don't use a plain polling behavior of the Files.inboundAdapter() and don't try to abuse a WatchService which puprose is really to let us to scan the whole file tree?

See docs for more info: https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#watch-service-directory-scanner

@artembilan artembilan added status: waiting-for-reporter Needs a feedback from the reporter and removed status: waiting-for-triage The issue need to be evaluated and its future decided labels Apr 28, 2021
@szilardk
Copy link
Author

thank you for the reply.
i will try using the plain polling behavior or the WatcherService with the filter. i will have to do some performance tests to see which one works better in my case.
are there some guidelines when to use WatcherService and when not to use it?

@artembilan
Copy link
Member

Well, one of them of course about walking through the whole file tree.
Another one is to react for event in the file system: like updates to files or their removal.

There must not be any performance difference since both approaches are handled by the SourcePollingChannelAdapter.

@kodecharlie
Copy link

As discussed, use-watch-service=true implies a full directory-tree scan. This, in fact, was the documented behavior in the spring.io references. But intuitively, it seems what watch-service would offer an option to regulate the recursion. Someone mentioned exposing a maxDepth property that's already natively supported in the watch-service logic. Well, that's one way, although limiting in its own way. Possibly a better solution is to inject some kind of filter into the watch-service that regulates which sub-directories are scanned. If the filter is *, then the implication is all sub-directories are scanned; if the filter is empty, then none; if the filter is a regex, then only those subdirs that match are scanned.

I don't have a special use-case that warrants this behavior. But looking careful in the docs and reading, in fact, the source code itself for FileReadingMessageSource, this just seems reasonable behavior out-of-the-box.

@artembilan
Copy link
Member

I think I find your suggestions reasonable, so we fix it in the next 6.0 as two options: int maxDepth and Predicate<Path> watchDirFilter.

@artembilan artembilan added in: file and removed status: waiting-for-reporter Needs a feedback from the reporter labels Nov 9, 2021
@artembilan artembilan added this to the 6.0.x milestone Nov 9, 2021
@artembilan artembilan modified the milestones: 6.0.x, 6.1.x Jan 17, 2023
@artembilan artembilan self-assigned this Apr 13, 2023
@artembilan artembilan modified the milestones: 6.1.x, 6.1.0-RC1 Apr 13, 2023
artembilan added a commit to artembilan/spring-integration that referenced this issue Apr 13, 2023
Fixes spring-projects#3557

* Expose a `watchMaxDepth` on the `FileReadingMessageSource` for its `Files.walkFileTree()` API usage
* Add `watchDirPredicate` option ot the `FileReadingMessageSource` to skip sub-tree for `Files.walkFileTree()`
scanning according to some condition against directory `Path`
garyrussell added a commit that referenced this issue Apr 13, 2023
* GH-3557: Add maxDepth, dirPredicate to FileReadMS

Fixes #3557

* Expose a `watchMaxDepth` on the `FileReadingMessageSource` for its `Files.walkFileTree()` API usage
* Add `watchDirPredicate` option ot the `FileReadingMessageSource` to skip sub-tree for `Files.walkFileTree()`
scanning according to some condition against directory `Path`

* Fix language in docs

Co-authored-by: Gary Russell <grussell@vmware.com>

---------

Co-authored-by: Gary Russell <grussell@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants