-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include remote path in FileHolder #5911
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
ab13b83
to
0a2ae72
Compare
Looking at the constructor of FileHolder, it's not straight forward to just add a FileHolder( Path inputFile ) {
assert inputFile
this.sourceObj = inputFile
this.storePath = real(inputFile)
this.stageName = norm(inputFile.getFileName())
}
FileHolder( def origin, Path path ) {
assert origin != null
assert path != null
this.sourceObj = origin
this.storePath = path
this.stageName = norm(path.getFileName())
}
protected FileHolder( def source, Path store, def stageName ) {
this.sourceObj = source
this.storePath = store
this.stageName = norm(stageName)
} I just added a class RemoteFileHolder which has a different set of constructors. However there are many more alternative solutions -- e.g. by adding a setter to FileHolder so it doesn't need to get passed to the constructor. @bentsherman do you have any suggestions on which solution you think would be best? |
modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy
Outdated
Show resolved
Hide resolved
This reverts commit 0a2ae72. Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
4cf7ad5
to
1793efc
Compare
Looks good. Thanks for your help 😄 Do you understand how to use this in your setup? I will also try it with nf-prov. Let me know if you need help with how to test on your end |
I wonder whether something needs to be updated in the constructors: FileHolder( Path inputFile ) {
assert inputFile
this.sourceObj = inputFile
this.storePath = real(inputFile)
this.stageName = norm(inputFile.getFileName())
}
FileHolder( def origin, Path path ) {
assert origin != null
assert path != null
this.sourceObj = origin
this.storePath = path // ← does this need to become `real(path)`?
this.stageName = norm(path.getFileName())
}
protected FileHolder( def source, Path store, def stageName ) {
this.sourceObj = source
this.storePath = store // ← does this need to become `real(store)`?
this.stageName = norm(stageName)
} Cfr the use case: I'm working on a nf-lamin plugin which allows users to run pre-existing Nextflow workflows and create records in the LaminDB regarding:
This package will use the observer to detect various events and then pass that information on to LaminDB. |
I think you should have a look at this on-going effort #5715 |
I think this PR will benefit all of us -- the provenance store, nf-prov, and nf-lamin. Fundamentally it's about being able to trace staged files to their remote source, which we currently cannot do The only snag is that this PR currently will change the task hash. When a task has a foreign input file, the locally staged path is used for the hash, not the remote path. This might actually be better, as it would decouple the task hash from the foreign file staging. But I wonder if the locally staged path was used for a reason |
I have it working with nf-prov! nextflow-io/nf-prov#45 You can see the before/after in the linked PR. Turns out it doesn't affect the task hash. I think this is because the FilePorter uses the original URL to construct the hash for the stage directory, and since the file is hashed by either base name + size + timestamp (standard) or contents (deep), the remote file and staged file produce the same hash @rcannood to your point about doing |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Alternative solution to #5907 to implement #5905.
I didn't change any other calls of FileHolder except the one @bentsherman suggested in #5907.