-
Notifications
You must be signed in to change notification settings - Fork 18
Add array support for samplesheets #128
Conversation
|
Relates to #81 |
mirpedrol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking good! thanks for the contribution 😄
We should add tests for this. You can add a samplesheet and json schema with the required structure in testResources here, and use it as in other tests.
Also add documentation, I think samplesheet schema specification will be the best place.
Don't forget to update the changelog! 😄
plugins/nf-validation/src/main/nextflow/validation/SamplesheetConverter.groovy
Outdated
Show resolved
Hide resolved
1d839c7 to
bccdd9f
Compare
…e that boolean values in json/yaml are not treated as null before casting
|
Ok. So. State of play after this latest set of changes:
What is shown to be supported:
What is not shown to be supported (some of this may work by chance):
There is also obviously not any support yet in N.B. It would let me merge this branch without any approvals which seems risky - should probably make it so all PRs require at least 1 approval! |
|
Hi, I'll take a look now. But just a heads up that a lot of the non-supported features will probably work if #141 is finished |
nvnieuwk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome LGTM!
|
What does changing to |
|
It has support for all major versions of the JSON schema, while the current library is missing support the two newest ones (which is why we had to implement |
|
Oooh unique as part of schema would be good! But the issue isn't the validation of most of those things - I think anything that is valid jsonschema Draft7 will validate OK. It's rather the conversion to an NFType and the support in the Plus we don't have any internal checks to warn against features of a schema that are not supported for |
mirpedrol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing!!
I think we would also add a note in the documentation, your comment above seems a good summary to add there, especially the limitations.
And don't forget to add your contribution to the CHANGLOG 😄
|
@awgymer can you also add an error if the user gives a CSV or TSV samplesheet when there is an array in the schema? |
Yes. I will need to work out the best place to raise this error. |
…samplesheet is passed. Add tests for this behaviour
|
Now raises an error if |
|
Awesome!!! So this should be done now? Or should I wait a bit more for a review? |
|
This should be complete with the suggestions from the prior reviews now! So feel free to go ahead and re-review the changes |
nvnieuwk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
There is a slight wrinkle I did not account for: This means that currently someone could supply a valid v7 schema with tuple-type I'm not sure if we should just leave it unsupported and breaking, or add an explicit error if we detect This also ties in to whether we ever intend for a single |
|
I'd prefer to add an error for now as we are actively looking into adding support for all types of JSON schema. So let's try to not break much that could impact that transition 😁 |
|
So, as always, it's a case of "supported in validation but not in conversion". Where do you think is best to raise the error? Only in the converter code? |
Yeah the conversion will always be more limited I think, but it's getting better (thanks to you mostly now 😉) IMO it should only be in the places where it can cause issues in the future so only in the conversion code should be fine |
|
Turns out that I have lied! If you make This is all due to the |
Ah that needs fixing too then 🔨 |
|
I don't know if there is some good way to effectively have a schema for our schema 😂 The This is because tuple-type supports e.g. |
|
My best long-term proposal for this would be to make the However obviously that would be backwards-incompatible with everyone's current schema |
We considered this at first but eventually decided not to do it with the formats because it would add a load of new formats that would have to be added into Tower and the nf-core CLI (but I agree in hindsight that it would maybe have the better solution). Also we're going to start up some discussions to only support the 2020-12 draft of JSON schema |
|
What shall we do in the current situation then? I think array support is useful and the current implementation works for all schema versions for standard Since this was never supported previously, is it possible to push it through with a clear note in the docs that |
We have a meta-schema for the parameters schema, we could add one for the samplesheet schema if this helps. Also, something to consider for the schema-builder tooling once we implement it for samplesheet schemas. |
|
As far as I know, no one uses the |
|
So What happens is you define
Pre 2020 both types used the 2020 changes this to make tuple-type have it's own keyword So I think for our "meta-schema" we need to have it define |
From what I've seen during my tests in #141, the |
|
So feel free to merge this @awgymer 🚀 |
Had to open afresh as my attempt to rebase after I branched off another feature branch spectacularly failed
This Draft PR demonstrates the feasibility of adding limited support for simple array fields to the
fromSamplesheetmethod (they is already undocumented support for fulljsonschemaspec in thevalidateFilemethod).The proposal for what to support would be:
metaThe code so far has no tests in the plugin but has been demonstrated to work almost as it does for flat files with one major exception:
There is currently no support forexistschecking with file/directory format fields within an array.Array file/directory existence is now supported.
The following
samplesheet.yaml:With the following
schema.json:{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://raw.githubusercontent.com/nf-validation/example/master/assets/schema_input.json", "title": "nf-validation example - params.input schema", "description": "Schema for the file provided with params.input", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string", "pattern": "^\\S+$", "errorMessage": "Sample name must be provided and cannot contain spaces" }, "fastq": { "type": "array", "items": { "type": "string", "pattern": "^\\S+\\.f(ast)?q\\.gz$" }, "errorMessage": "FastQ files cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" }, "strandedness": { "type": "string", "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'", "enum": ["forward", "reverse", "unstranded"] } }, "required": ["sample", "fastq", "strandedness"] } }Gives the following in
nextflow:Things to consider:
"type": "array"(we don't currently do this and it technically is supported for validation)"type": "array"with"prefixItems"(this is the tuple-type validation)N.B.: I tried to use polymorphism (is this the right term) to have different methods depending on the input type, but everything just went to the method declared for
Object(the fallback) so if anyone can help me understand that I would appreciate it