Skip to content
This repository was archived by the owner on Aug 20, 2024. It is now read-only.

Commit 4113068

Browse files
authored
Merge pull request #141 from nvnieuwk/move-json-validator
Update the JSON schema validator library + major refactor
2 parents 0a0ba1b + cf0da97 commit 4113068

File tree

83 files changed

+2179
-1610
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+2179
-1610
lines changed

CHANGELOG.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,31 @@
11
# nextflow-io/nf-validation: Changelog
22

3+
# Version 2.0.0dev
4+
5+
:warning: This version contains a number of breaking changes. Please read the changelog carefully before upgrading. :warning:
6+
7+
To migrate your schemas please follow the [migration guide](https://nextflow-io.github.io/nf-validation/latest/migration_guide/)
8+
9+
## New features
10+
11+
- Added the `uniqueEntries` keyword. This keyword takes a list of strings corresponding to names of fields that need to be a unique combination. e.g. `uniqueEntries: ['sample', 'replicate']` will make sure that the combination of the `sample` and `replicate` fields is unique. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
12+
13+
## Changes
14+
15+
- Changed the used draft for the schema from `draft-07` to `draft-2020-12`. See the [2019-09](https://json-schema.org/draft/2019-09/release-notes) and [2020-12](https://json-schema.org/draft/2020-12/release-notes) release notes for all changes ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
16+
- Removed all validation code from the `.fromSamplesheet()` channel factory. The validation is now solely done in the `validateParameters()` function. A custom error message will now be displayed if any error has been encountered during the conversion ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
17+
- Removed the `unique` keyword from the samplesheet schema. You should now use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or `uniqueEntries` instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
18+
- Removed the `skip_duplicate_check` option from the `fromSamplesheet()` channel factory and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
19+
- `.fromSamplesheet()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this and earlier versions because of this ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
20+
- `.fromSamplesheet()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
21+
22+
## Improvements
23+
24+
- Setting the `exists` keyword to `false` will now check if the path does not exist ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
25+
- The `schema` keyword will now work in all schemas. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
26+
- Improved the error messages ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
27+
- `.fromSamplesheet()` now supports deeply nested samplesheets ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
28+
329
# Version 1.1.3 - Asahikawa
430

531
## Improvements

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ This [Nextflow plugin](https://www.nextflow.io/docs/latest/plugins.html#plugins)
1414
- 📋 Validate the contents of supplied sample sheet files
1515
- 🛠️ Create a Nextflow channel with a parsed sample sheet
1616

17-
Supported sample sheet formats are CSV, TSV and YAML (simple).
17+
Supported sample sheet formats are CSV, TSV, JSON and YAML.
1818

1919
## Quick Start
2020

@@ -31,7 +31,7 @@ This is all that is needed - Nextflow will automatically fetch the plugin code a
3131
> [!NOTE]
3232
> The snippet above will always try to install the latest version, good to make sure
3333
> that the latest bug fixes are included! However, this can cause difficulties if running
34-
> offline. You can pin a specific release using the syntax `nf-validation@0.3.2`
34+
> offline. You can pin a specific release using the syntax `nf-validation@2.0.0`
3535
3636
You can now include the plugin helper functions into your Nextflow pipeline:
3737

@@ -58,7 +58,7 @@ ch_input = Channel.fromSamplesheet("input")
5858
## Dependencies
5959

6060
- Java 11 or later
61-
- <https://github.com/everit-org/json-schema>
61+
- <https://github.com/harrel56/json-schema>
6262

6363
## Slack channel
6464

@@ -75,3 +75,4 @@ We would like to thank the key contributors who include (but are not limited to)
7575
- Nicolas Vannieuwkerke ([@nvnieuwk](https://github.com/nvnieuwk))
7676
- Kevin Menden ([@KevinMenden](https://github.com/KevinMenden))
7777
- Phil Ewels ([@ewels](https://github.com/ewels))
78+
- Arthur ([@awgymer](https://github.com/awgymer))

docs/migration_guide.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: Migration guide
3+
description: Guide to migrate pipelines using nf-validation pre v2.0.0 to after v2.0.0
4+
hide:
5+
- toc
6+
---
7+
8+
# Migration guide
9+
10+
This guide is intended to help you migrate your pipeline from older versions of the plugin to version 2.0.0 and later.
11+
12+
## Major changes in the plugin
13+
14+
Following list shows the major breaking changes introduced in version 2.0.0:
15+
16+
1. The JSON schema draft has been updated from `draft-07` to `draft-2020-12`. See [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.
17+
2. The `unique` keyword for samplesheet schemas has been removed. Please use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or [`uniqueEntries`](nextflow_schema/nextflow_schema_specification.md#uniqueentries) now instead.
18+
3. The `dependentRequired` keyword now works as it's supposed to work in JSON schema. See [`dependentRequired`](https://json-schema.org/understanding-json-schema/reference/conditionals#dependentRequired) for more information
19+
20+
A full list of changes can be found in the [changelog](../CHANGELOG.md).
21+
22+
## Updating your pipeline
23+
24+
If you aren't using any special features in your schemas, you can simply update your `nextflow_schema.json` file using the following command:
25+
26+
```bash
27+
sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' nextflow_schema.json
28+
```
29+
30+
This will replace the old schema draft specification (`draft-07`) by the new one (`2020-12`), and the old keyword `definitions` by the new notation `defs`.
31+
32+
!!! note
33+
Repeat this command for every JSON schema you use in your pipeline. e.g. for the default samplesheet schema:
34+
`bash sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' assets/schema_input.json `
35+
36+
If you are using any special features in your schemas, you will need to update your schemas manually. Please refer to the [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.
37+
38+
However here are some guides to the more common migration patterns:
39+
40+
### Updating `unique` keyword
41+
42+
When you use `unique` in your schemas, you should update it to use `uniqueItems` or `uniqueEntries` instead.
43+
44+
If you used the `unique:true` field, you should update it to use `uniqueItems` like this:
45+
46+
=== "Before v2.0"
47+
`json hl_lines="9" { "$schema": "http://json-schema.org/draft-07/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string", "unique": true } } } } `
48+
49+
=== "After v2.0"
50+
`json hl_lines="12" { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string" } } }, "uniqueItems": true } `
51+
52+
If you used the `unique: ["field1", "field2"]` field, you should update it to use `uniqueEntries` like this:
53+
54+
=== "Before v2.0"
55+
`json hl_lines="9" { "$schema": "http://json-schema.org/draft-07/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string", "unique": ["sample"] } } } } `
56+
57+
=== "After v2.0"
58+
`json hl_lines="12" { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string" } } }, "uniqueEntries": ["sample"] } `
59+
60+
### Updating `dependentRequired` keyword
61+
62+
When you use `dependentRequired` in your schemas, you should update it like this:
63+
64+
=== "Before v2.0"
65+
`json hl_lines="12" { "$schema": "http://json-schema.org/draft-07/schema", "type": "object", "properties": { "fastq_1": { "type": "string", "format": "file-path" }, "fastq_2": { "type": "string", "format": "file-path" "dependentRequired": ["fastq_1"] } } } `
66+
67+
=== "After v2.0"
68+
`json hl_lines="14 15 16" { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "fastq_1": { "type": "string", "format": "file-path" }, "fastq_2": { "type": "string", "format": "file-path" } }, "dependentRequired": { "fastq_2": ["fastq_1"] } } `

docs/nextflow_schema/create_schema.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,15 @@ go to the pipeline root and run the following:
4646
nf-core schema build
4747
```
4848

49+
!!! warning
50+
51+
The current version of `nf-core` tools (v2.12.1) does not support the new schema draft used in `nf-validation`. Running this command after building the schema will convert the schema to the right draft:
52+
53+
```bash
54+
sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' nextflow_schema.json
55+
```
56+
A new version of the nf-core schema builder will be available soon. Keep an eye out!
57+
4958
The tool will run the `nextflow config` command to extract your pipeline's configuration
5059
and compare the output to your `nextflow_schema.json` file (if it exists).
5160
It will prompt you to update the schema file with any changes, then it will ask if you

docs/nextflow_schema/nextflow_schema_specification.md

Lines changed: 68 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -30,24 +30,24 @@ You can find more information about JSON Schema here:
3030

3131
## Definitions
3232

33-
A slightly strange use of a JSON schema standard that we use for Nextflow schema is `definitions`.
33+
A slightly strange use of a JSON schema standard that we use for Nextflow schema is `defs`.
3434

3535
JSON schema can group variables together in an `object`, but then the validation expects this structure to exist in the data that it is validating.
3636
In reality, we have a very long "flat" list of parameters, all at the top level of `params.foo`.
3737

38-
In order to give some structure to log outputs, documentation and so on, we group parameters into `definitions`.
39-
Each `definition` is an object with a title, description and so on.
40-
However, as they are under `definitions` scope they are effectively ignored by the validation and so their nested nature is not a problem.
38+
In order to give some structure to log outputs, documentation and so on, we group parameters into `defs`.
39+
Each `def` is an object with a title, description and so on.
40+
However, as they are under `defs` scope they are effectively ignored by the validation and so their nested nature is not a problem.
4141
We then bring the contents of each definition object back to the "flat" top level for validation using a series of `allOf` statements at the end of the schema,
4242
which reference the specific definition keys.
4343

4444
<!-- prettier-ignore-start -->
4545
```json
4646
{
47-
"$schema": "http://json-schema.org/draft-07/schema",
47+
"$schema": "https://json-schema.org/draft/2020-12/schema",
4848
"type": "object",
4949
// Definition groups
50-
"definitions": { // (1)!
50+
"defs": { // (1)!
5151
"my_group_of_params": { // (2)!
5252
"title": "A virtual grouping used for docs and pretty-printing",
5353
"type": "object",
@@ -64,7 +64,7 @@ which reference the specific definition keys.
6464
},
6565
// Contents of each definition group brought into main schema for validation
6666
"allOf": [
67-
{ "$ref": "#/definitions/my_group_of_params" } // (6)!
67+
{ "$ref": "#/defs/my_group_of_params" } // (6)!
6868
]
6969
}
7070
```
@@ -77,7 +77,7 @@ which reference the specific definition keys.
7777
5. Shortened here for the example, see below for full parameter specification.
7878
6. A `$ref` line like this needs to be added for every definition group
7979

80-
Parameters can be described outside of the `definitions` scope, in the regular JSON Schema top-level `properties` scope.
80+
Parameters can be described outside of the `defs` scope, in the regular JSON Schema top-level `properties` scope.
8181
However, they will be displayed as ungrouped in tools working off the schema.
8282

8383
## Nested parameters
@@ -115,8 +115,7 @@ Any parameters that _must_ be specified should be set as `required` in the schem
115115

116116
!!! tip
117117

118-
Make sure you do not set a default value for the parameter, as then it will have
119-
a value even if not supplied by the pipeline user and the required property will have no effect.
118+
Make sure you do set `null` as a default value for the parameter, otherwise it will have a value even if not supplied by the pipeline user and the required property will have no effect.
120119

121120
This is not done with a property key like other things described below, but rather by naming
122121
the parameter in the `required` array in the definition object / top-level object.
@@ -164,13 +163,13 @@ Variable type, taken from the [JSON schema keyword vocabulary](https://json-sche
164163
- `number` (float)
165164
- `integer`
166165
- `boolean` (true / false)
166+
- `object` (currently only supported for file validation, see [Nested paramters](#nested-parameters))
167+
- `array` (currently only supported for file validation, see [Nested paramters](#nested-parameters))
167168

168169
Validation checks that the supplied parameter matches the expected type, and will fail with an error if not.
169170

170-
These JSON schema types are _not_ supported (see [Nested paramters](#nested-parameters)):
171+
This JSON schema type is _not_ supported:
171172

172-
- `object`
173-
- `array`
174173
- `null`
175174

176175
### `default`
@@ -223,7 +222,7 @@ If validation fails, this `errorMessage` is printed instead, and the raw JSON sc
223222
For example, instead of printing:
224223

225224
```
226-
ERROR ~ * --input: string [samples.yml] does not match pattern ^\S+\.csv$ (samples.yml)
225+
* --input (samples.yml): "samples.yml" does not match regular expression [^\S+\.csv$]
227226
```
228227

229228
We can set
@@ -239,9 +238,21 @@ We can set
239238
and get:
240239

241240
```
242-
ERROR ~ * --input: File name must end in '.csv' cannot contain spaces (samples.yml)
241+
* --input (samples.yml): File name must end in '.csv' cannot contain spaces
243242
```
244243

244+
### `deprecated`
245+
246+
!!! example "Extended key"
247+
248+
A boolean JSON flag that instructs anything using the schema that this parameter/field is deprecated and should not be used. This can be useful to generate messages telling the user that a parameter has changed between versions.
249+
250+
JSON schema states that this is an informative key only, but in `nf-validation` this will cause a validation error if the parameter/field is used.
251+
252+
!!! tip
253+
254+
Using the [`errorMessage`](#errormessage) keyword can be useful to provide more information about the deprecation and what to use instead.
255+
245256
### `enum`
246257

247258
An array of enumerated values: the parameter must match one of these values exactly to pass validation.
@@ -325,11 +336,6 @@ Formats can be used to give additional validation checks against `string` values
325336
The `format` key is a [standard JSON schema key](https://json-schema.org/understanding-json-schema/reference/string.html#format),
326337
however we primarily use it for validating file / directory path operations with non-standard schema values.
327338

328-
!!! note
329-
330-
In addition to _validating_ the strings as the provided format type, nf-validation also _coerces_ the parameter variable type.
331-
That is: if the schema defines `params.input` as a `file-path`, nf-validation will convert the parameter from a `String` into a `Nextflow.File`.
332-
333339
Example usage is as follows:
334340

335341
```json
@@ -342,7 +348,7 @@ Example usage is as follows:
342348
The available `format` types are below:
343349

344350
`file-path`
345-
: States that the provided value is a file. Does not check its existence, but it does check that the path is not a directory.
351+
: States that the provided value is a file. Does not check its existence, but it does check if the path is not a directory.
346352

347353
`directory-path`
348354
: States that the provided value is a directory. Does not check its existence, but if it exists, it does check that the path is not a file.
@@ -351,11 +357,11 @@ The available `format` types are below:
351357
: States that the provided value is a path (file or directory). Does not check its existence.
352358

353359
`file-path-pattern`
354-
: States that the provided value is a globbing pattern that will be used to fetch files. Checks that the pattern is valid and that at least one file is found.
360+
: States that the provided value is a glob pattern that will be used to fetch files. Checks that the pattern is valid and that at least one file is found.
355361

356362
### `exists`
357363

358-
When a format is specified for a value, you can provide the key `exists` set to true in order to validate that the provided path exists.
364+
When a format is specified for a value, you can provide the key `exists` set to true in order to validate that the provided path exists. Set this to `false` to validate that the path does not exist.
359365

360366
Example usage is as follows:
361367

@@ -367,18 +373,9 @@ Example usage is as follows:
367373
}
368374
```
369375

370-
!!! note
371-
372-
If `exists` is set to `false`, this validation is ignored. Does not check if the path exists.
373-
374-
!!! note
375-
376-
If the parameter is set to `null`, `false` or an empty string, this validation is ignored. It does not check if the path exists.
377-
378376
!!! note
379377

380378
If the parameter is an S3 URL path, this validation is ignored.
381-
Use `--validationS3PathCheck` or set `params.validationS3PathCheck = true` to validate them.
382379

383380
### `mimetype`
384381

@@ -404,8 +401,7 @@ Should only be set when `format` is `file-path`.
404401

405402
!!! tip
406403

407-
Setting this field is key to working with sample sheet validation and channel generation,
408-
as described in the next section of the nf-validation docs.
404+
Setting this field is key to working with sample sheet validation and channel generation, as described in the next section of the nf-validation docs.
409405

410406
These schema files are typically stored in the pipeline `assets` directory, but can be anywhere.
411407

@@ -448,3 +444,41 @@ Specify a minimum / maximum value for an integer or float number length with `mi
448444
The JSON schema doc also mention `exclusiveMinimum`, `exclusiveMaximum` and `multipleOf` keys.
449445
Because nf-validation uses stock JSON schema validation libraries, these _should_ work for validating keys.
450446
However, they are not officially supported within the Nextflow schema ecosystem and so some interfaces may not recognise them.
447+
448+
## Array-specific keys
449+
450+
### `uniqueItems`
451+
452+
All items in the array should be unique.
453+
454+
- See the [JSON schema docs](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems)
455+
for details.
456+
457+
```json
458+
{
459+
"type": "array",
460+
"uniqueItems": true
461+
}
462+
```
463+
464+
### `uniqueEntries`
465+
466+
!!! example "Non-standard key"
467+
468+
The combination of all values in the given keys should be unique. For this key to work you need to make sure the array items are of type `object` and contains the keys in the `uniqueEntries` list.
469+
470+
```json
471+
{
472+
"type": "array",
473+
"items": {
474+
"type": "object",
475+
"uniqueEntries": ["foo", "bar"],
476+
"properties": {
477+
"foo": { "type": "string" },
478+
"bar": { "type": "string" }
479+
}
480+
}
481+
}
482+
```
483+
484+
This schema tells `nf-validation` that the combination of `foo` and `bar` should be unique across all objects in the array.

0 commit comments

Comments
 (0)