You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.qmd
+43-24Lines changed: 43 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
1
---
2
2
title: Data Validation Error Format
3
3
subtitle: Version 0.1.0
4
-
date: 2025-07-10
5
4
#doi: 10.5281/zenodo......
6
5
authors:
7
6
- name: Jakob Voß
@@ -24,7 +23,7 @@ The specification of **Data Validation Error Format** has two goals:
24
23
Last but not least the format should help to better separate validation and presentation of validation results, so both can be solved by different applications.
25
24
26
25
:::{.callout-caution}
27
-
The format is strictly limited to errors and error positions. Neither does it include other kinds of analysis results such as statistics and summaries of documents, nor does in include details about validation such as test cases, schema rules, and individual constraints. Errors can be linked to additional information with error types but the semantics of these types is out of the scope of this specification.
26
+
The format is strictly limited to **errors** and **error positions**. Neither does it include other kinds of analysis results such as statistics and summaries of documents, nor does in include details about validation such as test cases, schema rules, and individual constraints. Errors can be linked to additional information with error types but the semantics of these types is out of the scope of this specification.
28
27
:::
29
28
30
29
## Overview
@@ -156,10 +155,12 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMEND
156
155
157
156
Only section @sec-errors to @sec-dimensions, excluding examples and notes, and the [list of normative references](#normative-references) are normative parts of this specification.
158
157
159
-
Specific support of Data Validation Error Format by an application depends on two options. Both MUST be documented by applications:
158
+
Specific support of Data Validation Error Format by an application depends on:
160
159
161
-
1. Support of either the full format or only [**positions**](#positions) in condense form being [**locator maps**](#locator-map)
162
-
2. The set of supported [**dimensions**](#sec-dimensions)
160
+
1. the set of supported [**dimensions**](#sec-dimensions), and
161
+
2. whether [**positions**](#positions) are supported in full ([**locators**](#locators)) and condense for([**locator maps**](#locator-map)) or only the latter.
162
+
163
+
Both MUST be documented by applications.
163
164
164
165
# Errors {#sec-errors}
165
166
@@ -192,8 +193,7 @@ An error can have a **position**. A position is given
192
193
Every locator map can be transformed to an equivalent array of locators. The reverse transformation is only possible if there is at most one locator per dimension and no locator has nested errors.
193
194
194
195
::: {.callout-note}
195
-
Locators of the same positions should refer to roughly the "same" part of a document or at least have a common intersection.
196
-
This requirement is difficult to formalize because locators refer to different document models, so it is no normative part of this specification yet.
196
+
Locators of the same positions should refer to roughly the "same" part of a document or at least have a common intersection. This requirement is difficult to formalize because locators refer to different document models, so it is no normative part of this specification.
197
197
:::
198
198
199
199
[locator format]: #locator-formats
@@ -275,26 +275,29 @@ Applications MAY restrict their support of Data Validation Error Format to posit
275
275
276
276
# Dimensions {#sec-dimensions}
277
277
278
-
A **dimension** is a defined method to address parts of a document. Each dimension has:
278
+
A **dimension** is a defined method to address elements of a document. Each dimension has:
279
279
280
280
- a unique **name**, being a string that start with lowercase letter `a` to `z`, optionally followed by a sequence of lowercase letters, digits `0` to `9` and/or `-`.
281
281
282
282
- a **locator format**, being a formal language of Unicode strings to encode references to parts of a document. The sets of strings of the language are called **addresses**.
283
283
284
284
- a **document model** matching the **locator format**.
285
285
286
+
Some dimensions imply a document model on addressed elements. For instance a [line number] addresses a character string and a [JSON Pointer] addresses a JSON value.
287
+
286
288
Applications SHOULD support the following dimensions:
`offset` | [offset number] | sequence of elements | -
293
+
`char` | [character number] | character string | character
294
+
`line` | [line number] | sequence of character strings | character string
295
+
`linecol` | [line and column] | sequence of character strings | character
296
+
`cell` | [cell reference] | tabular data | -
297
+
`cells` | [cell range] | tabular data | tabular data
298
+
`file` | [file path] | directory tree | -
299
+
`jsonpointer` | [JSON Pointer] | JSON value | JSON value
300
+
`xpath` | [XML Path Expression] | XML or compatible hierarchies | XML or character string
298
301
299
302
<!--
300
303
A **validator** is an executable function that transforms a **document** into a (possibly empty) set of **errors**.
@@ -309,6 +312,12 @@ The set of normative locator formats has not been finally specified yet. The fin
309
312
310
313
See [appendix](#sec-additional-dimensions) for more dimensions to be discussed.
311
314
315
+
:::{.callout-warning}
316
+
Dimensions are a subset of query languages. A dimension value locates to *one* element from a document. A query language (e.g. JSONPath, full XPath...) often locates a set of elements.
317
+
:::
318
+
319
+
warning
320
+
312
321
### Sequential document models
313
322
314
323
#### Offset number
@@ -329,13 +338,21 @@ Possibly requires some more detailled specification. For instance line number de
329
338
330
339
#### Line and Column
331
340
332
-
[Line number] and [character number] within the line, separated by colon `:`.
341
+
The **line and column** locator format with name `linecol` is used to reference a character in a sequence of character strings. The locator value consists of a [line number] and a [character number] within the line, separated by colon (`:`).
333
342
334
343
### Tabular document models
335
344
345
+
:::{.callout-info}
346
+
Tabular data is known from spreadsheet software and CSV files. The tabular document model does *not* include table headers.
347
+
:::
348
+
336
349
#### Cell reference
337
350
338
-
The **cell reference** locator format with name `cell` is used to reference a cell or a range of cells in a table as known from spreadsheet software. The locator value consists of a pair of column and row, optionally followed by colon (`:`) and another pair of column and row. Columns are given in hexavigesimal system (A=1, B=2..., Z=26, AA=27, AB=28...) and rows are given by numbers, starting from 1.
351
+
The **cell reference** locator format with name `cell` is used to reference a single cell in tabular data. The locator value consists of a pair of column and row. Columns are given in hexavigesimal system (A=1, B=2..., Z=26, AA=27, AB=28...) and rows are given by numbers, starting from 1.
352
+
353
+
#### Cell range
354
+
355
+
The **cell range** locator format with name `cells` is used to reference a range of connected cells in tabular data. The locator value consists of a cell reference, optionally followed by colon (`:`) and another cell reference.
339
356
340
357
### Hierarchical document models
341
358
@@ -349,9 +366,9 @@ Depending in the document model, file names may be defined as binary string inst
349
366
350
367
#### JSON Pointer
351
368
352
-
...
369
+
The **JSON Pointer** locator format with name `jsonpointer` is used to reference a JSON value within a JSON value. The locator value and its semantics are defined in [RFC 6901].
353
370
354
-
See <https://datatracker.ietf.org/doc/html/rfc6901>
@@ -393,6 +410,7 @@ TODO: Subset of XPath, see <https://www.w3.org/TR/xpath20/#id-path-expressions>
393
410
394
411
-[JSON Schema](https://json-schema.org/) schema language
395
412
-[XPath] XML Path Language
413
+
-[RFC 9457](https://datatracker.ietf.org/doc/html/rfc9457) defines an extensible error format with fields `type` (`types` in this format), `status`, `title`, `detail` (`message` in this format), `instance` (`position` but as one URI). The example given in the specification is similar to this format.
396
414
397
415
# Appendices {.unnumbered}
398
416
@@ -410,10 +428,11 @@ The following [dimensions](#sec-dimensions) are not normative part of the specif
`fq` | format and path | all binary formats supported by [fq] (see @lst-fq)
414
431
`rfc5147` | [RFC 5147](https://tools.ietf.org/html/rfc5147) | characters and lines
415
-
`rfc7111` | [RFC 7111](https://tools.ietf.org/html/rfc7111) | tabular date
432
+
`rfc7111` | [RFC 7111](https://tools.ietf.org/html/rfc7111) | tabular data
416
433
`id` | Unicode string | data models that refer to elements with an identifier
434
+
`fq` | format and path | all binary formats supported by [fq] (see @lst-fq)
435
+
`files` | File patterns | directory tree
417
436
418
437
Dimension `rfc5147`, in contrast to `char` and `line`, also supports ranges. `rfc7111`, in contrast to `cell`, also supports ranges and multi-selection.
0 commit comments