Skip to content

Commit bdccfa4

Browse files
committed
Update
1 parent db444a6 commit bdccfa4

File tree

2 files changed

+45
-24
lines changed

2 files changed

+45
-24
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
*.tmp
55

66
.venv
7+
8+
**/*.quarto_ipynb

index.qmd

Lines changed: 43 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
title: Data Validation Error Format
33
subtitle: Version 0.1.0
4-
date: 2025-07-10
54
#doi: 10.5281/zenodo......
65
authors:
76
- name: Jakob Voß
@@ -24,7 +23,7 @@ The specification of **Data Validation Error Format** has two goals:
2423
Last but not least the format should help to better separate validation and presentation of validation results, so both can be solved by different applications.
2524

2625
:::{.callout-caution}
27-
The format is strictly limited to errors and error positions. Neither does it include other kinds of analysis results such as statistics and summaries of documents, nor does in include details about validation such as test cases, schema rules, and individual constraints. Errors can be linked to additional information with error types but the semantics of these types is out of the scope of this specification.
26+
The format is strictly limited to **errors** and **error positions**. Neither does it include other kinds of analysis results such as statistics and summaries of documents, nor does in include details about validation such as test cases, schema rules, and individual constraints. Errors can be linked to additional information with error types but the semantics of these types is out of the scope of this specification.
2827
:::
2928

3029
## Overview
@@ -156,10 +155,12 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMEND
156155

157156
Only section @sec-errors to @sec-dimensions, excluding examples and notes, and the [list of normative references](#normative-references) are normative parts of this specification.
158157

159-
Specific support of Data Validation Error Format by an application depends on two options. Both MUST be documented by applications:
158+
Specific support of Data Validation Error Format by an application depends on:
160159

161-
1. Support of either the full format or only [**positions**](#positions) in condense form being [**locator maps**](#locator-map)
162-
2. The set of supported [**dimensions**](#sec-dimensions)
160+
1. the set of supported [**dimensions**](#sec-dimensions), and
161+
2. whether [**positions**](#positions) are supported in full ([**locators**](#locators)) and condense for([**locator maps**](#locator-map)) or only the latter.
162+
163+
Both MUST be documented by applications.
163164

164165
# Errors {#sec-errors}
165166

@@ -192,8 +193,7 @@ An error can have a **position**. A position is given
192193
Every locator map can be transformed to an equivalent array of locators. The reverse transformation is only possible if there is at most one locator per dimension and no locator has nested errors.
193194

194195
::: {.callout-note}
195-
Locators of the same positions should refer to roughly the "same" part of a document or at least have a common intersection.
196-
This requirement is difficult to formalize because locators refer to different document models, so it is no normative part of this specification yet.
196+
Locators of the same positions should refer to roughly the "same" part of a document or at least have a common intersection. This requirement is difficult to formalize because locators refer to different document models, so it is no normative part of this specification.
197197
:::
198198

199199
[locator format]: #locator-formats
@@ -275,26 +275,29 @@ Applications MAY restrict their support of Data Validation Error Format to posit
275275

276276
# Dimensions {#sec-dimensions}
277277

278-
A **dimension** is a defined method to address parts of a document. Each dimension has:
278+
A **dimension** is a defined method to address elements of a document. Each dimension has:
279279

280280
- a unique **name**, being a string that start with lowercase letter `a` to `z`, optionally followed by a sequence of lowercase letters, digits `0` to `9` and/or `-`.
281281

282282
- a **locator format**, being a formal language of Unicode strings to encode references to parts of a document. The sets of strings of the language are called **addresses**.
283283

284284
- a **document model** matching the **locator format**.
285285

286+
Some dimensions imply a document model on addressed elements. For instance a [line number] addresses a character string and a [JSON Pointer] addresses a JSON value.
287+
286288
Applications SHOULD support the following dimensions:
287289

288-
name | locator format | document model
289-
---------------:|-----------------------|----------------------
290-
`offset` | [offset number] | sequence of elements
291-
`char` | [character number] | sequence of characters or code points
292-
`cell` | [cell reference] | tabular data
293-
`file` | [file path] | directory tree
294-
`line` | [line number] | sequence of lines
295-
`linecol` | [line and column] | sequence of characters with line breaks
296-
`jsonpointer` | [JSON Pointer] | JSON
297-
`xpath` | [XML Path Expression] | XML or compatible hierarchies
290+
name | locator format | document model | element model
291+
:---------------|-------------------------|-----------------------------------------|------------------
292+
`offset` | [offset number] | sequence of elements | -
293+
`char` | [character number] | character string | character
294+
`line` | [line number] | sequence of character strings | character string
295+
`linecol` | [line and column] | sequence of character strings | character
296+
`cell` | [cell reference] | tabular data | -
297+
`cells` | [cell range] | tabular data | tabular data
298+
`file` | [file path] | directory tree | -
299+
`jsonpointer` | [JSON Pointer] | JSON value | JSON value
300+
`xpath` | [XML Path Expression] | XML or compatible hierarchies | XML or character string
298301

299302
<!--
300303
A **validator** is an executable function that transforms a **document** into a (possibly empty) set of **errors**.
@@ -309,6 +312,12 @@ The set of normative locator formats has not been finally specified yet. The fin
309312

310313
See [appendix](#sec-additional-dimensions) for more dimensions to be discussed.
311314

315+
:::{.callout-warning}
316+
Dimensions are a subset of query languages. A dimension value locates to *one* element from a document. A query language (e.g. JSONPath, full XPath...) often locates a set of elements.
317+
:::
318+
319+
warning
320+
312321
### Sequential document models
313322

314323
#### Offset number
@@ -329,13 +338,21 @@ Possibly requires some more detailled specification. For instance line number de
329338

330339
#### Line and Column
331340

332-
[Line number] and [character number] within the line, separated by colon `:`.
341+
The **line and column** locator format with name `linecol` is used to reference a character in a sequence of character strings. The locator value consists of a [line number] and a [character number] within the line, separated by colon (`:`).
333342

334343
### Tabular document models
335344

345+
:::{.callout-info}
346+
Tabular data is known from spreadsheet software and CSV files. The tabular document model does *not* include table headers.
347+
:::
348+
336349
#### Cell reference
337350

338-
The **cell reference** locator format with name `cell` is used to reference a cell or a range of cells in a table as known from spreadsheet software. The locator value consists of a pair of column and row, optionally followed by colon (`:`) and another pair of column and row. Columns are given in hexavigesimal system (A=1, B=2..., Z=26, AA=27, AB=28...) and rows are given by numbers, starting from 1.
351+
The **cell reference** locator format with name `cell` is used to reference a single cell in tabular data. The locator value consists of a pair of column and row. Columns are given in hexavigesimal system (A=1, B=2..., Z=26, AA=27, AB=28...) and rows are given by numbers, starting from 1.
352+
353+
#### Cell range
354+
355+
The **cell range** locator format with name `cells` is used to reference a range of connected cells in tabular data. The locator value consists of a cell reference, optionally followed by colon (`:`) and another cell reference.
339356

340357
### Hierarchical document models
341358

@@ -349,9 +366,9 @@ Depending in the document model, file names may be defined as binary string inst
349366

350367
#### JSON Pointer
351368

352-
...
369+
The **JSON Pointer** locator format with name `jsonpointer` is used to reference a JSON value within a JSON value. The locator value and its semantics are defined in [RFC 6901].
353370

354-
See <https://datatracker.ietf.org/doc/html/rfc6901>
371+
[RFC 6901]: https://datatracker.ietf.org/doc/html/rfc6901
355372

356373
#### XML Path Expression
357374

@@ -393,6 +410,7 @@ TODO: Subset of XPath, see <https://www.w3.org/TR/xpath20/#id-path-expressions>
393410

394411
- [JSON Schema](https://json-schema.org/) schema language
395412
- [XPath] XML Path Language
413+
- [RFC 9457](https://datatracker.ietf.org/doc/html/rfc9457) defines an extensible error format with fields `type` (`types` in this format), `status`, `title`, `detail` (`message` in this format), `instance` (`position` but as one URI). The example given in the specification is similar to this format.
396414

397415
# Appendices {.unnumbered}
398416

@@ -410,10 +428,11 @@ The following [dimensions](#sec-dimensions) are not normative part of the specif
410428

411429
name | locator format | document models
412430
------------|------------------|------------------
413-
`fq` | format and path | all binary formats supported by [fq] (see @lst-fq)
414431
`rfc5147` | [RFC 5147](https://tools.ietf.org/html/rfc5147) | characters and lines
415-
`rfc7111` | [RFC 7111](https://tools.ietf.org/html/rfc7111) | tabular date
432+
`rfc7111` | [RFC 7111](https://tools.ietf.org/html/rfc7111) | tabular data
416433
`id` | Unicode string | data models that refer to elements with an identifier
434+
`fq` | format and path | all binary formats supported by [fq] (see @lst-fq)
435+
`files` | File patterns | directory tree
417436

418437
Dimension `rfc5147`, in contrast to `char` and `line`, also supports ranges. `rfc7111`, in contrast to `cell`, also supports ranges and multi-selection.
419438

0 commit comments

Comments
 (0)