Support newline-delimited JSON and fix YAML multi-doc vs. sequence #1168

neiser · 2025-09-05T12:17:27Z

First of all, sorry for this unsolicited PR!

The changes here fix an inconsistency when handling YAML multi documents, such as

a
---
b

vs. a single YAML document which is a sequence on top-level (resulting in a config which can accidentally be cast to []any):

- a
- b

Due to an unfortunate cast to []any in the engine package, this situation was handled incorrectly and it was impossible to run Rego validations on those fundamentally different structures correctly, as both were handled as the same config inputs to Rego policy "a" and "b". With the change, the inputs stay for the multi-doc (first case), but changes to the correct single input ["a", "b"] for the single-document (second case).

Also, newline-delimited JSON (ndjson extension) can be seen as some multi-document JSON, and I've adjusted the existing JSON parser similarly to the YAML parser such that multiple documents in one file are parsed as []any with length>1.
Such ndjson input is convenient when conftesting logs from sources. I've added an example for this.
In fact, there's not even an \n as delimiter needed, as we just continue parsing documents until we encounter EOF.

This required some larger changes across all parsers, and I hope you see the benefit of this change.

Most importantly, the code in engine is streamlined and does not cast the parsed configurations, leading to the initially described confusion with YAML.

I've also took the chance and simplified the format command, which now consistently uses two spaces instead of tab for separation (this was previously depending on whether --combine was used or not).

I could also offer to make the tests a bit more beautiful using the famous testify package. Would you accept/merge this as well?

Let me know what you think!

This removes the clumsy cast to []any in engine by changing the Parser interface. Also, use the official go-yaml library from the YAML foundation, and give Parser an io.Reader which is more memory efficient for some implementations Signed-off-by: Andreas Grub <grub@posteo.de>

Signed-off-by: Andreas Grub <grub@posteo.de>

jalseth · 2025-09-16T20:36:05Z

Adding a ndjson parser seems reasonable to me, but this is a huge PR that touches many parts of the code base. From a quick skim, I don't see why the changes are needed to the existing parsers. Please split into smaller PRs with each addressing one specific change to the functionality of the code.

neiser · 2025-09-17T06:29:21Z

Please split into smaller PRs with each addressing one specific change to the functionality of the code.

@jalseth Thank you for looking at this PR!

As managing multiple PRs on top of each other is some effort, would you also be fine if I split up the first commit of this PR in somewhat smaller parts?

I think I can create the following sequence of commits which you can review one by one, I hope:

Change Parser interface to return []any and remove the cast in the engine package, fixing the multi-doc vs. sequence confusion
Change Parser interface to take an io.Reader
Simplify and align format command output
Simpler YAML parsing using io.Reader, using the yaml/go-yaml lib
Simpler JSON parsing code using the io.Reader approach with accidental support for ndjson and using a library to strip the UTF-8 BOM

Would that be separated enough? I fear that the first two steps (touching Parser interface) are still quite a large commits, but the changes are rather repeating for each existing parser. I might even be simpler for me to combine step 1 und 2 into one commit. Would that be fine?

neiser force-pushed the support-ndjson-as-multidoc branch 4 times, most recently from d9156c4 to 24908ea Compare September 5, 2025 19:20

neiser added 2 commits September 5, 2025 21:39

feat: support ndjson extension using json parser

c833538

Signed-off-by: Andreas Grub <grub@posteo.de>

neiser force-pushed the support-ndjson-as-multidoc branch from 24908ea to c833538 Compare September 5, 2025 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support newline-delimited JSON and fix YAML multi-doc vs. sequence #1168

Support newline-delimited JSON and fix YAML multi-doc vs. sequence #1168

Uh oh!

neiser commented Sep 5, 2025 •

edited

Loading

Uh oh!

jalseth commented Sep 16, 2025

Uh oh!

neiser commented Sep 17, 2025

Uh oh!

Uh oh!

Support newline-delimited JSON and fix YAML multi-doc vs. sequence #1168

Are you sure you want to change the base?

Support newline-delimited JSON and fix YAML multi-doc vs. sequence #1168

Uh oh!

Conversation

neiser commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jalseth commented Sep 16, 2025

Uh oh!

neiser commented Sep 17, 2025

Uh oh!

Uh oh!

neiser commented Sep 5, 2025 •

edited

Loading