Skip to content

Conversation

neiser
Copy link

@neiser neiser commented Sep 5, 2025

First of all, sorry for this unsolicited PR!

The changes here fix an inconsistency when handling YAML multi documents, such as

a
---
b

vs. a single YAML document which is a sequence on top-level (resulting in a config which can accidentally be cast to []any):

- a
- b

Due to an unfortunate cast to []any in the engine package, this situation was handled incorrectly and it was impossible to run Rego validations on those fundamentally different structures correctly, as both were handled as the same config inputs to Rego policy "a" and "b". With the change, the inputs stay for the multi-doc (first case), but changes to the correct single input ["a", "b"] for the single-document (second case).

Also, newline-delimited JSON (ndjson extension) can be seen as some multi-document JSON, and I've adjusted the existing JSON parser similarly to the YAML parser such that multiple documents in one file are parsed as []any with length>1.
Such ndjson input is convenient when conftesting logs from sources. I've added an example for this.
In fact, there's not even an \n as delimiter needed, as we just continue parsing documents until we encounter EOF.

This required some larger changes across all parsers, and I hope you see the benefit of this change.

Most importantly, the code in engine is streamlined and does not cast the parsed configurations, leading to the initially described confusion with YAML.

I've also took the chance and simplified the format command, which now consistently uses two spaces instead of tab for separation (this was previously depending on whether --combine was used or not).

I could also offer to make the tests a bit more beautiful using the famous testify package. Would you accept/merge this as well?

Let me know what you think!

@neiser neiser force-pushed the support-ndjson-as-multidoc branch 4 times, most recently from d9156c4 to 24908ea Compare September 5, 2025 19:20
This removes the clumsy cast to []any in engine by changing the Parser interface.

Also, use the official go-yaml library from the YAML foundation,
and give Parser an io.Reader which is more memory efficient for some implementations

Signed-off-by: Andreas Grub <grub@posteo.de>
Signed-off-by: Andreas Grub <grub@posteo.de>
@neiser neiser force-pushed the support-ndjson-as-multidoc branch from 24908ea to c833538 Compare September 5, 2025 19:40
@jalseth
Copy link
Member

jalseth commented Sep 16, 2025

Adding a ndjson parser seems reasonable to me, but this is a huge PR that touches many parts of the code base. From a quick skim, I don't see why the changes are needed to the existing parsers. Please split into smaller PRs with each addressing one specific change to the functionality of the code.

@neiser
Copy link
Author

neiser commented Sep 17, 2025

Please split into smaller PRs with each addressing one specific change to the functionality of the code.

@jalseth Thank you for looking at this PR!

As managing multiple PRs on top of each other is some effort, would you also be fine if I split up the first commit of this PR in somewhat smaller parts?

I think I can create the following sequence of commits which you can review one by one, I hope:

  1. Change Parser interface to return []any and remove the cast in the engine package, fixing the multi-doc vs. sequence confusion
  2. Change Parser interface to take an io.Reader
  3. Simplify and align format command output
  4. Simpler YAML parsing using io.Reader, using the yaml/go-yaml lib
  5. Simpler JSON parsing code using the io.Reader approach with accidental support for ndjson and using a library to strip the UTF-8 BOM

Would that be separated enough? I fear that the first two steps (touching Parser interface) are still quite a large commits, but the changes are rather repeating for each existing parser. I might even be simpler for me to combine step 1 und 2 into one commit. Would that be fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants