Skip to content

Conversation

@joelostblom
Copy link
Contributor

@joelostblom joelostblom commented Nov 27, 2025

Description

I think we need to go through many types of error messages manually and see if the way the error is presented via explain is helpful. This debug option helps with that by always printing the properties dict, its issues, and the output from explain. It also enables us to easily do this on all test messages.

Here is an example of an error message where all the ^^^^^ are not that helpful:

image

This is now easy to detect by running the test suite with the debug option enabled, but we still need to make decision about what to do with these errors by opening a new issue for each one of them and discuss the best solutions.

Needs an in-depth review.

Checklist

  • Formatted Markdown
  • Ran just run-all

Comment on lines +135 to +139
if os.getenv("CDP_DEBUG"):
rprint("", properties)
rprint(*issues)
rprint(explain(issues))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To activate, you can run a script like this:

CDP_DEBUG=true uv run joel_checking.py 

I tried fancier solutions with the rich logging module, but it was impossible for me to find a way that pytest would pipe through the colors of the output from rich when using the logging handler. With printing it "just works" (tm).

There is also no built-in way to pass a debug level flag to Python's logging module, so we would en up reading an env variable or making our script accept a command line parameter, which both seemed like unnecessary code.


As an example of output, let's run this command:

CDP_DEBUG=true uv run python -c 'import check_datapackage as cdp; cdp.check({"resources": [{"title": "Title"}]})'

which gives us:

Image

It is easy to inspect the two error message from explain this way. In this case, the first one seems to make sense whereas the second doesn't make it clear what went wrong (it seems like we passed something to name so why is it telling us it didn't). Follow up on @martonvago 's comment in #208 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a comment on the issue about this too 😅

Comment on lines +4 to +14
def pytest_report_teststatus(report, config):
if os.getenv("CDP_DEBUG"):
if report.when == "call":
# Add newlines to separate test results
category = report.outcome
shortletter = "\n\n" # dot / F / X / etc.
verbose = "\n\n" # ("PASSED", "FAILED", ...)

return category, shortletter, verbose

return None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can already get debug output from tests by running them with:

CDP_DEBUG=true uv run pytest -sv tests/test_check.py 

However, it gets hard to see what output belongs to which tests since there is no separation:

Image

("PASSED" belongs to the previous test and tests melts into each other)

With this little code snippet in conftest.py, the output is easier to read:

Image (I'm thinking we don't need the test status since we would just use this to look at formatting but it could easily be added back in)

@joelostblom joelostblom marked this pull request as ready for review November 27, 2025 10:40
@joelostblom joelostblom requested a review from a team as a code owner November 27, 2025 10:40
@joelostblom
Copy link
Contributor Author

@seedcase-project/developers I'll mark as ready for review to get some early thoughts around whether this will be useful and if you prefer a different approach.

Copy link
Contributor

@martonvago martonvago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very helpful to get a general idea about the types of messages!
I cannot say if we want the debug option actually to be part of the released thing, I leave that to Luke :P .


About the wording of the messages (not sure if you're interested in that at this point, but I think Luke wanted to rewrite them):

One thing to note is that while there are many tests, they are testing cdp-specific behaviour and not the jsonschema output. So there are presumably more message types than what shows up in the tests. They format their messages based on the JSON schema validator (their validator, our type), so getting a list of validators that are used in the Data Package JSON schema could also give us an idea of how many message types there are.

But yeah, the answer is likely a bit too many for this to be a pleasant task.

Now, jsonschema can be customised in various ways, including using a custom validator to check the input object. In a custom validator it should be possible to tweak the behaviour of each individual JSON schema rule (so minItems, required, type, etc.) and redefine the messages. I don't know if this would be easier to do than remapping the default messages after validation, but it's something to explore.

Comment on lines +135 to +139
if os.getenv("CDP_DEBUG"):
rprint("", properties)
rprint(*issues)
rprint(explain(issues))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a comment on the issue about this too 😅

@lwjohnst86 lwjohnst86 moved this from Todo to In Progress in Iteration planning Nov 30, 2025
@lwjohnst86 lwjohnst86 moved this from In Progress to In Review in Iteration planning Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants