Skip to content

Conversation

@JakobJingleheimer
Copy link
Member

@JakobJingleheimer JakobJingleheimer commented Nov 10, 2025

This PR adds support for an xfail directive (similar to skip and todo), whereby a user can flag a test-case as expected to fail.

Co-Authored-By: Alejandro Espa <98526766+vespa7@users.noreply.github.com>
@JakobJingleheimer JakobJingleheimer added the test_runner Issues and PRs related to the test runner subsystem. label Nov 10, 2025
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/test_runner

@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Nov 10, 2025
@vassudanagunta
Copy link
Contributor

vassudanagunta commented Nov 24, 2025

There is long-established precedent for xfail:

What is the commitment level and timeline for this? I ask because I've already implemented xfail, ypass and some other additional feature extensions to node:test in a yet-to-be-released node-test-extra package (like fs-extra is to node:fs). I'd be delighted if xfail was subsumed into node:test. Because I've put a lot of thought into it, I might have something to contribute to nodejs on this. I could write up my thoughts, collaborate with @JakobJingleheimer, and/or create an alternate PR.

lmk, please

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented Nov 24, 2025

Ah, sure. I'm not particular about the name as long as it's intuitive (which xfail seems to be).

I expect this could land pretty quickly: quite a few collaborators had previously voiced support for it, and to my knowledge there have been no general concerns raised against the feature itself.

Aside from the name (which is trivial to update in the PR), the remaining work is:

  • find where the result data gets passed to reporters (and add this property to that—the actual implementation work there is probably ~5 seconds)
  • update the docs

Creating competing PRs is never desirable and generally pisses off the person / people already working on it.

Happy to collaborate on the feature; but aside from influencing the name, I think this is very straightforward and has very little implication work, especially almost none remaining.

In terms of timeline to finish this, I expect to have time this week (the past ~2 weeks were very tumultuous for me).

@vassudanagunta
Copy link
Contributor

Additional features that my version has:

  • option to specify the expected error. I think this is really important. If for example you are documenting/reproducing/tracking via a test a bug that is in released code, you will want it to xfail, since it ought not fail CI, but only if it fails in the expected way.

  • command line option to treat xfails as regular tests, i.e. treating them as test failures. This is useful when you want to see the failure details, which most reporters only show for failed tests. With node-test-extra, it will also show all debug/console logging, which is otherwise suppressed.

    • I am considering augmenting that command line option with a filter arg, so you can filter which xfails are treated as regular tests.

I think that's it, though I'll have to take a look at

@JakobJingleheimer
Copy link
Member Author

Alex and I considered those and pushed them to v2 to get the base implementation landed sooner (many users probably won't need the v2 extras, so no need to make them wait for things they don't want/need). If you already have an implementation for that, I'm happy to defer to you for that once this lands 🙂

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Nov 24, 2025

Deferring to v2 makes sense, but I think you should quickly consider how it would be added to the API, just to avoid potential API thrashing or getting painted into a corner. In the implementation, you have a string value given for xfail that is treated as a reporting message, just like todo. That makes sense:

test('this should to that', { xfail: 'bug #382' }, (t) => {
   //...
});

How would you, in v2, specify the expected failure? An xfail-error property that specifies the error message or pattern? Something that can be parsed out of the existing xfail property? Or within the test function, e.g. via a new assert.xfail method that specifies both the expected value and also the expected erroneous value? Or...?

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Nov 24, 2025

Arguably a new kind of assert might be better than adding an xfail marker as a test property.

assert.xfail(expected, actual, expectedErroneous?, message?)

OTOH, assertions of failures maybe should be reserved for testing that the code throws the correct exception under specific conditions.

@JakobJingleheimer
Copy link
Member Author

That's contrary to the intention of this feature: the test should be identical to the non-failing version. This is flag to flip.

assert.xfail is a different feature 🙂

@JakobJingleheimer
Copy link
Member Author

you have a string value given for xfail that is treated as a reporting message, just like todo. That makes sense:


test('this should to that', { xfail: 'bug #382' }, (t) => {

   //...

});

We were thinking in the case of it.fail, any failure is okay.

In the case of { fail } it would accept, true, a string or regex to compare against err.message (and maybe an instance of Error/AssertionError).

For all of these, there is no API thrashing as they're all truthy / falsy, which makes them additive to the current.

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Nov 24, 2025

it all sounds good to me. I'm glad this is getting into node:test. Thank you!

As to reporting, I think you have three options:

  1. xfail is a special case of pass. If you want existing reporters that don't know about xfail to report it as a regular pass.

  2. xfail is a special case of todo. If you want existing reporters that don't know about xfail to report it as a todo. this makes conceptual sense since it is something that shouldn't be treated as "all green", something that should be highlighted as needing to be addressed.

  3. you don't worry about breaking existing reporters, and you want xfail to be reported distinctly from pass and todo.

In my implementation, I chose option 2. A failing xfail generates a test todo event, where the todo message is prefixed with "xfail:". A passing xfail generates a test fail event, with an error indicating that it was an xfail that unexpectedly passed.

This allows my xfail aware reporter to produce a more informative report (it formats xfail results differently from both pass and todo), while degrading gracefully for legacy reporters.

@JakobJingleheimer
Copy link
Member Author

I (and I think Alex too?) was imagining the result of it.xfail would go into the existing pass (when it throws) or fail (when it doesn't throw) and would additionally have a property expectedFail: true (not married to the prop name either—if a well known one exists that isn't crazy, happy to go with that, especially if reports will already support it).

Reporters that support the extra prop do so, and those that don't just behave "normally" (even the uninformed report is still correct).

@vassudanagunta
Copy link
Contributor

ok, that's what i meant by option 1. Since I've implemented an xfail-aware custom reporter, it's all good for my purposes.

Screenshot 2025-11-24 at 11 22 23 PM

@JakobJingleheimer
Copy link
Member Author

Huzzah! TAP reporter is now consuming/surfacing xfail. The others aren't yet because I can't find where they compose their output text

TAP reporter output consuming 'xfail' 🎉 Spec reporter output missing 'xfail'

@vassudanagunta
Copy link
Contributor

vassudanagunta commented Dec 6, 2025

The others aren't yet because I can't find where they compose their output text

Jacob, both spec and dot reporters rely on:

function formatTestReport(type, data, prefix = '', indent = '', hasChildren = false, showErrorDetails = true) {

I know this as I spent a lot of time trying to reverse engineer the semantics of TestsStream, which isn't documented nearly enough for people to write custom reporters without inferring semantics by reading all of the built-in reporter code. I will open a discussion or issue on that topic when I get the chance.

But as part of my reverse engineering, I ended up paying off some test_runner tech debt in precisely this part of the code: #59700. It's seems to be falling through the cracks in PR limbo. Maybe we can get that merged? It isn't strictly necessary for your PR, but since we'll have review eyes on this area of code, it would be efficient to do both at the same time?



it.todo('sync pass todo', () => {
it.xfail('sync expect fail (method)', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also add a "sync pass xfail with message" like there is with todo and skip.

This opens up the question of what that should look like for TAP. In my own reporter, I append #XFAIL to the message. The discussion on TAP Directives for XPASS and XFAIL that I linked to in the main thread debates #XFAIL vs #TODO. The latter makes a lot of sense too... see my comment that conceptually an xfail is a special case of todo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the folks behind the TAP spec, TAPE and tapjs/node-tap can chime in? @isaacs @ljharb

Copy link
Member Author

@JakobJingleheimer JakobJingleheimer Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean in the case of specifying it via config object? If so, the value that would subsequently be supported would be the expected failure we talked about #60669 (comment) so

it('should add', { xfail: 'Expected 2 to equal 1' }, () => {  });

where the test-case must fail with an error whose message is 'Expected 2 to equal 1'

Or as a regex:

it('should add', { xfail: /expected 2 to equal 1/i }, () => {  });

When it fails with a different error, xfail would then not "pass" (and would be marked "failed", reported as failed, etc).

That would then be used in the test-case reporter output like

✔ should add # Expected 2 to equal 1
✔ should add # /expected 2 to equal 1/i
✖ should add # Expected 2 to equal 1
✖ should add # /expected 2 to equal 1/i

But since that's part of v1.1, I think for now it should not handle any value other than truthy/falsy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both a reason (like todo and skip) and expected error should be supported. Example reasons:

✔ should add #XFAIL Issue #389

✔ should add #XFAIL Can't fix until SemVer Major increment

I'm building a system where released bugs are tracked both as a a bug tracking issue and as an XFAIL test case.

I took your comment to mean (update of the current documentation to illustrate):

  • skip <boolean> | <string> If truthy, the test is skipped. If a string is provided, that string is displayed in the test results as the reason for skipping the test. Default: false.
  • todo <boolean> | <string> If truthy, the test marked as TODO. If a string is provided, that string is displayed in the test results as the reason why the test is TODO. Default: false.
  • xfail <boolean> | <string> | <RegExp> If truthy, the test marked as XFAIL. If a string is provided, that string is displayed in the test results as the reason why the test is expected to fail. If a RegExp is provided, only a failure that matches is permitted. Default: false.

Now I see my interpretation was wrong. In any case, that approach would allow EITHER a reason OR an expected error, NOT BOTH. I think both should be supported.

But as you say, we can punt until v1.1. 🤫🤐🙃

@JakobJingleheimer
Copy link
Member Author

Jacob, both spec and dot reporters rely on:

function formatTestReport(type, data, prefix = '', indent = '', hasChildren = false, showErrorDetails = true) {

Huzzah, yes, that's it. Thanks!

@JakobJingleheimer JakobJingleheimer marked this pull request as ready for review December 7, 2025 19:26
@codecov
Copy link

codecov bot commented Dec 7, 2025

Codecov Report

❌ Patch coverage is 87.50000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (bd3a202) to head (241e5dd).
⚠️ Report is 204 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/test_runner/reporter/utils.js 33.33% 1 Missing and 1 partial ⚠️
lib/internal/test_runner/tests_stream.js 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #60669      +/-   ##
==========================================
- Coverage   88.55%   88.53%   -0.02%     
==========================================
  Files         703      703              
  Lines      208077   208449     +372     
  Branches    40083    40210     +127     
==========================================
+ Hits       184254   184548     +294     
- Misses      15841    15905      +64     
- Partials     7982     7996      +14     
Files with missing lines Coverage Δ
lib/internal/test_runner/harness.js 91.82% <100.00%> (ø)
lib/internal/test_runner/reporter/tap.js 98.23% <100.00%> (+0.01%) ⬆️
lib/internal/test_runner/test.js 97.36% <100.00%> (-0.01%) ⬇️
lib/internal/test_runner/tests_stream.js 89.65% <75.00%> (-0.35%) ⬇️
lib/internal/test_runner/reporter/utils.js 90.38% <33.33%> (-2.76%) ⬇️

... and 86 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JakobJingleheimer
Copy link
Member Author

PR is quasi-blocked by a bug in the Validate commit message workflow (pending fix: nodejs/core-validate-commit#130).

switch (type) {
case 'test:fail': {
yield reportTest(data.nesting, data.testNumber, 'not ok', data.name, data.skip, data.todo);
yield reportTest(data.nesting, data.testNumber, 'not ok', data.name, data.skip, data.todo, data.xfail);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to move forward with xfail, we will need to update all integrated reporters!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't I already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see only the integration for tap, did you check spec, dot etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec and dot use the same util, and I did Spec 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

puuuurfect, can we add some tests to cover this behaviour for the different embedded reporters? 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the new xfail here: test/fixtures/test-runner/output/output.js.
This fixture should be used by the snapshot tests for all the different reporters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking I do it there instead of the output one I did do or in addition? I can see reasons for either:

  • More coverage to catch something.

  • More things to change, which is a maintenance burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, currently no tests cover the actual output of xfail across all integrated reporters.

My only concern is that any behavior that becomes clearly defined should be tested to avoid unintentional breakage.

IMHO, it's fine to remove the existing test if it no longer adds value once other tests are added.

Regarding the maintenance burden: these are snapshot tests, so they're usually quick to update.

Comment on lines +234 to +237
These are tests written to otherwise pass but there is some issue (often
upstream) causing them to fail. Rather than using `skip` or `todo`, which would
effectively silence the failure, the test would instead be marked `xfail` (`x`
from "expected" + `fail`). That way, when the issue is resolved, that is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we should provide more behaviour-first documentation for this feature, rather than focusing on a specific use case!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to add that in the test runner Learn article. The sentiment for API docs seems to be purely what it is. I don't have an opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ava calls this "failing", which seems a bit clearer to me than "xfail"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern about this feature isn't the feature itself, but the naming 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jest and ava, so yes? 🤷🏾‍♂️

Copy link
Contributor

@vassudanagunta vassudanagunta Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think xfail is more intuitive but i defer to you guys. More important to me are semantics. Is failing a subclass of todo or pass? For protocols like TAP and downstream consumers that don't know about it, it should be mapped as a subclass of one or the other. Likewise for existing custom node:test reporters.

I think todo. A failing test is one that is flagged for future action. it shouldn't be easily ignored.

(my last argument for xfail is embedded above. there's gonna be a lot of confusion when people use the very common phrase "failing test".)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a failing test that fails correctly, as a todo, make sense to me, as long as a failing test that passes incorrectly, is a fail :-)

fair point on the discussion confusion - doesn't change my position, but it's a good thing for others to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a failing test that fails correctly, as a todo, make sense to me, as long as a failing test that passes incorrectly, is a fail :-)

that's how i see it exactly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Successfully failed" and "failed to fail"; not confusing at all 🤪

I don't particularly care about the name—I want the functionality.


I'm not sure I understand this "subclass of todo". What does todo have to do with this? They seem fairly different to me:

A todo can pass or fail without affecting the suite's result.

An xfail must fail correctly, and it does affect the suite's result.



it.todo('sync pass todo', () => {
it.xfail('sync expect fail (method)', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding a test similar to test-runner-todo-skip-tests.js too, as this specific test is mainly meant to test the output 😁

Copy link
Member

@avivkeller avivkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neato!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ci PRs that need a full CI run. test_runner Issues and PRs related to the test runner subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants