Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structurally compare js objects rather than YAML #404

Merged
merged 3 commits into from
Sep 28, 2021

Conversation

Bj0rnen
Copy link
Contributor

@Bj0rnen Bj0rnen commented Sep 16, 2021

We are using the TestRunner to test our own kpt functions. For our use-cases, we have found the output of failed test cases to be insufficient in helping us locate the actual problems. That is because the test assertion compares two rendered YAMLs. These YAML strings are simply too long and the testing library (jasmine) truncates the strings, typically before the actual difference has appeared.

With this change, we compare JavaScript objects instead, and the assertion errors show a vastly improved structural diff between the configs. Now, we have been careful here to not change the behavior in terms of when tests succeed or fail. We've done that by still converting the configs to YAML and then converting them back from YAML. For us, this has been working exactly as intended.

@emmmile

Before vs after

Both before and after the change, diffs are printed like Expected <foo> to equal <bar>.. Keep an eye out for the to equal part in the before diff below:

Expected 'apiVersion: v1
items:
- apiVersion: bigtable.cnrm.cloud.google.com/v1beta1
  kind: BigtableInstance
  metadata:
    name: testtest-bigtable-instance
  spec:
    cluster:
    - clusterId: testtest-gr1-1
      numNodes: 0
      zone: gcp-region1-b
    - clusterId: testtest-gr2-1
      numNodes: 0
      zone: gcp-region2-b
    - clusterId: testtest-gr3-1
      numNodes: 0
      zone: gcp-region3-b
    displayName: testtest-bigtable-instance
- apiVersion: storage.example.com/v1alpha1
  kind: BigtableBackupSchedule
  metadata:
    name: testtest-bigtable-instance
- apiVersion: storage.example.com/v1alpha1
  kind: DatabaseAutoscaler
  metadata:
    name: testtest-gr3-1
  spec:
    configuration:
      pagerdutykey: xyz
      parameters:
        cpuTarget: 0.7
        maxNodes: 100
        minNodes: 5
      strategy: StandardStrategy
      version: v1alpha1
    databaseName: >-
      //bigtable.googleapis.com/projects/my-project/instances/testtest-bigtable-instance/clusters/testtest-gr3- ... to equal 'apiVersion: v1
items:
- apiVersion: bigtable.cnrm.cloud.google.com/v1beta1
  kind: BigtableInstance
  metadata:
    name: testtest-bigtable-instance
  spec:
    cluster:
    - clusterId: testtest-gr1-1
      numNodes: 0
      zone: gcp-region1-b
    - clusterId: testtest-gr2-1
      numNodes: 0
      zone: gcp-region2-b
    - clusterId: testtest-gr3-1
      numNodes: 0
      zone: gcp-region3-b
    displayName: testtest-bigtable-instance
- apiVersion: storage.example.com/v1alpha1
  kind: BigtableBackupSchedule
  metadata:
    name: testtest-bigtable-instance
- apiVersion: storage.example.com/v1alpha1
  kind: DatabaseAutoscaler
  metadata:
    name: testtest-gr3-1
  spec:
    configuration:
      pagerdutykey: xyz
      parameters:
        cpuTarget: 0.7
        maxNodes: 100
        minNodes: 5
      strategy: StandardStrategy
      version: v1alpha1
    databaseName: >-
      //bigtable.googleapis.com/projects/my-project/instances/testtest-bigtable-instance/clusters/testtest-gr3- ....

And here's the after diff for the same test case:

Expected $.results[0].message = 'Word 'bigtable' in instance name is redundant. Complete blocklist: bigtable,cluster,database,instance.' to equal 'Word 'instance' in instance name is redundant. Complete blocklist: bigtable,cluster,database,instance.'.
Expected $.results[1].message = 'Word 'instance' in instance name is redundant. Complete blocklist: bigtable,cluster,database,instance.' to equal 'Word 'bigtable' in instance name is redundant. Complete blocklist: bigtable,cluster,database,instance.'.

The before diff has many issues:

  1. Displaying the whole YAML encoding is too verbose.
  2. Worst of all, the sheer length of the YAML encoding results in a truncated diff. Notice how the "expected" and "actual" both end with .... In this test case (and a majority of cases for us), the differences weren't even in the Kubernetes objects part of the Configs objects; they were in the results part. So the displayed, truncated, YAML strings are, in fact, 100% equal. The before diff gives us zero information about differences between "expected" and "actual".
  3. Since the whole Configs objects are rendered as YAML, Kubernetes objects and results included, the diffing only reports that there is a single difference. In the after diff, we see two differences.

From the after diff, we can conclude that the two results result[0] and result[1] were simply placed in the wrong order in the results array.

@google-cla
Copy link

google-cla bot commented Sep 16, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no Google CLA needed label Sep 16, 2021
@emmmile
Copy link

emmmile commented Sep 16, 2021

👍 (I've worked on this with @Bj0rnen)

@Bj0rnen
Copy link
Contributor Author

Bj0rnen commented Sep 16, 2021

@googlebot I signed it!

@google-cla google-cla bot added cla: yes Google CLA OK and removed cla: no Google CLA needed labels Sep 16, 2021
@mengqiy
Copy link
Contributor

mengqiy commented Sep 17, 2021

@Bj0rnen and @emmmile Thanks for creating the PR!
Can you please paste the before vs after?

@emmmile
Copy link

emmmile commented Sep 18, 2021

@mengqiy on Monday we will restart working on this and we will provide some more information. You mean an example of how it looks with this change and how it was before?

About the typing problem, either we use load() (that has return type any) or set the type of valueOf() to any directly. I don't think we really need to reload the objects as Configs so maybe we can loosen the types a bit.

@Bj0rnen
Copy link
Contributor Author

Bj0rnen commented Sep 21, 2021

Hi @mengqiy! I have added a before vs after to the original comment. I think everything should be in order now. Does it look good from your end?

indent: 2,
noArrayIndent: true,
skipInvalid: true,
sortKeys: true,
});
return safeLoad(yaml);
Copy link
Contributor

@mengqiy mengqiy Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to safeDump and then safeLoad?
Can we simply compare the ResourceList returned by configs.toResourceList()? Are there any downsides?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it's a roundabout way to implement it. But in practice, returning configs.toResourceList() causes many test failures for us. Mainly these two types of inequalities:

Expected $.items[0].metadata not to have properties
        namespace: undefined

and

Expected $.items[0] to be a kind of IAMPolicyResource, but was Object({ apiVersion: 'iam.cnrm.cloud.google.com/v1alpha2', kind: 'IAMPolicy', metadata: Object({ name: 'iam-policy', namespace: 'project-level' }), spec: Object({ resourceRef: Object({ apiVersion: 'test', kind: 'Test' }), bindings: [  ] }) }).

The first kind of inequality is easy to understand. safeDump elides all undefined properties.

The second kind of inequality results from the call to deepCopy on line 102

const configs = input.deepCopy();

I haven't dug into the implementation of Configs.deepCopy. But basically, before the equality comparison happens, the input object goes through a deepCopy and the expectedOutput object doesn't. That deepCopy is apparently not "equality-safe", however whatever differences it creates disappear again upon safeDump.

We have already relied on the semantics of post-yaml comparison in our test cases. 44/118 of our test cases break if we directly return configs.toResourceList() from valueOf. safeDump returns a string. And if strings s and t are equal (and valid YAML), then safeLoad(s) and safeLoad(t) must also be equal (safeDump is deterministic, surely). The proposed change can't break existing tests.

Alternatives

Patch the differences

I suppose it's possible to address the two types of equality differences that we have observed above. We can do a deepCopy of the o object that goes into the comparison. We can explicitly filter out undefined properties with some recursive function. Or maybe modify our kpt functions to be meticulous about this, so that all the undefineds match up.

But how can we know that these two equality differences are the only ones that could appear? Someone else might depend on other aspects of safeDump in their tests. It seems a bit arbitrary to patch over these two things specifically.

Rich YAML comparison

Maybe the expect(...).toEqual(...) can be replaced with some YAML-specific equality comparison function, which takes two YAMLs and outputs a rich diff on the strings? For example a line-based diff. Then we can go back to only using safeDump and not safeLoad. That should probably also output readable diffs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation.

@mengqiy
Copy link
Contributor

mengqiy commented Sep 24, 2021

This seems to be the best option at the moment. And it doesn't introduce any breaking changes to any user-facing functions (valueOf is an internal functions).
I guess we can safely merge it.
@Bj0rnen @emmmile Please let me know if I misunderstood anything.

@emmmile
Copy link

emmmile commented Sep 27, 2021

@mengqiy All our points have been well described by @Bj0rnen and I don't think you misunderstood anything :)

@Bj0rnen
Copy link
Contributor Author

Bj0rnen commented Sep 27, 2021

Yep. All good from me 👍

@mengqiy mengqiy merged commit 37e38c1 into GoogleContainerTools:master Sep 28, 2021
@Bj0rnen Bj0rnen deleted the patch-1 branch September 29, 2021 14:08
@emmmile
Copy link

emmmile commented Sep 29, 2021

@mengqiy is this going to be released automatically?

@mengqiy
Copy link
Contributor

mengqiy commented Sep 29, 2021

No. It needs some manual work to release it.
@emmmile Are you blocked by this?

@emmmile
Copy link

emmmile commented Sep 29, 2021

No I wouldn't say that we are blocked, but I guess would be good to understand if the new version is days/weeks/months ahead :) Thanks!

@mengqiy
Copy link
Contributor

mengqiy commented Sep 29, 2021

You can expect it in the next 7 days.

@mengqiy mengqiy mentioned this pull request Oct 8, 2021
@mengqiy
Copy link
Contributor

mengqiy commented Oct 11, 2021

In case you missed it, kpt-functions@0.15.1 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Google CLA OK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants