Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPUnit should not create a diff when expected or actual string are too large #4918

Open
arnaud-lb opened this issue May 10, 2019 · 8 comments
Labels
type/performance Issues related to resource consumption (time and memory)

Comments

@arnaud-lb
Copy link

Q A
PHPUnit version 7.x
PHP version 7.1
Installation Method Composer

When comparing values that happen to not be equal, phpunit might try to display a nice diff. If the values are large enough, a memory-efficient implementation is used to compute the diff (MemoryEfficientLongestCommonSubsequenceCalculator).

The diff algorithm can take hours to compute a diff on accidentally large strings.

@sebastianbergmann sebastianbergmann transferred this issue from sebastianbergmann/phpunit May 10, 2019
@Darmaru
Copy link

Darmaru commented Dec 4, 2019

The same issue reproduced for me too

@SpacePossum
Copy link
Contributor

Unless a bug can be found in the MemoryEfficientLongestCommonSubsequenceCalculator you can try using the TimeEfficientLongestCommonSubsequenceCalculator for these cases.

You might have better luck having this resolved by finding the original authors of the calculators and discuss the details based on you input with them, as the details in the current report(s) are not enough to work with.

@arnaud-lb
Copy link
Author

I guess that the algorithms are pretty standard, and that they work as intended. One is time efficient (but memory hungry), the other is memory efficient (and time hungry).

If that's the case, and if there is no more efficient diffing algorithm, the issue is in PHPUnit: It should not try to generate a full minimal diff when it's not going to finish in an acceptable time. It could fallback to just finding the first difference in two inputs (which would not require to find the longuest common substring), or to use an algorithm that only tries to find an approximation of the minimal possible diff.

@SpacePossum
Copy link
Contributor

If that's the case, and if there is no more efficient diffing algorithm, the issue is in PHPUnit:

I didn't check if PHPUnit has some logic to select a calculator, or leaves it to this package; @ see:
https://github.com/sebastianbergmann/diff/blob/3.0.2/src/Differ.php#L193

It might be worth checking if this selection logic is working as intended. Maybe someone who is effected by this can check?

@cxlblm
Copy link

cxlblm commented Oct 24, 2020

I have the same issue, Can we use the yetanotherape/diff-match-patch package to replace the original diff algorithm.

@SpacePossum
Copy link
Contributor

Did you try to make diff with yetanotherape/diff-match-patch , did work considerable faster? Do you maybe have some test scenario/data to see the difference?

@arnaud-lb
Copy link
Author

I've created a reproducer / demo here: sebastianbergmann/diff#107

I've also tried yetanotherape/diff-match-patch: Unfortunately it uses too much memory.

@sebastianbergmann sebastianbergmann transferred this issue from sebastianbergmann/diff Mar 8, 2022
@sebastianbergmann sebastianbergmann changed the title phpunit can hang for tens of minutes in MemoryEfficientLongestCommonSubsequenceCalculator PHPUnit should not create a diff when expected or actual string are too large Mar 8, 2022
@sebastianbergmann sebastianbergmann added the type/performance Issues related to resource consumption (time and memory) label Mar 8, 2022
@FlorianSara
Copy link

I've been working towards a solution. Rather than removing the diff, I think it would be nice to provide a hint about what's going wrong.

To do this, we could implement a method "getPartialDiff" in the ComparisonFailure class (sebastian/comparator/src/ComparisonFailure.php);
This method would first check that $expected and $actual are not bigger than a decided threshold.
If they are, it would search for the first diff, and reduce $expected and $actual to a small number of lines around this diff.
It would then call $differ->diff(...) just like the "getDiff" method, but with the reduced inputs.

phpunit should then use this method instead of "getDiff" when printing the exception (in phpunit/src/Framework/TestFailure.php:47).

Here is an example of output :

--- Expected
+++ Actual
@@ @@
-skipping 1337 lines...
    unchanged line 1
    unchanged line 2
    unchanged line 3
-   this line is missing
+   this line should not be here
    unchanged line 1
    unchanged line 2
    unchanged line 3
+skipping 42000 lines...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/performance Issues related to resource consumption (time and memory)
Projects
None yet
Development

No branches or pull requests

6 participants