This repository has been archived by the owner on Aug 5, 2024. It is now read-only.
Speed up the semantic alignment loop in Javascript #103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the Javascript version, when using
diff_cleanupSemantic()
, some diffs result in the semantic alignment loop being run many times. This happens when comparing a file containing a long chunk of characters with a similar file containing the same long chunk of characters twice in succession, i.e.:File 1:
<chunk A><chunk B><chunk C>
FIle 2:
<chunk A><chunk B><chunk B><chunk C>
When this happens, the loop runs as many times as there are characters in chunk B. This can get quite expensive because three new strings are created in every iteration. This PR replaces these with faster index manipulations.
I tried to translate the algorithm line by line with the objective of changing nothing in its behaviour. Basically, instead of tracking 3 strings (
equality1
,edit
,equality2
), I track 1 string (buffer
) and 2 indices (editStart
,editEnd
). They are related this way:The other change I made was to change the loop condition. The original code shifts the edit left as much as possible (using the common suffix between equality1 and edit) and shifts right until the first character of edit and equality2 are different. I changed that to counting the common prefix between edit and equality2 and adding it to the amount of right shift to get the total number of shifts required.
I used the following benchmark to force the loop to run an arbitrary number of times:
Here are the timings I got for
diff_cleanupSemantic()
before and after this PR.