Skip to content

Rearranging documents - and question about the future development goals of this repo #5

@metabench

Description

@metabench

This is a question first about the scope of this project, about whether this project is intended to grow by including new functionality (ie breaking changes, would require a version increase) that increases the complexity of the overall system but decreases the size of the patch, while also being careful about time complexity.

Specifically, it appears as though if we had two large elements in the body of an html document, and they swapped places, then that would not get encoded efficiently. Are there plans to improve that? Is there interest in improving that?

I'm working on this problem right now as part of a larger programming problem, and so far this textdiff-create algorithm is fast and efficient with my somewhat difficult test case, but I do also see there are efficiency improvements suggested in #4.

This project gets into ongoing computer science research, and I am curious about if this project and its issues is a place where such things are worth discussing, or more if it's the case that textdiff-create already does what it does well enough and there is not much attention developers are paying to it on an ongoing basis.

@icflorescu I appreciate what you have done already in creating and publishing these repos, I don't want to add unwelcome time pressure to you if your attention is on other things than working on this and related repos. Personally I am at a stage where I am paying attention to some intricacies of string diffing (I've been looking into LCS, suffix trees with Ukkonen's algorithm, and not being entirely successful yet in extracting the LCS from one or more suffix trees), then found this, saw that it works easily, and along with the compression in issue 1 and Brotli compression gets the diffs between 60KB of JS from two structurally similar JS files (things like some different ULRs, text strings, web tracking codes) down to 1.9KB, which is very encouraging, and along with taking 84ms is a good combination of speed, compression, and ease of use. The thing I am curious about but I don't feel as though you are obligated is whether this repo would be a good place to work on and discuss with you and others what various possible improvements could be, test them, and then push them as new versions as they get implemented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions