Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented diffJson #28

Merged
merged 7 commits into from
Nov 29, 2014
75 changes: 72 additions & 3 deletions diff.js
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,11 @@ var JsDiff = (function() {
while (newPos+1 < newLen && oldPos+1 < oldLen && this.equals(newString[newPos+1], oldString[oldPos+1])) {
newPos++;
oldPos++;

this.pushComponent(basePath.components, newString[newPos], undefined, undefined);
var value = newString[newPos];
if (this.useLongestToken && oldString[oldPos].length > value.length) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that it's been a while since I've looked at this. How does useLongestToken change the behavior here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being used to discriminate between two lines of pretty-printed, serialized JSON where one of them has a dangling comma and the other doesn't. Turns out including the dangling comma yields the nicest output. For example when diffing {"foo": 123, "bar": 456, "quux": 789} and {"foo": 123, "bar": 456} you'll get:

 {
   "foo": 123,
   "bar": 456,
-  "quux": 456
 }

rather than:

 {
   "foo": 123,
   "bar": 456
-  "quux": 456
 }

I think it reads better with that dangling comma after 123.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. This makes total sense but we should certainly comment it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a9191cb (added part of the above as a comment)

value = oldString[oldPos];
}
this.pushComponent(basePath.components, value, undefined, undefined);
}
basePath.newPos = newPos;
return oldPos;
Expand Down Expand Up @@ -183,6 +186,63 @@ var JsDiff = (function() {
return retLines;
};

JsonDiff = new Diff();
// Discriminate between two lines of pretty-printed, serialized JSON where one of them has a
// dangling comma and the other doesn't. Turns out including the dangling comma yields the nicest output:
JsonDiff.useLongestToken = true;
JsonDiff.tokenize = LineDiff.tokenize;
JsonDiff.equals = function(left, right) {
return LineDiff.equals(left.replace(/,([\r\n])/g, '$1'), right.replace(/,([\r\n])/g, '$1'));
};

var objectPrototypeToString = Object.prototype.toString;

// This function handles the presence of circular references by bailing out when encountering an
// object that is already on the "stack" of items being processed.
function canonicalize(obj, stack, replacementStack) {
stack = stack || [];
replacementStack = replacementStack || [];

var i;

for (var i = 0 ; i < stack.length ; i += 1) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this loop doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks that the item being canonicalized is not already on the stack (ie. that it's working on a circular structure). This avoids infinite recursion, and it's safe to bail out because that object will canonicalized eventually.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought but was not certain. A few things here:

  1. Can we have a test for the circular object handling?
  2. Shouldn't this be returning the canonicalized object value, not the original value (or perhaps some other "[circular]" indicator). Right now it seems like this only defers the circular concerns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sure, in d786a63 I added an implicit test that makes sure canonicalize doesn't result in an infinite loop by checking that JSON.stringify throws an error.
  2. Hmm, I you're right. It's a bit of a moot point as circular references will cause JSON.stringify to error out immediately after, but looking at canonicalize isolated I agree that it's the wrong behavior. Will try to find a fix for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to find a fix for that.

Fixed in d12c46d and added tests. The tests required me to expose canonicalize as part of the API.

if (stack[i] === obj) {
return replacementStack[i];
}
}

var canonicalizedObj;

if ('[object Array]' === objectPrototypeToString.call(obj)) {
stack.push(obj);
canonicalizedObj = new Array(obj.length);
replacementStack.push(canonicalizedObj);
for (i = 0 ; i < obj.length ; i += 1) {
canonicalizedObj[i] = canonicalize(obj[i], stack, replacementStack);
}
stack.pop();
replacementStack.pop();
} else if (typeof obj === 'object' && obj !== null) {
stack.push(obj);
canonicalizedObj = {};
replacementStack.push(canonicalizedObj);
var sortedKeys = [];
for (var key in obj) {
sortedKeys.push(key);
}
sortedKeys.sort();
for (i = 0 ; i < sortedKeys.length ; i += 1) {
var key = sortedKeys[i];
canonicalizedObj[key] = canonicalize(obj[key], stack, replacementStack);
}
stack.pop();
replacementStack.pop();
} else {
canonicalizedObj = obj;
}
return canonicalizedObj;
};

return {
Diff: Diff,

Expand All @@ -191,6 +251,13 @@ var JsDiff = (function() {
diffWordsWithSpace: function(oldStr, newStr) { return WordWithSpaceDiff.diff(oldStr, newStr); },
diffLines: function(oldStr, newStr) { return LineDiff.diff(oldStr, newStr); },

diffJson: function(oldObj, newObj) {
return JsonDiff.diff(
typeof oldObj === 'string' ? oldObj : JSON.stringify(canonicalize(oldObj), undefined, " "),
typeof newObj === 'string' ? newObj : JSON.stringify(canonicalize(newObj), undefined, " ")
);
},

diffCss: function(oldStr, newStr) { return CssDiff.diff(oldStr, newStr); },

createPatch: function(fileName, oldStr, newStr, oldHeader, newHeader) {
Expand Down Expand Up @@ -360,7 +427,9 @@ var JsDiff = (function() {
ret.push([(change.added ? 1 : change.removed ? -1 : 0), change.value]);
}
return ret;
}
},

canonicalize: canonicalize
};
})();

Expand Down
37 changes: 37 additions & 0 deletions test/canonicalize.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
const VERBOSE = false;

var diff = require('../diff');

function getKeys(obj) {
var keys = [];
for (var key in obj) {
if (obj.hasOwnProperty(key)) {
keys.push(key);
}
}
return keys;
}

describe('#canonicalize', function() {
it('should put the keys in canonical order', function() {
getKeys(diff.canonicalize({b: 456, a: 123})).should.eql(['a', 'b']);
});

it('should dive into nested objects', function() {
var canonicalObj = diff.canonicalize({b: 456, a: {d: 123, c: 456}});
getKeys(canonicalObj.a).should.eql(['c', 'd']);
});

it('should dive into nested arrays', function() {
var canonicalObj = diff.canonicalize({b: 456, a: [789, {d: 123, c: 456}]});
getKeys(canonicalObj.a[1]).should.eql(['c', 'd']);
});

it('should handle circular references correctly', function() {
var obj = {b: 456};
obj.a = obj;
var canonicalObj = diff.canonicalize(obj);
getKeys(canonicalObj).should.eql(['a', 'b']);
getKeys(canonicalObj.a).should.eql(['a', 'b']);
});
});
60 changes: 60 additions & 0 deletions test/diffTest.js
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,66 @@ describe('#diffLines', function() {
});
});

describe('#diffJson', function() {
it('should accept objects', function() {
diff.diffJson(
{a: 123, b: 456, c: 789},
{a: 123, b: 456}
).should.eql([
{ value: '{\n "a": 123,\n "b": 456,\n', added: undefined, removed: undefined },
{ value: ' "c": 789\n', added: undefined, removed: true },
{ value: '}', added: undefined, removed: undefined }
]);
});

it('should accept objects with nested structures', function() {
diff.diffJson(
{a: 123, b: 456, c: [1, 2, {foo: 'bar'}, 4]},
{a: 123, b: 456, c: [1, {foo: 'bar'}, 4]}
).should.eql([
{ value: '{\n "a": 123,\n "b": 456,\n "c": [\n 1,\n', added: undefined, removed: undefined },
{ value: ' 2,\n', added: undefined, removed: true },
{ value: ' {\n "foo": "bar"\n },\n 4\n ]\n}', added: undefined, removed: undefined }
]);
});

it('should accept already stringified JSON', function() {
diff.diffJson(
JSON.stringify({a: 123, b: 456, c: 789}, undefined, " "),
JSON.stringify({a: 123, b: 456}, undefined, " ")
).should.eql([
{ value: '{\n "a": 123,\n "b": 456,\n', added: undefined, removed: undefined },
{ value: ' "c": 789\n', added: undefined, removed: true },
{ value: '}', added: undefined, removed: undefined }
]);
});

it('should ignore trailing comma on the previous line when the property has been removed', function() {
var diffResult = diff.diffJson(
{a: 123, b: 456, c: 789},
{a: 123, b: 456});
diff.convertChangesToXML(diffResult).should.equal('{\n &quot;a&quot;: 123,\n &quot;b&quot;: 456,\n<del> &quot;c&quot;: 789\n</del>}');
});

it('should ignore the missing trailing comma on the last line when a property has been added after it', function() {
var diffResult = diff.diffJson(
{a: 123, b: 456},
{a: 123, b: 456, c: 789});
diff.convertChangesToXML(diffResult).should.equal('{\n &quot;a&quot;: 123,\n &quot;b&quot;: 456,\n<ins> &quot;c&quot;: 789\n</ins>}');
});

it('should throw an error if one of the objects being diffed has a circular reference', function() {
var circular = {foo: 123};
circular.bar = circular;
(function () {
diff.diffJson(
circular,
{foo: 123, bar: {}}
);
}).should.throw('Converting circular structure to JSON');
});
});

describe('convertToDMP', function() {
it('should output diff-match-patch format', function() {
var diffResult = diff.diffWords('New Value ', 'New ValueMoreData ');
Expand Down