-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta diff: client side implementation of an expanded delta lake diff #5232
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and reacty.
Have some comments.
webui/src/lib/api/index.js
Outdated
@@ -477,6 +477,20 @@ class Repositories { | |||
throw new Error(await extractError(response)); | |||
} | |||
} | |||
|
|||
async otfDiff(repoId, leftRef, rightRef, table_path = "", type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async otfDiff(repoId, leftRef, rightRef, table_path = "", type) { | |
async otfDiff(repoId, leftRef, rightRef, tablePath = "", type) { |
Not a fan of pushing commented out code, but if you're going with it, the argument should be commented out - or used in some fake way - or given as _.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed naming, as for commenting out the code - i'm going to change in literally a few days so I think it is ok to keep it this way for now,
webui/src/styles/globals.css
Outdated
|
||
.table-operation-expansion { | ||
color: black; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// The list of available operations is based on: https://docs.databricks.com/delta/history.html#operation-metrics-keys | ||
const deltaLakeOperationToDiffType = new Map([ | ||
["WRITE" , DiffType.Added], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like server side logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a Map and not a simple JS object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we've agreed that the type of diff will be returned from lakeFS within the response.
Created a PR to reflect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnnyaug good call, thanks! changed the client to use the operation type returned from the response
const diffType = deltaLakeOperationToDiffType.get(operation); | ||
return diffType !== undefined ? diffType : DiffType.Changed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const diffType = deltaLakeOperationToDiffType.get(operation); | |
return diffType !== undefined ? diffType : DiffType.Changed; | |
return deltaLakeOperationToDiffType.get(operation) || DiffType.Changed; |
(and then this function is not really needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
]); | ||
|
||
export const TableDiff = ({repo, leftRef, rightRef, tablePath}) => { | ||
let { error, loading, response } = useAPI(() => repositories.otfDiff(repo.id, leftRef, rightRef, tablePath, OtfType.Delta), []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point this element becomes a DeltaDiff rather than a TableDiff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, renamed accordingaly
<td className="pl-lg-10 col-10 table-operation-details"> | ||
<strong>Timestamp:</strong> {operationTimestamp} | ||
<br/> | ||
<strong>Commit Info:</strong> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the commit info here? Is it a lakeFS commit? (I hope so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit info is the terminology used in the delta lake transaction log. see the definition in https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html
Commit info - Contains information around the commit, which operation was made, from where and at what time.
const JSONContentStr = JSON.stringify(content, null, 2); | ||
const JSONContent = JSON.parse(JSONContentStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of these 2 lines? It looks like a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is not needed, because I'm parsing here the following object included in the otfDiff response:
operation_content:
type: object
description: free form content describing the returned operation diff
and the object is a map of key-value pairs, so I don't need to convert it into a json first.
@Jonathan-Rosenberg should we clarify it in API doc? in it not relly a free form but a collection of key-value pairs.
parsedContent += `${key}: ` | ||
let parsedVal = ""; | ||
try { | ||
parsedVal = JSON.parse(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will look like [object Object]
if val
is parsed to a Javascript object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, it looks like this -
"[\"(spark_catalog.delta.lakefs://meetup-repo/experiment-2-branch/raw/.
rating < 4.0D) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\"]"
parsed into
predicate: (spark_catalog.delta.lakefs://meetup-repo/experiment-2-branch/raw/.
rating < 4.0D) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
did you mean something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code takes the result of JSON.parse and concatenates it with a string, consequently calling toString
on the result, which returns [object Object]
for a POJO.
function parseOperationContent(content) { | ||
let parsedContent = ""; | ||
const JSONContentStr = JSON.stringify(content, null, 2); | ||
const JSONContent = JSON.parse(JSONContentStr); | ||
for (let key in JSONContent) { | ||
const val = JSONContent[key]; | ||
parsedContent += `${key}: ` | ||
let parsedVal = ""; | ||
try { | ||
parsedVal = JSON.parse(val); | ||
} catch (err) { | ||
parsedVal = val; | ||
} | ||
parsedContent += parsedVal + "\n"; | ||
} | ||
return parsedContent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separation of concerns - convert the object to what you want, handle the printing/formatting elsewhere.
function parseOperationContent(content) { | |
let parsedContent = ""; | |
const JSONContentStr = JSON.stringify(content, null, 2); | |
const JSONContent = JSON.parse(JSONContentStr); | |
for (let key in JSONContent) { | |
const val = JSONContent[key]; | |
parsedContent += `${key}: ` | |
let parsedVal = ""; | |
try { | |
parsedVal = JSON.parse(val); | |
} catch (err) { | |
parsedVal = val; | |
} | |
parsedContent += parsedVal + "\n"; | |
} | |
return parsedContent; | |
function safeJsonParse(val) { | |
try { | |
return JSON.parse(val); | |
} | |
catch (err) { | |
return val; | |
} | |
} | |
function safeJsonParseValues(obj) { | |
Object.keys(obj).forEach(k => { | |
obj[k] = safeJsonParse(obj[k]); | |
}) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -195,7 +207,9 @@ export const ChangesTreeContainer = ({results, showExperimentalDeltaDiffButton = | |||
</span> | |||
</Card.Header> | |||
<Card.Body> | |||
<Table borderless size="sm"> | |||
{tableDiffState.isExpanded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we went with separate Table/Object diff screens (or am I reading the code wrong?).
{tableDiffState.isExpanded | |
{tableDiffState.isShown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you are good, done thx
webui/src/constants.ts
Outdated
Added = "added", | ||
Removed = "removed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added = "added", | |
Removed = "removed", | |
Created = "created", | |
Dropped = "dropped", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, I changed the enum to represent table operation types to be consistent with the server
webui/src/lib/api/index.js
Outdated
// } | ||
// return response.json(); | ||
// const mockRes = '{"results": []}' | ||
const mockRes = '{"results": [{"version": "1", "timestamp": 1515491537026, "operation": "INSERT", "operation_content": {"mode": "Append","partitionBy": "[]"}}, {"version": "2", "timestamp": 1515491537346, "operation": "DELETE", "operation_content": {"mode": "Append","partitionBy": "[]"}}, {"version": "14", "timestamp": 1674393286223, "operation": "DELETE", "operation_content": {"predicate": "[\\"(spark_catalog.delta.lakefs://meetup-repo/experiment-2-branch/raw/.`rating` < 4.0D) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\"]"}}]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're missing DiffType
:
const mockRes = '{"results": [{"version": "1", "timestamp": 1515491537026, "operation": "INSERT", "operation_content": {"mode": "Append","partitionBy": "[]"}}, {"version": "2", "timestamp": 1515491537346, "operation": "DELETE", "operation_content": {"mode": "Append","partitionBy": "[]"}}, {"version": "14", "timestamp": 1674393286223, "operation": "DELETE", "operation_content": {"predicate": "[\\"(spark_catalog.delta.lakefs://meetup-repo/experiment-2-branch/raw/.`rating` < 4.0D) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\"]"}}]}' | |
const mockRes = '{"diff_type": "changed", "results": [{"id": "1", "timestamp": 1515491537026, "operation": "INSERT", "operation_content": {"mode": "Append","partitionBy": "[]"}}, {"version": "2", "timestamp": 1515491537346, "operation": "DELETE", "operation_content": {"mode": "Append","partitionBy": "[]"}}, {"version": "14", "timestamp": 1674393286223, "operation": "DELETE", "operation_content": {"predicate": "[\\"(spark_catalog.delta.lakefs://meetup-repo/experiment-2-branch/raw/.`rating` < 4.0D) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\"]"}}]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added thx
Linked Issue
Closes #5001
Change Description
New Feature
Add a GUI for showing the diff between two delta lake tables. the delta lake table diff view shows the diff for a single table at each point and colors the operations according to their type.
Testing Details
Tested manually the compare, commits and uncommitted changes views.



see attached screenshots:
Additional info