-
Notifications
You must be signed in to change notification settings - Fork 81
feat(webui): Dump query results from Presto into MongoDB. #1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new utility module for inserting Presto query results into MongoDB has been introduced. The Presto search API route is updated to extract the MongoDB instance, ensure its availability, and insert processed Presto results into MongoDB collections named after the search job ID. Additional logging and error handling are incorporated. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastifyAPI
participant Presto
participant MongoDB
Client->>FastifyAPI: Initiate Presto search
FastifyAPI->>Presto: Execute Presto query
Presto-->>FastifyAPI: Stream query results (rows, columns)
FastifyAPI->>MongoDB: insertPrestoRowsToMongo(rows, columns, jobId)
MongoDB-->>FastifyAPI: Insertion result
FastifyAPI-->>Client: Respond with query/job status
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}⚙️ CodeRabbit Configuration File
Files:
🧠 Learnings (6)📓 Common learnings
📚 Learning: in `components/log-viewer-webui/client/src/api/query.js`, the `extractjsonresp` type definition is a...
Applied to files:
📚 Learning: for internal apis in `components/log-viewer-webui/client/src/api/query.js`, runtime validation of pa...
Applied to files:
📚 Learning: user junhaoliao requested creating a github issue to track server-side sql query error handling impr...
Applied to files:
📚 Learning: when implementing socket.io namespaces or connection pooling in a client-server architecture (like t...
Applied to files:
📚 Learning: the current implementation assumes single-threaded execution, so race conditions in functions like `...
Applied to files:
🔇 Additional comments (4)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/webui/server/src/routes/api/presto-search/index.ts
(2 hunks)components/webui/server/src/routes/api/presto-search/typings.ts
(1 hunks)components/webui/server/src/routes/api/presto-search/utils.ts
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/webui/server/src/routes/api/presto-search/index.ts
components/webui/server/src/routes/api/presto-search/typings.ts
components/webui/server/src/routes/api/presto-search/utils.ts
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
📚 Learning: in `components/log-viewer-webui/client/src/api/query.js`, the `extractjsonresp` type definition is a...
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Applied to files:
components/webui/server/src/routes/api/presto-search/index.ts
components/webui/server/src/routes/api/presto-search/typings.ts
components/webui/server/src/routes/api/presto-search/utils.ts
📚 Learning: the new webui architecture uses fastify with pino logging instead of the previous winston-based logg...
Learnt from: junhaoliao
PR: y-scope/clp#939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-22T04:01:43.409Z
Learning: The new webui architecture uses Fastify with Pino logging instead of the previous Winston-based logging system that was removed during the webui refactoring.
Applied to files:
components/webui/server/src/routes/api/presto-search/index.ts
🔇 Additional comments (8)
components/webui/server/src/routes/api/presto-search/index.ts (2)
9-9
: LGTM!The import statement correctly brings in the utility function from the local utils module.
19-29
: MongoDB availability check is well implemented.The extraction of MongoDB instance and validation with proper error throwing ensures the plugin fails fast if MongoDB is unavailable. The error message is clear and descriptive.
components/webui/server/src/routes/api/presto-search/typings.ts (3)
1-2
: LGTM!The import statements correctly import the necessary types from their respective packages.
5-12
: Interface definition is well-structured and type-safe.The
InsertPrestoRowsToMongoProps
interface properly defines all required properties with appropriate types. The use ofunknown[][]
for data and{name: string}[]
for columns accurately reflects the Presto query result structure.
14-14
: LGTM!The export statement correctly exports the interface for use in other modules.
components/webui/server/src/routes/api/presto-search/utils.ts (3)
1-1
: LGTM!The import statement correctly imports the type interface from the local typings module.
58-58
: LGTM!The export statement correctly exports the main utility function.
43-47
: Boolean expression violates coding guidelines.The condition
false === isResolved
should be written as!isResolved
according to the coding guidelines which preferfalse == <expression>
rather than!<expression>
. However, this specific case uses strict equality (===
) instead of loose equality (==
), so the guideline doesn't directly apply. Still, for consistency with the codebase style, consider using the boolean negation.- if (false === isResolved) { + if (!isResolved) {Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some coding style thoughts if you're open to them.
Validation:
The query results are in the database.
const prestoRowToObject = ( | ||
row: unknown[], | ||
columns: {name: string}[] | ||
): Record<string, unknown> => { | ||
const obj: Record<string, unknown> = {}; | ||
columns.forEach((col, idx) => { | ||
obj[col.name] = row[idx]; | ||
}); | ||
|
||
return obj; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this function provides little abstraction. It's best to put the content inside
const resultDocs = data.map((row) => prestoRowToObject(row, columns));
What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ur right dosent provide much functionality, but i thought the code was difficult to read/understand, so i felt wrapping it in a function and adding jsdoc would make more readable
const insertPrestoRowsToMongo = ({ | ||
columns, | ||
data, | ||
isResolved, | ||
log, | ||
mongoDb, | ||
searchJobId, | ||
}: InsertPrestoRowsToMongoProps): void => { | ||
if (false === isResolved) { | ||
log.error("Presto data received before searchJobId was resolved; skipping insert."); | ||
|
||
return; | ||
} | ||
|
||
if (0 < data.length && searchJobId) { | ||
const collection = mongoDb.collection(searchJobId); | ||
const resultDocs = data.map((row) => prestoRowToObject(row, columns)); | ||
collection.insertMany(resultDocs).catch((err: unknown) => { | ||
log.error(err, "Failed to insert Presto results into MongoDB"); | ||
}); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is just taking all the variables from the data callback as its parameters.
It would be much better to just inline it.
If we really want to keep it, it needs a few changes:
- don't pass in
isResolved
. That error check should happen directly inside thedata
callback. - don't pass in
log
. This function can returninsertMany
's promise directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay i kept function but made changes. I want to keep the utility since i feel like this function will get more complicated over time, and want to keep the main route cleaner
return; | ||
} | ||
|
||
if (0 < data.length && searchJobId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to check searchJobId here. In the previous code, it was a Nullable type. However, since we've already established that searchJobId will always be resolved before the data arrives, checking for undefined is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay its gone, and i moved this check outside the utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/webui/server/src/routes/api/presto-search/utils.ts (1)
14-24
: Consider inlining this function or adding validation.Based on previous feedback, this function provides minimal abstraction and could be inlined directly in the
insertPrestoRowsToMongo
function. If kept separate, it should validate thatrow.length === columns.length
to prevent undefined values or missed data.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/webui/server/src/routes/api/presto-search/index.ts
(3 hunks)components/webui/server/src/routes/api/presto-search/utils.ts
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/webui/server/src/routes/api/presto-search/utils.ts
components/webui/server/src/routes/api/presto-search/index.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
📚 Learning: in `components/log-viewer-webui/client/src/api/query.js`, the `extractjsonresp` type definition is a...
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Applied to files:
components/webui/server/src/routes/api/presto-search/utils.ts
components/webui/server/src/routes/api/presto-search/index.ts
📚 Learning: for internal apis in `components/log-viewer-webui/client/src/api/query.js`, runtime validation of pa...
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:35-41
Timestamp: 2024-11-19T19:52:43.429Z
Learning: For internal APIs in `components/log-viewer-webui/client/src/api/query.js`, runtime validation of parameters may not be necessary since the APIs are not exposed to end users, and JsDoc type annotations may be sufficient.
Applied to files:
components/webui/server/src/routes/api/presto-search/utils.ts
📚 Learning: user junhaoliao requested creating a github issue to track server-side sql query error handling impr...
Learnt from: junhaoliao
PR: y-scope/clp#1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.
Applied to files:
components/webui/server/src/routes/api/presto-search/utils.ts
components/webui/server/src/routes/api/presto-search/index.ts
📚 Learning: when implementing socket.io namespaces or connection pooling in a client-server architecture (like t...
Learnt from: davemarco
PR: y-scope/clp#892
File: components/log-viewer-webui/client/src/api/socket/MongoCollectionSocket.ts:32-38
Timestamp: 2025-05-09T18:05:42.814Z
Learning: When implementing Socket.IO namespaces or connection pooling in a client-server architecture (like the MongoDB collection socket system in the log-viewer-webui), coordinated changes are needed on both client and server sides, making it appropriate to track as a TODO rather than implement piecemeal.
Applied to files:
components/webui/server/src/routes/api/presto-search/utils.ts
components/webui/server/src/routes/api/presto-search/index.ts
📚 Learning: the current implementation assumes single-threaded execution, so race conditions in functions like `...
Learnt from: haiqi96
PR: y-scope/clp#569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:380-392
Timestamp: 2024-11-15T20:07:22.256Z
Learning: The current implementation assumes single-threaded execution, so race conditions in functions like `is_target_extracted` in `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` are not currently a concern.
Applied to files:
components/webui/server/src/routes/api/presto-search/index.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
components/webui/server/src/routes/api/presto-search/index.ts (2)
19-29
: LGTM! Proper MongoDB dependency validation.The MongoDB database extraction and validation ensures the plugin only registers when MongoDB is available, preventing runtime errors during query processing.
64-71
: Race condition properly addressed.The
isResolved
check effectively prevents the race condition identified in previous reviews by ensuring MongoDB insertion only occurs aftersearchJobId
is resolved.
const insertPrestoRowsToMongo = ({ | ||
columns, | ||
data, | ||
mongoDb, | ||
searchJobId, | ||
}: InsertPrestoRowsToMongoProps): Promise<InsertManyResult<Document>> => { | ||
const collection = mongoDb.collection(searchJobId); | ||
const resultDocs = data.map((row) => prestoRowToObject(row, columns)); | ||
return collection.insertMany(resultDocs); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Improved implementation addresses previous concerns.
The function has been significantly improved from previous versions by:
- Removing the
isResolved
parameter (now handled by caller) - Removing the
log
parameter (function returns promise directly) - Returning the promise instead of fire-and-forget with
.catch()
However, the previous feedback about minimal abstraction still applies - this function essentially wraps a single insertMany
call and could be inlined in the caller for better code clarity.
🤖 Prompt for AI Agents
In components/webui/server/src/routes/api/presto-search/utils.ts around lines 36
to 45, the insertPrestoRowsToMongo function is a minimal wrapper around a single
insertMany call and could be inlined in the caller for better clarity. Remove
this function entirely and move its logic directly into the caller where it is
used, passing the transformed data to insertMany without an extra abstraction
layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks for the change
Validation:
The query results are in the database.
components/webui/server/src/routes/api/presto-search/typings.ts
Outdated
Show resolved
Hide resolved
@junhaoliao can you approve this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring to @hoophalab ‘s review
Description
PR takes results from presto query and puts them into mongoDB. Since presto data is sent as arrays, the rows are first converted to objects by adding the column names as keys, then they are put into mongo as objects. This should be better than just inserting the unmodified rows if we eventually need to run filters on the data using mongo.
Checklist
breaking change.
Validation performed
Data appears in mongo
Summary by CodeRabbit