-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support system.parquet_files
table
#25002
Conversation
A shell for the `system` table provider was added to the QueryExecutorImpl which currently does not do anything, but will enable us to tie the different system table providers into it. The QueryLog was elevated from the `Database`, i.e., namespace provider, to the QueryExecutorImpl, so that it lives accross queries.
The system.queries table is now accessible, when queries are initiated in debug mode, which is not currently enabled via the HTTP API, therefore this is not yet accessible unless via the gRPC interface. The system.queries table lists all queries in the QueryLog on the QueryExecutorImpl.
8bcfeec
to
457e7d1
Compare
457e7d1
to
f85fa78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SegmentState in the write buffer actually has the information for the last 1,000 persisted segments and their parquet files: https://github.com/influxdata/influxdb/blob/main/influxdb3_write/src/write_buffer/segment_state.rs#L45. This is populated on startup.
It may be a good idea to pull the parquet files information from there, rather than the persister/object store. When the buffer gets the ability to persist some table data ahead of the segment rolling over, that state will be kept there in memory and won't show up in a segment info file until after the segment rolls over.
I think I'll be refactoring the structure of SegmentState when doing that. So maybe wait until after that to change this?
That makes sense and seems simpler. Giving the system table provider read access to the influxdb/influxdb3_write/src/write_buffer/segment_state.rs Lines 216 to 232 in 2381cc6
I can at least try that out and flag if there is an issue with doing so. |
@pauldix - after taking a quick look at using the // T -> TimeProvider
// W -> Wal
struct SegmentState<T, W> {
/* ... */
} The reason it gets messy is because I need to get access via the trait Bufferer {
/* ... */
fn persister(&self) -> Arc<dyn Persister>;
} If you are re-thinking the segment state, it may be helpful to have a trait, e.g., trait SegmentStateProvider {
fn get_parquet_files(&self) -> Vec<ParquetFile>;
/* Other methods you are considering for this API */
} Then that would be more easily extracted via the I can leave this for now, since you're planning to refactor, don't want to make doing so more difficult with changes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM code wise. I'll defer to @pauldix though if there's anything else we need to get into this PR, but from a code perspective it looks great.
Planning to wait on #25144 and any related refactoring to the |
Closing in favour of #25225 |
This PR is intended to follow #25000.
Extend system table support to allow queries against
system.parquet_files
, a new system table that allows queries for parquet files associated with a given database and table.Currently, the query will only look back on 1,000 segments to get associated parquet files, we may want to adjust that, or make it more flexible.
Two unit tests were added to
query_executor
for this, instead of E2E integration tests, as more control was needed over file persistence than what the E2E test harness provides.Example
The
system.parquet_files
table can be queried like so:And responds accordingly:
Other Changes
Another change in this PR was to go back to using dynamic dispatch for passing around the
Persister
. The associated types and generics made using thePersister
a pain in theParquetFilesTable
implementation, so I removed those in favour of usingArc<dyn Persister>
, as we had been doing previously (see 42d36d5).Closes #24988