Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: query bugs with buffer #25213

Merged
merged 4 commits into from
Aug 7, 2024
Merged

fix: query bugs with buffer #25213

merged 4 commits into from
Aug 7, 2024

Conversation

pauldix
Copy link
Member

@pauldix pauldix commented Aug 5, 2024

This fixes three different bugs with the buffer. First was that aggregations would fail because projection was pushed down to the in-buffer data that de-duplication needs to be called on. The test in influxdb3/tests/server/query.rs catches that.

I also added a test in write_buffer/mod.rs to ensure that data is correctly queryable when combining with different states: only data in buffer, only data in parquet files, and data across both. This showed two bugs, one where the parquet data was being doubled up (parquet chunks were being created in write buffer mod and in queryable buffer. The second was that the timestamp min max on table buffer would panic if the buffer was empty.

Closes #25206, #25215

@pauldix pauldix added the v3 label Aug 5, 2024
Copy link
Contributor

@mgattozzi mgattozzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are fine, but I had a suggestion/question around projection

@@ -68,39 +66,19 @@ impl QueryableBuffer {
db_schema: Arc<DatabaseSchema>,
table_name: &str,
filters: &[Expr],
projection: Option<&Vec<usize>>,
_projection: Option<&Vec<usize>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need this anymore? If so we should remove it unless we plan to do projection again here, but it seems like we don't want too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to at some point. That is part of the signature for the ChunkContainer that the write buffer implements. Ideally, we'd use the passed in projection and have all record batches include all tags/keys, time and any fields in the projection. All other fields can be bypassed. I've logged #25220 to track that work.

Copy link
Contributor

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One discretionary comment - otherwise LGTM.

Comment on lines 798 to 818
let object_store: Arc<dyn ObjectStore> = Arc::new(InMemory::new());
let persister = Arc::new(PersisterImpl::new(Arc::clone(&object_store)));
let time_provider = Arc::new(MockProvider::new(Time::from_timestamp_nanos(0)));
let level_0_duration = Level0Duration::new_5m();
let write_buffer = WriteBufferImpl::new(
Arc::clone(&persister),
Arc::clone(&time_provider),
level_0_duration,
crate::test_help::make_exec(),
WalConfig {
level_0_duration: Duration::from_secs(60),
max_write_buffer_size: 100,
flush_interval: Duration::from_millis(10),
snapshot_size: 2,
},
)
.await
.unwrap();
let session_context = IOxSessionContext::with_testing();
let runtime_env = session_context.inner().runtime_env();
register_iox_object_store(runtime_env, "influxdb3", Arc::clone(&object_store));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a setup helper below that does all of these steps, although, to use it here, looks like you would need to parameterize the different parts of the WalConfig in the function signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterized in 1eed007

@pauldix pauldix requested a review from mgattozzi August 6, 2024 20:05
This fixes three different bugs with the buffer. First was that aggregations would fail because projection was pushed down to the in-buffer data that de-duplication needs to be called on. The test in influxdb3/tests/server/query.rs catches that.

I also added a test in write_buffer/mod.rs to ensure that data is correctly queryable when combining with different states: only data in buffer, only data in parquet files, and data across both. This showed two bugs, one where the parquet data was being doubled up (parquet chunks were being created in write buffer mod and in queryable buffer. The second was that the timestamp min max on table buffer would panic if the buffer was empty.
Fixes two problems uncovered by adding to the write_buffer/mod.rs test. Ensures we can replay wal data and that snapshots work properly with replayed data.
@pauldix pauldix merged commit 43877be into main Aug 7, 2024
13 checks passed
@pauldix
Copy link
Member Author

pauldix commented Aug 7, 2024

Going to merge this in as @mgattozzi is out today. I can remove the _projection parameter in a follow up PR or do #25220

@pauldix pauldix deleted the pd/fix-query-buffer branch August 7, 2024 20:00
mgattozzi added a commit that referenced this pull request Sep 5, 2024
* fix: wal skip persist and notify if empty buffer (#25211)

* fix: wal skip persist and notify if empty buffer

This fixes the WAL so that it will skip persisting a file and notifying the file notifier if the wal buffer is empty.

* fix: fix last cache persist test

* fix: make ParquetChunk fields and mod chunk pub (#25219)

* fix: make ParquetChunk fields and mod chunk pub

This doesn't affect anything in the OSS version, but these changes are
needed for Pro as part of our compactor work.

* fix: cargo deny failure

* fix: query bugs with buffer (#25213)

* fix: query bugs with buffer

This fixes three different bugs with the buffer. First was that aggregations would fail because projection was pushed down to the in-buffer data that de-duplication needs to be called on. The test in influxdb3/tests/server/query.rs catches that.

I also added a test in write_buffer/mod.rs to ensure that data is correctly queryable when combining with different states: only data in buffer, only data in parquet files, and data across both. This showed two bugs, one where the parquet data was being doubled up (parquet chunks were being created in write buffer mod and in queryable buffer. The second was that the timestamp min max on table buffer would panic if the buffer was empty.

* refactor: PR feedback

* fix: fix wal replay and buffer snapshot

Fixes two problems uncovered by adding to the write_buffer/mod.rs test. Ensures we can replay wal data and that snapshots work properly with replayed data.

* fix: run cargo update to fix audit

* feat: use host identifier prefix in object store paths (#25224)

This enforces the use of a host identifier prefix in all object store
paths (currently, for parquet files, catalog files, and snapshot files).

The persister retains the host identifier prefix, and uses it when
constructing paths.

The WalObjectStore also holds the host identifier prefix, so that it can
use it when saving and loading WAL files.

The influxdb3 binary requires a new argument 'host-id' to be passed that
is used to specify the prefix.

* feat: add `system.parquet_files` table (#25225)

This extends the system tables available with a new `parquet_files` table
which will list the parquet files associated with a given table in a
database.

Queries to system.parquet_files must provide a table_name predicate to
specify the table name of interest.

The files are accessed through the QueryableBuffer.

In addition, a test was added to check success and failure modes of the
new system table query.

Finally, the Persister trait had its associated error type removed. This
was somewhat of a consequence of how I initially implemented this change,
but I felt cleaned the code up a bit, so I kept it in the commit.

* fix: un-pub QueryableBuffer and fix compile errors (#25230)

* refactor: Make Level0Duration part of WAL (#25228)

* refactor: Make Level0Duration part of WAL

I noticed this during some testing and cleanup with other PRs. The WAL had its own level_0_duration and the write buffer had a different one, which would cause some weird problems if they weren't the same. This refactors Level0Duration to be in the WAL and fixes up the tests.

As an added bonus, this surfaced a bug where multiple L0 blocks getting persisted in the same snapshot wasn't supported. So now snapshot details can have many files per table.

* fix: have persisted files always return in descending data time order

* fix: sort record batches for test verification

* fix: main (#25231)

* feat: Add last cache create/delete to WAL (#25233)

* feat: Add last cache create/delete to WAL

This moves the LastCacheDefinition into the WAL so that it can be serialized there. This ended up being a pretty large refactor to get the last cache creation to work through the WAL.

I think I also stumbled on a bug where the last cache wasn't getting initialized from the catalog on reboot so that it wouldn't actually end up caching values. The refactored last cache persistence test in write_buffer/mod.rs surfaced this.

Finally, I also had to update the WAL so that it would persist if there were only catalog updates and no writes.

Fixes #25203

* fix: typos

* feat: Catalog apply_catalog_batch only updates if new (#25236)

* feat: Catalog apply_catalog_batch only updates if new

This updates the Catalog so that when applying a catalog batch it only updates the inner catalog and bumps the sequence number and updated tracker if there are new updates in the batch. Also does validation that the catalog batch schema is compatible with any existing.

Closes #25205

* feat: only persist catalog when updated (#25238)

* chore: ignore sqlx rustsec advisory (#25252)

* feat: Add FileIndex type to influxdb3_index

This commit does two important things:

1. It creates a new influxdb3_index crate under influxdb3_pro to contain
   all indexing logic and types that we might create for influxdb3_pro
2. Creates our first index type the FileIndex which is part of #20

Note we're starting off with just file ids as this will let us set up
the logic for creating and working with the `FileIndex` inside of the
compactor first. Later we can add row groups as that logic is a bit more
complicated in nature.

The `FileIndex` contains methods to lookup, insert, and delete items
from the index as needed and an associated test to make sure it works as
expected.

Note that the `FileIndex` is meant to have one created for each database
table that has an index created for it. Later on when it's being
integrated into the compactor a `FileIndex` will be returned per
compaction of a given table. We'll later integrate this into the
`WriteBuffer` for querying as well as adding this to the WAL so that
indexes can be recreated as needed.

---------

Co-authored-by: Paul Dix <paul@pauldix.net>
Co-authored-by: Trevor Hilton <thilton@influxdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create integration test for WriteBufferImpl to validate wal flushing and snapshotting
3 participants