Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify/sync dump with authority store #3494

Merged
merged 8 commits into from
Jul 26, 2022
Merged

Conversation

oxade
Copy link
Contributor

@oxade oxade commented Jul 26, 2022

This PR is a follow up of original tool which makes it such that the DB debug tool code stays in sync with the authority store codebase. See
Due to demand for DB debugging, I will add dump as a native feature of typed_store, but in the meantime, this PR keeps things in order
I also added a test which ensures that every table in the authority store can indeed be debugged. This way
Cli behavior of the tool is unchanged.

Next steps:

  • Checkpoint and lock service dump
  • Use procedural macro to autogenerate debug features for dumping. This will reduce code duplication, but requires a bit more work.
  • Implement robust debugging in typed_store to reduce code duplication/re-implementation in Sui

// TODO: condense with macros
pub fn dump(&self, table_name: &str) -> anyhow::Result<BTreeMap<String, String>> {
Ok(match table_name {
OBJECTS_TABLE_NAME => {
Copy link
Contributor Author

@oxade oxade Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next steps: These will be auto-generated by a procedural macro for any table which used DBMap.

use crate::db_tool::db_dump::{dump_table, list_tables};

#[tokio::test]
async fn db_dump_population() -> Result<(), anyhow::Error> {
Copy link
Contributor Author

@oxade oxade Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This test ensures that any table in the data store can be dumped, which prevents folks from adding tables without adding them to the dump list. Once we auto-gen, we will not need this test

let executed_sequence =
DBMap::reopen(&db, Some(EXEC_SEQ_TABLE_NAME)).expect("Cannot open CF.");

let (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This opening and naming pattern is used in many places around the code. Investigating macros we can provide from mysten-infra to reduce the noise

@oxade oxade enabled auto-merge (squash) July 26, 2022 06:32
@oxade oxade disabled auto-merge July 26, 2022 06:46
@oxade oxade requested a review from patrickkuo July 26, 2022 16:05
@lxfind
Copy link
Contributor

lxfind commented Jul 26, 2022

Thanks. How does open_cf_opts_secondary work? Does it eventually make a clone of the entire db on the fly and then read from it?

@oxade
Copy link
Contributor Author

oxade commented Jul 26, 2022

Thanks. How does open_cf_opts_secondary work? Does it eventually make a clone of the entire db on the fly and then read from it?

No it does not clone the whole db. It reads the files in the primary db path in a lock-free mode, but it additionally needs to store some a few info logs of its own. These are stored in the secondary path. To keep in sync with primary data changes, we use a catchup logic to apply the primary WAL to the secondary's memtables upon every dump request. This is because the secondary cannot see the primary's memtables as it's in a different memory space.
So the only things copied are info logs to storage, and some memtables to RAM.

Copy link
Contributor

@velvia velvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I realize now that this change is basically

  1. Porting the older db_dump code into a store_tables
  2. and separating out store_tables from authority_store.

I have two thoughts:

  • Sooner or later, I'm going to refactor most of these tables into separate stores like an ObjectStore, CertificateStore etc. So, while this move of tables to a separate StoreTables is functionally fine, I wonder if it would be better to minimize the changes now and keep dump() where it is without moving all the tables, as stuff will get refactored later anyways
  • See my other comment but it wouldn't be much more work to create a trait like DumpableDBMap which would make the code much cleaner without having to match on every table. I think it could be doable without macros....

Anyways my request for change mainly has to do with my other comment about limits.

OBJECTS_TABLE_NAME => {
self.objects.try_catch_up_with_primary()?;
self.objects
.iter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put a limit here. This will work today, but when we have millions of TXes, that BTreeMap you're building will quickly run out of memory. I'd rather we quickly change the limits than build something which does not have limits and may OOM.

Also, it wouldn't be much work to put in a trait like DumpableDBMap in typed_store crate which implements a Dump function for each key and value. Then, we would not need the match, and any table that implements DumpableDBMap could easily be dumped. If the API is good, we also would not need to dump everything out.

Copy link
Contributor Author

@oxade oxade Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put a limit here. This will work today, but when we have millions of TXes, that BTreeMap you're building will quickly run out of memory. I'd rather we quickly change the limits than build something which does not have limits and may OOM.

Also, it wouldn't be much work to put in a trait like DumpableDBMap in typed_store crate which implements a Dump function for each key and value. Then, we would not need the match, and any table that implements DumpableDBMap could easily be dumped. If the API is good, we also would not need to dump everything out.

For the trait, yes that's the plan. This will be introduced as we add support to more

Ok, so I realize now that this change is basically

  1. Porting the older db_dump code into a store_tables
  2. and separating out store_tables from authority_store.

I have two thoughts:

  • Sooner or later, I'm going to refactor most of these tables into separate stores like an ObjectStore, CertificateStore etc. So, while this move of tables to a separate StoreTables is functionally fine, I wonder if it would be better to minimize the changes now and keep dump() where it is without moving all the tables, as stuff will get refactored later anyways
  • See my other comment but it wouldn't be much more work to create a trait like DumpableDBMap which would make the code much cleaner without having to match on every table. I think it could be doable without macros....

Anyways my request for change mainly has to do with my other comment about limits.

The reason for this code change is to reduce code duplication. There are many improvements to be made, but this PR addresses just a slither.
I agree we should factor out the stores. But for now we can go incrementally rather than wait for everything.

Regarding using a trait, yes that was the initial plan and it's a good idea but it still puts dump logic in Sui. I decided this is a feature we will want in narhwal too so best if its part of typedstore natively

Keep in mind this PR is not the final form of the debug utility. It is merely a first evolution in our ability to support debugging features.

Copy link
Contributor Author

@oxade oxade Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For limits, you want us to limit how much data we dump?
it makes sense but we're not at that point yet where its a concern I think
I can rather add pagination in the next PR, but this is not the core of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just we want to avoid having data structures that can grow unbounded as a principle.
Just .iter().take(LIMIT) should suffice for now.

We can discuss pagination later, agreed that should be pushed off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will be used to track the size limit improvement
#3532

oxade and others added 2 commits July 26, 2022 17:32
Co-authored-by: Lu Zhang <8418040+longbowlu@users.noreply.github.com>
@oxade oxade requested a review from velvia July 26, 2022 21:45
@oxade oxade enabled auto-merge (squash) July 26, 2022 21:50
Copy link
Contributor

@velvia velvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxade will create PR with follow on issues

@oxade oxade merged commit 855f882 into main Jul 26, 2022
@oxade oxade deleted the unify_dump_with_authority_store branch July 26, 2022 22:28
@oxade
Copy link
Contributor Author

oxade commented Jul 26, 2022

@oxade will create PR with follow on issues

#3532

punwai pushed a commit that referenced this pull request Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants