Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Implement BaseMetastoreCatalog.registerTable() #5037

Merged
merged 12 commits into from
Jul 22, 2022

Conversation

Mehul2500
Copy link
Contributor

@Mehul2500 Mehul2500 commented Jun 14, 2022

For migrating tables between different catalogs, need is there for a registerTable() function on destination catalog side.
Thus, planned to implement the registerTable() function in BaseMetastoreCatalog.

Closes #4995

@Mehul2500 Mehul2500 changed the title [WIP] Implementation of BaseMetastoreCatalog.registerTable() Implementation of BaseMetastoreCatalog.registerTable() Jun 16, 2022
@Mehul2500 Mehul2500 changed the title Implementation of BaseMetastoreCatalog.registerTable() Core: Implement BaseMetastoreCatalog.registerTable() Jun 16, 2022
@ajantha-bhat
Copy link
Member

@pvary , @openinx, @szehon-ho, @RussellSpitzer , @rdblue : Can one of the maintainers please approve the first-time contributor flow for this PR?

@@ -356,6 +360,81 @@ public void testListTables() {
Assertions.assertThat(catalog.tableExists(TABLE_IDENTIFIER)).isTrue();
}

private void testRegister(TableIdentifier identifier, String metadataVersionFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we separate these tests for Nessie specific and general ones?
Could we run the "general" tests on every catalog, and move the testRegisterTable, and testRegisterExistingTable from HiveTableTest to this general place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases are specific to Nessie.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the PR moves the HiveCatalog.registerTable to the BaseMetastoreCatalog so it will be available for every Catalog which is inheriting from BaseMetastoreCatalog. So we should have a test for testing all of the catalogs that implement the registerTable that the method is working as expected.
I understand that there are some Nessie specific test cases, but I think the basic functionality should be available and tested for all of the catalogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the said test-cases and would ping you then, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still miss:

  • DynamoDbCatalog
  • GlueCatalog

I do not like the duplicated test code, and I think it would be good to have a general Catalog API test for checking the functionality of the new Catalog implementations. Something like TestCatalog but for all/most of the catalog API methods. We can do it in a different PR, but it would be good to have them. The test for registerTable() is a good candidate for these common tests.

Copy link
Contributor Author

@Mehul2500 Mehul2500 Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put in test cases for Glue and DynamoDb as "Ignore" because I currently do not have an AWS account for testing them. I working on it.
Also, I would address the general Catalog API test in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got my AWS account setup, and have amended the test cases in Glue and DynamoDb Catalog, @pvary please have a look over the test cases.

Copy link
Contributor

@kbendick kbendick Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like the duplicated test code, and I think it would be good to have a general Catalog API test for checking the functionality of the new Catalog implementations. Something like TestCatalog but for all/most of the catalog API methods. We can do it in a different PR, but it would be good to have them. The test for registerTable() is a good candidate for these common tests.

We do have CatalogTests, which is what TestRESTCatalog is built off of. It’s nice in that it has some methods for subclasses to declare if the catalog being tested supports certain features, so individual tests can exit early if they don’t support an optional feature (similar to JUnit Assume).

For example if the catalog in question doesn’t support namespace properties.

Might be good to see if that can be made to fit this need or can otherwise get some ideas from it. I believe Nessie also implements these tests.

Copy link
Contributor Author

@Mehul2500 Mehul2500 Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good suggestion of using CatalogTests, but the duplicated test code referred to under these comments above shall not be resolved using CatalogTests.
Reason:
The duplicated test code is with two of the test cases, i.e., testRegisterTable() and testRegisterExistingTable() in three of the catalog test files.

  • Jdbc Catalog
  • Ecs Catalog
  • Hadoop Catalog

Out of these only the Jdbc catalog is one which can use CatalogTests, others do not use CatalogTests.
Thus, it won't be helpful to consider CatalogTests under this comment.

@pvary Also has mentioned that we must consider a Catalog API test similar to CatalogTests, but this time for all the catalogs,
but as decided we must come up with a follow-up PR, here, let's only focus on the registerTable functionality in BaseMetastoreCatalog.

@github-actions github-actions bot added the DELL label Jun 29, 2022
Copy link
Member

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  • Manually verified the integration tests with my AWS account.
  • Agree with @pvary that test cases can be refactored into common class and other class should extend it to avoid duplication. But this can be done in follow up PR.

@Mehul2500 Mehul2500 requested a review from rdblue July 10, 2022 10:59
@Mehul2500
Copy link
Contributor Author

@pvary and @rdblue could you please review this PR.

Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if all of us agree, that is is ok to do the Catalog API test in a different PR.
If no objections arrive, we can merge this PR in a few days

@pvary pvary merged commit 3c0fd8f into apache:master Jul 22, 2022
@pvary
Copy link
Contributor

pvary commented Jul 22, 2022

Merged the PR.
Thanks for the work @Mehul2500 and @ajantha-bhat for the review.

@Mehul2500 if you have time please go ahead with the CatalogAPI test PR.

Thanks,
Peter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nessie: Implement Catalog.registerTable() for NessieCatalog
7 participants