Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4018] feat(core): Add tag management logic for Tag System (Part 1) #4019

Merged
merged 15 commits into from
Jul 9, 2024

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This PR tracks the work of adding the core logics for tag management.

Why are the changes needed?

This is a part of work for adding tag support in Gravitino.

Fix: #4018

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UTs added.

@jerryshao jerryshao self-assigned this Jul 2, 2024
public void insertTag(TagEntity tagEntity, boolean overwritten) throws IOException {
Namespace ns = tagEntity.namespace();
String metalakeName = ns.level(0);
AtomicReference<Long> metalakeId = new AtomicReference<>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any thread-safe problem here? If not, I think Long[] metalakeId = new Long[] {metalakeId} may be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a preference, I think both are fine.

@jerryshao jerryshao changed the title [#4018] feat(core): Add tag management logic for Tag System [#4018] feat(core): Add tag management logic for Tag System (Part 1) Jul 2, 2024
+ MetalakeMetaMapper.TABLE_NAME
+ " mm on tm.metalake_id = mm.metalake_id"
+ " WHERE mm.metalake_name = #{metalakeName} AND tm.deleted_at = 0")
List<TagPO> listTagPOsByMetalake(@Param("metalakeName") String metalakeName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metalake doesn't exist, it will return empty collection instead of throwing NoSuchMetalakeException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I have already checked this outside in TagManager.
  2. Even in the other implementation you will still get an empty collection if it happens that after the check and before the select the metalake is deleted.

Copy link
Contributor

@qqqttt123 qqqttt123 Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put the check into one transaction in later pull request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hesitate to do so, it is overkill to put the check in one transaction. Also, I was thinking about the behavior of the list, for example, if we're going to list the entities under a non-existent namespace, should it return an empty list or throw a namespace not exist exception? Both are reasonable, I need to think a bit about this.

@jerryshao
Copy link
Contributor Author

@xloya @yuqi1129 @lw-yang can you please help me to review this? I use the transactions here for multiple step queries, I'm not so sure if it is correct or not, also I'm very sure we can really keep the consistency, or we have to introduce the lock here.


try {
SessionUtils.doMultipleWithCommit(
() -> metalakeId.set(MetalakeMetaService.getInstance().getMetalakeIdByName(metalakeName)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that select... cannot block the delete operation, so we need to have a metalake lock here to avoid user deleting the metalake while we're querying it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to we need use select xxx for update to obtain the write lock here, I'm not very it will cause dead lock or other problem or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One choice is to use select...for update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to avoid using tag lock or tree lock, instead using transactions to keep consistency, but seems it is not achievable, still we have to introduce lock here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the current design, I think we'd better add locks in the upper layer to avoid duplication of internal locks and database locks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a upper lock mechanism like tree lock, using it is a reasonable solution.

+ " + EXTRACT(MICROSECOND FROM CURRENT_TIMESTAMP(3)) / 1000"
+ " WHERE tmo.tag_id IN (SELECT tm.tag_id FROM "
+ TagMetaMapper.TAG_TABLE_NAME
+ " tm WHERE tm.metalake_id IN (SELECT mm.metalake_id FROM "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using JOIN in the sub queries here? As far as I know, the performance of the IN condition will be poor when the amount of data is large, so we'd better try to reduce the use of IN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H2 doesn't support JOIN, so I use subquery here.


try {
SessionUtils.doMultipleWithCommit(
() -> metalakeId.set(MetalakeMetaService.getInstance().getMetalakeIdByName(metalakeName)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the current design, I think we'd better add locks in the upper layer to avoid duplication of internal locks and database locks.

UNIQUE KEY `uk_ti_mi_del` (`tag_id`, `metadata_object_id`, `deleted_at`),
KEY `idx_tid` (`tag_id`),
KEY `idx_mid` (`metadata_object_id`)
Copy link
Collaborator

@xloya xloya Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any query using this condition in the current SQL, so it seems that this index is unnecessary now. But we can also reserve this index if we are sure that we will need to query based on this condition in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used in the next PR.

@jerryshao
Copy link
Contributor Author

@yuqi1129 @xloya @lw-yang please help to review again.

@yuqi1129
Copy link
Contributor

yuqi1129 commented Jul 8, 2024

Overall, LGTM is good except for a minor issue. Please let me know if it has been resolved or if there is any doubt.

Copy link
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuqi1129
Copy link
Contributor

yuqi1129 commented Jul 8, 2024

@qqqttt123 @lw-yang
Do you have any additional comments on the matter?

@lw-yang
Copy link
Contributor

lw-yang commented Jul 9, 2024

LGTM

1 similar comment
@xloya
Copy link
Collaborator

xloya commented Jul 9, 2024

LGTM

@jerryshao jerryshao merged commit 178eb37 into apache:main Jul 9, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Add core logic for tag management (part 1)
5 participants