-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#4018] feat(core): Add tag management logic for Tag System (Part 1) #4019
Conversation
public void insertTag(TagEntity tagEntity, boolean overwritten) throws IOException { | ||
Namespace ns = tagEntity.namespace(); | ||
String metalakeName = ns.level(0); | ||
AtomicReference<Long> metalakeId = new AtomicReference<>(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any thread-safe problem here? If not, I think Long[] metalakeId = new Long[] {metalakeId}
may be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a preference, I think both are fine.
core/src/main/java/com/datastrato/gravitino/storage/relational/service/TagMetaService.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/datastrato/gravitino/tag/TagManager.java
Outdated
Show resolved
Hide resolved
+ MetalakeMetaMapper.TABLE_NAME | ||
+ " mm on tm.metalake_id = mm.metalake_id" | ||
+ " WHERE mm.metalake_name = #{metalakeName} AND tm.deleted_at = 0") | ||
List<TagPO> listTagPOsByMetalake(@Param("metalakeName") String metalakeName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If metalake doesn't exist, it will return empty collection instead of throwing NoSuchMetalakeException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I have already checked this outside in TagManager.
- Even in the other implementation you will still get an empty collection if it happens that after the check and before the select the metalake is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to put the check into one transaction in later pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hesitate to do so, it is overkill to put the check in one transaction. Also, I was thinking about the behavior of the list, for example, if we're going to list the entities under a non-existent namespace, should it return an empty list or throw a namespace not exist exception? Both are reasonable, I need to think a bit about this.
core/src/main/java/com/datastrato/gravitino/storage/relational/mapper/TagMetaMapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/datastrato/gravitino/storage/relational/service/TagMetaService.java
Outdated
Show resolved
Hide resolved
|
||
try { | ||
SessionUtils.doMultipleWithCommit( | ||
() -> metalakeId.set(MetalakeMetaService.getInstance().getMetalakeIdByName(metalakeName)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that select...
cannot block the delete
operation, so we need to have a metalake lock here to avoid user deleting the metalake while we're querying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to we need use select xxx for update
to obtain the write lock here, I'm not very it will cause dead lock or other problem or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One choice is to use select...for update
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to avoid using tag lock or tree lock, instead using transactions to keep consistency, but seems it is not achievable, still we have to introduce lock here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the current design, I think we'd better add locks in the upper layer to avoid duplication of internal locks and database locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a upper lock mechanism like tree lock, using it is a reasonable solution.
+ " + EXTRACT(MICROSECOND FROM CURRENT_TIMESTAMP(3)) / 1000" | ||
+ " WHERE tmo.tag_id IN (SELECT tm.tag_id FROM " | ||
+ TagMetaMapper.TAG_TABLE_NAME | ||
+ " tm WHERE tm.metalake_id IN (SELECT mm.metalake_id FROM " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using JOIN
in the sub queries here? As far as I know, the performance of the IN
condition will be poor when the amount of data is large, so we'd better try to reduce the use of IN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H2 doesn't support JOIN, so I use subquery here.
...main/java/com/datastrato/gravitino/storage/relational/mapper/TagMetadataObjectRelMapper.java
Outdated
Show resolved
Hide resolved
|
||
try { | ||
SessionUtils.doMultipleWithCommit( | ||
() -> metalakeId.set(MetalakeMetaService.getInstance().getMetalakeIdByName(metalakeName)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the current design, I think we'd better add locks in the upper layer to avoid duplication of internal locks and database locks.
UNIQUE KEY `uk_ti_mi_del` (`tag_id`, `metadata_object_id`, `deleted_at`), | ||
KEY `idx_tid` (`tag_id`), | ||
KEY `idx_mid` (`metadata_object_id`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any query using this condition in the current SQL, so it seems that this index is unnecessary now. But we can also reserve this index if we are sure that we will need to query based on this condition in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be used in the next PR.
Overall, LGTM is good except for a minor issue. Please let me know if it has been resolved or if there is any doubt. |
core/src/main/java/com/datastrato/gravitino/tag/TagManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@qqqttt123 @lw-yang |
LGTM |
1 similar comment
LGTM |
What changes were proposed in this pull request?
This PR tracks the work of adding the core logics for tag management.
Why are the changes needed?
This is a part of work for adding tag support in Gravitino.
Fix: #4018
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UTs added.