Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove synchronized blocks in Thread isAlive and isDead methods #20415

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

babsingh
Copy link
Contributor

@babsingh babsingh commented Oct 25, 2024

Made threadRef volatile so that the most up-to-date value is seen
by all the threads.

threadRef = NO_REF assignment will happen once, and similarly, the
started field is also set once during initialization.

A user of these methods will probably invoke them multiple times
until the desired result is achieved. A delay in getting the most
up-to-date value should not hinder the functionality of these
methods.

To sum up, removing synchronization in these methods improves perf
by reducing lock contention without hindering the functionality.

Reference to PR 1FJMO7Q has been removed and a comment related to
ThreadGroup has been updated since the code seems to have evolved
since the comments were written.

Related: #20414

Signed-off-by: Babneet Singh sbabneet@ca.ibm.com

@babsingh
Copy link
Contributor Author

jenkins test sanity amac jdk17

@babsingh
Copy link
Contributor Author

babsingh commented Oct 25, 2024

Updated a typo in the commit message. The previously launched PR build should still be good: https://openj9-jenkins.osuosl.org/view/Pull%20Requests/job/PullRequest-OpenJ9/6285. There was a timeout with java/lang/Thread/IsAlive.java.IsAlive. Made threadRef volatile to resolve this issue.

@@ -746,10 +746,8 @@ public static boolean interrupted() {
* @see Thread#start
*/
public final boolean isAlive() {
synchronized (lock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the source as to why these locks were acquired here in the first place? I suspect it has something to do with guarding the volatile variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They seem to exist since the beginning as means to protect access to shared variables. I didn't find a special reason for their use here. Based on how on how these methods will be used, they should function fine without synchronization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked in the extension repo code for j.l.Thread that we use for the more recent JDK versions, and isAlive already doesn't have the synchronized block.isDead still does though. Are you opening a patch for the extension repos as well to remove the isDead lock?

Copy link
Contributor Author

@babsingh babsingh Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you opening a patch for the extension repos as well to remove the isDead lock?

Yes (only for JDK21+).

@@ -765,9 +763,7 @@ public final boolean isAlive() {
*/
private boolean isDead() {
// Has already started, is not alive anymore, and has been removed from the ThreadGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment (and the doc comment) suggests we are doing some type of checking w.r.t. removal from a ThreadGroup. Do we know if started indicates anything about ThreadGroup removal? Either way, while we are here we should ensure that the comment is accurate.

Copy link
Contributor Author

@babsingh babsingh Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threadRef can be NULL before the thread starts and after it exits. started = TRUE makes sure that we only pick up the exit case and implies that thread has been removed from the ThreadGroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be your suggestion for a rewrite?

Copy link
Contributor Author

@babsingh babsingh Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a reference to a very old issue: PR 1FJMO7Q; I believe it is no longer accessible. They might have seen an issue related to ThreadGroup and isDead method.

May be, change it to the following to clarify and highlight the implicit relationship?

Suggested change
// Has already started, is not alive anymore, and has been removed from the ThreadGroup
/* threadRef can be NO_REF before the thread starts and after it exits.
* Using started, only account for the exit case where threadRef is NO_REF.
* This assures that the thread has been removed from the ThreadGroup.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implies that thread has been removed from the ThreadGroup.

It was this implication that isn't clear to me. I'd've thought that:
// Had already started and is not alive anymore.
would alone be appropriate as the comment.

The only place I could find where the thread is removed from the ThreadGroup is here.

But that isn't through the exit() path. That is in the failure to start() path. Exit simply nulls out the group member.

Copy link
Contributor Author

@babsingh babsingh Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. It looks like the code might have changed; but, the comment was never updated. We can probably remove the reference to the old issue and ThreadGroup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I am thinking too. But I am also wondering now if we are missing a call to this.group.remove(this) during the exit sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In OpenJ9's ThreadGroup.java (JDK17-), there is a comment which is repeated twice: Dead Threads not removed from ThreadGroups. There might have been a behavior change where dead threads were no longer removed from the ThreadGroup. Also, methods like ThreadGroup.activeCount are anticipating dead threads since they are doing checks like isAlive.

In JDK21+, ThreadGroup in the extension repo no longer maintains a list of threads; instead, it relies upon Thread.getAllThreads().

ThreadGroups are not widely used and considered unsafe (ref). Since no functional issues are reported, it might not be worth investigating this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

return started && threadRef == NO_REF;
}
/* Has already started and is not alive anymore. */
return started && threadRef == NO_REF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest for readability and match with the extension repo versions:
return (started && (threadRef == NO_REF))

Made threadRef volatile so that the most up-to-date value is seen
by all the threads.

threadRef = NO_REF assignment will happen once, and similarly, the
started field is also set once during initialization.

A user of these methods will probably invoke them multiple times
until the desired result is achieved. A delay in getting the most
up-to-date value should not hinder the functionality of these
methods.

To sum up, removing synchronization in these methods improves perf
by reducing lock contention without hindering the functionality.

Reference to PR 1FJMO7Q has been removed and a comment related to
ThreadGroup has been updated since the code seems to have evolved
since the comments were written.

Related: eclipse-openj9#20414

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
@ThanHenderson
Copy link
Contributor

lgtm

@tajila
Copy link
Contributor

tajila commented Oct 26, 2024

jenkins test sanity alinux64 jdk17

@tajila
Copy link
Contributor

tajila commented Oct 26, 2024

jenkins test extended.system zlinux64 jdk17

@tajila
Copy link
Contributor

tajila commented Oct 26, 2024

jenkins test extended.functional plinux64 jdk17

@tajila
Copy link
Contributor

tajila commented Oct 26, 2024

jenkins test extended.system zlinux jdk17

@tajila
Copy link
Contributor

tajila commented Oct 26, 2024

jenkins test extended.functional plinux jdk17

@tajila
Copy link
Contributor

tajila commented Oct 28, 2024

Im okay with these changes. Just want to double check that we are not regressing on an issue I recall from before where there was a race condition with a thread dying and interrupt() or something like that being called which results it an intermittent crash.

@tajila
Copy link
Contributor

tajila commented Oct 28, 2024

Actually, it was probably getStackTrace and the thread dying. Anyways, Ill see if I can find the test.

@ThanHenderson
Copy link
Contributor

The other uses of locks that wrap isAlive() weren't removed e.g.

synchronized (lock) {
if (!isAlive()) {
return EMPTY_STACK_TRACE;
}
t = getStackTraceImpl();
}
return J9VMInternals.getStackTrace(t, false);

So it's likely that the test you are referring to wouldn't regress.

@tajila
Copy link
Contributor

tajila commented Oct 29, 2024

I found the test, it was an issue with throwing an exception while the thread was dying, so its unaffected byt these changes.

@tajila tajila merged commit 17ee5e7 into eclipse-openj9:master Oct 29, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants