Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP thread cpu time #175

Closed
wants to merge 10 commits into from
Closed

WIP thread cpu time #175

wants to merge 10 commits into from

Conversation

nickrobinson251
Copy link
Member

PR Description

What does this PR do?

WIP on https://relationalai.atlassian.net/browse/RAI-29088

Checklist

Requirements for merging:

  • I have opened an issue or PR upstream on JuliaLang/julia: <link to JuliaLang/julia>
  • I have removed the port-to-* labels that don't apply.
  • I have opened a PR on raicode to test these changes:

@github-actions github-actions bot added port-to-v1.10 This change should apply to Julia v1.10 builds port-to-master This change should apply to all future Julia builds labels Aug 28, 2024
uint64_t scheduler_time;
uint64_t lock_spin_time;
uint64_t gc_time;
} jl_timing_tls_states_t;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: should this be like GC_Num and have a corresponding struct on the Julia side, so on that side we work with the struct rather than individual numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downside: any user-facing struct is impossible to expand, so probably just want to be exposing functions that return numbers... could still be passing data from C -> Julia side as a struct but idk if that gains us much tbh

base/timing.jl Outdated
thread_up_time() = ccall(:jl_thread_up_time, UInt64, ())
thread_user_time() = ccall(:jl_thread_user_time, UInt64, ())
# thread_user_time(tid::Integer) = ccall(:jl_thread_user_time, UInt64, (Cint,), Cint(tid))
# function thread_user_time(pool::Symbol=:all)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm thinking an interface like this, where you can optionally return stats by threadpool makes sense?

we also need to change this to be able to return specific stats (like sleep_time, gc_time etc.) rather than just user_time

and planning to do the aggregation on the Julia side, as you can see

@@ -523,6 +527,7 @@ JL_DLLEXPORT jl_task_t *jl_task_get_next(jl_value_t *trypoptask, jl_value_t *q,
assert(jl_atomic_load_relaxed(&ptls->sleep_check_state) == not_sleeping);
uv_mutex_unlock(&ptls->sleep_lock);
JULIA_DEBUG_SLEEPWAKE( ptls->sleep_leave = cycleclock() );
ptls->timing_tls.sleep_time += jl_hrtime() - tsleep0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently sleep_time is a subset of scheduler_time (so we might want to rename scheduler_time to make that clear, or do the extra accounting so that we stop accumulating scheduler_time when we start accumulating sleep_time?)

src/threading.c Outdated
{
jl_ptls_t ptls = jl_current_task->ptls;
jl_timing_tls_states_t *timing = &ptls->timing_tls;
return jl_thread_up_time() - timing->gc_time - timing->lock_spin_time;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

Suggested change
return jl_thread_up_time() - timing->gc_time - timing->lock_spin_time;
return jl_thread_up_time() - timing->gc_time - timing->lock_spin_time - timing->scheduler_time;

but also maybe this isn't the right API, and we should instead have a jl_thread_timing_stats(int tid) that populates a struct and do all the arithmetic on the Julia side

src/threading.c Outdated
while (1) {
if (owner == NULL && jl_atomic_cmpswap(&lock->owner, &owner, self)) {
lock->count = 1;
jl_profile_lock_acquired(lock);
jl_record_lock_spin_time(jl_hrtime() - t0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently time in runtime-internal spin locks and Julia-side SpinLocks accumulate into the same field... idk if we want to separate those (i guess some use of Julia-side SpinLocks are "internal" not just in user-code, so i'm leaning towards keeping accumulating them both into the same field)

static uint64_t jl_thread_start_time;
void jl_set_thread_start_time(void)
{
jl_thread_start_time = jl_hrtime();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is global shared by all threads, which technically isn't correct since threads will start at very slightly different times, but i think this is fine at least for a first pass?

return task;

}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the fast-path for task-switches (i think?), but i think we've concluded this shouldn't add too much overhead (given jl_hrtime is a vdso call)... still need to verify that experimentally though

@@ -3798,6 +3798,8 @@ JL_DLLEXPORT void jl_gc_collect(jl_gc_collection_t collection)
jl_safepoint_end_gc();
jl_gc_state_set(ptls, old_state, JL_GC_STATE_WAITING);
JL_PROBE_GC_END();
// Time how long GC took.
ptls->timing_tls.gc_time += jl_hrtime() - t1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC time for the thread coordinating the GC

@@ -173,6 +176,7 @@ void jl_safepoint_wait_gc(void)
uv_cond_wait(&safepoint_cond, &safepoint_lock);
uv_mutex_unlock(&safepoint_lock);
}
ptls->timing_tls.gc_time = jl_hrtime() - t0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC time for the other threads

uint64_t start_time;
uint64_t sleep_time;
uint64_t scheduler_time;
uint64_t lock_spin_time;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to do: add compile_time (which could be a subset of lock_spin_time i guess or we could stop accumulating lock_spin_time when compilation_time starts?)

in future maybe we could split lock_spin_time to have timing for a few important internal locks (like the codegen_lock) but i think that can be follow-up work?

nickrobinson251 pushed a commit that referenced this pull request Sep 11, 2024
Stdlib: Tar
URL: https://github.com/JuliaIO/Tar.jl.git
Stdlib branch: master
Julia branch: master
Old commit: 81888a3
New commit: 1114260
Julia version: 1.12.0-DEV
Tar version: 1.10.0(Does not match)
Bump invoked by: @StefanKarpinski
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaIO/Tar.jl@81888a3...1114260

```
$ git log --oneline 81888a3..1114260
1114260 Accept other string types for all string arguments (fix #179) (#180)
a2e39d6 Bump julia-actions/cache from 1 to 2 (#178)
152d12e Bump julia-actions/setup-julia from 1 to 2 (#177)
5012536 Fix Codecov (#176)
9b5460b Add `public` declarations using `eval` (#175)
4e9d73a Add docstring for Tar module (#173)
38a4bf4 Bump codecov/codecov-action from 3 to 4 (#172)
166deb3 [CI] Switch to `julia-actions/cache` (#171)
d0085d8 Hardcode doc edit backlink (#164)
7e83ed7 [NFC] fix some wonky formatting (#168)
6269b5b Bump actions/checkout from 3 to 4 (#163)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
@d-netto d-netto mentioned this pull request Sep 20, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
port-to-master This change should apply to all future Julia builds port-to-v1.10 This change should apply to Julia v1.10 builds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant