Skip to content

Add GC-safe regions around some ccalls #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 28, 2025

Conversation

kpamnany
Copy link
Contributor

@kpamnany kpamnany commented Oct 25, 2023

Specifically, around LZ4_compress_fast, LZ4_compress_destSize and LZ4_decompress_safe. If these are called on large data (on the order of GBs), the calls could take multiple seconds during which GC cannot run without these GC-safe regions.

Ref: JuliaLang/julia#51574

Please note that when JuliaLang/julia#49933 lands, these changes should be subsumed by whatever that requires.

@kpamnany
Copy link
Contributor Author

CI failure seems unrelated?

Copy link

@kuszmaul kuszmaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I suggested some changes to the capitalization of comments to make them consistent with the other comments in the file.

@kpamnany
Copy link
Contributor Author

This needs some additional tweaks for correctness -- coming shortly.

@kpamnany kpamnany force-pushed the kp/gc-safe-some-cccalls branch 2 times, most recently from adf1dc8 to 0ecf4f8 Compare October 26, 2023 16:49
@kpamnany
Copy link
Contributor Author

This is ready to go. Thanks to @vchuravy for his input!

@kpamnany
Copy link
Contributor Author

kpamnany commented Nov 2, 2023

Bump. Attn: @ViralBShah or @omus or @sjkelly.

@ViralBShah
Copy link
Contributor

ViralBShah commented Nov 2, 2023

@kpamnany invited you as a member of the org. Please merge and tag. Is the windows failure relevant? I think earlier PRs were not failing on windows.

@kpamnany
Copy link
Contributor Author

kpamnany commented Nov 2, 2023

The segfault is coming from within this ccall. I haven't touched this part of the code, but the caller of that function, here, is playing fast and loose with pointer calls. Presumably there is a missing GC.@preserve? Would love some more eyes on this... @vtjnash, please advise?

@vtjnash
Copy link
Contributor

vtjnash commented Nov 2, 2023

There does not appear to be any uses of out_buffer (aside from pointer), so there is nothing that requires the GC to bother allocating it. It is pretty risky calling pointer in that caller, as then there is nothing for GC.@preserve to preserve later in the callee.

@kpamnany kpamnany force-pushed the kp/gc-safe-some-cccalls branch from 0ecf4f8 to db79d9c Compare November 3, 2023 00:19
@kpamnany
Copy link
Contributor Author

kpamnany commented Nov 3, 2023

There's a lot of pointer calls in there without any @preserves or any uses of the Array for which the pointer is taken. Right at the beginning of the file, here for instance.

Seems like this package needs some cleaning up.

@vtjnash
Copy link
Contributor

vtjnash commented Nov 3, 2023

Very strangely sketchy way to make unsafe_store slower and more unsafe

@ViralBShah
Copy link
Contributor

@vtjnash What exactly should be done?

@vtjnash
Copy link
Contributor

vtjnash commented Nov 3, 2023

Someone needs to go rewrite that code without the calls to pointer

@kpamnany
Copy link
Contributor Author

kpamnany commented Nov 6, 2023

We have this in package cleanup on our TODO list. Unfortunately, that TODO list is very long at the moment, so I don't know how soon we'll get to this. Cc: @NHDaly

@nhz2
Copy link
Member

nhz2 commented Feb 1, 2024

I have fixed a few of the pointer issues in #44 and CI is passing again. There are still more issues in the file, but this is a start. I don't have merge permissions, so someone else needs to merge this. CC: @vtjnash @kpamnany

@ViralBShah
Copy link
Contributor

@nhz2 I have invited you to the org. Can you accept and see if you are able to merge?

@nhz2
Copy link
Member

nhz2 commented Feb 1, 2024

Thanks, I am part of the JuliaIO org, but for some reason, I still see "You’re not authorized to merge this pull request." I was able to merge JuliaIO/CodecBzip2.jl#25 so there must be something special about the settings in this repo.

@ViralBShah
Copy link
Contributor

Ah. Most of the repos had only read as the default permission for the team. I updated those - and you should be able to merge now. Glad we got this fixed!

@nhz2
Copy link
Member

nhz2 commented Feb 1, 2024

I still don't have permission to merge on this repo. I see "The base branch restricts merging to authorized users. Learn more about protected branches."

@ViralBShah
Copy link
Contributor

ViralBShah commented Feb 1, 2024

I don't see you as a member of JuliaIO, and it says invitation pending. In fact there were two past invitations that timed out. I don't have a direct link I can send you for accepting the invitation, as far as I can tell.

@ViralBShah
Copy link
Contributor

Never mind - I was misspelling your id. I have made you owner.

@KristofferC
Copy link
Member

I merged that PR for now. Tell me if we should make a new version as well.

@ViralBShah
Copy link
Contributor

Might be good to get this one in and then release, but @nhz2 may be the best person to suggest.

@nhz2
Copy link
Member

nhz2 commented Feb 2, 2024

I'll release a new version to get #43 in and fix PkgEval. I'm not sure what jl_gc_safe_enter does exactly, and couldn't find any docs for it, so I'm not the person to review if this PR is ready.

@ViralBShah
Copy link
Contributor

@vtjnash Can you review and possibly fix this PR?

Specifically, around `LZ4_compress_fast`, `LZ4_compress_destSize` and
`LZ4_decompress_safe`. If these are called on large data (on the order
of GBs), the calls could take multiple seconds during which GC cannot
run without these GC-safe regions.
@kpamnany kpamnany force-pushed the kp/gc-safe-some-cccalls branch from abcacee to e31fa66 Compare March 13, 2025 17:04
Copy link

codecov bot commented Mar 13, 2025

Codecov Report

Attention: Patch coverage is 97.05882% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.10%. Comparing base (b3319f7) to head (4257ece).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/gcsafe_ccall.jl 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
- Coverage   98.21%   98.10%   -0.11%     
==========================================
  Files           8        9       +1     
  Lines         392      423      +31     
==========================================
+ Hits          385      415      +30     
- Misses          7        8       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kpamnany
Copy link
Contributor Author

kpamnany commented Mar 13, 2025

Sorry for leaving this PR hanging. I just updated it.

JuliaLang/julia#49933 has landed, but will only be in 1.12 so this still makes sense to do.

Okay with merging @nhz2?

@vchuravy
Copy link

GPUToolbox.jl is fairly lightweight and has an implementation of the macro that will be compatible with 1.12

@kpamnany
Copy link
Contributor Author

Feels weird to add that dependency just for the macro though.

@vchuravy
Copy link

Feel free to internalize it or copy it into Compat.jl

@nhz2
Copy link
Member

nhz2 commented Mar 16, 2025

I have questions about using the @gcsafe_ccall macro in other Codec packages.

  1. Is it ok to use @gcsafe_ccall when the C function may call Julia functions for memory management, like in https://github.com/JuliaIO/CodecBzip2.jl/blob/bbccc86dc4577193cc5236516a1f4cf33cadf9b1/src/libbzip2.jl#L41-L42 ?

  2. How should the calling convention be specified? libbzip2 needs stdcall on WIN32.

Copied from GPUToolbox.jl.
@kpamnany
Copy link
Contributor Author

I've updated this PR to copy in @gsafe_ccall from GPUToolbox.jl as Kristoffer's point in the Compat.jl PR made sense to me -- better to duplicate a little code than to pull in unnecessary dependencies.

@nhz2: if you are ccalling a function that calls back into Julia, that is not GC-safe.

@vchuravy: can I get a little help with the CI failures on Julia 1.6 please? I do not know how @inline has changed. Also, I do not know the answer to @nhz2's second question.

@kpamnany kpamnany force-pushed the kp/gc-safe-some-cccalls branch from 0ce15b6 to d1e5e08 Compare March 27, 2025 22:40
@kpamnany
Copy link
Contributor Author

Turns out that support for using @inline as a call-site annotation was only added in 1.8. And I don't think I can do anything more about code coverage.

So this is good to go, AFAICT. Would love to have someone take a quick look before merging.

@KristofferC
Copy link
Member

KristofferC commented Mar 28, 2025

Would it make sense to add 1.12-nightly and nightly to the CI matrix since there are some different branches that will be hit for those?

@kpamnany
Copy link
Contributor Author

Apart from that, looks okay to you Kristoffer?

@KristofferC KristofferC merged commit 1a5932d into JuliaIO:master Mar 28, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants