Skip to content

Conversation

iximeow
Copy link

@iximeow iximeow commented May 21, 2025

this is a rather broad change, but the overall intention is to move f.seq_hdr.as_ref().unwrap() and f.frame_hdr.as_ref().unwrap() as far up the call tree as the highest unconditional execution of that statement. in most cases these unwraps are truly dead due to their equivalent execution by some caller several frames up, and in some functions (like decode_b and decode_sb) these are redundantly checked many times.

there doesn't seem to be a substantial effect on runtime or instruction count here, but this does delete many unreachable unwrap()s and might help legibility some. this also reduces a release build's size by about 2.6kb.

i didn't see any point that branches the Some-ness of these Arcs, which is particularly interesting. looks like that's all intended to be decided somewhere higher up when scheduling work? it's not exactly clear to me what a principled boundary would look like - "functions in thread_task.rs may not rely on header presence, called functions in the rest of rav1d should be able to"?

this is a pretty rough patch as-is, mostly wondering what your interest level is here! there's a somewhat more meaningful change to be had by grouping {Rav1dContext, Rav1dTaskContext, Rav1dFrameData(WithHeaders)} together into a single parameter, because the combination of number of parameters, number of calls, and inner calls not being tail calls, does add up to a mild (like.. ~0.5%) amount of overall overhead.

this is a rather broad change, but the overall intention is to move
`f.seq_hdr.as_ref().unwrap()` and `f.frame_hdr.as_ref().unwrap()` as far
up the call tree as the highest unconditional execution of that
statement. in most cases these unwraps are truly dead due to their
equivalent execution by some caller several frames up, and in some
functions (like decode_b and decode_sb) these are redundantly checked
many times.

there doesn't seem to be a substantial effect on runtime or instruction
count here, but this does delete many unreachable unwrap()s and might
help legibility some.
let ss_ver = chroma & (f.sr_cur.p.p.layout == Rav1dPixelLayout::I420) as c_int;
let ss_hor = chroma & (f.sr_cur.p.p.layout != Rav1dPixelLayout::I444) as c_int;
let frame_hdr = &***f.frame_hdr.as_ref().unwrap();
let frame_hdr = &f.frame_hdr;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and an obviousish second step to improve this patch would be to clean up this kinda thing. emphasis on "do the broad strokes seem useful" :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which lines are you referring to? The ss_ver, ss_hor stuff? Or the f.frame_hdr stuff you changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, all the random bindings of f.frame_hdr and f.seq_hdr that are scattered around. in most cases it seems better to just write f.frame_hdr instead of frame_hdr at the use sites.

@djc
Copy link
Collaborator

djc commented May 23, 2025

I think it's a great idea to statically clarify whether these values are expected to be there via the type system. I don't really love the elaborate Rav1dDataFrameWithHeaders name -- and maybe the "burden of ugly name" should be the other way -- that is, the data frame with maybe headers should be named as such?

Comment on lines +891 to +896
// Safety: as an implementation detail of Rust today, Arc<T> is just a glorified
// NonNull<ArcInner<T>>. this will be niche elided into a `Some` when transmuted to
// `Option<Arc<T>>`. this is not guaranteed to remain so, and `Arc<T, A>` with non-ZST
// allocators will trivially violate this.
//
// to repeat: safe only as an implementation detail of Rust to date.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was curious if the UCG might have something more concrete to cite as i was writing this and instead realized the Option::Some(Arc<T>) <=> Arc<T> games really only work as an implementation detail instead. this takes advantage of two things that are not guaranteed:

  • repr(Rust) structs with one non-ZST field have the same layout as that field (e.g. Arc<T> hast the layout of a NonNull<ArcInner<T>>)
  • Arc here will be allocated with a ZST allocator (such as System, Global, etc)

i don't think this can be made to work in a stable way, unfortunately.

Comment on lines +865 to +868
// Safety: we just asserted that seq_hdr and frame_hdr are Some, and Option<Arc<T>> has the
// same layout as Arc<T>. further, `Rav1dFrameData` and `Rav1dFrameDataWithHeaders` have
// the same layout with repr(C) ensuring their field ordering is identical.
unsafe { std::mem::transmute(self) }
Copy link
Author

@iximeow iximeow May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing that would be stable here is to have the two FrameData{,MaybeHeaders} structs hold *const DRav1d<...> and Option<*const DRav1d<...>> respectively, and access them through methods that turn them into refs as appropriate. it would almost be fine, but it's still unfortunately hairy in at least one case i'll note below...

Comment on lines 4943 to 4944
let _ = mem::take(&mut f.seq_hdr);
let _ = mem::take(&mut f.frame_hdr);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any time the seq_hdr/frame_hdr fields are taken from like this, it would be absolutely critical to reconstruct the Arc<T> if these were Some so as to not leak the header refcount. seems pretty subtle and easy to mistake. might be less bad with a Rav1dFrameDataMaybeData::drop_state() or something, but ouch.

Comment on lines 5085 to 5086
let _ = mem::take(&mut f.seq_hdr);
let _ = mem::take(&mut f.frame_hdr);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same "easy to improperly discard *_hdr" issue here. though, on_error probably ought to use the same hypothetical Rav1dFrameData::drop_state.

@iximeow
Copy link
Author

iximeow commented May 26, 2025

that is, the data frame with maybe headers should be named as such?

gave a swing at better names, and unfortunately after going and fixing up all the references i realized the Safety claim isn't defensible as-is. @djc i've left some comments about how i think it could still be made to work, but i'd defer to you and other maintainers on if this is too fiddly.

@kkysen kkysen self-requested a review May 27, 2025 07:09
Copy link
Collaborator

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I took a while to look through this.

there doesn't seem to be a substantial effect on runtime or instruction count here, but this does delete many unreachable unwrap()s and might help legibility some. this also reduces a release build's size by about 2.6kb.

So if this doesn't improve performance noticeably, it's definitely going to have to be sound and not depend on std internals. It is a nice cleanup, though, and if we can figure out a way to do it without transmuting, then it'd be a good idea.

this is a pretty rough patch as-is, mostly wondering what your interest level is here! there's a somewhat more meaningful change to be had by grouping {Rav1dContext, Rav1dTaskContext, Rav1dFrameData(WithHeaders)} together into a single parameter, because the combination of number of parameters, number of calls, and inner calls not being tail calls, does add up to a mild (like.. ~0.5%) amount of overall overhead.

This sounds more promising. This is just from the number of calls and stack usage?

let ss_ver = chroma & (f.sr_cur.p.p.layout == Rav1dPixelLayout::I420) as c_int;
let ss_hor = chroma & (f.sr_cur.p.p.layout != Rav1dPixelLayout::I444) as c_int;
let frame_hdr = &***f.frame_hdr.as_ref().unwrap();
let frame_hdr = &f.frame_hdr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which lines are you referring to? The ss_ver, ss_hor stuff? Or the f.frame_hdr stuff you changed?


// Safety: we just asserted that seq_hdr and frame_hdr are Some, and Option<Arc<T>> has the
// same layout as Arc<T>. further, `Rav1dFrameData` and `Rav1dFrameDataWithHeaders` have
// the same layout with repr(C) ensuring their field ordering is identical.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be beneficial to later remove the #[repr(C)]s from a lot of these types (see #1339), so in general it'd be best to not depend on it if we don't need to.

src/internal.rs Outdated
}
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +849 to +853
pub(crate) struct Rav1dFrameData {
pub seq_hdr: Arc<DRav1d<Rav1dSequenceHeader, Dav1dSequenceHeader>>,
pub frame_hdr: Arc<DRav1d<Rav1dFrameHeader, Dav1dFrameHeader>>,
pub content: Rav1dFrameContent,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you just made each of these fields references?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could work, and i think i see why you ask - it's a short path from there to a change like this without a transmute involved. i was hesitant to do that because of the extra pointer chasing, though it's probably not actually all that bad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works, we can also remove the Arc<DRav1d<R, D>> parts and just store &R, so we won't have the Arc deref overhead (although that will hopefully go away soon).

@kkysen kkysen added refactoring Cleaning up transpiled code performance labels Jun 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance refactoring Cleaning up transpiled code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants