-
Notifications
You must be signed in to change notification settings - Fork 60
Be stricter about sequence and frame header presence #1407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
this is a rather broad change, but the overall intention is to move `f.seq_hdr.as_ref().unwrap()` and `f.frame_hdr.as_ref().unwrap()` as far up the call tree as the highest unconditional execution of that statement. in most cases these unwraps are truly dead due to their equivalent execution by some caller several frames up, and in some functions (like decode_b and decode_sb) these are redundantly checked many times. there doesn't seem to be a substantial effect on runtime or instruction count here, but this does delete many unreachable unwrap()s and might help legibility some.
let ss_ver = chroma & (f.sr_cur.p.p.layout == Rav1dPixelLayout::I420) as c_int; | ||
let ss_hor = chroma & (f.sr_cur.p.p.layout != Rav1dPixelLayout::I444) as c_int; | ||
let frame_hdr = &***f.frame_hdr.as_ref().unwrap(); | ||
let frame_hdr = &f.frame_hdr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and an obviousish second step to improve this patch would be to clean up this kinda thing. emphasis on "do the broad strokes seem useful" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which lines are you referring to? The ss_ver
, ss_hor
stuff? Or the f.frame_hdr
stuff you changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, all the random bindings of f.frame_hdr
and f.seq_hdr
that are scattered around. in most cases it seems better to just write f.frame_hdr
instead of frame_hdr
at the use sites.
I think it's a great idea to statically clarify whether these values are expected to be there via the type system. I don't really love the elaborate |
// Safety: as an implementation detail of Rust today, Arc<T> is just a glorified | ||
// NonNull<ArcInner<T>>. this will be niche elided into a `Some` when transmuted to | ||
// `Option<Arc<T>>`. this is not guaranteed to remain so, and `Arc<T, A>` with non-ZST | ||
// allocators will trivially violate this. | ||
// | ||
// to repeat: safe only as an implementation detail of Rust to date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was curious if the UCG might have something more concrete to cite as i was writing this and instead realized the Option::Some(Arc<T>) <=> Arc<T>
games really only work as an implementation detail instead. this takes advantage of two things that are not guaranteed:
repr(Rust)
structs with one non-ZST field have the same layout as that field (e.g.Arc<T>
hast the layout of aNonNull<ArcInner<T>>
)Arc
here will be allocated with a ZST allocator (such as System, Global, etc)
i don't think this can be made to work in a stable way, unfortunately.
// Safety: we just asserted that seq_hdr and frame_hdr are Some, and Option<Arc<T>> has the | ||
// same layout as Arc<T>. further, `Rav1dFrameData` and `Rav1dFrameDataWithHeaders` have | ||
// the same layout with repr(C) ensuring their field ordering is identical. | ||
unsafe { std::mem::transmute(self) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing that would be stable here is to have the two FrameData{,MaybeHeaders}
structs hold *const DRav1d<...>
and Option<*const DRav1d<...>>
respectively, and access them through methods that turn them into refs as appropriate. it would almost be fine, but it's still unfortunately hairy in at least one case i'll note below...
let _ = mem::take(&mut f.seq_hdr); | ||
let _ = mem::take(&mut f.frame_hdr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any time the seq_hdr
/frame_hdr
fields are take
n from like this, it would be absolutely critical to reconstruct the Arc<T>
if these were Some
so as to not leak the header refcount. seems pretty subtle and easy to mistake. might be less bad with a Rav1dFrameDataMaybeData::drop_state()
or something, but ouch.
let _ = mem::take(&mut f.seq_hdr); | ||
let _ = mem::take(&mut f.frame_hdr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same "easy to improperly discard *_hdr
" issue here. though, on_error
probably ought to use the same hypothetical Rav1dFrameData::drop_state
.
gave a swing at better names, and unfortunately after going and fixing up all the references i realized the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I took a while to look through this.
there doesn't seem to be a substantial effect on runtime or instruction count here, but this does delete many unreachable unwrap()s and might help legibility some. this also reduces a release build's size by about 2.6kb.
So if this doesn't improve performance noticeably, it's definitely going to have to be sound and not depend on std
internals. It is a nice cleanup, though, and if we can figure out a way to do it without transmuting, then it'd be a good idea.
this is a pretty rough patch as-is, mostly wondering what your interest level is here! there's a somewhat more meaningful change to be had by grouping {Rav1dContext, Rav1dTaskContext, Rav1dFrameData(WithHeaders)} together into a single parameter, because the combination of number of parameters, number of calls, and inner calls not being tail calls, does add up to a mild (like.. ~0.5%) amount of overall overhead.
This sounds more promising. This is just from the number of calls and stack usage?
let ss_ver = chroma & (f.sr_cur.p.p.layout == Rav1dPixelLayout::I420) as c_int; | ||
let ss_hor = chroma & (f.sr_cur.p.p.layout != Rav1dPixelLayout::I444) as c_int; | ||
let frame_hdr = &***f.frame_hdr.as_ref().unwrap(); | ||
let frame_hdr = &f.frame_hdr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which lines are you referring to? The ss_ver
, ss_hor
stuff? Or the f.frame_hdr
stuff you changed?
|
||
// Safety: we just asserted that seq_hdr and frame_hdr are Some, and Option<Arc<T>> has the | ||
// same layout as Arc<T>. further, `Rav1dFrameData` and `Rav1dFrameDataWithHeaders` have | ||
// the same layout with repr(C) ensuring their field ordering is identical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be beneficial to later remove the #[repr(C)]
s from a lot of these types (see #1339), so in general it'd be best to not depend on it if we don't need to.
src/internal.rs
Outdated
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) struct Rav1dFrameData { | ||
pub seq_hdr: Arc<DRav1d<Rav1dSequenceHeader, Dav1dSequenceHeader>>, | ||
pub frame_hdr: Arc<DRav1d<Rav1dFrameHeader, Dav1dFrameHeader>>, | ||
pub content: Rav1dFrameContent, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you just made each of these fields references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could work, and i think i see why you ask - it's a short path from there to a change like this without a transmute involved. i was hesitant to do that because of the extra pointer chasing, though it's probably not actually all that bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this works, we can also remove the Arc<DRav1d<R, D>>
parts and just store &R
, so we won't have the Arc
deref overhead (although that will hopefully go away soon).
this is a rather broad change, but the overall intention is to move
f.seq_hdr.as_ref().unwrap()
andf.frame_hdr.as_ref().unwrap()
as far up the call tree as the highest unconditional execution of that statement. in most cases these unwraps are truly dead due to their equivalent execution by some caller several frames up, and in some functions (like decode_b and decode_sb) these are redundantly checked many times.there doesn't seem to be a substantial effect on runtime or instruction count here, but this does delete many unreachable
unwrap()
s and might help legibility some. this also reduces a release build's size by about 2.6kb.i didn't see any point that branches the
Some
-ness of theseArc
s, which is particularly interesting. looks like that's all intended to be decided somewhere higher up when scheduling work? it's not exactly clear to me what a principled boundary would look like - "functions inthread_task.rs
may not rely on header presence, called functions in the rest ofrav1d
should be able to"?this is a pretty rough patch as-is, mostly wondering what your interest level is here! there's a somewhat more meaningful change to be had by grouping
{Rav1dContext, Rav1dTaskContext, Rav1dFrameData(WithHeaders)}
together into a single parameter, because the combination of number of parameters, number of calls, and inner calls not being tail calls, does add up to a mild (like.. ~0.5%) amount of overall overhead.