Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considering redefining jboolean as either an bool or our own #[repr(u8)] enum #19

Closed
rib opened this issue Jan 25, 2023 · 4 comments · Fixed by #23
Closed

Considering redefining jboolean as either an bool or our own #[repr(u8)] enum #19

rib opened this issue Jan 25, 2023 · 4 comments · Fixed by #23

Comments

@rib
Copy link
Contributor

rib commented Jan 25, 2023

While looking at jni-rs/jni-rs#400 it was noted that there's a risk that code can easily cause undefined behaviour for the JVM by setting boolean array elements to values other than 0 or 1.

This isn't something that's specific to array elements though; anywhere that Rust returns a jboolean to Java it needs to ensure the byte value is either 0 or 1.

Since jboolean is currently simply an alias for u8 (pub type jboolean = u8;) then it's very easy to return arbitrary invalid boolean values to the JVM.

My initial thought was that we could represent a jboolean as:

#[repr(u8)]
enum jboolean {
    False=0,
    True=1
}

and it was then also noted by @argv-minus-one that this is essentially how the Rust Reference defines bool: https://doc.rust-lang.org/reference/types/boolean.html

My initial reservation with simply defining jboolean as an alias for bool is that Rust doesn't generally guarantee much in terms of stable ABI and I wouldn't like to potentially depending on an implementation detail. e.g. the Rust Reference says "this book is not normative. It may include details that are specific to rustc itself, and should not be taken as a specification for the Rust language."

It looks like historically bool was assumed to be defined in terms of the C _Bool type (and several FFI projects have made that assumption) which wouldn't technically guarantee that bool were one byte on all platforms, ref: https://internals.rust-lang.org/t/rusts-stability-story-should-apply-to-bool-in-ffi/6305/8

Considering the follow up discussion here: rust-lang/rust#46176 it looks like the consensus was that bool is defined to be the same as C's _Bool but it was also noted that that means it's one byte one all platforms supported by Rust.

A bool is also explicitly documented as being one byte here: https://doc.rust-lang.org/std/mem/fn.size_of.html

I think it's fair to conclude that Rust's bool type is basically guaranteed to always be 1 byte and to also use the values 1 and 0 for true and false - which is what we want for jboolean

@argv-minus-one
Copy link

JNI jboolean is also defined as being exactly 8 bits in the JNI spec. So, the sizes are definitely the same.

C99 specPDF sections 6.2.5 and 6.3.1.2 imply that 0 and 1 are the only valid bit patterns for a C _Bool. Since Rust bool's set of valid bit patterns is the same as C _Bool, then they must also be 0 and 1. Therefore, it should be safe to reinterpret the bits of a Rust bool as a JNI jboolean.

The definition of jboolean in the JNI spec is accompanied by two constants, JNI_FALSE and JNI_TRUE, with the values 0 and 1 respectively. This implies that those are the only two valid bit patterns for a jboolean, but the JNI spec doesn't actually say that. I doubt a JVM actually will use a value other than 1 to mean “true” in a jboolean, though, because if it did, then C code containing a statement like if (x == JNI_TRUE) (which at least one programmer does) would behave incorrectly. Therefore, it's probably safe to reinterpret the bits of a JNI jboolean as a Rust bool, probably, maybe 😅.

@rib
Copy link
Contributor Author

rib commented Jan 26, 2023

I doubt a JVM actually will use a value other than 1 to mean “true”

hehe, I'm definitely not so sure about that :) I'd be willing to bet the opposite I think - that most will interpret anything except zero as true.

cpu instructions are often pretty good at telling you when a result was zero or not zero so I'd imagine it could be common to take advantage of that and implement true as "not 0" which should be perfectly valid for any JVM or a Rust compiler I expect. (because they are allowed to assume that booleans are only 0 or 1 and 1 also happens to be "not 0" which can be simpler to determine.)

So it seems like it would be a perfect opportunity for undefined behaviour in the sense that maybe some implementations really do only consider 1 as true but others consider "not 0" as true and then if you have code that breaks the rules and stores a 4 in a boolean it might behave differently for different implementations.

@rib
Copy link
Contributor Author

rib commented Feb 14, 2023

For reference here, the jni-sys repo has now been moved to the jni-rs organization and we are now in a better position to make this change to the jni-sys crate. (See: #17)

@rib rib transferred this issue from jni-rs/jni-rs Jun 9, 2023
@rib
Copy link
Contributor Author

rib commented Jun 9, 2023

Before really looking to switch jboolean to be an alias for bool I have tried to double check in case there is some guarantee that a JNI/JVM implementation will treat any non-zero value as TRUE - or more importantly that there would be roundtrip guarantees that would mean other u8 values need to be preserved.

Some useful links:

I certainly haven't seen any hint of there being any kind of roundtrip guarantees for other values, more the opposite in fact since it looks like different JVM implementations will do their own 0/1 normalization of booleans at different times - which don't appear to be well specified.

Overall there is no strong consensus on whether values besides 0 and 1 can be stored in a jboolean and so my interpretation of that is simply that it is "undefined" behaviour to store values other than 0 or 1.

In practice, because JVMs have tried to account for how C treats non-zero values as truthy then other values are quite likely to be considered truthy and/or get normalized to 1 but that's not guaranteed.

It does still look to me like bool would be a preferable alias for jboolean instead of u8

@rib rib closed this as completed in #23 Aug 31, 2023
rib added a commit that referenced this issue Aug 31, 2023
JNI only strictly defines two valid values for a `jboolean`
and there's no consensus on whether other values greater than one will
be interpreted as TRUE in all situations.

The safest interpretation is to say that it will lead to undefined
behaviour to pass any value besides zero or one as a `jboolean`.

Addresses jni-rs/jni-rs#400
Closes #19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants