Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why "The 'de lifetime should not appear in the type to which the Deserialize impl applies"? #2190

Open
grelative opened this issue Mar 17, 2022 · 11 comments

Comments

@grelative
Copy link

Serde's documentation about lifetime mentioned this:

The 'de lifetime should not appear in the type to which the Deserialize impl applies.

 // Do not do this. Sooner or later you will be sad.
 impl<'de> Deserialize<'de> for Q<'de> {

 // Do this instead.
 impl<'de: 'a, 'a> Deserialize<'de> for Q<'a> {

But didn't give a explanation, I'm curious about what's the drawback of using 'de as the lifetime of type which Deserialize impl applies.

@Mingun
Copy link
Contributor

Mingun commented Mar 17, 2022

Probably https://users.rust-lang.org/t/how-to-end-borrow-in-this-code/72719/3?u=mingun can give you an answer:

...

// `&'static [u8]` in particular implements
// `for<'any> BufferedInput<'any, ()>`
impl<'long: 'short, 'short> BufferedInput<'short, ()> for &'long [u8] {
    fn read(&mut self, _buf: ()) -> &'short [u8] {
        self
    }
}

...
You may wonder why we had to spell this out, and the answer is that lifetimes in trait parameters are invariant -- they can't automatically shrink, like you're used to with &[u8], say. The above change makes the implementation covariant by breaking up the lifetime of the implemented trait ('short) and the lifetime of the implementer ('long) -- every concrete &'long [u8] gets an implementation for every 'short that is shorter.
...

But I totally agree that such an explanation should be placed in the documentation.

@grelative
Copy link
Author

grelative commented Mar 17, 2022

impl<'long: 'short, 'short> BufferedInput<'short, ()> for &'long [u8] {
    fn read(&mut self, _buf: ()) -> &'short [u8] {
        self
    }
}

Yeah, this imp of BufferedInput has the benefit of satisfying the HRTB for<'any> &'static [u8]: BufferedInput<'any, ()>.
But it does not apply to Deserialize, there is nothing can represent a lifetime smaller than anyone, so we can not form a HRTB for<'any> Q<'???>: Deserialize<'any> that the following impl will satisfy.

 impl<'de: 'a, 'a> Deserialize<'de> for Q<'a> {}

@ia0
Copy link

ia0 commented May 14, 2024

I'm also curious to see an example that does not compile with impl<'de> Deserialize<'de> for Q<'de> but does with impl<'de: 'a, 'a> Deserialize<'de> for Q<'a>. Because I'm under the impression that it should always be possible to instantiate Deserialize with a proper lifetime (even if it means to do an explicit reborrow &*serialized_data before eventually calling deserialize()).

If there is not such example, then I would suggest removing this warning from the documentation and adding a #[serde(lifetime = "'a")] attribute to control the Deserialize<'a> lifetime in the generated implementation.

@dtolnay
Copy link
Member

dtolnay commented May 14, 2024

picoHz/mediatype#15

@ia0
Copy link

ia0 commented May 14, 2024

Thanks for the link! However I believe this a recursive problem. If there was a #[serde(lifetime = 'a)] I would expect the following to also work:

#[derive(Deserialize)]
#[serde(lifetime = 'a)]
pub struct FileMetadata<'a> {
    pub file_path: PathBuf,
    pub last_updated: SystemTime,
    #[serde(borrow)]
    pub mediatype: MediaType<'a>,
}

What I want to see is a problem using <MediaType<'a> as Deserialize<'a>>::deserialize() directly, because I believe it is always possible to choose the right lifetime when calling deserialize(). In the case of using derive(Deserialize), choosing the right lifetime is done with the serde(lifetime = 'a) attribute.

@dtolnay
Copy link
Member

dtolnay commented May 14, 2024

I believe it is always possible to choose the right lifetime when calling deserialize().

This is not the case. Here is a more complete example.

use serde::de::{Deserialize, Deserializer};
use std::borrow::Cow;

#[derive(Debug)]
struct Pair<'a> {
    array: Cow<'a, [&'a str]>,
    string: &'a str,
}

impl<'de> Deserialize<'de> for Pair<'de> { // FAIL
//impl<'de: 'a, 'a> Deserialize<'de> for Pair<'a> { // ok
    fn deserialize<D: Deserializer<'de>>(_deserializer: D) -> Result<Self, D::Error> {
        unimplemented!()
    }
}

pub fn repro<'de, D: Deserializer<'de>>(deserializer: D) -> Result<(), D::Error> {
    let mut pair = Pair::deserialize(deserializer)?;

    let string = pair.array.join(" ");
    pair.string = &string;
    println!("{:?}", pair);
    Ok(())
}

@ia0
Copy link

ia0 commented May 14, 2024

Thanks! This was what I was looking for.

To sum up the reproduction requirements:

  1. The deserializable type must be invariant on its lifetime. (If covariant, there's no problem.)
  2. The deserializer must be a function parameter such that the lifetime must be at least the function scope. (If the deserializer is built in the same function, there's no problem.)
  3. The deserialized object wants a lifetime smaller than the function scope. (If the lifetime doesn't need to be smaller, there's no problem.)

Breaking any of the requirements makes the example work. The requirements (1) and (3) can't be messed with because they are user-controlled. The requirement (2) could be circumvented by allowing lifetime parameters in functions to be smaller than the function scope. This doesn't work in Rust today, but assuming it would be possible, there would be another issue, which is that the function returns D::Error which must refer to the long lifetime. This part can't be worked around. Here is a sketch:

pub fn repro<'de, D>>(deserializer: D) -> Result<(), D::<'de>::Error> // hypothetic syntax
where D: for<'a> (Deserializer<'a> where 'de: 'a) // hypothetic syntax
{
    let mut pair = Pair::deserialize(deserializer)?; // this would not compile
    // because it return D::<'a>::Error instead of D::<'de>::Error

So for types invariant in their lifetime, one has to dissociate the lifetime of the deserializer (covariant) from the one of the deserialized type (invariant) by using 'de: 'a. However, for covariant types (the most common ones), I still expect that conflating those 2 lifetimes (which have the same variance) to not be an issue. I can see though why the serde documentation doesn't want to mention variance when telling people to avoid using the same lifetime. Because those who need to be warned are often those who would not understand variance. For those who do, I guess they will just be surprised, search for an issue about it, find this, and see that this warning doesn't concern them.

@dtolnay
Copy link
Member

dtolnay commented May 14, 2024

for covariant types (the most common ones), I still expect that conflating those 2 lifetimes (which have the same variance) to not be an issue.

I disagree with this assessment. Conflating lifetimes in the impl for a covariant type makes the covariant type unusable in downstream impls, some of which may involve invariant lifetimes. So there are not any types that shouldn't follow the guidance on the website. The guidance on the website is accurate ("sooner or later you will be sad") and the rule is easy to follow.

struct Bad<'a>(&'a str);
impl<'de> Deserialize<'de> for Bad<'de> {...}

#[derive(Deserialize)]
struct Thing<'a> {
    array: Cow<'a, [&'a str]>,
    bad: Bad<'a>,
}

@ia0
Copy link

ia0 commented May 15, 2024

I believe this is a shortcoming of serde due to the use of intermediate/helper traits when implementing the deserialize function. This is fine and good practice by itself, however Rust assumes lifetime parameters of traits to be invariant, which prevents such code architecture (that's a shortcoming of Rust, but see the first section below for a generic work-around).

Here is a formal proof of my claim using the Rust type system as a logic system. I copy and document it in the rest of this comment.

Covariance helper

This could be a crate (with a Contravariant trait too). It fixes the Rust issue of lifetime parameters of traits being invariant (see DeserializeCov below for an example) but has some ergonomics issues, because the trait is not implemented by the type itself but by a proof that the type is covariant.

/// Covariant type-level functions.
///
/// # Safety
///
/// - Self::Type<'a> must be covariant in 'a.
/// - Self::cast() must be the identity function.
unsafe trait Covariant {
    type Type<'a>;
    fn cast<'b, 'a: 'b>(x: Self::Type<'a>) -> Self::Type<'b> {
        // SAFETY: Self::Type<'a> is covariant in 'a.
        unsafe { core::mem::transmute(x) }
    }
}

/// Proves that a type-level function is covariant.
///
/// `impl_covariant!(Name = for<'a> Type)` proves that `Type` (which may mention `'a`) is covariant
/// in `'a` and names the proof `Name`. This name is needed to safely cast `Type`.
macro_rules! impl_covariant {
    ($n:ident = for<$l:lifetime> $t:ty) => {
        pub enum $n {}
        // SAFETY: Self::cast() is the identity function, which proves that Self::Type<'a> is
        // covariant in 'a by type-checking.
        unsafe impl Covariant for $n {
            type Type<$l> = $t;
            fn cast<'b, 'a: 'b>(x: Self::Type<'a>) -> Self::Type<'b> {
                x
            }
        }
    };
}

This macro only works when the type is not generic. When the type is generic like Foo<'a, T>, then one needs to use unsafe. Ideally the safe-transmute project will make all of this unecessary.

Deserialize trait for covariant types

/// Simpler version of the Deserialize trait for covariant types.
///
/// There's no need for 2 lifetimes, because the result can always be cast to a shorter lifetime.
trait DeserializeCov: Covariant {
    fn deserialize<'a, D: Deserializer<'a>>(deserializer: D) -> Result<Self::Type<'a>, D::Error>;
}

/// Implements the more complex version of the Deserialize trait for covariant types.
fn deserialize<'a, 'de: 'a, T: DeserializeCov, D: Deserializer<'de>>(
    deserializer: D,
) -> Result<T::Type<'a>, D::Error> {
    Ok(T::cast(T::deserialize(deserializer)?))
}

This last function is actually the proof that one only needs fn<'a, D: Deserializer<'a>>(D) -> Result<Foo<'a>, D::Error> to implement the more general fn<'a, 'de: 'a, D: Deserializer<'de>>(D) -> Result<Foo<'a>, D::Error> when Foo<'a> is covariant in 'a.

Fixing the "Bad" example

pub struct Bad<'a>(pub &'a str);
impl<'a> Deserialize<'a> for Bad<'a> {
    fn deserialize<D: Deserializer<'a>>(_: D) -> Result<Self, D::Error> {
        unimplemented!()
    }
}
impl_covariant!(BadCov = for<'a> Bad<'a>); // Bad is covariant
impl DeserializeCov for BadCov {
    fn deserialize<'a, D: Deserializer<'a>>(deserializer: D) -> Result<Self::Type<'a>, D::Error> {
        Bad::deserialize(deserializer)
    }
}
// Helper for serde (I didn't manage to use `deserialize()` directly and had to instantiate it myself).
fn deserialize_bar<'a, 'de: 'a, D: Deserializer<'de>>(
    deserializer: D,
) -> Result<Bad<'a>, D::Error> {
    deserialize::<'a, 'de, BadCov, D>(deserializer)
}

#[derive(Deserialize)]
pub struct Thing<'a> {
    #[serde(borrow)] // not sure why this is needed, but it doesn't seem like it's hiding the issue
    pub array: Cow<'a, [&'a str]>,
    #[serde(deserialize_with = "deserialize_bar")] // commenting this shows which issue is being fixed
    pub bad: Bad<'a>,
}

Conclusion so far

There are a few things to consider when implementing impl<'a> Deserialize<'a> for Foo<'a> (let's call it unified-lifetime) instead of the more general impl<'a, 'de: 'a> Deserialize<'de> for Foo<'a> (let's call it bound-lifetime):

  • Is Foo<'x> covariant in 'x? If yes, then all good. If no, continue below.
  • Is D: Deserializer<'x> covariant in 'x? I initially thought that it would be enough, but that's wrong. I forgot that deserialize() returns an error that may mention 'x. So there are a few cases:
    • Is D::Error: 'static or is the error quickly converted to something static (e.g. with map_err() or unwrap())? If yes, then all good (I expect a similar proof as above to be possible, casting the deserializer before calling deserialize()). If no, continue below.
    • Is D::Error contra-variant in 'x? If yes, then all good (I expect a similar proof as above to be possible, casting the deserializer before calling deserialize() then casting the error with map_err()). If no, continue below.
    • We have 'x free in D::Error. In this case, one needs to use bound-lifetime renaming 'de to 'err because it's the lifetime free in the error and 'a to 'ok because it's the lifetime free in the result, and it makes sense for the user to instantiate those 2 lifetimes differently.
  • Neither Foo<'x> nor D: Deserializer<'x> are covariant in 'x. In this case, one needs to use bound-lifetime.

The remaining question for me is whether something is missing from that conclusion to characterize the cases when it's ok to use unified-lifetime versus bound-lifetime.

That said, this conclusion is theoretical and not always practical (requiring the Covariant trait defined above with non-optimal ergonomics). In particular, it is not practical when (not exhaustive):

  • Using generics (either on the deserializer or the type).
  • Nesting within a type using derive(Deserialize).

So from a practical perspective, unless you're ready to fight Rust type system, you better use the bound-lifetime. That said, I'll continue using the unified-lifetime for learning purposes (and test the limits of Rust type system). But that's my problem.

ia0 added a commit to ia0/serde that referenced this issue May 15, 2024
This PR is for discussion and related to serde-rs#2190. As discussed there, the benefit is not obvious (except for providing more control to the user).

With this PR, a user can use `#[serde(lifetime = 'a)]` to control the lifetime of the implementation of `Deserialize`. In particular, the user is now able to control all the implementation parameters when using `#[serde(bound = "", lifetime = 'a)]`.

See https://github.com/ia0/wasefire/blob/ccf7bc30ac8440b30981b2f39b734de10d1a037c/crates/protocol/src/lib.rs#L65 for an example in real code.
@Mingun
Copy link
Contributor

Mingun commented May 16, 2024

I would also like to note that the name 'de, in my opinion, was chosen unsuccessfully. You might assume this is the lifetime of the data owned by the Deserializer, but this is not true. This is the lifetime of the content which is leaked thru the Deserializer into the target type. So the better name would be 'content:

let content: &'content str = "...";
let de = Deserializer::new(content); // borrows `content`

struct Type<'a>(&'a str);// expect to borrow from `content`, not from `de`

let data: Type<'data> = Type::deserialize(de)?;
// de is `Drop`ped, but `data` is live because it borrow from `content` rather that `de`

@ia0
Copy link

ia0 commented May 16, 2024

Yes, 'de is the data which is why it is covariant all the way from 'data to 'a:

  • The data with lifetime 'data is borrowed by the deserializer with lifetime 'de shorter than 'data.
  • The data is cut into pieces to either:
    • Build the error with the same 'de lifetime.
    • Build the deserialized type borrowing the pieces at lifetime 'a shorter than 'de.

Currently, the deserialize() implementation does the reborrow directly on the pieces it cuts, which is why it needs both 'de and 'a. If either of the following is true, then only one lifetime is needed:

  • The deserialized type is covariant, in which case it can be converted from 'de to 'a after deserialization.
  • The deserializer is covariant and its error is contra-variant, in which case it can be converted from 'de to 'a before deserialization and its error can be converted from 'a to 'de after deserialization.

If both are false, then two lifetimes are needed and the conversion must happen inside the deserialization implementation when the lifetimes are only present covariantly (essentially &'de [u8]).

For generality, the serde documentation recommends to always use two lifetimes (in case both the deserializer and the deserialized type are invariant).

My claim is that the deserialized type is always covariant right after deserialization, it's just the Rust type system that doesn't know. Because right after deserialization, the only occurrences of the data lifetime within the deserialized type are pieces of the data, and thus covariant. The problem is that users usually conflate the concept of a structured view into serialized data and the concept of a data structure, by using the same type for both. The typical example is Cow<'a, [&'a str]> as show-cased in #2190 (comment). Rust doesn't know that Cow<'a, [&'a str]> is covariant in 'a when the variant is Cow::Borrowed because it doesn't statically know the variant just by looking at the type. By dissociating the view type &'a [&'a str] and the data type Cow<'a, [&'a str]>, Rust is able to see that the deserialized type is covariant, because it's just a view into the data. But this is not convenient to use, so most users just cheat and say that Cow<'a, &['a str]> should be deserializable. That's fine.

EDIT: The rest of this comment doesn't work. When implementing Deserialize for &[T], one needs to show that T: 'a when defining type Type<'a> = &'a [T]. But we can't refer to 'a in the impl generics. So the type for which we implement Deserialize must be a fake/skeleton one (just a way to lookup an implementation), but this goes against the requirement that the lookup works when using the type itself (as needed by the derive macro).

That said, I think serde decided to go with this design because when it was initially written, there was no generic associated type. Today, I wonder whether defining the Deserialize trait as follows would not be a better option:

trait Deserialize {
    type Type<'a>;
    fn deserialize<'a, 'de: 'a, D: Deserializer<'de>>(deserializer: D) -> Result<Self::Type<'a>, D::Error>;
}

This would embed into the trait definition the fact that the deserialize implementation is converting from 'de to 'a, instead of asking all implementers to pay attention and use 2 lifetimes.

On the other side, now we need to ask the implementers to have Self be of the form Self::Type<'a> for some impl-generic 'a or for all 'a if Self doesn't borrow the data, such that derive(Deserialize) may have Field::deserialize(deserializer)? of type Field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants