Skip to content

Add scoped thread-local encoding and decoding contexts to cstore. #30140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2015

Conversation

michaelwoerister
Copy link
Member

With this commit, metadata encoding and decoding can make use of thread-local encoding and decoding contexts. These allow implementers of serialize::Encodable and Decodable to access information and
datastructures that would otherwise not be available to them. For example, we can automatically translate def-id and span information during decoding because the decoding context knows which crate the data is decoded from. Or it allows to make ty::Ty decodable because the context has access to the ty::ctxt that is needed for creating ty::Ty instances.

Some notes:

  • tls::with_encoding_context() and tls::with_decoding_context() (as opposed to their unsafe versions) try to prevent the TLS data getting out-of-sync by making sure that the encoder/decoder passed in is actually the same as the one stored in the context. This should prevent accidentally reading from the wrong decoder.
  • There are no real tests in this PR. I had a unit tests for some of the core aspects of the TLS implementation but it was kind of brittle, a lot of code for mocking ty::ctxt, crate_metadata, etc and did actually test not so much. The code will soon be tested by the first incremental compilation auto-tests that rely on MIR being properly serialized. However, if people think that some tests should be added before this can land, I'll try to provide some that make sense.

r? @nikomatsakis

@arielb1
Copy link
Contributor

arielb1 commented Dec 1, 2015

AST autoserialization causes a ton of metadata bloat and a decent amount of compile-time bloat. I am not sure I would like to stay with MIR autoserialization if that would be a problem. I am not aware of the current astencode manual serialization causing problems (I looked at that topic and it looks like the most annoying part in serializing MIR is dealing with ConstVal and its annoying NodeId contents, but that should probably be refactored anyway).

@oli-obk
Copy link
Contributor

oli-obk commented Dec 2, 2015

@arielb1: I'm improving ConstVal to get rid of the NodeId stuff. See some explanation of the problems with that here: https://internals.rust-lang.org/t/removing-const-eval-duplication-of-labor-between-librustc-and-librustc-trans/1786

@michaelwoerister
Copy link
Member Author

Sorry for not responding earlier, was ill last week.

Some thoughts:

  1. It's not auto-serialization in principle that causes the bloat but its unfortunate interaction with the verbose RBML format. We don't have to stick with RBML in the future.
  2. The Ty and Substs values in the MIR use the tyencode/tydecode infrastructure so they won't contribute to data bloat even now. The rest of the MIR data structures will though.
  3. While discussing this with @nikomatsakis before starting the implementation, we came to the conclusion that it would be great for maintainability and refactorability if we could get rid of as much serialization boilerplate code as possible. I think that making the autoserialization support in the compiler more powerful so it can support space-efficient encodings and deal with context-dependent (de-)serialization is the way to go (although I'm not convinced that the TLS-based implementation of the latter is the best solution in the long term).

@arielb1
Copy link
Contributor

arielb1 commented Dec 7, 2015

I don't recall having maintainability problems with the old astencode manual-decoding code - it mostly just works. Our MIR format is not really supposed to rapidly change.

astencode for Substs is fairly bloated. Luckily, it is not used that often. I would however prefer to encode (AdtDef,Substs) as a Ty.

@michaelwoerister michaelwoerister force-pushed the tls-encoding branch 2 times, most recently from 0b1e599 to e254867 Compare December 8, 2015 18:46
@nikomatsakis
Copy link
Contributor

I'm pretty happy with this PR. Writing serialization code is painful --- not so much a source of bugs (though I've had a few from tyencode), but really annoying, and to little purpose. I don't think autogenerated serialization needs to be inefficienct. Any inefficiences we see can easily be overcome by implementing serialization by hand for a particular type -- you can then use the autogenerated code as a building block, by writing things like

// serialize:
serialize((&self.foo, &self.bar)); // serialize first 2 fields, but not self.span

// deserialize:
let (foo, bar) = deserialize();
MyType { foo: foo, bar: bar, span: DUMMY_SP }

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Collaborator

bors commented Dec 8, 2015

📌 Commit e254867 has been approved by nikomatsakis

@bors
Copy link
Collaborator

bors commented Dec 9, 2015

⌛ Testing commit e254867 with merge 56670b1...

@bors
Copy link
Collaborator

bors commented Dec 9, 2015

💔 Test failed - auto-win-msvc-64-opt

With this commit, metadata encoding and decoding can make use of
thread-local encoding and decoding contexts. These allow implementers
of serialize::Encodable and Decodable to access information and
datastructures that would otherwise not be available to them. For
example, we can automatically translate def-id and span information
during decoding because the decoding context knows which crate the
data is decoded from. Or it allows to make ty::Ty decodable because
the context has access to the ty::ctxt that is needed for creating
ty::Ty instances.
@michaelwoerister
Copy link
Member Author

@bors r=nikomatsakis

@bors
Copy link
Collaborator

bors commented Dec 9, 2015

📌 Commit f65823e has been approved by nikomatsakis

bors added a commit that referenced this pull request Dec 9, 2015
With this commit, metadata encoding and decoding can make use of thread-local encoding and decoding contexts. These allow implementers of `serialize::Encodable` and `Decodable` to access information and
datastructures that would otherwise not be available to them. For example, we can automatically translate def-id and span information during decoding because the decoding context knows which crate the data is decoded from. Or it allows to make `ty::Ty` decodable because the context has access to the `ty::ctxt` that is needed for creating `ty::Ty` instances.

Some notes:
- `tls::with_encoding_context()` and `tls::with_decoding_context()` (as opposed to their unsafe versions) try to prevent the TLS data getting out-of-sync by making sure that the encoder/decoder passed in is actually the same as the one stored in the context. This should prevent accidentally reading from the wrong decoder.
- There are no real tests in this PR. I had a unit tests for some of the core aspects of the TLS implementation but it was kind of brittle, a lot of code for mocking `ty::ctxt`, `crate_metadata`, etc and did actually test not so much. The code will soon be tested by the first incremental compilation auto-tests that rely on MIR being properly serialized. However, if people think that some tests should be added before this can land, I'll try to provide some that make sense.

r? @nikomatsakis
@bors
Copy link
Collaborator

bors commented Dec 9, 2015

⌛ Testing commit f65823e with merge eebf674...

@bors bors merged commit f65823e into rust-lang:master Dec 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants