-
Notifications
You must be signed in to change notification settings - Fork 905
Allows to introspect Python modules from cdylib: first step #3977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for moving this forward! The idea of using custom data sections is new to me. I see the upsides of it, though I am slightly worried by the extra complexity of needing to worry about linker details in yet another way.
I agree that having the macros generate file(s) is unlikely to be the right solution 👍
For the library which converts the metadata to In fact, I wonder if for option (3) explored here then the using a test to generate and update the
If we agree that using a test to update stubs is a good solution, then I think the choice between runtime code like (2) and data segments like (3) is probably just influenced by whatever is easier for us to implement. We might even be able to swap back and forth between these two options as an implementation detail as we learn. |
Yes! To have played a bit with both, runtime code like (2) is way easier (the difference between this MR and #2454 is quite significant). If I try to summarize the pros of each approach: Approach 3: add introspection to the cdylib and let maturin write the stubs on build:
Approach 4: use a test to write/update stubs:
|
|
You make a very good point that automated tests to update a One thing that I expect is that Overall I don't have a good sense of whether option 3 or 4 is better. In an ideal world we might offer both options. Which one do you think would meet your needs better at present? Maybe we start by implementing that and we learn a lot by doing so! |
I would love to avoid people to handwrite customization in stubs because it makes automatically updating the stubs when Rust code is changed very hard. Imho automatically updating stubs to reflect changes in the Rust code is the main value proposition of auto-generating stubs in the first place. An idea: Add entry points in PyO3 macros to extend the stubs. For example (rought idea, not sure about the actual details): #[pymodule]
#[py(extra_stub_content = "
K = VarType('T')
V = VarType('V')
")]
mod module {
#[pyclass(stub_parent = [collections::abc::Mapping::<K,V>])]
struct CustomDict;
#[pymethods]
impl CustomDict {
#[pyo3(signature = (key: K), return_type = V | None)]
fn get(key: &Bound<'_, PyAny>) -> Option<Bound<'_, PyAny>> {
}
}
}would generate K = VarType('T')
K = VarType('V')
class CustomDict(collections.abc.Mapping[K,V]):
def get(key: K) -> V | None: ...This way stubs would stay auto generated but can be improved by the author. A possible way to mix options 3 and 4:
|
|
Agreed that having the proc macros be able to collect all the necessary information would be nice. I think only time will tell whether they can meet all user needs! I'm slightly wary of coupling to cc @messense do you see any concerns with adding this to So what's next steps here? Do you want me to start reviewing this code, or will you push more first? Regarding the data sections, I happened to hear yesterday that UniFFI's proc macros can do something similar about shipping definitions in the shared library, so it might be interesting to look at / ask them how that was implemented. |
No concern, I think a |
|
Thank you!
Yes! My hope is to cover as many as possible.
Additionaly to
Thank you! I'm going to have a look at it.
I think the current draft already shows the relevant direction, a very high level code review to check if it's going in the good direction would be welcome. Maybe wait for me to have a look at UniFFI, I might change a bit this MR if I find interesting things there. Thank you! |
|
This is very exciting, looking forward to being able to generate type stubs! Currently we have this lengthy and hard to maintain Python script for doing so, which we have to update by hand: https://github.com/Chia-Network/chia_rs/blob/main/wheel/generate_type_stubs.py This would be a major improvement. Happy to help out however I can (testing, implementation, whatever) as time allows, to hopefully get this out the door 😄 |
|
@Rigidity Thank you! I plan to work on this MRs to get the basics done. Then there will be a lot of features to incrementally add on it (support for all PyO3 features...) so help will coding and testing will be much welcome! |
e82fc35 to
cb4cfa7
Compare
|
Sorry for the very long reaction delay (a lot of priorities + vacations).
I had a look at uniffi, they basically use the same approach as us: embedding the metadata in the binary and then parsing the binary using the same @davidhewitt If you have time, may you have a quick look at the MR to see if the global design goes into a good direction? If yes, I will fix a lot of shortcut I took and get the MR ready for review. |
ed37f96 to
2f0d8dc
Compare
davidhewitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes I'm happy with us proceeding with this approach!
I'm definitely convinced by technical direction here for the generation process; the main risk I still see is how to give users the full power to customise the stubs with generic args etc.
I think we can learn by that piece-by-piece as we proceed.
I think so too; I also imagine we might want to strip the custom link section after the stubs have been extracted, it feels like it's probably easier to do that by having a distinct section. |
Thank you! I agree on the risk. My guess is that we will introduce a set of macro arguments for that, but getting them right won't be easy. |
6ee4940 to
82e672f
Compare
|
Note: I am more than happy to rebase this MR if someone is willing to review it |
|
@davidhewitt do you know if you or any other maintainers will have time in the foreseeable future? |
|
Do you have any estimates on how much this will increase the size of the binary, especially for larger projects? Did you consider using something more concise like CBOR since this data doesn't need to be human-readable once it's embedded? |
|
@bschoenmaeckers This is a good point. My assumption is that most people that care about binary size will strip the binary before publishing it but it might not be always true. Moving to CBOR is a great idea but might make the data building code a bit more complex (it already quite abuses the const expression evaluation mechanism). |
Let's get this merged first and look into optimizing the format if necessary in the next iteration. |
|
Yes. I finally have some promising Fridays ahead so I will do my utmost to review this in the next week or two. It is long overdue and deserves to move forwards. |
|
@Tpt do you wanna rebase in response to #3977 (comment) ? |
|
@davidhewitt Amazing! I am quite packed this week but I will do the rebase early next week. @abrisco Thanks for the ping |
d35274f to
683e3be
Compare
683e3be to
a5e309a
Compare
|
With sincerest apologies my childcare fell through today so I did not get the productive day I had hoped for. Next week is disrupted for other reasons. This PR is still my top priority to review however I think the realistic timeframe is now Friday in two weeks' time. |
davidhewitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have finally reviewed. This looks good to me to move forward with, and I am very sorry I blocked this for so long.
Just a few small tidy ups, and then let's merge and proceed with building this out 👍
|
|
||
| #[derive(Default)] | ||
| struct ConcatenationBuilder { | ||
| elements: Vec<TokenStream>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these are either paths (referring to string constants), or just chunks of strings. Would it make sense to build an enum over these instead of using TokenStream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thank you for spotting this. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted back my code to an enum on String and TokenStream because in the follow MR on type signature I will need to inject more complex expressions than just a path. Is it fine with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, we'll stick with that and can always reconsider later 👍
| let mut content = ConcatenationBuilder::default(); | ||
| self.add_to_serialization(&mut content); | ||
| let content = content.into_token_stream(pyo3_crate_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seems to me that the content is defined to be JSON payloads matching the Chunk enum in pyo3-introspection? Maybe shall we document / link to that, it took me a while to find...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I have added a line to the module doc comments. is it good enough?
|
@davidhewitt Thank you so much for the review. I have applied your changes suggestions and rebased the MR. Except if you prefer for me to do something else first, my next step is going to rebase and polish the type signature extraction. I have a very rough draft on my hard drive. |
davidhewitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! Let's do that in incremental follow-ups; this PR is long overdue merge so let's get some progress now 👍
|
Thanks again for all your patience while I went through a very busy time 😮💨 |
|
I have nothing to say besides |
|
Thanks to everyone that was involved!!! |
This is a first step to introspect Python modules built by PyO3
A missing piece in the story listed in #2454 is how tools like Maturin move the introspection information generated how by PyO3 into to type stubs files included in the built wheels.
I see three approaches for it:
pyo3-macrosgenerate a file with the stubs after having processed all macros of the crate. This has the advantage of being self-contained in the crate but falls short in cases like python classes declared in a crate but exposed in an other crate: there is no guarantees that proc macros of a crate and its dependencies are compiled in the same process and that proc macros will still be able to write files in the future (like with proposal to run them a WASM sandbox).__pyo3_stubs_MODULE_NAMEfunction However, for the build system to execute it, a compatible Python interpreter must be present to link with and a compatible CPU or VM to run it, making generation when doing cross-compilation very hard. I guess it's what Python Interface (.pyi) generation and runtime inspection #2454 was heading toward.Architecture:
pyo3_data0section that contains a JSON "chunk" with the introspection data. Code inpyo3-macros-backend/src/introspection.rs. I had to do some bad hack to generate the segments properly via Ruststaticelements.PYO3_INTROSPECTION_IDconstants, allowing the code building the JSON chunk to get the global id of eg. a classCviaC::PYO3_INTROSPECTION_ID. This allows chunks to refer to other chunks easily (eg. to list module elements). A bad hack is used to generate the ids (TypeId::ofwould have been a nicer approach but is not const on Rust stable yet).0at the end ofpyo3_data0is a version number, allowing breaking changes in the introspection format.pyo3-introspectioncrate parses the binary usinggoblin(library also used by Maturin), fetches all thepyo3_data0segments (only implemented for macOS Match-O in this experiment), and builds nice data structures.pyo3-introspectioncrate would implement ato_stubsfunction converting the data structures to Type stubs.pyo3-introspectionhas an integration tests doing introspection of thepyo3-pytestslibrary.Current limitations:
FromPyObject::type_inputinto an associated constant or a const function and similarly forIntoPy::type_output. This is mandatory in order to allow to make use of them in thestaticvalues added to the generated binary.