Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share configuration between collections #290

Open
LucasPickering opened this issue Jul 21, 2024 · 17 comments
Open

Share configuration between collections #290

LucasPickering opened this issue Jul 21, 2024 · 17 comments
Labels
feature New behavior to enable things that were previously not possible

Comments

@LucasPickering
Copy link
Owner

Did you search for existing issues already?
Yes :)

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

It should be possible to import elements of a collection from one file to another. For example, if you use the same authentication process for several microservices, and each service has its own slumber.yml file, it would be nice to reuse the chain/request that provides authentication across all collections, so you only have to write it once.

Describe the solution you'd like
A clear and concise description of what you want to happen

A potential solution would be to support JSONRef is some parts of the collection. For example:

#common.yml
chains:
  username:
    source: !command
      command: [whoami]
    trim: both
  password:
    source: !prompt
      message: Password
    sensitive: true
  auth_token:
    source: !request
      recipe: login
      trigger: !expire 12h
    selector: $.form

requests:
  login: !request
    method: POST
    url: "{{host}}/anything/login"
    headers:
      Accept: application/json
    body: !form_urlencoded
      username: "{{username}}"
      password: "{{chains.password}}"
#slumber.yml
chains:
  auth_token: !reference common.yml#/chains/auth_token

requests:
  get_users: !request
    name: Get Users
    method: GET
    url: "{{host}}/get"
    authentication: !bearer "{{chains.auth_token}}"

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered

  • Inheritance, where a file can declare extends: common.yml and transparently use everything from it
    • Might be more magical and complicated to implement

Additional context
Add any other context or screenshots about the feature request here

@LucasPickering LucasPickering added qol Improvements that make usage smoother, without introducing new functionality feature New behavior to enable things that were previously not possible and removed qol Improvements that make usage smoother, without introducing new functionality labels Jul 21, 2024
@anussel5559
Copy link
Contributor

@LucasPickering Is this a feature you'd still like to get added? I would love to look in to implementing this next (after I wrap up the ChainSource::Select stuff!).

When I initially started using slumber for testing my work's APIs this was one of the first "ah, bummer" things I hit (we use a micro-service architecture with shared auth - as you noted would be one of the biggest advantages of such a feature).

@LucasPickering
Copy link
Owner Author

@anussel5559 Yes definitely, but this is a very significant feature. The first step is figuring out a design for this. Do you have any thoughts on how you would want it to work?

@anussel5559
Copy link
Contributor

anussel5559 commented Sep 4, 2024

yea! happy to brainstorm here.

Some of the complication/intricacies I see are:

  • resource ids have to be globally unique
  • a Collection is a stand-alone concept, having it's own resources. These map 1:1 to a config file, and the rest of the app has no concept of multiple collections.
  • probably other things?

I think the easiest approach for users to grok might be one where they can build a inherits from concept, with a extends keyword or similar.

One approach for dealing with the id concerns is making the shape of the extends resource an object containing a relative file path and an assigned id - when we parse the extended collection, the resource ids in that extended collection are then prefixed with the assigned collection id thereby creating globally unique resource ids.

This allows the user to intuit how to reference a resource across collections. We can additionally panic/quit parsing if we detect any collection with a duplicate id as part of the Collection hierarchy.

This might look like this:

--- base.yaml
profiles:  
  works:
    name: This Works
    data:
      host: https://httpbin.org
      username: xX{{chains.username}}Xx
      user_guid: abc123
chains:
  username:
    source: !command
      command: [whoami]
    trim: both

--- service1.yaml
extends: 
  - file: 'base.yaml' # relative filepath
    id: base
profiles:
  works:
    name: This Works
    data:
      host: https://httpbin.org
      username: xX{{base.chains.username}}Xx
      user_guid: abc123
chains:
  password:
    source: !prompt
      message: Password
    sensitive: true
requests:
  login: !request
    method: POST
    url: "{{host}}/anything/login"
    authentication: !basic
      username: "{{username}}"
      password: "{{chains.password}}"

The link is explicit in the target collection, and the collection can extend multiple additional collection.

There's still some hand-waving to think through here, but hopefully this is a thread to start pulling on?

@LucasPickering
Copy link
Owner Author

LucasPickering commented Sep 5, 2024

@anussel5559 This is a good start. My first thought is that you use the words "extends" and "inheritance" but this feels more like "import" and "composition" to me. You're declaring an external resource (the "extended" collection) and assigning it an alias, then reference that alias selectively to import data from the external resource. To be clear, I prefer this approach over "true" inheritance (where everything from the parent is implicitly brought in), just want to be clear on terminology. I used an inheritance structure for a previous project env-select, and it worked alright there but it opens up a rats nest of both behavior and implementation issues (e.g. how deep do you go when merging data? do arrays concat or overwrite? etc).

With your design, I'm curious how you would reference things other than chains from the imported file. E.g. if you wanted login to be a common recipe in base.yaml, could you bring the entire recipe into service1.yaml with a single import statement?

This feels functionally similar to the option I've been leaning toward of using URIs to reference objects between . I would take a look at the openapiv3 crate. The OpenAPI spec makes heavy use of this pattern, allowing you to easily reference complex objects within a file or between files. (Note: In my original post I called this JSONRef but after doing more research I'm not sure if that's exactly what OpenAPI uses. It looks like JSONRef is an extension of JSONSchema, and Slumber collections certainly aren't JSONSchema compliant).

The key to this in openapiv3's API is the ReferenceOr type. With an equivalent in Slumber, any field that we want to be a referencable we would just wrap in ReferenceOr. For example, the Recipe struct might look like:

pub struct Recipe {
    #[serde(skip)] // This will be auto-populated from the map key
    pub id: RecipeId,
    pub name: Option<String>,
    pub method: Method,
    pub url: ReferenceOr<Template>,
    pub body: Option<ReferenceOr<RecipeBody>>,
    pub authentication: Option<ReferenceOr<Authentication>>,
    pub query: ReferenceOr<Vec<(String, Template)>>,
    pub headers: ReferenceOr<IndexMap<String, Template>>,
}

There are certainly some design decisions to be made around what should/shouldn't be referenceable, but that's the general idea.

Example

This is an extension of the example in the original post, to show some more use cases:

#common.yml
chains:
  username:
    source: !command
      command: [whoami]
    trim: both
  password:
    source: !prompt
      message: Password
    sensitive: true
  auth_token:
    source: !request
      recipe: login
      trigger: !expire 12h
    selector: $.form

requests:
  login: !request
    method: POST
    url: "{{host}}/anything/login"
    headers:
      Accept: application/json
    body: !form_urlencoded
      username: "{{username}}"
      password: "{{chains.password}}"

  get_current_user: !request
    method: POST
    url: "{{host}}/users/current"
    headers: !reference #/recipes/login/headers
    authentication: !bearer "{{chains.auth_token}}"
#service.yml
chains:
  auth_token: !reference common.yml#/chains/auth_token

requests:
  get_current_user: !reference common.yaml#/recipes/get_current_user

  get_users: !request
    name: Get Users
    method: GET
    url: "{{host}}/get"
    authentication: !bearer "{{chains.auth_token}}"

Pros

  • Explicit
  • Uses existing standards (URIs) and patterns (from OpenAPI)
  • Powerful

Cons

  • Implementation is complex
  • We need to explicitly declare what can be referenceable
  • Complicates and slows down collection loads

Implementation

The implementation for this I think would look something like:

  • Create ReferenceOr type and stick it in various places throughout the collection definition
  • Implementation deserialization for ReferenceOr, which would involve URI parsing
    • In the past I had checked if the url crate would work for this, but iirc that's specifically only for web URLs so it doesn't support URIs where the root is a file path (or no root at all, for same-file references)
  • Add a post-deserialize step to the collection that resolves all references
    • This would be fallible, in case any references are invalid
    • We have a minimal version of this in the openapi importer already. It'd be great to share code with that but realistically probably wouldn't work because the reference types are different.

We would also need to figure out how to make this as transparent as possible in the rest of the app. After loading and resolving, references should be transparent to any code that uses the collection.

That's a lot of words. Thoughts?

@anussel5559
Copy link
Contributor

Thank you for adding some precision to my language there! I tend to be a little too flippant with my word choice, so my apologies.

Ah - I DO like this referential approach, but I worry a bit that it might feel awkward to wield. There's a bit of repetition between the two files - in your example the base collection defined the auth_token chain, and that had to be re-defined in the service collection. While the whole chain didn't have to be re-defined (thus the power of the reference) it feels funky / wonky - overall that feels like a minor detail though.

In my head, this import and composition approach doesn't HAVE to deal with merging data and the "realm of assumptions" that comes from trying to combine two datasets together, at least not initially - we could iterate, first allowing chains from a dependent collection to be directly leveraged (since chains don't really have a visual component), then iterate to profiles and recipes. And id name-spacing allows things to combine pretty seamlessly. There's likely a large swath of how the app interprets templates that would have to change to support indexing in to a dependent collection.

That's where I'll leave my thoughts on the import and composition approach though - ultimately I think the referential approach, due to its explicitness, may serve the app the best long-term. That along with having it feel like an OpenAPI $ref is pretty powerful.

@LucasPickering
Copy link
Owner Author

@anussel5559 To me, explicitness is probably the most important consideration for this system. I've spent a lot of time at my job looking at Gitlab CI configuration files, which support an inheritance-ish structure where you can arbitrarily import resources from other files/repos. It means reading CI config turns into a game of repo browsing and ctrl-f-ing. I really want to avoid that.

I think the repetition of the chain definition here is a good thing, because of the explicitness. I look at it as akin to a use statement in Rust; it says "I'm pulling in this external resource to use here". Like Rust, this scheme requires that every external resource that's used is explicitly pulled in. You may not be shocked to learn that I hate wildcard use statements in Rust, and I'd like to avoid anything similar here. While the ID of the chain is repeated here, the definition is not repeated. That means changes to the upstream definition will be reflected downstream automatically.

Another thing I just thought of: Right now, Slumber watches the collection file and reloads automatically when it changes. As part of this scheme, we should consider collecting a list of upstream collection files as well, and watching all of them so that upstream changes automatically reload as well. It might also be worthwhile to include a diffing step to make sure we only reload when things that we actually care about were modified.

@anussel5559
Copy link
Contributor

Awesome - I think that all makes a ton of sense. I'm still happy to take a shot at building this out if that makes sense to you!

@LucasPickering
Copy link
Owner Author

@anussel5559 Yeah go for it! Similar to the the select list, try to break this into a few MRs. Let me know if you want any more input on it.

@anussel5559
Copy link
Contributor

@LucasPickering I wanted to get some thoughts down on the approach here. One thing that's been bugging me is when to do the reference resolution.

I'm starting to feel pretty strongly that during app startup, AFTER deserialization, there should be a step of reference resolution. This way we can collect all the unique files from the reference URIs spread throughout the collection, parse the unique files once and then assign in the appropriate struct based on the URIs fragment. If we place reference resolution inside of deserialization, we run the risk of slowdown, ultimately parsing potentially the same file any number of times.

If that makes sense, I think I'd do something along the lines of:

  1. build ReferenceOr struct, with serialization/deserialization.
  2. Create a CollectionWithReferences version of Collection - this would be the target of serde deserialization from the raw collection file, containing ReferenceOr in the expected fields. This is ideally a wrapper around the existing Collection in a way that avoids having to write two nearly identical Collection structs.
  3. update app startup to parse and then call a resolve_references or similar trait on the CollectionWithReferences, which will ultimately return a Collection (thus the references become transparent to the rest of the app).
    a. This is where the most hand-waving comes in. this reference resolver would need to: collect unique file references, parse (recursively) each of those collection files, walk the list of references and lookup the appropriate struct (via reference fragment) and clone that in to the final Collection that gets returned.

How does that land / jive with your thoughts on the outcomes here?

@LucasPickering
Copy link
Owner Author

@anussel5559 Yeah, I think that's the right way to do it, with an intermediate step between deserialization and usage.

I really don't like the idea of maintaining two versions of the entire collection tree, so I'm thinking it might be best to do all of this in a separate crate that can be generalized. It could use a derive macro to make it easy to mark which fields are resolvable. An example:

#[derive(Debug, Default, Serialize, Deserialize, Resolve)]
pub struct Collection {
    #[reference]
    pub profiles: IndexMap<ProfileId, Profile>,
    #[reference]
    pub chains: IndexMap<ChainId, Chain>,
    #[reference]
    pub recipes: RecipeTree,
    pub _ignore: serde::de::IgnoredAny,
}

and the expanded code would look something like:

#[derive(Debug, Default, Serialize, Deserialize)]
pub struct Collection {
    #[reference]
    pub profiles: IndexMap<ProfileId, Profile>,
    #[reference]
    pub chains: IndexMap<ChainId, Chain>,
    #[reference]
    pub recipes: RecipeTree,
    pub _ignore: serde::de::IgnoredAny,
}

#[derive(Debug, Default, Serialize, Deserialize)]
pub struct __ResolveCollection {
    pub profiles: IndexMap<ProfileId, ReferenceOr<Profile>>,
    pub chains: IndexMap<ChainId, ReferenceOr<Chain>>,
    pub recipes: ReferenceOr<RecipeTree>,
    pub _ignore: serde::de::IgnoredAny,
}

impl Resolve for __ResolveCollection {
    type Output = Collection;

    fn resolve(self) -> Result<Self::Output, ResolveError> {
        Ok(Collection {
            profiles: self.profiles.into_iter().map(|(key, value)| Ok((key, value.resolve()?))).try_collect()?,
            chains: self.chains.into_iter().map(|(key, value)| Ok((key, value.resolve()?))).try_collect()?
            recipes: self.recipes.resolve()?,
            _ignore: self._ignore,
        })
    }
}

This definitely adds complexity to the implementation but I think encapsulating all that in a separate library will make it much more maintainable. Are you interested in doing that? If not I'm happy to do the library. Also happy to collaborate on it.

@anussel5559
Copy link
Contributor

yea, that's exactly how I was thinking of it! I'm still happy to build that library, though I'll probably move somewhat slow. I found a VERY similar crate (OptionalStruct) that nearly does what we want though it is strict in that it only supports wrapping with the Option enum. But that will be a good template for a more generalized macro that wraps struct fields in any enum that implements a resolve trait or similar.

@anussel5559
Copy link
Contributor

Alright - I've got a fully working implementation of this macro here! I still have some documentation, test cases, and publishing to work out. The applicable_generic file under the main tests dir details how I think we'd leverage the macro.

Effectively, we'll have to implement the ResolveToBase trait on whatever enum we choose to wrap the Collection in (initially the Reference arm of the resolution match could just be unimplemented to allow us to iterate and such). Eventually we'd want to implement a ResolveRef trait or similar for each type that may have a reference. In my head, this method would:

  • parse the URI string (or use the parsed variant if we build in uri parsing as part of deserialization)
  • lookup the referenced resource in the appropriate, parsed referenced collection and return that (or panic / return an Err result)

Then, all that has to happen is:

  • serde deserializes collection in to enum-ified Collection (leverage the macro)
  • collect / de-dupe all external file references, deserialize them (new method)
  • call .build on the enum-ified Collection (built in to enum-ified collection, once we implement ResolveToBase trait)

Lemme know what you think so far!

@LucasPickering
Copy link
Owner Author

@anussel5559 I'm on vacation so I haven't had time to look through all the code of enumify_struct, but my first concern is that the reference resolution still has to be written manually. I think that should be handled by the macro.

@anussel5559
Copy link
Contributor

no rush! I'm not sure I follow though - the macro is designed to take in any enum, so it doesn't seem like it could ever know about how to resolve all the variants of its wrapping enum to the underlying struct type without being told what to do.

@LucasPickering
Copy link
Owner Author

@anussel5559 I'm actually not even sure if what I'm suggesting is possible. I'm going to play around with it a little bit and see if I can figure out something that works

@LucasPickering
Copy link
Owner Author

Update on this: I've spent the last month or so poking around with a variety of solutions. I have something very primitive that's working, but I'm questioning whether this is the right solution, or potentially if I should replace YAML altogether with a different solution that can handle references (as well as other qol improvements) natively. So I haven't forgotten about this, it's just slow going.

@anussel5559
Copy link
Contributor

Nice! I'm still happy to help here however I can, just let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New behavior to enable things that were previously not possible
Projects
None yet
Development

No branches or pull requests

2 participants