Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ureq 3.x #762

Merged
merged 171 commits into from
Aug 13, 2024
Merged

ureq 3.x #762

merged 171 commits into from
Aug 13, 2024

Conversation

algesten
Copy link
Owner

@algesten algesten commented Jul 3, 2024

This PR/branch is a complete, from the ground up rewrite of ureq. For Rust there clearly is a role for a sync, minimal dependency HTTP/1.1 client. The ureq 2.x code has gradually been iterated to its current state, but some early design decisions means we are struggling to make some changes we would like to have (such as being transport agnostic). While ureq's surface API is largely okay (with some warts), internally there is a lack of good separation of concerns and clear contracts between the parts of the library.

This PR starts from a complete clean slate, but the intention is to gradually bring back code from the 2.x branch where it makes sense to do so. Especially I'd like to bring back the majority of the tests to ensure backwards compatibility. The intention is also to preserve as much of the API surface as possible, only making changes in areas where there currently are clear problems (such as making unnecessary allocations). The intention is to also have a 1-1 feature parity with 2.x and the MSRV will not move much.

This PR/effort might not work out. I might discover things that are irreconcilable with the current ureq 2.x, or I might run out of steam to complete the project. There is also no timeline for completion, might take months, might take years. Ideally there are more people wanting to engage in the project – please talk to me here or send an email!

Here are some things I been thinking about fixing:

  • Separation between HTTP/1.1 logic and transport – this effort has started in hoot – a project which aims to provide a Sans-IO HTTP/1.1 protocol underpinning.
  • Bring your own transport. The transport should be completely plugin based and possible for the user to write their own.
  • http crate. While I will try to retain a lot of ureq's API, adhering to a common Rust HTTP API is ultimately a better idea.
  • Less allocations – less Strings, Vec. Try to work more with borrows/lifetimes.
  • Potentially webasm. As a consequence of making the transport possible to swap out, we can maybe make ureq work in a webasm context (needs more thought and input!)
  • Improve middleware support. The current middleware chains are OK, but we want them to be even better.
  • Improve the Error type and error-handling.

Please engage in discussion below if there are things I have omitted or ideas of where ureq could go!

@algesten algesten force-pushed the feature/ureq3 branch 3 times, most recently from a4233f3 to 1b9cb3b Compare July 6, 2024 10:29
@algesten
Copy link
Owner Author

algesten commented Jul 6, 2024

For a while I was considering the main API entry point would be taking the request http::Request<Body> by reference, i.e. the most central signature would be something like:

pub fn run(request: &Request<impl IntoBody>) -> Result<Response<RecvBody>, Error> { ... }

However &Request precludes sending any body would require &mut, such as those implementing std::io::Read (File comes to mind). Therefore I'm changing to not taking by ref. The internals of hoot still runs by ref, which means we should be able to avoid ever needing to clone it.

pub fn run(request: Request<impl IntoBody>) -> Result<Response<RecvBody>, Error> { ... }

@amkillam
Copy link

amkillam commented Jul 6, 2024

Just wanted to comment to say that I am very excited about the potential to use ureq with WASM! I managed to make ureq 2.x compile for WASM with some conditional compilation of rustls things, but am currently facing implementing further conditional compilation of TCPStream and some other things to make requests execute successfully. This rewrite was a great idea and will be really useful for my projects.

@algesten
Copy link
Owner Author

algesten commented Jul 6, 2024

@amkillam I haven't done any WASM myself, so I don't know what do's and dont's there are. I'm not sure I can go full no_std, because living without Box<dyn X> could prove a challenge.

I just tried to compile this branch with cargo build --target wasm32-unknown-unknown – and that's fine. Does that mean it's ok?

If I add that to the CI, are there more lints/things to check for?

@amkillam
Copy link

amkillam commented Jul 6, 2024

If compilation works then it should be most of the way there. Could you provide an example of how ureq usage would change in 3.x? I can try to make a working example of ureq usage in WASM based on that, and share my results after.

@amkillam
Copy link

amkillam commented Jul 6, 2024

If compilation works then it should be most of the way there. Could you provide an example of how ureq usage would change in 3.x? I can try to make a working example of ureq usage in WASM based on that, and share my results after.

Also I'm not totally sure about tests specific to WASM yet, but I think once I've (hopefully) created a working example of usage in WASM, I'll be in a better position to comment or maybe contribute if you'd like!

@algesten
Copy link
Owner Author

algesten commented Jul 6, 2024

If compilation works then it should be most of the way there. Could you provide an example of how ureq usage would change in 3.x? I can try to make a working example of ureq usage in WASM based on that, and share my results after.

Also I'm not totally sure about tests specific to WASM yet, but I think once I've (hopefully) created a working example of usage in WASM, I'll be in a better position to comment or maybe contribute if you'd like!

I will pick you up on that, but it's a bit too early still. I can't even make ureq 3.x do a regular request just yet. Hopefully in a couple of days.

@rmg-x
Copy link

rmg-x commented Jul 8, 2024

and the MSRV will stay the same.

I really really think the MSRV policy should at least be relaxed and communicated clearly with a major version bump. My personal opinion is that users shouldn't be relying on this in the first place but such is life.

I'm not saying we should bump MSRV anytime a new Rust version is released but allow for some modernization when it makes sense.
E.g. #725

@algesten
Copy link
Owner Author

algesten commented Jul 8, 2024

and the MSRV will stay the same.

I really really think the MSRV policy should at least be relaxed and communicated clearly with a major version bump. My personal opinion is that users shouldn't be relying on this in the first place but such is life.

I'm not saying we should bump MSRV anytime a new Rust version is released but allow for some modernization when it makes sense. E.g. #725

The problem is developer environments installed via Linux (or similar) package managers instead of rustup. They tend to have a much slower upgrade cadence. I want ureq to be available to as many users as possible, and I personally don't think the code get that much worse by not having "modern" constructs like let-else.

You're right about once cell – there we can avoid a dependency. So far I've managed to avoid using it, we'll see how that goes when I get more into the tests. I've been forced to bump MSRV to 1.63 because rustls bumped, so right now I'm leaning towards 1.63.

But nothing is set in stone.

@rmg-x
Copy link

rmg-x commented Jul 8, 2024

installed via Linux (or similar) package managers instead of rustup. They tend to have a much slower upgrade cadence

That's valid. Just as an example, the creator of once_cell supports at least the previous eight Rust versions which is roughly ~1 year behind current. To me, that seems reasonable.

Also as an aside, compatibility for crates and Rust versions is tracked here if you are interested: https://lib.rs/stats#rustc

@algesten

This comment was marked as outdated.

@algesten
Copy link
Owner Author

algesten commented Jul 9, 2024

                         │                                                  
                                                                            
                         │                                                  
                                        ┌─────────┐                         
                         │              │         │                         
                                        │ CONNECT │                         
                         │              │         │                         
                                        └─────────┘                         
            ┌─────────┐  │  ┌─────────┐ ┌─────────┐             ┌─────────┐ 
            │         │     │         │ │         │             │         │ 
            │   TLS   │  │  │ CONNECT │ │   TLS   │             │   TLS   │ 
            │         │     │         │ │         │             │         │ 
            └─────────┘  │  └─────────┘ └─────────┘             └─────────┘ 
┌─────────┐ ┌─────────┐     ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ 
│         │ │         │  │  │         │ │         │ │         │ │         │ 
│TcpStream│ │TcpStream│     │TcpStream│ │TcpStream│ │  SOCKS  │ │  SOCKS  │ 
│         │ │         │  │  │         │ │         │ │         │ │         │ 
└─────────┘ └─────────┘     └─────────┘ └─────────┘ └─────────┘ └─────────┘ 
                         │     Proxy       Proxy       Proxy       Proxy    
    HTTP       HTTPS          CONNECT     CONNECT      SOCKS       SOCKS    
                         │      HTTP       HTTPS        HTTP       HTTPS    
                                                                            
                         │                                                  

Actually. I need TLS also for SOCKS.

@algesten
Copy link
Owner Author

algesten commented Jul 9, 2024

I've been debating in my head how to think about Send/Sync. ureq 2.x is very liberal, Agent, Request and Response are all thread safe.

In ureq 3.x, I gone from thinking "let's just stay on the thread" (i.e. nothing is Send/Sync), to thinking, that maybe some parts should be. One core idea is to avoid allocations and thus not require ownership of any request body – that can be orthogonal with being Send/Sync. However I think we can get around it.

This is how the types map from ureq 2.x -> 3.x.

  • ureq2::Agent -> ureq3::Agent
  • ureq2::Request -> ureq3::RequestBuilder
  • ureq2::Response -> http::Response<RecvBody> (naming TBD)

ureq 3.x deliberately drops the type names Request and Response, leaving them to the http-crate. Thus http::Request becomes a central type giving us a new method on Agent that is:

    pub fn run(&self, request: http::Request<impl AsBody>) -> Result<http::Response<RecvBody>, Error> {

This lives side by side with the existing ureq2 API. These are equivalent:

New http-crate centric API:

   let data = vec![0_u8, 1, 2, 3, 4];
   let request = http::Request::post("https://yaz").body(&data).unwrap();
   let response = ureq::run(request).unwrap();

Old ureq2 API. Notice that ureq::post(...) produces a RequestBuilder, and we will make sure this is Send/Sync since that makes it possible to partially constructs requests with both 2 and 3 API.

   let data = vec![0_u8, 1, 2, 3, 4];
   let response = ureq::post("https://yaz").send_bytes(&data).unwrap();

impl AsBody above is so far implemented for: &[u8], &str, String, Vec<u8>, &String, &Vec<u8>, &File, &TcpStream, &Stdin, File, TcpStream, Stdin, RecvBody and Response<RecvBody>. Those are not all Send/Sync. However ureq only needs to know about them when it's time to send a request, which means the type Body<'a> – which ISN'T Send/Sync – only exists while calling Agent::run or RequestBuilder::send_bytes. Once those functions return a response, the Body<'a> is not used anymore, meaning ureq doesn't actually enforce any constraints on the user wrt Send/Sync.

@algesten
Copy link
Owner Author

Considering building out a TLS API in ureq3.x to not be in the situation we have in #765

Instead of exposing/re-exporting rustls config (or native-tls), we would abstract over both. The initial abstraction would need support a reasonable amount of minimal config:

  • client certs
  • custom roots
  • sni enable/disable
  • TLS12/13
  • disable certificate verifier (maybe under a dangerous feature flag)

Even if we default to rustls, native-tls might be a better starting point for looking at which config options to include, since rustls can do more than native-tls.

https://docs.rs/native-tls/0.2.12/native_tls/struct.TlsConnectorBuilder.html
https://docs.rs/rustls/0.23.11/rustls/struct.ConfigBuilder.html

@algesten algesten force-pushed the feature/ureq3 branch 2 times, most recently from f365d90 to 965fdff Compare July 11, 2024 08:13
@algesten
Copy link
Owner Author

Got TLS working with both rustls and native-tls. For now I decided to not use the webpki-roots crate instead opting for using rustls-native-certs for both rustls and native-tls. webpki-roots is structured such that it's only compatible with rustls, thus it breaks symmetry to use it.

Question is how many ureq users want the webpki-roots instead?

@algesten
Copy link
Owner Author

@amkillam i think we're at a place where we can try webasm now.

What I'm not clear on is whether webasm can give me a TCP socket style interface and if TLS should be in-or-outside the scope ureq. Let me explain what we got and see if this is a good starting point for tinkering.

Agent::new constructor takes a config, a connector and a resolver. https://github.com/algesten/ureq/blob/feature/ureq3/src/agent.rs#L148-L161 . The config is for stuff like timeouts and TLS. The connector is a composable thing that provides a transport per request (which I get back to below), and the resolver is something that translates hostnames to SocketAddr.

The DefaultConnector has a chain of subconnectors that are run one after another. Each can chose to pass-through or build on top of the previous Transport. https://github.com/algesten/ureq/blob/feature/ureq3/src/transport/mod.rs#L72-L93
Simplified we get a chain of connectors: SOCKS -> TcpStream -> Rustls. If we don't use SOCKS, we get just TcpStream -> Rustls and the scheme, https decides whether there Rustls connector is used.

Connector and Transport are public traits that we can implement to provide some completely different transport. The Transport trait https://github.com/algesten/ureq/blob/feature/ureq3/src/transport/mod.rs#L55-L63 is quite different to io::Read/io::Write – rather working on shared buffers. This is because I have io_uring in mind for the future.

TcpTransport is maybe the cleanest example: https://github.com/algesten/ureq/blob/feature/ureq3/src/transport/tcp.rs#L78

Let's take a concrete example. Let's say webasm transport does not need an explicit resolving, instead you give it some combo (hostname, port, is_tls) – then we can write a dummy Resolver that does nothing and a corresponding Transport opening the connection.

If I remove all default features, it compiles on wasm32-unknown-unknown. If we need to bring in some feature like rustls, we can maybe make it compole through feature flags – needs investigating.

@cpu
Copy link
Contributor

cpu commented Jul 11, 2024

For now I decided to not use the webpki-roots crate instead opting for using rustls-native-certs for both rustls and native-tls. webpki-roots is structured such that it's only compatible with rustls, thus it breaks symmetry to use it.

For unrelated reasons we've been thinking about extending webpki-roots with an optional feature that includes the DER of the self-signed root certificates in addition to the more minimal (and webpki crate focused) trust anchor representation. I think if that were to happen you could use webpki-roots for both TLS backends similar to rustls-native-certs. The main downside (and why it would be opt-in) is the increase in binary size from all the extra (and for rustls, superfluous) data.

If you were open to breaking symmetry between backends the rustls-platform-verifier crate is often the best choice for rustls in particular. The README table breaks it down, but it works like a better rustls-native-certs on most platforms and degrades to rustls-native-certs and webpki-roots as appropriate for other platforms (like Linux/BSD/WASM).

What I'm not clear on is whether webasm can give me a TCP socket style interface and if TLS should be in-or-outside the scope ureq.

I'm very out of the loop with wasm, but have heard folks say they've gotten rustls working there. Certainly you need webpki-roots in this context because there is no platform trust store to load certificates from in that environment (e.g. rustls-native-certs would never be supported in WASM AIUI).

In case it helps colour your decision for support rustls is interested in adding first-class wasm support but it hasn't been done yet and isn't something tested in CI. I've heard anecdotally it works, but it could break :-)

@algesten
Copy link
Owner Author

For unrelated reasons we've been thinking about extending webpki-roots with an optional feature that includes the DER of the self-signed root certificates in addition to the more minimal (and webpki crate focused) trust anchor representation. I think if that were to happen you could use webpki-roots for both TLS backends similar to rustls-native-certs. The main downside (and why it would be opt-in) is the increase in binary size from all the extra (and for rustls, superfluous) data.

That would be a most welcome feature indeed. In a previous life I hugely preferred webpki-roots to the platform, since enterprise sec and others want to MITM by installing their own roots. That's why ureq2.x used the webpki-roots as primary. However I'm older and wiser now, and maybe it's more pragmatic to just go with the local root store as the default choice.

If you were open to breaking symmetry between backends the rustls-platform-verifier crate is often the best choice for rustls in particular. The README table breaks it down, but it works like a better rustls-native-certs on most platforms and degrades to rustls-native-certs and webpki-roots as appropriate for other platforms (like Linux/BSD/WASM).

Looks great! Revocations support is nice, and using the OS-default verifiers probably means even better protection in case there are 0-day updates etc. I think it is worth breaking symmetry for, yes.

I'm very out of the loop with wasm, but have heard folks say they've gotten rustls working there. Certainly you need webpki-roots in this context because there is no platform trust store to load certificates from in that environment (e.g. rustls-native-certs would never be supported in WASM AIUI).

In case it helps colour your decision for support rustls is interested in adding first-class wasm support but it hasn't been done yet and isn't something tested in CI. I've heard anecdotally it works, but it could break :-)

Interesting. I was hoping that by making the transport truly plug-in in ureq3.x this would open for these possibilities. But I'm not sure I'll tinker with it much myself until I have a real need.

@algesten algesten force-pushed the feature/ureq3 branch 2 times, most recently from b55ed8f to 6fc839c Compare July 12, 2024 17:14
@algesten
Copy link
Owner Author

With that last push, ureq 3.x is now feature complete in terms of feature flags.

There are a couple of TODOs I want to deal with and I need to glance through the 2.x code to make sure there aren't functionality areas that I've forgotten about.

Next up is porting all the tests over.

@cpu
Copy link
Contributor

cpu commented Aug 7, 2024

For unrelated reasons we've been thinking about extending webpki-roots with an optional feature that includes the DER of the self-signed root certificates in addition to the more minimal (and webpki crate focused) trust anchor representation.

That would be a most welcome feature indeed.

In case it ends up being helpful I wanted to mention I finished implementing this. It ended up being a separate crate instead of an opt-in feature: https://crates.io/crates/webpki-root-certs

@algesten
Copy link
Owner Author

Moving this to the main branch with individual PRs now.

@algesten algesten merged commit 0e3cbb9 into main Aug 13, 2024
48 checks passed
@algesten algesten deleted the feature/ureq3 branch August 13, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants