-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BespokeFit client #351
BespokeFit client #351
Conversation
Hi @j-wags , @mattwthompson , @Yoshanuikabundi I wasn't sure who to ask for a review here so feel free to ignore if I got the wrong person! The main part I need feedback on is if I should keep the old methods on the executor and add deprecation warnings to them or if I can get away with removing them and having this be a major version change, happy to do either though. Also, any feedback generally would be great! |
To be frank I'm not really sure where to start reviewing this or how to provide feedback; I skimmed some of the diff while nodding along, but the code alone doesn't do much to get me thinking about where I should really focus (and these sort of details kinda need to be spoonfed to me). Also you're the owner of this repo so it's your playground to do with what you please https://github.com/openforcefield/status?tab=readme-ov-file#openff-maintainer-dashboard |
I'll give this a review today, mostly with an eye on the API change/deprecation questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the docs update, it'd be good to ensure that they reflect best practices - so examples/docs pages should be updated to avoid things that are removed/emit deprecation warnings.
Thanks @mattwthompson and @j-wags! I think I have covered everything now but it would be great if you could give the docs a quick look over to make sure I didn't miss an old use of the executor! |
openff/bespokefit/executor/client.py
Outdated
self._session.headers.update({"bespokefit-token": settings.BEFLOW_API_TOKEN}) | ||
retry = Retry(connect=retries, backoff_factor=backoff_factor) | ||
adapter = HTTPAdapter(max_retries=retry) | ||
self._session.mount("http://", adapter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit more reading I think enabling https is more of a deployment detail in that users need to have a certificate to enable this with uvicorn but there are a few changes we need to make to enable this option:
- expose a setting which points to the certificate and key pair which is used when starting the app
- change the
BEFLOW_GATEWAY_ADDRESS
to expect the protocol (http/s) to be part of the address - mount the HTTPAdapter to
http
to cover both protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit more reading I think enabling https is more of a deployment detail in that users need to have a certificate to enable this with uvicorn but there are a few changes we need to make to enable this option:
- expose a setting which points to the certificate and key pair which is used when starting the app
I'm not sure about the first point - if it were the case that users had to provide a certificate/key pair, or do something special with uvicorn, I'd expect to see a mention of those in the QCFractal or Alchemiscale server deployment docs, since those communicate securely. But neither certificate nor key pair are mentioned in the QCF deployment docs. Somehow QCFractal does everything securely by default. Same with Alchemiscale. So I might need more convincing that enabling https is a deployment detail that requires additional work by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point I would have expected that as well, I checked Alchemiscale and it turns out you can use something like traefik to automatically do this for you without having to implement it in the package. You can see that this is part of the deployment docker file in alchemiscale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bennybp what are you using for handling TLS on your QCFractal deployments? We we traefik
to handle this for alchemiscale
, but curious what you've been using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. All the instances (and other projects) that are on that server run behind traefik, which handles all the routing and certificates. QCFractal doesn't have direct support for SSL itself (at least I don't see it as on option for waitress
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @jthorton. Everything looks good, except I'm still concerned about sending the tokens (or any other sensitive data) over raw http. I did some searching on my own and asked ChatGPT, and it seems like passing sensitive data between machines does require https. I don't have the knowledge to recognize whether this is happening in a context where we don't need https, so I've asked Dotson to weigh in.
I think it's generally wise to offload handling of TLS to a TLS termination proxy, such as Since Do we have docs on how to deploy |
That said, if a user deploys So, this comes down to decisions by an individual user/deployer as to how they plan to expose their If a user is deploying Does this help clarify paths forward? |
Thanks @bennybp and @dotsdl for the response this definitely helps and hopefully clears things up for @j-wags.
Yes this is a great idea and something I want to add and started in #342 , but as you point out most of the time users won't need this as they can run it on a local machine, but hopefully we can document how to do this on AWS as we have with the ASAP deployment. |
Yes, thank you both for the clarification. I'm fine with the client functionality being added, but I don't think the token field makes sense if the two situations where it's being used are:
So I'm happy to consider this PR without any use of tokens and making no security guarantees (and a documented warning that using this over the open internet is not secure). Or I'm happy to have tokens with https-by-default to enable secure communication. But I don't want to put this out with seemingly-secure tokens that aren't actually secure. |
Who owns this project now? @jthorton is currently listed as the contact https://github.com/openforcefield/status?tab=readme-ov-file#openff-maintainer-dashboard |
Josh is indeed currently the owner of Bespokefit and could unilaterally merge this if he wanted. However this (to me) is a serious enough issue that I'd bring it up with the lead team if this went ahead in its current state. The context around this PR is that ASAP is building software pipelines to discover new drugs, and they must implement data security to avoid massive IP/legal trouble if the data leaks. Using structurally insecure token authentication presents a very large risk for them, and has the possibility of catching us in the blast radius if something goes wrong. And this is before the issue of us pushing insecure token authentication to OUR partners as well. |
Yeah I could see users doing option 2 rather than option 3 which is to deploy behind treafik and thinking that the server is secure when its not, but I am not sure if its possible for us to automatically enforce/ enable https by default without adding a lot of complexity. I think the best thing we can do is to provide the docker file with treafik and detailed deployment docs outlining how this works and cases when it is secure similar to alchemiscale. We could also have warnings emitted if the user sets a authentication token and tries to connect to a http address rather than https and warn that its not secure? |
To recap our discussion in the Newcastle/bespoke meeting today - Josh's proposal to emit a warning when a client connects to anything except We didn't talk about steps forward for the dockerfile - I'd prefer that go into a separate PR given that the warning-on-http plan gets this to a mergeable state, and we'll need to do some thinking about the devops/testing/documentation of a docker deployment pathway. |
Thanks @j-wags I have added the warning along with a test please feel free to suggest a changing to the wording of the warning I wasn't quite sure of how urgent to make the warning! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic. Thanks @jthorton. This is good to merge, and I'm thinking it's a significant enough feature that it should be release 0.4.0. Any notes to add about reverse compatibility, or should previously working commands still work? Please update the releasenotes to list any gotchas here, but if that's not a concern this is good to merge.
I'll assume that I should cut the release once this is merged but let me know if it should wait for any other PRs.
Thanks @j-wags, there should be no changes to past commands or scripts as we export all of the old functions still but run them through the new client feature and yes a 0.4.0 version would be great to mark this feature! |
Description
This PR introduces a simple bespokefit client which splits out the logic of communicating with the executor. This should make it easier to submit and retrieve calculations from an executor on another machine using the python API similarly to Alchemiscale or QCArchive. To enable this we also expose the
BEFLOW_GATEWAY_ADDRESS
which can be used to provide the location of an executor on another machine which is currently not possible.We also introduce some basic authentication which checks that the value of the new setting
BEFLOW_API_TOKEN
matches between the server and the client value before responding to requests.These features should make it easier to provide a light weight bespokefit-client package in future for applications which only need to interface with a bespokefit executor.
Todos
Notable points that this PR has either accomplished or will accomplish.
Questions
should we keep the old API points and raise a deprecation warning if they are used?Status