Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remotes.http: Support dvc push for http remotes #3343

Merged
merged 8 commits into from
Feb 21, 2020

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Feb 17, 2020

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
    remote: Document HTTP remote push changesΒ dvc.org#1006

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

WIP PR for #3247

  • Adds support for POST uploads to http remotes
  • Adds support for HTTP Basic and Digest authentication methods (for both upload and download operations)
    • basic_auth and digest_auth boolean options added to the http(s) remote configurations (digest takes precedence if both are set to true)
    • When no auth method is specified (or both are set to false), existing behavior is used (no auth method, assumes all HTTP endpoints are public for both upload and download)
    • Username/password combinations can be configured in the same way as ssh remotes
    • Password can be set in config or prompted on command line via ask_password config option

Still TODO:

  • Add functional/remote tests
  • Docs PR

@pmrowla
Copy link
Contributor Author

pmrowla commented Feb 17, 2020

I have been testing locally against a simple flask app with /public /basic /digest endpoints for each possible auth method: https://gist.github.com/pmrowla/6ed79cc0c4d20e26ab8e86d25570c983

@pmrowla pmrowla changed the title WIP: Support dvc push for http remotes [WIP] remotes.http: Support dvc push for http remotes Feb 17, 2020
@pmrowla pmrowla requested a review from efiop February 17, 2020 06:52
@pared pared self-requested a review February 18, 2020 15:49
dvc/config.py Outdated Show resolved Hide resolved
dvc/remote/http.py Outdated Show resolved Hide resolved
dvc/remote/http.py Outdated Show resolved Hide resolved
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@efiop efiop requested a review from skshetry February 19, 2020 13:22
tests/utils/httpd.py Outdated Show resolved Hide resolved
dvc/remote/http.py Outdated Show resolved Hide resolved
@efiop efiop requested a review from skshetry February 20, 2020 00:42
@pmrowla pmrowla changed the title [WIP] remotes.http: Support dvc push for http remotes remotes.http: Support dvc push for http remotes Feb 20, 2020
@efiop
Copy link
Contributor

efiop commented Feb 20, 2020

For the record: test failures are unrelated and will be fixed in #3368

pmrowla and others added 8 commits February 20, 2020 10:47
- uploaded files are sent as chunked encoding POST data

Fixes iterative#3247
- username, password, ask_password options work the same way as for
  ssh remotes
- basic_auth (bool) and digest_auth (bool) options added for http(s)
  remotes
- digest_auth takes precedence over basic_auth if both are enabled
- HTTP remotes now tested locally using a SimpleHTTPServer instance that
  allows reading/writing to a temp directory
Co-Authored-By: Saugat Pachhai <suagatchhetri@outlook.com>
Co-Authored-By: Ruslan Kuprieiev <kupruser@gmail.com>
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment.

self.http_server = http_server
self.method_name = request.function.__name__

def get_url(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DS has a valid point here, the original method is static, shouldn't we make all of them non-static now, that it is required by this particular change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I wrote it like this because it was done the same way in TestRemoteSSHMocked (which overrides static SSHMocked.get_url() from tests/remotes.py).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it was only a question, its not severe, because any potential problems should be detected on the build stage. For me, it can stay as is.

@efiop efiop requested review from skshetry and pared February 21, 2020 03:18
@efiop efiop merged commit 68e543f into iterative:master Feb 21, 2020
@@ -100,7 +163,13 @@ def _request(self, method, url, **kwargs):
kwargs.setdefault("timeout", self.REQUEST_TIMEOUT)

try:
res = self._session.request(method, url, **kwargs)
res = self._session.request(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla, I'm unable to make it work with Digest auth (I'm using the script that you provided on the gist).

Copy link
Member

@skshetry skshetry Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, when I change this to:

Suggested change
res = self._session.request(
res = requests.Session().request(

it works. Something's wrong in our cached session perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured it out. When I try to push, we send a HEAD request, after which the script sets a cookie. And, then all POST requests fails. So, I first tried clearing cookies with self._session.cookies.clear() and it worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we need to set auth when on creating sessions?

Copy link
Contributor Author

@pmrowla pmrowla Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is a side effect of how my test app in the gist was configured?

By default, flask uses client-side cookies for session data, so flask-httpauth does as well. But for a web server running digest auth to be properly secured, it needs to be handled server-side: https://flask-httpauth.readthedocs.io/en/latest/#security-concerns-with-digest-authentication

If I modify the test app to actually use server-side sessions (https://gist.github.com/pmrowla/0615f162d1308cab4f429b6efafe276a) the existing remote code works without needing to clear any cached cookies

If I set session.auth in the http remote at the time we first create the session instead of setting it per request, I still see the same issue making requests against the original test app. So I'm not sure if it's just that requests.auth.HTTPDigestAuth won't work properly when talking to improperly configured flask apps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set session.auth in the http remote at the time we first create the session instead of setting it per request, I still see the same issue making requests against the original test app.

@pmrowla, I tried that too but, does not work. It's unclear what's the best thing to do here.

If it's probably only flask-httpauth, let's ignore for now then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants