Skip to content

gdrive: use the normal file backend for saved credentials #5554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 8, 2021

Conversation

isidentical
Copy link
Contributor

@isidentical isidentical commented Mar 5, 2021

Resolves #5551, awaiting iterative/PyDrive2#97

@isidentical isidentical force-pushed the drop-json-creds-save-backend branch from c9d6df9 to 9545908 Compare March 8, 2021 08:21
@isidentical isidentical marked this pull request as draft March 8, 2021 08:32
@isidentical isidentical force-pushed the drop-json-creds-save-backend branch from 9545908 to a9ee9ff Compare March 8, 2021 11:54
@isidentical isidentical marked this pull request as ready for review March 8, 2021 12:49
@isidentical isidentical requested a review from efiop March 8, 2021 12:49
Comment on lines +216 to +218
if self._use_service_account:
temporary_save_path = os.path.join(
tempfile.gettempdir(), "google-creds.json"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional logic is needed because the saved file (cached authorization info) is different than the actual service json. The one that we have secret on the CI is the exported raw credentials json file. But the one pydrive's newer versions expect from this is the cached version. So we have to designate the temporary file as the raw credentials instead of the serialized ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't realise that this is litterally /tmp/google-creds.json. Indeed, that's seems a bit odd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical Can we keep it in .dvc/tmp ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to depend on self.repo in this logic (which is how we get .dvc/tmp, self.repo.tmp_dir) since we intend to remove repo access. Do you know a better way to get this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above - I see where it comes from, but I feel we don't need these changes. We can keep everything the way it was before all these changes (cache on CI saved credentials instead of the raw JSON file). Otherwise, if we want to support an env var we should introduce another one, since it has a different semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical tmp_dir will need to be passed as an option to the fs, it is fine. We might also switch to using proper global directories for it (e.g. using appdirs) in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical Wdyt about @shcheklein 's suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache on CI saved credentials instead of the raw JSON file

Can do it. This would require us to switch the CI variable to a serialized version of this raw file @efiop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to a serialized version of this raw file

not raw but the one that GDrive saves into .dvc/tmp/....? right? just to be sure we are on the same page...

Otherwise, if we want to support an env var we should introduce another one, since it has a different semantics.

I still like the idea behind this PR (to make it work with the raw file), @isidentical I think it's fine to create a ticket? It just feels that we need to do it bit differently (var name, path?, change docs, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not raw but the one that GDrive saves into .dvc/tmp/....? right? just to be sure we are on the same page...

I guess we have to find proper names for these 2 different files since it might become very confusing (serialized file / raw file maybe? or cached file / raw file). Yes, that's what I meant by "serialized version of this raw file".

I still like the idea behind this PR (to make it work with the raw file), @isidentical I think it's fine to create a ticket? It just feels that we need to do it bit differently (var name, path?, change docs, etc)

My aim was make the the CI work, so I didn't see this as a feature though can create a ticket.

@efiop efiop merged commit f7d8761 into master Mar 8, 2021
@efiop efiop deleted the drop-json-creds-save-backend branch March 8, 2021 14:30
is_credentials_temp = os.getenv(
GDriveFileSystem.GDRIVE_CREDENTIALS_DATA
)
if self._use_service_account:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GDRIVE_CREDENTIALS_DATA this way has two difference meanings in different situations. I think this is confusing, complicates the code, etc. Can we just do the same as we had before? If we don't change anything and just revert the CI variable value, I think it should work? Or cache the saved credentials instead of the raw JSON file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that logic a bit weird, considering it would throw a random error when somebody tries to run GDRIVE_CREDENTIALS_DATA=... dvc status -c in case of ... is the raw JSON file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I suggest we do a separate env variable - GDRIVE_SERVICE_ACCOUNT_DARA? It's bad to mix two different meanings into the same variable.

considering it would throw a random error when somebody tries to run

handle and show a nice message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle and show a nice message?

It is an arbitrary key error :( Probably won't even worth the effort though if we want, I guess the best place is to that is wrapping around it in the pydrive2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

 Raises:
            ValueError, if the credential type is not :data:`SERVICE_ACCOUNT`.
            KeyError, if one of the expected keys is not present in
                the keyfile.

handle this in the auth.py ServiceAuth() (?) makes total sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gdrive: looking for user credentials when configured with service account
3 participants