Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Backup & Restore with GoogleDrive #1376

Closed

Conversation

thehanimo
Copy link
Contributor

Adds support to backup and restore to Google Drive.
Backup works the same way as "Download Specter backup files" except that it uploads the zip file to Google Drive.
Restore works the same way as "Load Specter backup" but without prompting the user to choose files.

Action Required:

  • .flaskenv should contain the following:
    GOOGLE_OAUTH_CLIENT_ID=
    GOOGLE_OAUTH_CLIENT_SECRET=
    GOOGLE_OAUTH_PROJECT_ID=
    OAUTHLIB_INSECURE_TRANSPORT=1
    
    This has to be generated on the GCP console. Instructions can be found in the prerequisites here.
    NOTE: Authorized JavaScript origins should contain http://localhost:25441 and Authorized redirect URIs should contain http://localhost:25441/settings/backup_to_google_drive/callback.

Things to discuss:

  • Currently, the scope we request from Google drive is https://www.googleapis.com/auth/drive.appdata as recommended here. This means we do not request access to the user's drive folders or files. A possible downside (or maybe an advantage) is that the user will not be able to access this "app data" that we're storing. The link above says:

    The application data folder is a special hidden folder that your app can use to store application-specific data, such as configuration files. The application data folder is automatically created when you attempt to create a file in it. Use this folder to store any files that the user shouldn't directly interact with. This folder is only accessible by your application and its contents are hidden from the user and from other Drive apps.

  • We have access to all versions of the backup file. Currently, we're only accessing the last saved one. We can maybe include a picker to let the users choose this based on timestamp. We might have to download and preview the contents here too.

  • Since we do not have SSL running on localhost, we're disabling OAuth2's requirement of https by setting OAUTHLIB_INSECURE_TRANSPORT=1. Not sure if this has consequences. From here,

    N.B: You should note that Oauth2 works through SSL layer. If your server is not parametrized to allow HTTPS, the fetch_token method will raise an oauthlib.oauth2.rfc6749.errors.InsecureTransportError. Most people don’t set SSL on their server while testing and that is fine.

  • In case the server isn't running on port 25441, Google OAuth won't work as this specific port number is provided in Authorized JavaScript origins and Authorized redirect URIs while creating credentials.

@k9ert
Copy link
Collaborator

k9ert commented Sep 1, 2021

Cool, looks very interesting. In the tests, i see:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/cirrus-ci-build/src/cryptoadvance/specter/__main__.py", line 7, in <module>
    entry_point()
  File "/tmp/cirrus-ci-build/.env/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/cirrus-ci-build/.env/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmp/cirrus-ci-build/.env/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/cirrus-ci-build/.env/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/cirrus-ci-build/.env/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmp/cirrus-ci-build/src/cryptoadvance/specter/cli/cli_server.py", line 138, in server
    init_app(app, hwibridge=hwibridge)
  File "/tmp/cirrus-ci-build/src/cryptoadvance/specter/server.py", line 147, in init_app
    from cryptoadvance.specter.server_endpoints import controller
  File "/tmp/cirrus-ci-build/src/cryptoadvance/specter/server_endpoints/controller.py", line 33, in <module>
    from .settings import settings_endpoint
  File "/tmp/cirrus-ci-build/src/cryptoadvance/specter/server_endpoints/settings.py", line 34, in <module>
    from ..util.google_drive import backup, callback, restore
  File "/tmp/cirrus-ci-build/src/cryptoadvance/specter/util/google_drive.py", line 6, in <module>
    from googleapiclient.discovery import build
ModuleNotFoundError: No module named 'googleapiclient'

Did you forget to list the needed modules in requirements.txt ?

@thehanimo
Copy link
Contributor Author

thehanimo commented Sep 1, 2021

Oops, yes! How do I update the requirements? I manually added

google-api-python-client==2.17.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.5

to requirements.in and ran pip-compile --generate-hashes requirements.in but got this error:

Traceback (most recent call last):
  File "/Users/hani/specter-desktop/.env/bin/pip-compile", line 8, in <module>
    sys.exit(cli())
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/scripts/compile.py", line 458, in cli
    results = resolver.resolve(max_rounds=max_rounds)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/resolver.py", line 173, in resolve
    has_changed, best_matches = self._resolve_one_round()
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/resolver.py", line 278, in _resolve_one_round
    their_constraints.extend(self._iter_dependencies(best_match))
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/resolver.py", line 388, in _iter_dependencies
    dependencies = self.repository.get_dependencies(ireq)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/repositories/local.py", line 75, in get_dependencies
    return self.repository.get_dependencies(ireq)
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/repositories/pypi.py", line 231, in get_dependencies
    self._dependencies_cache[ireq] = self.resolve_reqs(
  File "/Users/hani/specter-desktop/.env/lib/python3.8/site-packages/piptools/repositories/pypi.py", line 155, in resolve_reqs
    preparer = self.command.make_requirement_preparer(
TypeError: make_requirement_preparer() got an unexpected keyword argument 'wheel_download_dir'

Is this related to jazzband/pip-tools#1228?

@thehanimo
Copy link
Contributor Author

Ran pip install --upgrade --pre pip wheel setuptools pip-tools and it worked!

Copy link
Collaborator

@k9ert k9ert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's enough for the code-review for now.

src/cryptoadvance/specter/util/google_drive.py Outdated Show resolved Hide resolved
)
.execute()
)
return {"success": True}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style of returning stuff is usually common for (rest-) endpoints but not for a function in a utils-class. Consider to let the exception get exposed to the calling context and deal with it there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

.execute()
)
return {"success": True}
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's usually not a good idea to catch "Exception". It usually means you're aware that your code is buggy but you don't want to dive into the details of why that is.
Your probably wrote it for a reason. Reproduce the issue and check which exception it is exactly. Probably some google-drive-specific ones. So catch exactly those exceptions. As a benefit, it's easier to react on those exceptions rather than not knowing what to do because of a generic exception.

The default behaviour of app.logger(...) and then proceeding is usually implemented anyway somewhere up the stacktrace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I wanted to account for any and every exception that google-drive's apis may throw. Even if I am able to manually try a few possibilities, wouldn't there be some exception that I couldn't find? Shouldn't except Exception as e: still be there just in case this happens?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is: Do you know how to deal with those exceptions? Maybe yes. But then figure out the exception hirarchy of google. Probably they have a common Exception they are all derived from. That way you would handle Google-Exceptions, not ALL exceptions. That's an important difference as you might cover e.g. an Out-of-memory-Error and assume it's some Google-Exception.

So in general, it's much better to NOT handle an exception and let it propagate then to suppress it and try to move on (if you don't have a rough understanding why that Exception is thrown).

The kind of generic handling is already built in in certain levels of the system:
https://github.com/cryptoadvance/specter-desktop/blob/master/src/cryptoadvance/specter/server_endpoints/controller.py#L49-L87
As you can see, we're handling 3 different errors slightly different. So in your case (and if you would remove your catching-code) if weird Google-exceptions might be thrown, you end up in Line 77 and a generic error-handler-page (and logging in the logs).
This is the best to figure out what's happening and a proper fix can get delivered based on that information which deals with the details of this issue in a much more informed way,

Rule of thumb: Don't deal with errors you don't know about.

That doesn't mean that you should look out for errors and think about how to deal with them best. But don't assume all Exceptions are Google-Exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That really makes so much sense. Thanks! I'll keep this in mind and make changes.

)
files = response.get("files", [])
if len(files) == 0:
return {"success": False, message: "No backups found."}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "Rest-style" consider to raise your own SpecterError or maybe your own derivation of it. An Exception system is a great thing! Use it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yea. Didn't realise that. Will do!

@settings_endpoint.route("/backup_to_google_drive", methods=["POST"])
@login_required
def backup_to_google_drive():
return backup(app.specter.specter_backup_file(), current_user)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implementations are too "thin". You should put the logic of returning json in this layer and the heavy lifitng in the util-layer. See below!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

window.location = jsonResponse.redirect_url;
return
}
if (jsonResponse.success) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that enough? Could also be {"success": False} (or True) right?
But maybe you should overthink the whole approach here. Http-status-codes are great! Use them for your own purposes. So you can return them in the endpoints as you need.
Get inspired by https://http.cat/

But you don't even have to check for specific codes. You can just remove this part because you already check above for != 200.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into it.

@k9ert
Copy link
Collaborator

k9ert commented Sep 2, 2021

Things to discuss:
Very good questions.

The OAuth scope

I like the separation of the data and i dislike that the user can't access that data. Can we have a "leak" so that the user can download that data in raw-format from google-drive at any time (independent of whether that file is currently restored or not)? That way we can stick to the app-data approach but make it visible to the user as well.

versioning question

Yeah, that would be nice. But let's push that back until we have a reasonable minimum version.

#OAUTHLIB_INSECURE_TRANSPORT=1
I think it's ok as we get redirected from localhost to localhost. However, how would it behave if anyone is using a self-signed-cert? Maybe you can check that via starting it like:
python3 -m cryptoadvance.specter server --config DevelopmentConfig --debug --ssl

Port question

In case the server isn't running on port 25441, Google OAuth won't work as this specific
port number is provided in Authorized JavaScript origins and Authorized redirect URIs while creating credentials.

I'm not sure i understand that. Do you talk about this:

        "redirect_uris": ["http://localhost:25441/"],
        "javascript_origins": ["http://localhost:25441"],
    }
}
SCOPES = ["https://www.googleapis.com/auth/drive.appdata"]
REDIRECT_URI = "http://localhost:25441/settings/backup_to_google_drive/callback"

You could replace the port there with the port from the flask-config, no?

@thehanimo
Copy link
Contributor Author

The OAuth Scope

There is no way a user can access an application's data. So the "leak" is out of the question I think, at least as per what Google docs say. However there is another scope https://www.googleapis.com/auth/drive.file (list of scopes here) that lets us access only those files created by us/shared with us by the user. That seems like what we'd like but opens up the possibility to invalid zip files.
Regardless, another point I forgot to add was was to encrypt these files as @ben-kaufman had said with the initial idea.

OAUTHLIB_INSECURE_TRANSPORT=1 and Port

I tested out with --ssl and things work the same with OAUTHLIB_INSECURE_TRANSPORT=1. We have to set

REDIRECT_URI = "http://localhost:25441/settings/google_oauth/callback"

or

REDIRECT_URI = "https://localhost:25441/settings/google_oauth/callback"

based on whether the server is running with ssl or not.

Also, when creating API creds on the GCP console, we need to define Authorized JavaScript origins as:

http://localhost:25441
https://localhost:25441

and Authorized redirect URIs as:

http://localhost:25441/settings/google_oauth/callback
https://localhost:25441/settings/google_oauth/callback

to allow servers with and without ssl to complete oauth.

I did rename the callback url from http://localhost:25441/settings/backup_to_google_drive/callback to https://localhost:25441/settings/google_oauth/callback since it made more sense. I was thinking it would be better if we could have https://localhost:25441/google_oauth_callback instead but it's not fundamental to the feature.

Anyways, since we have to define these uris in the GCP console (with port numbers), we lose flexibility with Port numbers. If we're using x, it should be listed in Authorized JavaScript origins and Authorized redirect URIs. And since this is specific to the organisation's GCP credentials, we cannot allow users to add custom port numbers to this list. The best we can do is add as many common port numbers as we can.

@thehanimo
Copy link
Contributor Author

@k9ert, I've made changes based on your suggestions. There's one CORS error I tried solving but couldn't. Initially, I used to send a redirect_uri in the json response of /backup_to_google_drive and redirect using js on the client side. When I use HTTP 307 to do this from the server side, I get

TypeError: Origin http://localhost:25441 is not allowed by Access-Control-Allow-Origin.

Also, need a way to figure out if the current server is being run with --ssl so that I can set the global var REDIRECT_URI in util/google_drive.py to either http://x or https://x

@k9ert
Copy link
Collaborator

k9ert commented Sep 8, 2021

CORS errors are nasty. It was some time that i digged into all of that OAuth2/CORS stuff and i feel with you because it's time-consuming, difficult to troubleshoot and a lot of poking and hoping (and then backtracking in order to figure out what exactly made the difference). And i also didn't enjoyed it.
So i probably can't help you much, just keep trying and drink coffee in the longer breaks where you're contemplating about the world of OAuth2.

Whether you run on ssl should be checkable via: app.config["PORT"] --> 80/443

Thank you for your work so far. Looks really interesting. I'll definitely will give it a try today or tomorrow!

Avoids the CORS error that accounts.google.com has with Origin:http://localhost:25441/ by updating HTTP 307 Redirect with a HTTP 401. Redirection is done by the browser instead.
@thehanimo
Copy link
Contributor Author

@k9ert, an update:

The CORS error was inevitable and had nothing to do with the flask server. It was thrown by https://accounts.google.com/o/oauth2/auth/ since they don't allow localhost origins. This stackoverflow question helped. I've moved the redirection back to the browser with an HTTP 401 status code.

app.config["PORT"] was always returning 25441 regardless of --ssl. I ended up using flask.url_for with _external=True to build this redirect url dynamically. Ran it with and without --ssl and it worked.

Also, since you will be running this soon, you do need .flaskenv to contain these:

GOOGLE_OAUTH_CLIENT_ID=
GOOGLE_OAUTH_CLIENT_SECRET=
OAUTHLIB_INSECURE_TRANSPORT=1

I've been using my own creds to test. If you need any help with setting this up for the organisation, do let me know.

@k9ert
Copy link
Collaborator

k9ert commented Sep 9, 2021

Defintely i need some help. Are you on telegram? My handle there is the same!
A small tutorial would be good.

@k9ert
Copy link
Collaborator

k9ert commented Oct 20, 2022

I think this needs to be reimplemented as extension. I'm closing this PR for now. It would really be cool to maybe have that as extension but we can't no longer accept it in the core codebase.
Thank you nevertheless for your contribution.

@k9ert k9ert closed this Oct 20, 2022
@aphex3k
Copy link
Contributor

aphex3k commented Jan 12, 2023

Has this topic been picked back up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants