-
Notifications
You must be signed in to change notification settings - Fork 7
Setup DVC database tracking #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ebrahimebrahim before checking out this branch, you need to initialize dvc in the main branch. So you can run: Then you can try run |
ebrahimebrahim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super cool, tysm for putting it together!
A small thing:
- Can you add
"dvc[gdrive]"to the dev requirements list here in pyproject.toml? This way when someone doespip installof openlifu with the [dev] option it includes dvc as a developer's tool, with the needed google drive support.
Bigger thing, possibly my issue: I couldn't get it to work,after dvc pull I get a google drive login in a browser and then when I select my google acct (my work acct which should have access to our shared data folder) I get this in the browser:
Did you run into this by any chance or do you know what we might be doing differently? My dvc version is 3.55.2
Interesting, seems like this is a recent issue due to Google blocking the dvc app till further verifications are complete. See here I followed the solution in that link and created a custom app and authorization credentials following the instructions here. Could you test it out again and see if I have set it up correctly? I will share the credentials file with you. |
pyproject.toml
Outdated
| dev = [ | ||
| "pytest >=6", | ||
| "pytest-cov >=3", | ||
| dvc[gdrive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just needs to be in quotes. Also good to have a trailing comma.
If you want to test that the pyproject.toml works, you can create a fresh python environment, and then pip install from the directory:
pip install /path/to/openlifu-python-repo
Adding [dev] will install with these options:
pip install /path/to/openlifu-python-repo[dev]
|
One thing that will help you to avoid going back and forth to fix linting errors is to install the pre-commit hooks. They will auto-reformat to fix most of trivial linting issues every time you commit. To install pre-commit hooks: pip install pre-commit # installs the pre-commit program to your python environment
pre-commit install # uses pre-commit program to install the local pre-commit configuration in this repo as a hook that will be run whenever you commit something |
1dbcc49 to
fbfde81
Compare
|
Hmm it does seem that google has broken this way of using DVC with google drive 😐 Okay so in order to get this to work I had to do this after Then A few points and unanswered questions:
|
|
Discussed this with @jcfr and here are a few points to consider moving forward:
|
|
@ebrahimebrahim I can't figure out why the README isn't rendering correcly. I haven't used RST before and don't have the correct tooling setup for proper local rendering. Could you please check that during your review and let me know where it could be going wrong. I will share the updated gdrive_client_secret with you. |
ebrahimebrahim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get it to work. Here is exactly what I did, with a fresh clone of the repo and a fresh python environment just to be sure:
python3.10 -m venv .venv
. .venv/bin/activate
git clone git@github.com:OpenwaterHealth/OpenLIFU-python.git
cd OpenLIFU-python/
git checkout dvc_setup
pip install .[dev]
dvc remote modify shared_gdrive gdrive_client_secret <the_secret_that_you_sent_me>
dvc pull
It opened a google login in the browser and had me authenticate with google 3 or 4 times, then it stopped and said in the terminal:
ERROR: failed to connect to gdrive (1rGAzqdikeTUzUQADjdP7-fTPdfF15eaO/files/md5) - Failed to authenticate GDrive: OAuth2 code exchange failed: invalid_clientUnauthorized
Fetching
ERROR: failed to pull data from the cloud - 1 files failed to download
Other comments:
- I'm not a fan of RST either, but I kept it initially because there was already some auto-documentation thing set up to use it. We should definitely switch to markdown at some point. This online tool I just found is how I debugged your readme
- Some of the commits reference #121 (the issue) while others reference #122 (this PR). Make them all reference #121. (One example of how to do this is: update your local main branch, check out this
dvc_setupbranch, dogit rebase -i main, then in the editor you can mark "reword" on the commits for which you want to reword the message) - There is still a secret key in the commit 938f2f0 -- we should not have this for the reasons discussed above, even if the secret is no longer valid. This can be eliminated from the history also with an interactive rebase as mentioned above, squashing the commit that adds the key and the commit that removes it.
README.rst
Outdated
| for user access authentication. | ||
|
|
||
| DVC usage | ||
| ^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently the hierarchy of heading markers is inferred from the document itself, and it doesn't like when you skip a heading level
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections
| ^^^^^^^^^ | |
| ~~~~~~~~~ |
README.rst
Outdated
| pip install -e .[dev] | ||
| Version control of database using DVC (Data Version Control) | ||
| ------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the underline heading marker has to go all the way to the end of the text.
(This heading is kind of long though.)
| ------------ | |
| ------------------------------------------------------------ |
| git pull | ||
| dvc pull # Requires access to remote storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there also be a command to add the client secret? Something like
dvc remote modify shared_gdrive gdrive_client_secret <client_secret_here>
(and we tell the reader that they need to contact the developers to get the client secret).
Really the readme should be up front about data access currently being restricted and requiring an access key to be shared by the developers
README.rst
Outdated
| ------------ | ||
|
|
||
| Data Version Control (DVC) is a data management tool that is meant to be run alongside Git. | ||
| In this project DVC is used to link changes in the code to specific versions of the database. DVC can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to the reader what is "the database".
Maybe we can call it "database folder of example data" or something like that.
or be more generic and just call it "project data such as an example database" so that it holds true no matter what kind of project-associated data we end up trackign with dvc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thanks for sharing the RST tool!
I had previously just update the client_secret but seems like I should have created a whole new id/secret pair. I have now created a new client_id as well and updated the config. I tested it out with a clean install and it seems to work. Hopefully it works for you now as well! Will share the new secret. I have also now cleaned up the commit history. |
README.rst
Outdated
| .. code:: sh | ||
| git pull | ||
| dvc remote modify shared_gdrive gdrive_client_secret <client_secret_here> # Contact developers for grive_client_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for a way to prevent the secret from affecting a file that is tracked by git, because it seemed to easy for a developer to accidentally commit it in the future (the secret would end up stored inside .dvc/config which would show up as a change to potentially be committed). I found that if you add --local like this the key will instead go into .dvc/config.local which is already in .dvc/.gitignore so the danger goes away
| dvc remote modify shared_gdrive gdrive_client_secret <client_secret_here> # Contact developers for grive_client_secret | |
| dvc remote modify --local shared_gdrive gdrive_client_secret <client_secret_here> # Contact developers for grive_client_secret |
ebrahimebrahim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is working for me, yay!
Thanks so much for this it's such an improvement
I made a mistake while merging where I overwrote some previous DVC changes. Issue #122
I made a mistake while merging where I overwrote some previous DVC changes. Issue #122
I made a mistake while merging where I overwrote some previous DVC changes. Issue #122


No description provided.