Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running pip install . for mofid in Google Colab #30

Open
ngkayjay opened this issue Jun 22, 2023 · 3 comments
Open

Issue running pip install . for mofid in Google Colab #30

ngkayjay opened this issue Jun 22, 2023 · 3 comments

Comments

@ngkayjay
Copy link

ngkayjay commented Jun 22, 2023

Hey, I got an error that I can't debug when running the pip install. Make init and path setup went without issues.

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Processing /content/gdrive/MyDrive/Project_MTF-C/mofid
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I was running this on colab. All required setup packages are updated as follows.
Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (23.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (68.0.0)
Requirement already satisfied: ez_setup in /usr/local/lib/python3.10/dist-packages (0.9)

@bbucior
Copy link
Contributor

bbucior commented Jun 22, 2023

Thanks for reporting and trying it out on another platform!

I'm not too familiar with Google Colab but may have found a workaround. The source of the crash appears to be parsing the install_requires option in setup.py, which sets up a dependency in older Python 2.x configurations. Everything seemed to work for me after commenting out that line. (or adding a step like sed -i -e 's/install_requires/#install_requires/' setup.py to the install process).

Does it fix the error for you, too?

@ngkayjay ngkayjay changed the title Issue running pip install . Issue running pip install . for mofid in Google Colab Jun 27, 2023
@ngkayjay
Copy link
Author

Yes it does, the fix works! Thanks!
Right now I'm on Google Colab as my HPC resource allocation has been approved, but not yet implemented. I suspect other users who would want to play around with ML on MOFs without institutional resources would appreciate your advice as well.

For other users on Colab, be advised to run !chmod -R 755 <YOUR_DIR> in Colab after you run pip install . to set proper privileges, otherwise you'd get a Errno 13 Permissions error.

One more question: how long does it take to construct a mofid for a given .cif file on your end? The authors whose work I'm reproducing had constructed the mofids for a dataset of 400k+ .cifs, but it takes me ~6s to construct a single mofid. I'm wondering where I should start my optimization.

@bbucior
Copy link
Contributor

bbucior commented Jun 27, 2023

Awesome, glad everything's working now!

For the ML training set, unfortunately calculating the MOFids is going to take awhile for a large folder of CIFs. Your calculation times are consistent with what I'm seeing on my laptop (make test runs through 28 CIFs in 1-2 minutes). If memory serves correctly, I ran MOF databases by splitting the CIFs into a few folders and ran them as parallel jobs on HPC resources (see Scripts/HPC/).

TBH, while you're waiting on HPC resources, your best bet to get started would probably be a precomputed MOFid.smi or similar structural information, if it's available in the SI of that paper or another compatible one. For example, our SmVAE paper includes an training set with RFcodes, so slightly different from MOFid but a similar intent. Maybe something like that could help get things off the ground until you get the compute resources for reproducing the original 400k+ dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants