Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example projects not cross-OS #6957

Open
BramVanroy opened this issue Feb 6, 2021 · 11 comments
Open

Example projects not cross-OS #6957

BramVanroy opened this issue Feb 6, 2021 · 11 comments
Labels
compat Cross-platform and cross-Python compatibility enhancement Feature requests and improvements help wanted Contributions welcome! projects spaCy projects and project templates

Comments

@BramVanroy
Copy link
Contributor

BramVanroy commented Feb 6, 2021

How to reproduce the behaviour

I was planning to have a look at the example projects and quickly found that these are very Linux oriented. The commands are all Linux commands (mv, mkdir, etc.). Of course, there are ways around (WSL, other CLIs), but generally, these commands do not work well on Windows. This is not really a bug (it is expected) but I wasn't sure how to tag this differently.

Preferably, all commands in script are cross-platform so that they can be run on any platform. These should be able to be replaced by Python -c commands to ensure cross-compatibility. I am aware that these are intended as example projects, but especially for new users it would be great if these "just work" cross-platform. Something like the following should work I think (untested).

python -c "from pathlib import Path; Path('corpus/${vars.treebank}').mkdir(exist_ok=True)"
python -c "from pathlib import Path; Path('corpus/${vars.treebank}/${vars.train_name}.spacy').rename('corpus/${vars.treebank}/train.spacy)'"
python -c "from pathlib import Path; Path('corpus/${vars.treebank}/${vars.dev_name}.spacy').rename('corpus/${vars.treebank}/dev.spacy)'"
python -c "from pathlib import Path; Path('corpus/${vars.treebank}/${vars.test_name}.spacy').rename('corpus/${vars.treebank}/test.spacy)'"

Info about spaCy

  • spaCy version: 3.0.1
  • Platform: Windows-10-10.0.19041-SP0
  • Python version: 3.8.2
@svlandeg svlandeg added compat Cross-platform and cross-Python compatibility projects spaCy projects and project templates labels Feb 6, 2021
@svlandeg
Copy link
Member

svlandeg commented Feb 8, 2021

Hm, I see your point, though I think most Windows users will work in a CLI environment that does support these operations? At least this is why we haven't caught this before: I'm probably the only Windows-user on the team and the projects work just fine for me :p How does one live without mv and mkdir? ;p

@BramVanroy
Copy link
Contributor Author

Well, I switch. For programming, I use PowerShell 7 for GPU stuff, and WSL for other stuff. PowerShell has these commands, but for instance for rm it will throw an error when you use rm -rf.

I came across this error when I ran a command from within the terminal inside PyCharm. I guess I never swapped it out for PowerShell. 🤓

That being said, I still think that examples should speak to a general audience and especially new people or not-so-technical people will run into problems with this. And yes, I do think that not-so-technical people will try their hand at training their own model. Many of my colleagues or people in digital humanities want to use all these cool things, but they do not have a lot of experience or background with it. I'd argue that we should make examples as broad and accessible as possible, but I understand that there are many counter-arguments to this.

We can leave this open, and when I find the time I can look into doing a cross-platform pull request?

@svlandeg svlandeg added enhancement Feature requests and improvements help wanted Contributions welcome! labels Feb 8, 2021
@svlandeg
Copy link
Member

svlandeg commented Feb 8, 2021

Sure! PRs would definitely be appreciated :-)

@svlandeg
Copy link
Member

svlandeg commented Mar 1, 2021

The first PR to start resolving this, explosion/projects#42, fixes the textcat_goemotions project. More PRs are always welcome ;-)

@milan101
Copy link

milan101 commented Mar 2, 2021

I'm having the same issue with the textcat_geomotions project. Getting the following error:

NotFoundError: [E970] Can not execute command 'mkdir -p training/cnn'. Do you have 'mkdir' installed?

Other details:

  • windows 10, with VS code
  • anaconda3 environment where i created it with the navigator (installing spacy, cuda etc on there)
  • I also did pip install mkdir to try to fix it but didn't work
  • i have an nvidia GPU so was trying to use that

Happy to test something if required.

# Name Version Build Channel _pytorch_select 0.1 cpu_0 aiohttp 3.7.4 py38h294d835_0 conda-forge async-timeout 3.0.1 py_1000 conda-forge attrs 20.3.0 pyhd3deb0d_0 conda-forge blas 1.0 mkl conda-forge boto 2.49.0 py_0 conda-forge boto3 1.17.18 pyhd8ed1ab_0 conda-forge botocore 1.20.18 pyhd8ed1ab_0 conda-forge brotlipy 0.7.0 py38h294d835_1001 conda-forge bz2file 0.98 py_0 conda-forge ca-certificates 2020.12.5 h5b45459_0 conda-forge cachetools 4.2.1 pyhd8ed1ab_0 conda-forge catalogue 2.0.1 py38haa244fe_2 conda-forge certifi 2020.12.5 py38haa244fe_1 conda-forge cffi 1.14.5 py38hd8c33c5_0 conda-forge chardet 4.0.0 py38haa244fe_1 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge cryptography 3.4.4 py38hb7941b4_0 conda-forge cudatoolkit 11.2.1 h7a7aa70_8 conda-forge cymem 2.0.5 py38h885f38d_1 conda-forge cython-blis 0.7.4 py38h347fdf6_0 conda-forge dataclasses 0.7 pyhb2cacf7_7 conda-forge google-api-core 1.25.1 pyh44b312d_0 conda-forge google-auth 1.24.0 pyhd3deb0d_0 conda-forge google-cloud-core 1.5.0 pyhd3deb0d_0 conda-forge google-cloud-storage 1.19.0 py_0 conda-forge google-crc32c 1.1.2 py38h554a69a_0 conda-forge google-resumable-media 1.2.0 pyhd3deb0d_0 conda-forge googleapis-common-protos 1.52.0 py38h5b57dd5_1 conda-forge grpcio 1.36.0 py38he5377a8_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge intel-openmp 2020.3 h57928b3_311 conda-forge jinja2 2.11.3 pyh44b312d_0 conda-forge jmespath 0.10.0 pyh9f0ad1d_0 conda-forge libblas 3.9.0 8_mkl conda-forge libcblas 3.9.0 8_mkl conda-forge libcrc32c 1.1.1 h0e60522_2 conda-forge liblapack 3.9.0 8_mkl conda-forge libprotobuf 3.15.3 h7755175_0 conda-forge markupsafe 1.1.1 py38h294d835_3 conda-forge mkdir 2020.12.3 pypi_0 pypi mkl 2020.4 hb70f87d_311 conda-forge mkl-service 2.3.0 py38h1e8a9f7_2 conda-forge multidict 5.1.0 py38h294d835_1 conda-forge murmurhash 1.0.5 py38h885f38d_0 conda-forge ninja 1.10.2 h5362a0b_0 conda-forge numpy 1.20.1 py38h0cc643e_0 conda-forge openssl 1.1.1j h8ffe710_0 conda-forge packaging 20.9 pyh44b312d_0 conda-forge pathy 0.4.0 pyhd8ed1ab_0 conda-forge pip 21.0.1 pyhd8ed1ab_0 conda-forge preshed 3.0.5 py38h885f38d_0 conda-forge protobuf 3.15.3 py38h885f38d_0 conda-forge pyasn1 0.4.8 py_0 conda-forge pyasn1-modules 0.2.7 py_0 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pydantic 1.7.3 py38h294d835_1 conda-forge pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pysocks 1.7.1 py38haa244fe_3 conda-forge python 3.8.8 h7840368_0_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python_abi 3.8 1_cp38 conda-forge pytorch 1.4.0 cpu_py38ha775e86_0 pytz 2021.1 pyhd8ed1ab_0 conda-forge requests 2.25.1 pyhd3deb0d_0 conda-forge rsa 4.7.2 pyh44b312d_0 conda-forge s3transfer 0.3.4 pyhd8ed1ab_0 conda-forge setuptools 49.6.0 py38haa244fe_3 conda-forge shellingham 1.4.0 pyh44b312d_0 conda-forge six 1.15.0 pyh9f0ad1d_0 conda-forge smart_open 2.2.1 pyh9f0ad1d_0 conda-forge spacy 3.0.3 py38h1266d08_0 conda-forge spacy-legacy 3.0.1 pyhd8ed1ab_0 conda-forge sqlite 3.34.0 h8ffe710_0 conda-forge srsly 2.4.0 py38h885f38d_2 conda-forge thinc 8.0.1 py38h1266d08_1 conda-forge tqdm 4.58.0 pyhd8ed1ab_0 conda-forge typer 0.3.2 pyhd8ed1ab_0 conda-forge typing-extensions 3.7.4.3 0 conda-forge typing_extensions 3.7.4.3 py_0 conda-forge urllib3 1.26.3 pyhd8ed1ab_0 conda-forge values 2020.12.3 pypi_0 pypi vc 14.2 hb210afc_3 conda-forge vs2015_runtime 14.28.29325 h5e1d092_3 conda-forge wasabi 0.8.2 pyh44b312d_0 conda-forge wheel 0.36.2 pyhd3deb0d_0 conda-forge win_inet_pton 1.1.0 py38haa244fe_2 conda-forge wincertstore 0.2 py38haa244fe_1006 conda-forge yarl 1.6.3 py38h294d835_1 conda-forge zlib 1.2.11 h62dcd97_1010 conda-forge

@svlandeg
Copy link
Member

svlandeg commented Mar 2, 2021

@milan101 : that specific case should be fixed by the PR I mentioned, explosion/projects#42 ;-)

@ctufts
Copy link

ctufts commented Jul 1, 2021

@svlandeg I realized my original PR explosion/projects#42 isn't working as originally thought. The python -c command I added in doesn't actually create the directory, but that line is not necessary as the spacy train command will create the specified output directory if it doesn't exist. I figured this out while trying to implement a similar fix for other projects that contain linux cli commands such as the ud_benchmark.

This is an issue with ud_benchmark because the convert command checks for the existence of the output directory and exits), where the train cli command does.

It seems running anything via python -c in the yml runs into very strange string parsing issues and fails silently. In the output below you can see the excessive number of quotations added by the yml parsing for python -c "import os; os.makedirs(os.path.join('training', '${vars.config}'))" :

Running command: 'C:\Users\ctufts\Anaconda3\envs\spacyProjectsBug\python.exe' -c '"import os; os.makedirs(os.path.join('"'"'training'"'"', '"'"'cnn'"'"'))"'
Running command: 'C:\Users\ctufts\Anaconda3\envs\spacyProjectsBug\python.exe' -m spacy train ./configs/cnn.cfg -o training/cnn --gpu-id -1
✔ Created output directory: training\cnn

This is why I was able to get the textcat_geomotions project running on windows after my previous PR: the line I added failed to create the directory silently, but was then created in the train cli step.

I can resolve this by either:

  1. adding helper scripts (python) for the os commands (mkdir, mv, etc) and replace the mkdir and mv references in the project.yml 's instead of running via python -c
  2. altering the cli functions to operate the same way spacy train does, where the output directory is created if it doesn't already exist.

I'm assuming solution 1 is probably less invasive, but figured I'd see if you had any thought on this before I proceed.

@svlandeg
Copy link
Member

svlandeg commented Jul 7, 2021

Hi @ctufts !

Yea, I think you're right that there's an issue here. I think option 1 would be the easiest for now - so just add some preprocessing script that takes care of setting up IO instead of using these Linux commands?

A PR would definitely be welcome!

@JBOE22175
Copy link

Hi,
I have the same problem on Windows with "tagger_parser_ud". I am not aware how to rewrite the Unix-scripts to valid windows scripts. Can someone help?
e.g.
- "mkdir -p corpus/${vars.treebank}"

I tried:
- "mkdir corpus\\${vars.treebank}"

which results in: Running command: mkdir 'corpus\UD_English-EWT'
This does not work.

@svlandeg
Copy link
Member

As discussed on explosion/projects#69: I think we should do an effort to try and avoid these platform-specific commands as much as possible, by either removing them when unnecessary (e.g. spacy train doesn't require an upfront mkdir command for the output as it'll create the dir if it doesn't exist), or by implementing some custom (pre)processing scripts that set up the working directory or delete intermediate files.

@polm
Copy link
Contributor

polm commented Aug 30, 2022

Related to this, some projects (the tagger_parser ones in pipelines) remove directories that were checked out git repos using rm -rf, but this fails in Windows because some of the files are marked read-only. There are a couple of workarounds for this, like using chmod on the whole directory before deleting it, but I'm not sure if there's a more idiomatic way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat Cross-platform and cross-Python compatibility enhancement Feature requests and improvements help wanted Contributions welcome! projects spaCy projects and project templates
Projects
None yet
Development

No branches or pull requests

6 participants