Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding change to ANSI_X3.4-1968 on import #11909

Closed
mdocekal opened this issue Dec 1, 2022 · 7 comments
Closed

Encoding change to ANSI_X3.4-1968 on import #11909

mdocekal opened this issue Dec 1, 2022 · 7 comments
Labels
gpu Using spaCy on GPU third-party Third-party packages and services

Comments

@mdocekal
Copy link

mdocekal commented Dec 1, 2022

How to reproduce the behaviour

>>> import locale
>>> print(locale.getpreferredencoding())
UTF-8
>>> import spacy
>>> print(locale.getpreferredencoding())
ANSI_X3.4-1968

It started happening when I've installed cuda:

python -m pip install 'spacy[cuda-autodetect]'

I've tried to search line of code causing this switch and it seems that the problem might be in file thinc/backends/_custom_kernels.py
clipped_linear_kernel_float = _get_kernel("clipped_linear<float>")

I've used following probe:

import locale
print(locale.getpreferredencoding()) # UTF-8
clipped_linear_kernel_float = _get_kernel("clipped_linear<float>")
print(locale.getpreferredencoding()) # ANSI_X3.4-1968

Your Environment

I am not even able to use the python -m spacy info --markdown as it gives:
RuntimeError: Click will abort further execution because Python was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/unicode-support/ for mitigation steps.

  • Operating System: Ubuntu 22.04
  • Python Version Used: 3.10.4
  • spaCy Version Used: 3.4.3
  • Environment Information:
    absl-py==1.2.0
    aiohttp==3.8.3
    aiosignal==1.2.0
    async-timeout==4.0.2
    attrs==22.1.0
    blis==0.7.8
    brotlipy==0.7.0
    catalogue @ file:///opt/conda/conda-bld/catalogue_1651218742349/work
    certifi @ file:///croot/certifi_1665076670883/work/certifi
    cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
    charset-normalizer==2.1.1
    click @ file:///tmp/build/80754af9/click_1646056706450/work
    colorama @ file:///opt/conda/conda-bld/colorama_1657009087971/work
    commonmark==0.9.1
    confection==0.0.1
    conllu==4.5.2
    cryptography @ file:///tmp/build/80754af9/cryptography_1652101588599/work
    cupy-cuda11x==11.3.0
    cupy-wheel==11.3.0
    cymem @ file:///opt/conda/conda-bld/cymem_1651237256138/work
    datasets==2.5.1
    dill==0.3.5.1
    en-core-sci-sm @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz
    en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl
    evaluate==0.2.2
    fastrlock==0.8.1
    filelock==3.8.0
    frozenlist==1.3.1
    fsspec==2022.8.2
    huggingface-hub==0.9.1
    idna==3.4
    Jinja2 @ file:///opt/conda/conda-bld/jinja2_1647436528585/work
    joblib==1.2.0
    langcodes @ file:///opt/conda/conda-bld/langcodes_1643477751144/work
    MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work
    mkl-fft==1.3.1
    mkl-random @ file:///home/builder/ci_310/mkl_random_1641843545607/work
    mkl-service==2.4.0
    multidict==6.0.2
    multiprocess==0.70.13
    murmurhash @ file:///opt/conda/conda-bld/murmurhash_1651237169273/work
    nltk==3.7
    nmslib==2.1.1
    numpy==1.22.4
    packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work
    pandas==1.5.0
    pathy @ file:///opt/conda/conda-bld/pathy_1651566172310/work
    preshed @ file:///opt/conda/conda-bld/preshed_1651240927559/work
    psutil==5.9.4
    pyarrow==9.0.0
    pybind11==2.6.1
    pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
    pydantic==1.8.2
    Pygments==2.13.0
    pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
    pyparsing @ file:///opt/conda/conda-bld/pyparsing_1661452539315/work
    pysbd==0.3.4
    PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work
    python-dateutil==2.8.2
    pytorch-ignite==0.4.10
    pytz==2022.2.1
    PyYAML==6.0
    regex==2022.9.13
    requests @ file:///opt/conda/conda-bld/requests_1657734628632/work
    responses==0.18.0
    rich==12.6.0
    rouge-score==0.1.2
    ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
    ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
    scikit-learn==1.0.2
    scipy==1.7.3
    scispacy==0.5.1
    shellingham @ file:///Users/ktietz/demo/mc3/conda-bld/shellingham_1629144685686/work
    six @ file:///tmp/build/80754af9/six_1644875935023/work
    smart-open @ file:///opt/conda/conda-bld/smart_open_1651563547610/work
    spacy==3.4.3
    spacy-legacy==3.0.10
    spacy-loggers @ file:///opt/conda/conda-bld/spacy-loggers_1643478552797/work
    srsly @ file:///opt/conda/conda-bld/srsly_1651584738433/work
    summa==1.2.0
    textual==0.1.18
    textual-inputs==0.2.6
    thinc==8.1.3
    threadpoolctl==3.1.0
    tokenizers==0.12.1
    torch==1.12.1
    tqdm==4.64.1
    transformers==4.22.1
    typer @ file:///opt/conda/conda-bld/typer_1651237163820/work
    typing_extensions @ file:///tmp/abs_ben9emwtky/croots/recipe/typing_extensions_1659638822008/work
    urllib3==1.26.12
    wasabi @ file:///opt/conda/conda-bld/wasabi_1651237317563/work
    windpyutils==2.0.15
    xxhash==3.0.0
    yarl==1.8.1
@adrianeboyd
Copy link
Contributor

That is really bizarre, but seems to be an known issue: NVIDIA/cuda-python#29

What version of CUDA do you have?

@mdocekal
Copy link
Author

mdocekal commented Dec 1, 2022

My CUDA version is 11.5.

@svlandeg svlandeg added install Installation issues third-party Third-party packages and services labels Dec 2, 2022
@adrianeboyd adrianeboyd added gpu Using spaCy on GPU and removed install Installation issues labels Dec 9, 2022
@adrianeboyd
Copy link
Contributor

This seems to be an upstream issue that we can't work around. If anyone does have a suggestion for how we can better handle this in spacy, feel free to follow up here or in a new issue.

@github-actions
Copy link
Contributor

This issue has been automatically closed because it was answered and there was no follow-up discussion.

@github-actions github-actions bot removed the resolved The issue was addressed / answered label Feb 10, 2023
@thomazmoon
Copy link

@mdocekal
I ran into this issue when I try switched from using the CPU to GPU on google colab and found I couldn't use any shell commands. (i.e. !pwd to get the current directory)

I found a work around by James Kent on Stack Overflow that seems to work on colab, so hopefully this works for you as well.

>>> import locale
>>> locale.getpreferredencoding()
UTF-8
>>> !pwd
/content
>>> import spacy
>>>locale.getpreferredencoding()
ANSI_X3.4-1968
>>> !pwd
NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968
# ⭐️ James' Code from https://stackoverflow.com/a/31470394/15585278
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
>>> print(locale.getpreferredencoding())
UTF-8
>>> !pwd
/content

@adrianeboyd
This might not help in solving the reason why it's happening, but hopefully you'll be able to have a work around until it's fixed?

@adrianeboyd
Copy link
Contributor

It's fine if you want to use this workaround on your end, but we wouldn't want to monkey-patch an important, built-in library like locale from within spacy.

It looks like this particular error for system commands might be a colab-specific issue, since the full traceback looks like this:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-5-8d5fad910f3e> in <module>
----> 1 get_ipython().system('pwd')

2 frames
/usr/local/lib/python3.8/dist-packages/google/colab/_system_commands.py in _run_command(cmd, clear_streamed_output)
    161   locale_encoding = locale.getpreferredencoding()
    162   if locale_encoding != _ENCODING:
--> 163     raise NotImplementedError(
    164         'A UTF-8 locale is required. Got {}'.format(locale_encoding))
    165 

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

I'm not sure whether you would run into the exact same error in a script or notebook outside of colab.

I hope that anyone who searches for this error related to spacy can find this bug report and can choose their preferred workaround from the linked issues and comments.

We tried to view the nvidia bug report that's mentioned in the issue above but you can't even see it without the right kind of account. Hopefully they will fix it on their end soon if they haven't already, and colab will be updated.

@github-actions
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
gpu Using spaCy on GPU third-party Third-party packages and services
Projects
None yet
Development

No branches or pull requests

4 participants