Skip to content

BUG: unable to read sftp urls despite fsspec support #46765

Open
@maalgorium

Description

@maalgorium

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.read_csv('sftp://sftp.company.com/some_path/file.csv',
                       storage_options={'username': 'my_username', 'password': 'my_password', 'allow_agent': False})

Issue Description

Throws exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 670, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 339, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
  File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 239, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 547, in _open
    return self._call_chain(self.handle_open, 'unknown',
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1425, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: sftp>

Expected Behavior

Loads CSV.

The following patch fixes the problem in my case:

Currently, line 324 says:

if isinstance(filepath_or_buffer, str) and is_url(filepath_or_buffer):

This redirects all sftp urls to urllib, which apparently doesn't support them. This fixed the problem for me:

if isinstance(filepath_or_buffer, str) and is_url(filepath_or_buffer) and parse_url(filepath_or_buffer).scheme != 'sftp':

This fixes my case. I tried submitting a patch but I can't get it to pass the automated QC, so someone who knows more about Pandas will need to look into it.

Installed Versions

pd.show_versions() crashes for me. But it's pandas 1.4.2, python 3.8, MacOS 11.6.5

Traceback (most recent call last):
File "", line 1, in
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
deps = _get_dependency_info()
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
mod = import_optional_dependency(modname, errors="ignore")
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency
module = importlib.import_module(name)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/setuptools/init.py", line 8, in
import _distutils_hack.override # noqa: F401
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/_distutils_hack/override.py", line 1, in
import('_distutils_hack').do_override()
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/_distutils_hack/init.py", line 72, in do_override
ensure_local_distutils()
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/_distutils_hack/init.py", line 59, in ensure_local_distutils
assert '_distutils' in core.file, core.file
AssertionError: /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/distutils/core.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO NetworkLocal or Cloud (AWS, GCS, etc.) IO Issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions