Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.read_csv('sftp://sftp.company.com/some_path/file.csv',
storage_options={'username': 'my_username', 'password': 'my_password', 'allow_agent': False})
Issue Description
Throws exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
self._engine = self._make_engine(f, self.engine)
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 670, in get_handle
ioargs = _get_filepath_or_buffer(
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 339, in _get_filepath_or_buffer
with urlopen(req_info) as req:
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 239, in urlopen
return urllib.request.urlopen(*args, **kwargs)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 547, in _open
return self._call_chain(self.handle_open, 'unknown',
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1425, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: sftp>
Expected Behavior
Loads CSV.
The following patch fixes the problem in my case:
Currently, line 324 says:
if isinstance(filepath_or_buffer, str) and is_url(filepath_or_buffer):
This redirects all sftp urls to urllib, which apparently doesn't support them. This fixed the problem for me:
if isinstance(filepath_or_buffer, str) and is_url(filepath_or_buffer) and parse_url(filepath_or_buffer).scheme != 'sftp':
This fixes my case. I tried submitting a patch but I can't get it to pass the automated QC, so someone who knows more about Pandas will need to look into it.
Installed Versions
pd.show_versions() crashes for me. But it's pandas 1.4.2, python 3.8, MacOS 11.6.5
Traceback (most recent call last):
File "", line 1, in
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
deps = _get_dependency_info()
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
mod = import_optional_dependency(modname, errors="ignore")
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency
module = importlib.import_module(name)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/setuptools/init.py", line 8, in
import _distutils_hack.override # noqa: F401
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/_distutils_hack/override.py", line 1, in
import('_distutils_hack').do_override()
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/_distutils_hack/init.py", line 72, in do_override
ensure_local_distutils()
File "/Users/dbrandon/Projects/import/.venv/lib/python3.8/site-packages/_distutils_hack/init.py", line 59, in ensure_local_distutils
assert '_distutils' in core.file, core.file
AssertionError: /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/distutils/core.py