Skip to content

BUG, ENH: Read Data From Password-Protected URL's and allow self signed SSL certs #16910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixing whitespace to meet style guidelines
  • Loading branch information
Sky NSS committed Jul 14, 2017
commit 437e0a2c4ec91cbb221441f96fc21be4c754c54d
28 changes: 14 additions & 14 deletions pandas/io/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,19 +191,19 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
filepath_or_buffer : a url, filepath (str, py.path.local or pathlib.Path),
or buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is misaligned. needs to be part of the sentence above.

now supports 'https://<user>:<password>@<host>:<port>/<url-path>'

.. versionadded:: 0.21.0

encoding : the encoding to use to decode py3 bytes, default is 'utf-8'

compression : string, default None

.. versionadded:: 0.18.1
auth : tuple, default None
A tuple of string with (username, password) string for

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blank lines in the doc-strings. Indent 4 spaces on the 2nd and other lines

auth : tuple, default None
A tuple of string with (username, password) string for
HTTP(s) basic auth: eg auth= ('roberto', 'panda$4life')

.. versionadded:: 0.21.0

verify_ssl : boolean, Default True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this default True? shouldn't the onus be on the user to pass this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an attempt to mirror what requests does, which is to check SSL certificates by default. I for one second that defaulting.

Expand Down Expand Up @@ -282,20 +282,20 @@ def split_auth_from_url(url_with_uname):
a url that may or may not contain username and password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much indentation

see section 3.1 RFC 1738 https://www.ietf.org/rfc/rfc1738.txt
//<user>:<password>@<host>:<port>/<url-path>

.. versionadded:: 0.21.0

Returns
-------
(username, password), url_no_usrpwd : tuple, string Default ('', '') url
A tuple with (username, pwd) pair and
A tuple with (username, pwd) pair and
url without username or password (if it contained it )

Raises
------
ValueError for empty url
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show what this Raises

Copy link
Member

@gfyoung gfyoung Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. See my comment here to patch the formatting for url_with_uname.

  2. The return format will need to be changed. The general format is this:

<var_name> : <data_type>
    <Description>

However, in this case, it would be preferable to describe the returned object without any naming, since this is a nested tuple object e.g.:

Returns
--------
A length-two tuple containing the following:
    - A length-two tuple of username and password.  These will be empty strings if none were extracted
    - The URL stripped of the username and password if provided in the URL.

if not url_with_uname:
if not url_with_uname:
msg = "Empty url: {_type}"
raise ValueError(msg.format(_type=type(url_with_uname)))
o = parse_url(url_with_uname)
Expand All @@ -320,13 +320,13 @@ def get_urlopen_args(url_with_uname, auth=None, verify_ssl=True):
a url that may or may not contain username and password
see section 3.1 RFC 1738 https://www.ietf.org/rfc/rfc1738.txt
//<user>:<password>@<host>:<port>/<url-path>

.. versionadded:: 0.21.0

auth : tuple, default None
A tuple of string with (username, password) string for
auth : tuple, default None
A tuple of string with (username, password) string for
HTTP(s) basic auth: eg auth= ('roberto', 'panda$4life')

.. versionadded:: 0.21.0

verify_ssl : boolean, Default True
Expand Down
12 changes: 6 additions & 6 deletions pandas/io/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,10 @@ def _read(obj, auth=None, verify_ssl=None):
Parameters
----------
obj : str, unicode, or file-like
auth : tuple, default None
A tuple of string with (username, password) string for
auth : tuple, default None
A tuple of string with (username, password) string for
HTTP(s) basic auth: eg auth= ('roberto', 'panda$4life')

.. versionadded:: 0.21.0

verify_ssl : boolean, Default True
Expand Down Expand Up @@ -874,10 +874,10 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None,

.. versionadded:: 0.19.0

auth : tuple, default None
A tuple of string with (username, password) string for
auth : tuple, default None
A tuple of string with (username, password) string for
HTTP(s) basic auth: eg auth= ('roberto', 'panda$4life')

.. versionadded:: 0.21.0

verify_ssl : boolean, Default True
Expand Down
6 changes: 3 additions & 3 deletions pandas/io/json/json.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,10 +263,10 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,

.. versionadded:: 0.19.0

auth : tuple, default None
A tuple of string with (username, password) string for
auth : tuple, default None
A tuple of string with (username, password) string for
HTTP(s) basic auth: eg auth= ('roberto', 'panda$4life')

.. versionadded:: 0.21.0

verify_ssl : boolean, Default True
Expand Down
5 changes: 2 additions & 3 deletions pandas/tests/io/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,18 +190,17 @@ def test_write_fspath_hdf5(self):

tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize('url, uname, pwd, nurl', [
('https://a1:b1@cc.com:101/f.csv',
'a1',
'b1',
'https://cc.com:101/f.csv'
),
),
('https://ccc.com:1010/aaa.txt',
'',
'',
'https://ccc.com:1010/aaa.txt'
),
),
])
def test_split_url_extract_uname_pwd(self, url, uname, pwd, nurl):
(un, pw), ur = common.split_auth_from_url(url)
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,4 +220,4 @@ def test_standardize_mapping():
assert (com.standardize_mapping({}) == dict)

dd = collections.defaultdict(list)
assert isinstance(com.standardize_mapping(dd), partial)
assert isinstance(com.standardize_mapping(dd), partial)