Skip to content

COMPAT: reading json with lines=True from s3, xref #17200 #17201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Nov 27, 2017
Merged
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Skip if imports don't exist. Create fixture for test setup.
  • Loading branch information
Kevin Kuhl committed Aug 29, 2017
commit ac9913363e85bb08f21aa68ca0bf5086baee3bd9
22 changes: 15 additions & 7 deletions pandas/tests/io/json/test_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,11 @@
from pandas.compat import (range, lrange, StringIO,
OrderedDict, is_platform_32bit)
import os
import boto3

import numpy as np
from pandas import (Series, DataFrame, DatetimeIndex, Timestamp,
read_json, compat)
from datetime import timedelta
from moto import mock_s3
moto = pytest.importorskip("moto")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will skip the entire module if moto isn't found. I think some of these tests can run without moto, so do this skip inside each test that needs moto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed, moto is now always installed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as a regular import

import pandas as pd

from pandas.util.testing import (assert_almost_equal, assert_frame_equal,
Expand Down Expand Up @@ -993,12 +991,22 @@ def test_read_inline_jsonl(self):
expected = DataFrame([[1, 2], [1, 2]], columns=['a', 'b'])
assert_frame_equal(result, expected)

@mock_s3
def test_read_s3_jsonl(self):
# GH17200
@pytest.yield_fixture(scope="function")
def testbucket_conn(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked, but can you use the fixture defined in pandas.tests.io.parser.test_network? I think if you use s3_resource. You may still nee the s3_resource.create_bucket, but you shouldn't have to worry about moto.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I was working on trying to reuse the code here. In order to reuse across moduled I'd probably need to move the s3_resource and its dependent fixtures to conftest.py so they'd be available to my test modules.

This may also entail changing how pointing to the data files that are mock-uploaded works since that code would no longer live in test_network. Do you agree with this? Or should I keep my implementation?

""" Fixture for test_read_s3_jsonl"""
boto3 = pytest.importorskip('boto3')
moto.mock_s3().start() # Start and stop mocking only once, surrounding the test run
# to ensure single context is kept.

conn = boto3.client('s3')
Copy link
Contributor

@TomAugspurger TomAugspurger Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setup stuff can be moved to a pytest fixture. Something like

@mock_s3
@pytest.fixture
def testbucket():
    """Fixture for ..."""
    conn = ...

Then your test_read_s3_jsonl will take a testbucket parameter. I don't know if it will have to be decorated by mock_s3, but I think it will.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also move the boto3 import here and use boto3 = pytest.importorskip('boto3') so that tests trying to use the fixture will skip if boto3 isn't installed.

conn.create_bucket(Bucket='testbucket')
conn.put_object(Body=b'{"a": 1, "b": 2}\n{"b":2, "a" :1}\n', Key='items.jsonl', Bucket='testbucket')
yield conn

moto.mock_s3().stop()

def test_read_s3_jsonl(self, testbucket_conn):
# GH17200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a pytest.importorskip() here I think.

testbucket_conn.put_object(Body=b'{"a": 1, "b": 2}\n{"b":2, "a" :1}\n', Key='items.jsonl', Bucket='testbucket')

result = read_json('s3://testbucket/items.jsonl', lines=True)
expected = DataFrame([[1, 2], [1, 2]], columns=['a', 'b'])
Expand Down