Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 1 for storing schemas for later use. #7761

Merged
merged 11 commits into from
Apr 25, 2019
Merged

Conversation

lbristol88
Copy link
Contributor

Part of feature request: #3419

@tswast @engelke

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Apr 20, 2019
@tswast tswast self-requested a review April 22, 2019 16:03
file_obj = file_or_path
else:
try:
file_obj = open(file_or_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important difference with this path is that we must close file_obj if we open it based on a path, but we must not close file_obj if we are given a file-like object.

I recommend using a with statement when we are provided a path to make sure we always close it. You may want to refactor this to add a _schema_from_json_file_object() private helper method that does the loading part without the file closing part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added helper method per recommendation along with the statement to close the file.

try:
file_obj = open(file_or_path)
except OSError:
raise TypeError(_NEED_JSON_FILE_ARGUMENT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we use ValueError for unexpected input argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with correct exception.

try:
json_data = json.load(file_obj)
except JSONDecodeError:
raise TypeError(_NEED_JSON_FILE_ARGUMENT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we let this raise (don't catch it). That way the user doesn't lose context about what is wrong with the input file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took this piece out.

file_obj = destination
else:
try:
file_obj = open(destination, mode="w")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as in schema_from_json, we must close file_obj if we open it based on a path, but we must not close file_obj if we are given a file-like object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with with statement to close when appropriate.

}
]"""
expected = list()
json_data = json.loads(file_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're basically repeating the function definition here. That's not ideal. You've basically written a change-detector test. I'd prefer to see actual schema.SchemaField constructors in tests.

I do like that you've mocked out open and the file contents, so keep that. Just change how you construct expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes to include schema.SchemaField as requested.

@@ -5161,3 +5161,84 @@ def test__do_multipart_upload_wrong_size(self):

with pytest.raises(ValueError):
client._do_multipart_upload(file_obj, {}, file_obj_len + 1, None)

def test__schema_from_json(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since the function name is schema_from_json, the test name should be test_schema_from_json (just one underscore between test and schema.)

Ditto for schema_to_json test function name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated test names appropriately.

bigquery/tests/unit/test_client.py Show resolved Hide resolved
bigquery/tests/unit/test_client.py Show resolved Hide resolved
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 23, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 23, 2019

def schema_from_json(self, file_or_path):
"""Takes a file object or file path that contains json that describes
a table schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This and the other docstrings (except for the first line) look a like they are indented 4 spaces too many.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentations.

@@ -1929,6 +1935,61 @@ def list_rows(
)
return row_iterator

def _schema_from_json_file_object(self, file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since file is a built-in, use file_ or file_obj as the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed name as directed.

schema_field_list = list()
json_data = json.load(file)

for field in json_data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This loop could be replaced with a Python "list comprehension".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be a list comprehension.

return self._schema_from_json_file_object(file_or_path)
else:
try:
with open(file_or_path) as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: file should be file_obj since file is a built-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated name accordingly.

"""
if isinstance(file_or_path, io.IOBase):
return self._schema_from_json_file_object(file_or_path)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since the above line returns, no need for else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the else as recommended.

json_schema_list.append(schema_field)

if isinstance(destination, io.IOBase):
destination.write(json.dumps(json_schema_list, indent=2, sort_keys=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use json.dump(schema_list, destination) instead. Then you can use BytesIO in the tests, too.

I'd prefer if json.dump wasn't repeated (here we do want to be DRY). Replace this with file_obj = destination and remove the else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored into a helper function because I couldn't get it to work with the context manager below just by removing the else.

]"""

expected = list()
expected.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for .append. Instead, construct the list inline.

expected = [
    SchemaField(...),
    SchemaField(...),
    ...
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to suggested method.

bigquery/tests/unit/test_client.py Show resolved Hide resolved
_mock_file().write.assert_called_once_with(file_content)
# This assert is to make sure __exit__ is called in the context
# manager that opens the file in the function
_mock_file().__exit__.assert_called_once_with(None, None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about the actual arguments, so just assert_called_once will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the arguments from the assert.

open_patch = mock.patch("builtins.open", mock.mock_open())
with open_patch as _mock_file:
actual = client.schema_to_json(schema_list, mock_file_path)
_mock_file.assert_called_once_with(mock_file_path, mode="w")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need wb when using json.dump. (b for "binary" mode)

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
List of schema field objects.
"""
json_data = json.load(file_obj)
return [SchemaField.from_api_repr(f) for f in json_data]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!

One nit-pick, though. Style-wise in our client libraries and samples, we avoid single-letter variable names, even in list comprehensions. Let's rename f to field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name has been updated!

bigquery/tests/unit/test_client.py Show resolved Hide resolved

open_patch = mock.patch("builtins.open", mock.mock_open())
with open_patch as _mock_file:
with mock.patch("json.dump") as _mock_dump:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your thinking here. It's definitely nicer to compare lists than it is to compare JSON strings.

Nit: Let's write both with statement on one line. https://stackoverflow.com/a/1073814/101923

Nit: No need for leading underscore, that's usually to indicate a "private" variable, which isn't relevant inside a test.

with open_patch as mock_file, mock.patch("json.dump") as mock_dump:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace existing with statements with suggested line.

client = self._make_client()

client.schema_to_json(schema_list, fake_file)
assert file_content == fake_file.getvalue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call json.loads(fake_file.getvalue()) and compare to a list of dictionaries like you do in the test of the path version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has been updated with the list of dictionaries and updated assert.

@lbristol88 lbristol88 requested a review from a team April 24, 2019 17:36
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
try:
with open(file_or_path) as file_obj:
return self._schema_from_json_file_object(file_obj)
except OSError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, coverage is failing on this line and the similar line in the other function. We'd need a test where open() fails.

Honestly, I'd be okay removing the try block and letting these errors just raise, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the try block as suggested for both functions.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
client = self._make_client()
mock_file_path = "/mocked/file.json"

open_patch = mock.patch("builtins.open", mock.mock_open())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the test logs, it looks like this is tripping up Python 2. https://stackoverflow.com/a/34677735/101923

=================================== FAILURES ===================================
____________ TestClientUpload.test_schema_from_json_with_file_path _____________
self = <tests.unit.test_client.TestClientUpload object at 0x7f47e2dce4d0>
    def test_schema_from_json_with_file_path(self):
        from google.cloud.bigquery.schema import SchemaField
        file_content = """[
          {
            "description": "quarter",
            "mode": "REQUIRED",
            "name": "qtr",
            "type": "STRING"
          },
          {
            "description": "sales representative",
            "mode": "NULLABLE",
            "name": "rep",
            "type": "STRING"
          },
          {
            "description": "total sales",
            "mode": "NULLABLE",
            "name": "sales",
            "type": "FLOAT"
          }
        ]"""
        expected = [
            SchemaField("qtr", "STRING", "REQUIRED", "quarter"),
            SchemaField("rep", "STRING", "NULLABLE", "sales representative"),
            SchemaField("sales", "FLOAT", "NULLABLE", "total sales"),
        ]
        client = self._make_client()
        mock_file_path = "/mocked/file.json"
        open_patch = mock.patch(
            "builtins.open", new=mock.mock_open(read_data=file_content)
        )
>       with open_patch as _mock_file:
tests/unit/test_client.py:5202:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/unit-2-7/lib/python2.7/site-packages/mock/mock.py:1353: in __enter__
    self.target = self.getter()
.nox/unit-2-7/lib/python2.7/site-packages/mock/mock.py:1523: in <lambda>
    getter = lambda: _importer(target)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
target = 'builtins'
    def _importer(target):
        components = target.split('.')
        import_path = components.pop(0)
>       thing = __import__(import_path)
E       ImportError: No module named builtins

Remember you can run against Python 2 by using nox, locally.

We could check the Python version with sys.version_info.major and change what object you are patching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually six.PY2 is probably a better way to detect. https://pythonhosted.org/six/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all the issues and the code is compatible with python 2 now.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019
@tswast tswast merged commit 55a8097 into googleapis:master Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants