Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 1 for storing schemas for later use. #7761

Merged
merged 11 commits into from
Apr 25, 2019
Prev Previous commit
Next Next commit
Updated functions to close file and made test changes per feedback.
  • Loading branch information
lbristol88 committed Apr 23, 2019
commit 136cfcbd9cbc917cb26ec2e88d0032885b9673d1
57 changes: 29 additions & 28 deletions bigquery/google/cloud/bigquery/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -1935,59 +1935,60 @@ def list_rows(
)
return row_iterator

def _schema_from_json_file_object(self, file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since file is a built-in, use file_ or file_obj as the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed name as directed.

"""Helper function for schema_from_json that takes a
file object that describes a table schema.

Returns:
List of schema field objects.
"""
schema_field_list = list()
json_data = json.load(file)

for field in json_data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This loop could be replaced with a Python "list comprehension".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be a list comprehension.

schema_field = SchemaField.from_api_repr(field)
schema_field_list.append(schema_field)

return schema_field_list

def schema_from_json(self, file_or_path):
"""Takes a file object or file path that contains json that describes
a table schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This and the other docstrings (except for the first line) look a like they are indented 4 spaces too many.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentations.


Returns:
List of schema field objects.
"""
file_obj = None
json_data = None
schema_field_list = list()

if isinstance(file_or_path, io.IOBase):
file_obj = file_or_path
return self._schema_from_json_file_object(file_or_path)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since the above line returns, no need for else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the else as recommended.

try:
file_obj = open(file_or_path)
with open(file_or_path) as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: file should be file_obj since file is a built-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated name accordingly.

return self._schema_from_json_file_object(file)
except OSError:
raise TypeError(_NEED_JSON_FILE_ARGUMENT)

try:
json_data = json.load(file_obj)
except JSONDecodeError:
raise TypeError(_NEED_JSON_FILE_ARGUMENT)

for field in json_data:
schema_field = SchemaField.from_api_repr(field)
schema_field_list.append(schema_field)

return schema_field_list
raise ValueError(_NEED_JSON_FILE_ARGUMENT)

def schema_to_json(self, schema_list, destination):
"""Takes a list of schema field objects.

Serializes the list of schema field objects as json to a file.

Destination is a file path or a file object.
Destination is a file path or a file object.
"""
file_obj = None
json_schema_list = list()

if isinstance(destination, io.IOBase):
file_obj = destination
else:
try:
file_obj = open(destination, mode="w")
except OSError:
raise TypeError(_NEED_JSON_FILE_ARGUMENT)

for field in schema_list:
schema_field = field.to_api_repr()
json_schema_list.append(schema_field)

file_obj.write(json.dumps(json_schema_list, indent=2, sort_keys=True))
if isinstance(destination, io.IOBase):
destination.write(json.dumps(json_schema_list, indent=2, sort_keys=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use json.dump(schema_list, destination) instead. Then you can use BytesIO in the tests, too.

I'd prefer if json.dump wasn't repeated (here we do want to be DRY). Replace this with file_obj = destination and remove the else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored into a helper function because I couldn't get it to work with the context manager below just by removing the else.

else:
try:
with open(destination, mode="w") as file:
file.write(json.dumps(json_schema_list, indent=2, sort_keys=True))
except OSError:
raise ValueError(_NEED_JSON_FILE_ARGUMENT)


# pylint: disable=unused-argument
Expand Down
106 changes: 94 additions & 12 deletions bigquery/tests/unit/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -5162,7 +5162,7 @@ def test__do_multipart_upload_wrong_size(self):
with pytest.raises(ValueError):
client._do_multipart_upload(file_obj, {}, file_obj_len + 1, None)

def test__schema_from_json(self):
def test_schema_from_json_with_file_path(self):
from google.cloud.bigquery.schema import SchemaField

file_content = """[
Expand All @@ -5185,26 +5185,31 @@ def test__schema_from_json(self):
"type": "FLOAT"
}
]"""
expected = list()
json_data = json.loads(file_content)

for field in json_data:
schema_field = SchemaField.from_api_repr(field)
expected.append(schema_field)
expected = list()
expected.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for .append. Instead, construct the list inline.

expected = [
    SchemaField(...),
    SchemaField(...),
    ...
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to suggested method.

expected.append(
SchemaField("rep", "STRING", "NULLABLE", "sales representative")
)
expected.append(SchemaField("sales", "FLOAT", "NULLABLE", "total sales"))

client = self._make_client()
mock_file_path = "/mocked/file.json"
tswast marked this conversation as resolved.
Show resolved Hide resolved

open_patch = mock.patch(
"builtins.open", new=mock.mock_open(read_data=file_content)
)

with open_patch as _mock_file:
actual = client.schema_from_json(mock_file_path)
_mock_file.assert_called_once_with(mock_file_path)
tswast marked this conversation as resolved.
Show resolved Hide resolved
# This assert is to make sure __exit__ is called in the context
# manager that opens the file in the function
_mock_file().__exit__.assert_called_once_with(None, None, None)

assert expected == actual

def test__schema_to_json(self):
def test_schema_from_json_with_file_object(self):
from google.cloud.bigquery.schema import SchemaField

file_content = """[
Expand All @@ -5227,12 +5232,50 @@ def test__schema_to_json(self):
"type": "FLOAT"
}
]"""
schema_list = list()
json_data = json.loads(file_content)

for field in json_data:
schema_field = SchemaField.from_api_repr(field)
schema_list.append(schema_field)
expected = list()
expected.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter"))
expected.append(
SchemaField("rep", "STRING", "NULLABLE", "sales representative")
)
expected.append(SchemaField("sales", "FLOAT", "NULLABLE", "total sales"))

client = self._make_client()

fake_file = io.StringIO(file_content)
actual = client.schema_from_json(fake_file)

assert expected == actual

def test_schema_to_json_with_file_path(self):
from google.cloud.bigquery.schema import SchemaField

file_content = """[
{
"description": "quarter",
"mode": "REQUIRED",
"name": "qtr",
"type": "STRING"
},
{
"description": "sales representative",
"mode": "NULLABLE",
"name": "rep",
"type": "STRING"
},
{
"description": "total sales",
"mode": "NULLABLE",
"name": "sales",
"type": "FLOAT"
}
]"""
schema_list = list()
schema_list.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter"))
schema_list.append(
SchemaField("rep", "STRING", "NULLABLE", "sales representative")
)
schema_list.append(SchemaField("sales", "FLOAT", "NULLABLE", "total sales"))

client = self._make_client()
mock_file_path = "/mocked/file.json"
Expand All @@ -5242,3 +5285,42 @@ def test__schema_to_json(self):
actual = client.schema_to_json(schema_list, mock_file_path)
_mock_file.assert_called_once_with(mock_file_path, mode="w")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need wb when using json.dump. (b for "binary" mode)

_mock_file().write.assert_called_once_with(file_content)
# This assert is to make sure __exit__ is called in the context
# manager that opens the file in the function
_mock_file().__exit__.assert_called_once_with(None, None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about the actual arguments, so just assert_called_once will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the arguments from the assert.


def test_schema_to_json_with_file_object(self):
from google.cloud.bigquery.schema import SchemaField

file_content = """[
{
"description": "quarter",
"mode": "REQUIRED",
"name": "qtr",
"type": "STRING"
},
{
"description": "sales representative",
"mode": "NULLABLE",
"name": "rep",
"type": "STRING"
},
{
"description": "total sales",
"mode": "NULLABLE",
"name": "sales",
"type": "FLOAT"
}
]"""
schema_list = list()
schema_list.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter"))
schema_list.append(
SchemaField("rep", "STRING", "NULLABLE", "sales representative")
)
schema_list.append(SchemaField("sales", "FLOAT", "NULLABLE", "total sales"))

fake_file = io.StringIO()
client = self._make_client()

client.schema_to_json(schema_list, fake_file)
assert file_content == fake_file.getvalue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call json.loads(fake_file.getvalue()) and compare to a list of dictionaries like you do in the test of the path version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has been updated with the list of dictionaries and updated assert.