-
Notifications
You must be signed in to change notification settings - Fork 185
Json Parsing Refactor #818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
taylorfturner
merged 8 commits into
capitalone:feature/profile-serialization
from
micdavis:categorical_decode_refactor
May 16, 2023
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
8853e6b
initial changes to categoricalColumn decoder
micdavis 7aa36a4
added parse to base_column_profiler
micdavis 9797e35
removed parse from child column_profilers
micdavis 82addbf
addressed PR comments
micdavis 982510c
pushing categorical changes
micdavis 950261a
removed float test diffs, renamed parse -> json_to_object
micdavis f9d01c6
small method rename fix
micdavis 168b711
updated method naming, testing, and utils function
micdavis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,7 +8,7 @@ | |
| import pandas as pd | ||
|
|
||
| from dataprofiler.profilers import CategoricalColumn | ||
| from dataprofiler.profilers.json_decoder import decode_column_profiler | ||
| from dataprofiler.profilers.json_decoder import load_column_profile | ||
| from dataprofiler.profilers.json_encoder import ProfileEncoder | ||
| from dataprofiler.profilers.profile_builder import StructuredColProfiler | ||
| from dataprofiler.profilers.profiler_options import CategoricalOptions | ||
|
|
@@ -233,7 +233,6 @@ def test_mixed_categorical_col_integer_string(self): | |
| self.assertCountEqual(categories, profile.categories) | ||
|
|
||
| def test_categorical_mapping(self): | ||
|
|
||
| df1 = pd.Series( | ||
| [ | ||
| "abcd", | ||
|
|
@@ -747,7 +746,7 @@ def test_json_encode_after_update(self): | |
| ) | ||
| profile = CategoricalColumn(df_categorical.name) | ||
|
|
||
| with patch("time.time", side_effect=lambda: 0.0): | ||
| with test_utils.mock_timeit(): | ||
| profile.update(df_categorical) | ||
|
|
||
| serialized = json.dumps(profile, cls=ProfileEncoder) | ||
|
|
@@ -759,7 +758,7 @@ def test_json_encode_after_update(self): | |
| "col_index": np.nan, | ||
| "sample_size": 12, | ||
| "metadata": {}, | ||
| "times": {"categories": 0.0}, | ||
| "times": {"categories": 1.0}, | ||
| "thread_safe": True, | ||
| "_categories": {"c": 5, "b": 4, "a": 3}, | ||
| "_CategoricalColumn__calculations": {}, | ||
|
|
@@ -775,7 +774,7 @@ def test_json_decode(self): | |
| expected_profile = CategoricalColumn(fake_profile_name) | ||
|
|
||
| serialized = json.dumps(expected_profile, cls=ProfileEncoder) | ||
| deserialized = decode_column_profiler(serialized) | ||
| deserialized = load_column_profile(json.loads(serialized)) | ||
|
|
||
| test_utils.assert_profiles_equal(deserialized, expected_profile) | ||
|
|
||
|
|
@@ -802,14 +801,27 @@ def test_json_decode_after_update(self): | |
| ) | ||
| expected_profile = CategoricalColumn(fake_profile_name) | ||
|
|
||
| with patch("time.time", side_effect=lambda: 0.0): | ||
| with test_utils.mock_timeit(): | ||
| expected_profile.update(df_categorical) | ||
|
|
||
| serialized = json.dumps(expected_profile, cls=ProfileEncoder) | ||
| deserialized = decode_column_profiler(serialized) | ||
| deserialized = load_column_profile(json.loads(serialized)) | ||
|
|
||
| test_utils.assert_profiles_equal(deserialized, expected_profile) | ||
|
|
||
| df_categorical = pd.Series( | ||
| [ | ||
| "a", # add existing | ||
| "d", # add new | ||
|
Comment on lines
+814
to
+815
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. love this comment for superb clarity and readability |
||
| ] | ||
| ) | ||
|
|
||
| # validating update after deserialization | ||
| deserialized.update(df_categorical) | ||
|
|
||
| assert deserialized.sample_size == 14 | ||
| assert deserialized.categorical_counts == {"c": 5, "b": 4, "a": 4, "d": 1} | ||
|
|
||
|
|
||
| class TestCategoricalSentence(unittest.TestCase): | ||
| def setUp(self): | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term I think will need to pull this out into a function bc i think it will be reused by other classes which don't have this as the base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can refactor then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree -- refactor into a function (away from a
@classmethod)