[WIP] Staging/dev/profile serialization#908
Closed
taylorfturner wants to merge 33 commits intodevfrom
Closed
Conversation
* hot fixes for encode and decode of numeric stats mixin and intcol profiler * cleaned up type checking and updated numericstatsmixin readin helper to give type conversions to more attributes * Added docstring to the _load_stats_helper function * Update dataprofiler/profilers/numerical_column_stats.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * Update dataprofiler/profilers/numerical_column_stats.py * fix for nan values issue in pytesting * Implementation of float profiler encode and decode process --------- Co-authored-by: Taylor Turner <taylorfturner@gmail.com>
* more verbose error log with types for easy debug * add load_from_dict to handle tiimestamps * add json decode tests * include DateTimeColumn class
…erent ordering of values is introduced (#868)
…piler (#885) * feat: add test and compiler serialization * fix: move primitive tests to own class * feat: add primitive col compiler save tests * fix: float serializers asserts
… fixes numerical deserialization (#886) * feat: add test and compiler serialization * fix: move primitive tests to own class * feat: add primitive col compiler save tests * fix: float serializers asserts * feat: add tests and allow primitive compiler to deserialize * fix: bug in numeric stats deserial * fix: missing `)` after conflict resolution
…refactors for order Typing (#887) * fix: organize categorical and add get function * refactor: reorganize tests and add stats test * feat: order typing * feat: add serial and deserial for stats compiler * fix: bug when sample_size == 0
…n for datalabeler (#879)
* Added initial profiler decoding for datalabeler column (WIP) * Intialial implementation for deserialization of datalabelercolumn * Fix LSP violations (#840) * Make profiler superclasses generic Makes the superclasses BaseColumnProfiler, NumericStatsMixin, and BaseCompiler generic, to avoid casting in subclass diff() methods and violating LSP in principle. * Add needed cast import --------- Co-authored-by: Junho Lee <53921230+junholee6a@users.noreply.github.com>
* encode testing * encode dataLabeler testing * encode structuredOptions testing * cleaned up datalabeler test * added text options
* formatting * update formatting * setting up full test suite for DataLabelerCompiler * update isort * updates to test -- still failing * update
* update * string in list * formatting
* refactored options encode testing * updated test name * updated class names * fixing test * initial base option decode * inital tests
* refactor: allow options to go through all * fix: bug
* refactor: allow options to go through all * fix: bug * update * update * update * updates * update * Fixes for taylors StructuredCol Issue * update * update * remove try/except --------- Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com> Co-authored-by: ksneab7 <ksneab7@gmail.com>
* fix: bug and add tests * fix: limit scipy requirements till problem understood and fixed
f927d52 to
f33b3fa
Compare
Contributor
Author
|
rebased |
taylorfturner
commented
Jun 27, 2023
Comment on lines
+392
to
+283
Contributor
Author
There was a problem hiding this comment.
rebase addition
taylorfturner
commented
Jun 27, 2023
Comment on lines
+15
to
+26
Contributor
Author
There was a problem hiding this comment.
rebase addition
taylorfturner
commented
Jun 27, 2023
Contributor
Author
There was a problem hiding this comment.
actually concerned there may be an issue with the rebase here.... I thought there was or should be another small class here... but don't see it in main, dev, or feature/profile-serialization. Just take your time reviewing
Contributor
Author
There was a problem hiding this comment.
disregard -- the thing I thought would be here is actually properly in order_column_profile.py
taylorfturner
commented
Jun 27, 2023
requirements.txt
Outdated
taylorfturner
commented
Jun 27, 2023
Comment on lines
249
to
96
Contributor
Author
There was a problem hiding this comment.
rebase change to include feature/memory-optimization work
* refactor: allow options to go through all * fix: bug in loading options * update * update * Fixes for taylors StructuredCol Issue * Created load and save code from structuredprofiler * intermidiate commit for fixing structured profile --------- Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com> Co-authored-by: taylorfturner <taylorfturner@gmail.com>
* refactor: allow options to go through all * fix: bug in loading options * update * update * Fixes for taylors StructuredCol Issue * Created load and save code from structuredprofiler * intermidiate commit for fixing structured profile * test fix * mypy fixes for typing issues * fix for none case of the datalabler in options * Added mock of datalabeler to structured profile test * Added tests for encoding of the Structured profiler * Update dataprofiler/profilers/json_decoder.py Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * Update dataprofiler/profilers/profile_builder.py Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * Update dataprofiler/profilers/profiler_options.py Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * Pr fixes * Fixed typo in test * Update dataprofiler/profilers/json_decoder.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * Update dataprofiler/profilers/profile_builder.py Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * Update dataprofiler/tests/profilers/utils.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * Update dataprofiler/profilers/profile_builder.py Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * Fixes for unneeeded callout for _profile check * small change --------- Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com> Co-authored-by: taylorfturner <taylorfturner@gmail.com> Co-authored-by: ksneab7 <ksneab7@gmail.com> Co-authored-by: ksneab7 <91956551+ksneab7@users.noreply.github.com>
Contributor
Author
* refactor: loading labeler for reuse and abstract loading * refactor: use for DataLabelerColumn as well * fix: don't error if doesn't exist * refactor: allow for config dict to be passed entire way * fix: compiler tests * fix: structCol tests * fix: test
* added save for top level and tests * small refactor * small fix
* refactor: use seed for sample for consistency * fix: formatting and variables
* added load_method * updated tests
* update example data profiler demo save/load * update notebook cells * Update examples/data_profiler_demo.ipynb * Update examples/data_profiler_demo.ipynb
aa4795f to
423bc0a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
devwithfeature/profile-serializationwork in preparation for next releasefeature/profile-serializationbranch are all low-level profilers, compiler, Structured Col Profiler, Structured Profiler, and Unstructured Profiler (pending PR) encode / decode