Skip to content

[WIP] Staging/dev/profile serialization#908

Closed
taylorfturner wants to merge 33 commits intodevfrom
staging/dev/profile-serialization
Closed

[WIP] Staging/dev/profile serialization#908
taylorfturner wants to merge 33 commits intodevfrom
staging/dev/profile-serialization

Conversation

@taylorfturner
Copy link
Contributor

@taylorfturner taylorfturner commented Jun 26, 2023

  • Updating dev with feature/profile-serialization work in preparation for next release
  • Main updates through the feature/profile-serialization branch are all low-level profilers, compiler, Structured Col Profiler, Structured Profiler, and Unstructured Profiler (pending PR) encode / decode

micdavis and others added 22 commits May 16, 2023 09:36
* hot fixes for encode and decode of numeric stats mixin and intcol profiler

* cleaned up type checking and updated numericstatsmixin readin helper to give type conversions to more attributes

* Added docstring to the _load_stats_helper function

* Update dataprofiler/profilers/numerical_column_stats.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* Update dataprofiler/profilers/numerical_column_stats.py

* fix for nan values issue in pytesting

* Implementation of float profiler encode and decode process

---------

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>
* more verbose error log with types for easy debug

* add load_from_dict to handle tiimestamps

* add json decode tests

* include DateTimeColumn class
…piler (#885)

* feat: add test and compiler serialization

* fix: move primitive tests to own class

* feat: add primitive col compiler save tests

* fix: float serializers asserts
… fixes numerical deserialization (#886)

* feat: add test and compiler serialization

* fix: move primitive tests to own class

* feat: add primitive col compiler save tests

* fix: float serializers asserts

* feat: add tests and allow primitive compiler to deserialize

* fix: bug in numeric stats deserial

* fix: missing `)` after conflict resolution
…refactors for order Typing (#887)

* fix: organize categorical and add get function

* refactor: reorganize tests and add stats test

* feat: order typing

* feat: add serial and deserial for stats compiler

* fix: bug when sample_size == 0
* Added initial profiler decoding for datalabeler column (WIP)

* Intialial implementation for deserialization of datalabelercolumn

* Fix LSP violations (#840)

* Make profiler superclasses generic

Makes the superclasses BaseColumnProfiler, NumericStatsMixin, and
BaseCompiler generic, to avoid casting in subclass diff() methods and
violating LSP in principle.

* Add needed cast import

---------

Co-authored-by: Junho Lee <53921230+junholee6a@users.noreply.github.com>
* encode testing

* encode dataLabeler testing

* encode structuredOptions testing

* cleaned up datalabeler test

* added text options
* formatting

* update formatting

* setting up full test suite for DataLabelerCompiler

* update isort

* updates to test -- still failing

* update
* update

* string in list

* formatting
* refactored options encode testing

* updated test name

* updated class names

* fixing test

* initial base option decode

* inital tests
* refactor: allow options to go through all

* fix: bug
* refactor: allow options to go through all

* fix: bug

* update

* update

* update

* updates

* update

* Fixes for taylors StructuredCol Issue

* update

* update

* remove try/except

---------

Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com>
Co-authored-by: ksneab7 <ksneab7@gmail.com>
* fix: bug and add tests

* fix: limit scipy requirements till problem understood and fixed
@taylorfturner taylorfturner self-assigned this Jun 26, 2023
@taylorfturner taylorfturner added the New Feature A feature addition not currently in the library label Jun 26, 2023
@taylorfturner taylorfturner changed the title WIP Staging/dev/profile serialization [WIP] Staging/dev/profile serialization Jun 26, 2023
@taylorfturner taylorfturner force-pushed the staging/dev/profile-serialization branch from f927d52 to f33b3fa Compare June 26, 2023 21:18
@taylorfturner
Copy link
Contributor Author

rebased

Comment on lines +392 to +283
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase addition

Comment on lines +15 to +26
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase addition

Copy link
Contributor Author

@taylorfturner taylorfturner Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually concerned there may be an issue with the rebase here.... I thought there was or should be another small class here... but don't see it in main, dev, or feature/profile-serialization. Just take your time reviewing

Copy link
Contributor Author

@taylorfturner taylorfturner Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disregard -- the thing I thought would be here is actually properly in order_column_profile.py

requirements.txt Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing scipy issue @JGSweets

Comment on lines 249 to 96
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase change to include feature/memory-optimization work

ksneab7 and others added 3 commits June 27, 2023 12:26
* refactor: allow options to go through all

* fix: bug in loading options

* update

* update

* Fixes for taylors StructuredCol Issue

* Created load and save code from structuredprofiler

* intermidiate commit for fixing structured profile

---------

Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com>
Co-authored-by: taylorfturner <taylorfturner@gmail.com>
* refactor: allow options to go through all

* fix: bug in loading options

* update

* update

* Fixes for taylors StructuredCol Issue

* Created load and save code from structuredprofiler

* intermidiate commit for fixing structured profile

* test fix

* mypy fixes for typing issues

* fix for none case of the datalabler in options

* Added mock of datalabeler to structured profile test

* Added tests for encoding of the Structured profiler

* Update dataprofiler/profilers/json_decoder.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* Update dataprofiler/profilers/profile_builder.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* Update dataprofiler/profilers/profiler_options.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* Pr fixes

* Fixed typo in test

* Update dataprofiler/profilers/json_decoder.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* Update dataprofiler/profilers/profile_builder.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* Update dataprofiler/tests/profilers/utils.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* Update dataprofiler/profilers/profile_builder.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* Fixes for unneeeded callout for _profile check

* small change

---------

Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com>
Co-authored-by: taylorfturner <taylorfturner@gmail.com>
Co-authored-by: ksneab7 <ksneab7@gmail.com>
Co-authored-by: ksneab7 <91956551+ksneab7@users.noreply.github.com>
@taylorfturner taylorfturner added the Work In Progress Solution is being developed label Jun 27, 2023
@taylorfturner
Copy link
Contributor Author

Pending #923 and #924

JGSweets and others added 8 commits June 28, 2023 08:37
* refactor: loading labeler for reuse and abstract loading

* refactor: use for DataLabelerColumn as well

* fix: don't error if doesn't exist

* refactor: allow for config dict to be passed entire way

* fix: compiler tests

* fix: structCol tests

* fix: test
* added save for top level and tests

* small refactor

* small fix
* refactor: use seed for sample for consistency

* fix: formatting and variables
* added load_method

* updated tests
* update example data profiler demo save/load

* update notebook cells

* Update examples/data_profiler_demo.ipynb

* Update examples/data_profiler_demo.ipynb
@micdavis micdavis force-pushed the staging/dev/profile-serialization branch from aa4795f to 423bc0a Compare June 29, 2023 01:08
@taylorfturner taylorfturner deleted the staging/dev/profile-serialization branch June 29, 2023 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New Feature A feature addition not currently in the library Work In Progress Solution is being developed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants