Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix loc len #45

Closed
wants to merge 5 commits into from
Closed

Fix loc len #45

wants to merge 5 commits into from

Conversation

simon-mo
Copy link
Collaborator

What do these changes do?

Fix iloc issue. The core fix is the line:

row_lengths_oid = ray.put(np.array(row_lengths))

and other similar lines. IndexMetadata accepts numpy array instead of list.

Related issue number

#43

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • Add to test suite

row_metadata_view = _IndexMetadata(
coord_df_oid=row_lookup, lengths_oid=row_lengths)
coord_df_oid=row_lookup, lengths_oid=row_lengths_oid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we modify the metadata class to take in an np.array here, as well as an OID, instead of using ray.put?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be just oid here because the keyword arg is _oid. When we refactor IndexMetaData (which will be another PR and coming soon), we can make it taking np.array. I don't want this to be a refactoring PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file an issue or mark a todo in the code in that case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think @devin-petersohn will soon write up a design doc for the new index (using two B+Tree) that will basically re-write _IndexMetaData.

Copy link
Collaborator

@pschafhalter pschafhalter Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, refactoring index is WIP. Exact datastructure is still an open discussion 😉


col_item_index += col_len
row_item_index += row_len

if self.is_view:
warn(_SETTING_WITHOUT_COPYING_WARING)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this warning be imported from pandas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh weird

@@ -123,7 +125,10 @@ def __init__(self, data=None, index=None, columns=None, dtype=None,
if block_partitions is not None:
axis = 0
# put in numpy array here to make accesses easier since it's 2D
self._block_partitions = np.array(block_partitions)
if not isinstance(block_partitions, np.ndarray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this check be done in _fix_blocks_dimensions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@simon-mo
Copy link
Collaborator Author

Closed in favor of #55

@simon-mo simon-mo closed this Jul 20, 2018
dchigarev pushed a commit to dchigarev/modin that referenced this pull request Aug 25, 2020
…ze_count

Izamyati/groupby.size()/count()
mdatre added a commit to mdatre/modin that referenced this pull request Jan 21, 2023
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Feb 13, 2023
RehanSD pushed a commit to RehanSD/modin that referenced this pull request Feb 17, 2023
…a service

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Fixes to pass CI + docs for io.py

Update implementation

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Fix some things

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Lint fixes

Fix put

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Clean up and add new details

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Use fsspec to get full path and allow URLs

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Add lazy loc

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

fixes for tests

porting more tests

more fixes

moar fixes

Raise exception

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Lint fixes

Return Python as the default modin engine

Handle indexing case for client qc

Call fast path for __getitem__ if not lazy

Remove user warning for Python-engine fall back

Add init

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Implement free as a no-op

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Add support for replace - client side

Fix a couple of issues with Client

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Throw errors on to_pandas

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Do not default to pandas for str_repeat

Add support for 18 datetime functions/properties

Fix columns caching when renaming columns

Fix test_query: put backticks back for col names

Add support for astype -- client side

hard coded changes for functions

Client support for str_(en/de)code, to_datetime

Add all missing query compiler methods.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix getitem_column_array and take_2d.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix getitem_column_array and take_2d.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix again.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix more bugs.

Signed-off-by: mvashishtha <mahesh@ponder.io>

More fixes.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix more bugs-- pushdown tests test_dates and test_pivot still broken due to service bugs.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix typo. Note drop() broken because service requires you to specify both argument and client QC at base of this PR uses default Nones.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Add query compiler class.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Testing a commit

Initial changes for adding support for Expanding

FEAT Support for rolling.sem

FEAT support for Expanding sum, min, max, mean, var, std, count, sem

Removing extratenous comment

REFACTOR: Remove defaults to pandas at API layer and add some corresponding client QC methods.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Add more methods.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix expanding.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Add ewm.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Revert whitespace.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix to_numpy by making it like to_pandas.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Remove extra to_numpy.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Pass kwargs

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix DataFrame import for isin.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix again.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Remove breakpoint

Signed-off-by: mvashishtha <mahesh@ponder.io>

Tell if series.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix client qc.

Signed-off-by: mvashishtha <mahesh@ponder.io>

Add self_is_series.

Signed-off-by: mvashishtha <mahesh@ponder.io>

FIX: Set numeric_only to True in groupby quantile

Add some comments

Fix str_cat/fullmatch/removeprefix/removesuffix/translate/wrap (modin-project#44)

* Fix str_cat/fullmatch/removeprefix/removesuffix/translate/wrap

* Update modin/core/storage_formats/base/query_compiler.py

Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>

* Update modin/pandas/series_utils.py

Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>

* Update modin/core/storage_formats/base/query_compiler.py

Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>

Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>

FEAT Support expanding.aggregate (modin-project#45)

Fix at_time and between_time. (modin-project#43)

Signed-off-by: mvashishtha <mahesh@ponder.io>

Signed-off-by: mvashishtha <mahesh@ponder.io>

Add QC method for groupby.sem (modin-project#47)

* FEAT: Add partial support for groupby.sem()

* Add sem changes to groupby

Fix nlargest and nsmallest Series support (modin-project#46)

* Fix nlargest and smallest support

Signed-off-by: Naren Krishna <naren@ponder.io>

Remove client query compiler's columnarize. (modin-project#48)

Signed-off-by: mvashishtha <mahesh@ponder.io>

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix info and set memory_usage=False. (modin-project#49)

Signed-off-by: mvashishtha <mahesh@ponder.io>

Signed-off-by: mvashishtha <mahesh@ponder.io>

POND-815 fixes for 21 column dataset (modin-project#50)

* POND-815 fixes for 21 column dataset

* Update modin/pandas/base.py

Co-authored-by: helmeleegy <40042062+helmeleegy@users.noreply.github.com>

---------

Co-authored-by: helmeleegy <40042062+helmeleegy@users.noreply.github.com>

Bring in upstream series binary operation fix 6d5545f… (modin-project#52)

* Bring in upstream series binary operation fix 6d5545f.

Signed-off-by: mvashishtha <mahesh@ponder.io>

* Update modin/pandas/series.py

Co-authored-by: Karthik Velayutham <karthik.velayutham@gmail.com>

---------

Signed-off-by: mvashishtha <mahesh@ponder.io>
Co-authored-by: Karthik Velayutham <karthik.velayutham@gmail.com>

Support groupby first/last (modin-project#53)

Signed-off-by: Naren Krishna <naren@ponder.io>

FEAT: Add initial partial support for groupby.cumcount() (modin-project#54)

* FEAT: Add partial support for cumcount

* Remove the set_index_name

* Squeeze the result

* Write cumcount name to None

* Can't set dtype to int64

Fix resample sum, prod, size (modin-project#56)

Signed-off-by: Naren Krishna <naren@ponder.io>

POND-184: fix describe and simplify query compiler interface (modin-project#55)

* Fix describe

Signed-off-by: mvashishtha <mahesh@ponder.io>

* Pass datetime_is_numeric.

Signed-off-by: mvashishtha <mahesh@ponder.io>

---------

Signed-off-by: mvashishtha <mahesh@ponder.io>

Fix dt_day_of_week/day_of_year, str_cat/extract/partition/replace/rpartition (modin-project#51)

* Fix dt_day_of_week/day_of_year, str_partition/replace/rpartition

* Fix str_extract

Revert "Fix dt_day_of_week/day_of_year, str_cat/extract/partition/replace/rpartition (modin-project#51)" (modin-project#58)

This reverts commit f7a31ab.

Revert "Revert "Fix dt_day_of_week/day_of_year, str_cat/extract/partition/replace/rpartition (modin-project#51)" (modin-project#58)" (modin-project#60)

This reverts commit ad9231d.

Add query compiler method for groupby.prod() (modin-project#57)

Signed-off-by: Naren Krishna <naren@ponder.io>

FEAT: Add support for groupby.head and groupby.tail (modin-project#61)

* FEAT: Add support for groupby.head and groupby.tail

* Change _change_index

FEAT: Add partial support for groupby.nth (modin-project#62)

FIX: Push first and last down to query compiler. (modin-project#64)

* FIX: Push first and last down to query compiler.

Signed-off-by: mvashishtha <mahesh@ponder.io>

* Fix last.

Signed-off-by: mvashishtha <mahesh@ponder.io>

---------

Signed-off-by: mvashishtha <mahesh@ponder.io>

FEAT: Add partial support for groupby.ngroup (modin-project#65)

* FEAT: Add partial support for groupby.ngroup

* Name of result should be none for now

Add client support for SeriesGroupby unique, nsmallest, nlargest (modin-project#63)

* Add client support for SeriesGroupby unique, nsmallest, nlargest

Signed-off-by: Naren Krishna <naren@ponder.io>

---------

Signed-off-by: Naren Krishna <naren@ponder.io>

Push memory_usage entirely to query compiler [change is not to be upstreamed to Modin] (modin-project#66)

* Fix dataframe memory usage.

Signed-off-by: mvashishtha <mahesh@ponder.io>

* Fix series memory_usage() the same way.

Signed-off-by: mvashishtha <mahesh@ponder.io>

---------

Signed-off-by: mvashishtha <mahesh@ponder.io>

FIX: allow updating backend query compilers in place. (modin-project#67)

* FIX: Mutate client query compiler columns and index in the service.

Motivation: Align axis update semantics across query compilers. In the base
query compiler and even our service's query compiler, you can update the index
and columns in place. However, the service gives no way to update axes of a
query compiler.

Right now, for inplace updates, service exposes an extra method rename(), and
client query compiler uses this to get the id of a new compiler with updated
axis, and then updates its id ID of the new query compiler.

This change might be the first to make the service present a mutable interface
for a backend query compiler. That seems safe to me, except I had to make
copy() get a new query compiler copied from the old query compiler, because we
can't let updates to the new query compiler change the original (or vice versa).

Signed-off-by: mvashishtha <mahesh@ponder.io>

* Add a comment.

Signed-off-by: mvashishtha <mahesh@ponder.io>

---------

Signed-off-by: mvashishtha <mahesh@ponder.io>

FEAT replace groupby.fillna with a simpler logic (modin-project#68)

* FEAT Support expanding.aggregate

* Replaced groupby.fillna logic with a simpler one

* Fix in groupby.fillna. Work object was causing problems.

* Only need to change _check_index_name to _check_index

* Removed commented out code.
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
vnlitvinov pushed a commit to vnlitvinov/modin that referenced this pull request Mar 16, 2023
noloerino pushed a commit to noloerino/modin that referenced this pull request May 2, 2023
noloerino pushed a commit to noloerino/modin that referenced this pull request May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants