-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
POC: ArrayManager -- array-based data manager for columnar store #36010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jorisvandenbossche
merged 42 commits into
pandas-dev:master
from
jorisvandenbossche:array-manager
Jan 13, 2021
Merged
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
a51835b
POC: ArrayManager -- array-based data manager for columnar store
jorisvandenbossche 591579b
Update with latest master + some fixes
jorisvandenbossche 896080a
add pd.options.mode.data_manager to switch
jorisvandenbossche f9c4dda
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche d18082a
add apply_with_block workaround
jorisvandenbossche cf3c07a
fix alignment in apply
jorisvandenbossche b252c6d
reorder methods to match BlockManager
jorisvandenbossche 0fb645e
skip json tests for now
jorisvandenbossche eb55fef
skip more json tests + to_csv with to_native_types
jorisvandenbossche d241f31
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche 47c3ee3
support both ndarrays and ExtensionArrays
jorisvandenbossche 75f7de2
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche f36e395
add unstack
jorisvandenbossche be20816
fix native types, skip quantile, hdf, stata tests
jorisvandenbossche 8b7cc81
remove skip in the benchmarks
jorisvandenbossche a239f50
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche a0ccf9a
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche dc1b190
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche 55d38be
remove manager keyword from DataFrame constructor, add _as_manager in…
jorisvandenbossche 3dea0d7
move new ArrayManager code to separate file
jorisvandenbossche 1a61333
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel 9751d33
de-privatize
jbrockmendel e45b645
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche 3749c7d
try fix up typing
jorisvandenbossche af53040
add pytest option + add one github actions build to run them
jorisvandenbossche cc45673
fix pytest marks for skipping when using array-manager
jorisvandenbossche 27cf215
several fixes - get tests/frame/methods tests passing
jorisvandenbossche f6a97df
ci - only run the tests/frame/methods tests
jorisvandenbossche 67c4c2b
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche 670ed76
mypy fix
jorisvandenbossche 5128ad1
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche 5c73688
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche a9a8c2d
move to internals/construction.py
jorisvandenbossche c7898fb
update for latest changes - fix tests/mypy
jorisvandenbossche 3430307
fix todo
jorisvandenbossche 1a30013
fix import in tests
jorisvandenbossche ef86b1e
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche c5548d9
add union alias to typing
jorisvandenbossche afe8f80
updates based on review
jorisvandenbossche b88c757
skip json tests to avoid segfaults
jorisvandenbossche ddc51d0
Merge remote-tracking branch 'upstream/master' into array-manager
jorisvandenbossche 9dc5600
fix for Label -> Hashable change in master
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
remove skip in the benchmarks
- Loading branch information
commit 8b7cc8157a3a8959f48c007f808a6198927ea9b3
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i'd be ok with ```flags`` being an added keyword to the constructor (and you can then make manager a flag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just for convenience (to easily change back and forth) I'd prefer a private method on DataFrame to change the manager (or return a new DataFrame with a new manager).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keyword right now makes it a bit more convenient to pick a specific one in the tests, or to compare both versions side by side.
Eg instead of
you can do
if you want eg a certain test to only run using a specific manager, regardless of the global setting.
I fully agree we should be careful with making this a public keyword, but I think that also for internal use eg in tests, it would be good to have a convenient way to do this.
With a private method, you are thinking of a class method like
pd.DataFrame._construct_with_array_manager(...)
/pd.DataFrame._construct_with_block_manager(..)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking
pd.DataFrame(...)._as_array_manager()
. But whatever is easiest for testing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. For testing that should also be fine (as long as it's not testing the constructor ;)). It will be less efficient as it's only converting to the other format after initial construction (which I do now anyway, but the goal is to fix that at some point of course), but for testing purposes that doesn't matter.
Could also be a
pd.DataFrame(..)._as_manager("block"/"array")
?Because we want to have both. Eg some test might be testing specifically aspects of block -based dataframe, so even when running the tests with global option to use array manager, that test should still use block manager.