-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: make pandas.DataFrame.info() method able to display memory usage of each column #59690
Comments
take |
Thanks for the request. It seems to me there are lots of things one could add to df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})
result = pd.concat(
[
df.memory_usage(index=False).rename("Memory Usage"),
(~df.isnull()).sum().rename("Non-Null Count"),
df.dtypes.rename("Dtype"),
],
axis=1,
)
print(result)
# Memory Usage Non-Null Count Dtype
# a 24 3 int64
# b 24 3 int64 |
@rhshadrach what are examples of "lots of things" one could add to |
I have some suggestions for the same:
I suggest we add another parameter to the info function (maybe something like 'more_details') which will be Boolean and only work when Verbose=True. When enabled 'more_details', it will calculate and show all other values along with the existing ones. Please tell me what you think, I am ready to take up the task in case we proceed. |
@RaghavKhemka
For a description of data (i.e. points 2,3,5) there are other functions Of those I think only "memory usage" belongs to |
What is the definition of "technical", and where is it documented that
How are null counts not about the data content itself?
Why is it not the case that |
Thank you, everyone, for your valuable input. After carefully considering the pros and cons, I think we should proceed with adding the memory usage feature to the Justification:
|
Please, answer the question in response to your earlier concern that there might be "lots of other (technical) variables" to add. Maybe we are missing some other technical parameters suitable for
I am not an authority on defining the terminology, yet, I'd say that "technical" information is about "how" data is stored rather than about "what" data is stored. In my opinion, "technicality" is implied by the existence of
So I fully expected to get by-column information about memory usage.
I agree that it is about content, but it is already there. I can see only corner-case justification: if all values are |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
.info()
method describes a DataFrame by each column dtype and count of non-null values, but, IMO, misses an opportunity to be more valuable by also displaying memory usage of each column.Feature Description
I think, thousand of hours of human time would be saved if this would be a build-in feature with
"memory_usage='by_column'"
and"memory_usage='by_column_deep'"
argument options.Alternative Solutions
The alternative way to see all "technical" information in by-column form in one table is to create the following "Frankenstein":
Resulting in some output looking like this:
Additional Context
I searched for similar suggestions in repo issues and have not found a duplicate.
The text was updated successfully, but these errors were encountered: