Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: na_rep='=na()' as default value #52258

Open
1 of 3 tasks
mhooreman opened this issue Mar 28, 2023 · 7 comments
Open
1 of 3 tasks

ENH: na_rep='=na()' as default value #52258

mhooreman opened this issue Mar 28, 2023 · 7 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action

Comments

@mhooreman
Copy link

mhooreman commented Mar 28, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Hello,

Excel has a formula for N/A: =na(). If is shown as #N/A.

Why not using this as default value for na_rep instead of ''?

So: df.to_excel('foobar.xlsx', na_rep='=na()')

Thanks a lot,
Michaël

Feature Description

def df.to_excel(..., na_rep='=na()', ...)

Alternative Solutions

N/A

Additional Context

No response

@mhooreman mhooreman added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 28, 2023
@rhshadrach rhshadrach added IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 29, 2023
@rhshadrach
Copy link
Member

rhshadrach commented Mar 31, 2023

Running

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5], 'c': ["=B2+B3", "=B2+C2", "=na()"]})
df.to_excel('test.xlsx')

in LibreOffice on Ubuntu, I get

image

When I upload the same sheet to Google Docs, the values appear as expected

image

Is this just an issue with LibreOffice? I'd be hesitant to have this as a default without it working in the major Excel programs.

@mhooreman
Copy link
Author

mhooreman commented Mar 31, 2023

Running

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5], 'c': ["=B2+B3", "=B2+C2", "=na()"]})
df.to_excel('test.xlsx')

...

Is this just an issue with LibreOffice? I'd be hesitant to have this as a default without it working in the major Excel programs.

Surprising, as =na() is a documented Excel fomula. It's most probably a missing feature in LibreOffice.

Have you tried using directly the =na() formula in LibreOffice?

But ... well ... the major Excel program is ... Excel. I don't want to debate about spreadsheets programs, but Excel file is the file format for Excel, which is supported by other programs ... who are natively using the open document fomat.

@DeaMariaLeon DeaMariaLeon removed the Needs Triage Issue that has not been reviewed by a pandas team member label Apr 4, 2023
@rhshadrach rhshadrach added the Needs Discussion Requires discussion from core team before further action label Apr 5, 2023
@rhshadrach
Copy link
Member

It's most probably a missing feature in LibreOffice.

Have you tried using directly the =na() formula in LibreOffice?

=na() works when writing it in LibreOffice

But ... well ... the major Excel program is ... Excel. I don't want to debate about spreadsheets programs, but Excel file is the file format for Excel, which is supported by other programs ... who are natively using the open document fomat.

If this change is known to degrade the behavior for some users, then I'm rather negative toward it. Perhaps it's a bug in LibreOffice that can be fixed and then we can implement.

@rhshadrach
Copy link
Member

Using Ctrl-Shift-F9 in LibreOffice resolves the values. Perhaps this is not done by default for security concerns?

A much larger issue is that this breaks round tripping. If you do not open the Excel file and evaluate the formulas, they resolve to 0 upon reading.

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5], 'c': ["=B2+B3", "=B2+C2", "=na()"]})
df.to_excel('test.xlsx')
print(pd.read_excel('test.xlsx'))
#    Unnamed: 0  a  b  c
# 0           0  1  3  0
# 1           1  1  4  0
# 2           2  2  5  0

If I instead open the written Excel file and evaluate formulas, I get the expected results.

#    Unnamed: 0  a  b    c
# 0           0  1  3  2.0
# 1           1  1  4  4.0
# 2           2  2  5  NaN

Unless there is a way to appropriately roundtrip the NaN here without having to open the file, I don't think =na() should be the default.

@mhooreman
Copy link
Author

mhooreman commented Sep 22, 2024

Finally, is it rejected ?
Sorry ... I'm sorting my pending issues...

@rhshadrach
Copy link
Member

That is how I would vote, the number one drawback in my mind is with round-tripping. However the idea does have some merit, and I do not like closing such issues without opinions from others. I'll mark it as a closing candidate for now.

@rhshadrach rhshadrach added the Closing Candidate May be closeable, needs more eyeballs label Sep 23, 2024
@mhooreman
Copy link
Author

mhooreman commented Sep 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants