Skip to content

BUG: update dict of sheets before check #27730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

BUG: update dict of sheets before check #27730

wants to merge 7 commits into from

Conversation

Zoynels
Copy link

@Zoynels Zoynels commented Aug 3, 2019

import pandas as pd

df = pd.DataFrame(
    {
        "id": ["1", "2", "3", "4", "5"],
        "Feature1": ["A", "C", "E", "G", "I"],
        "Feature2": ["B", "D", "F", "H", "J"],
    },
    columns=["id", "Feature1", "Feature2"],
)
writer = pd.ExcelWriter(path="testOutput.xlsx", mode="a", engine="openpyxl")
df.to_excel(writer, sheet_name="Main")
writer.save()
writer.close()

Problem description

I have excel file with existing sheet "Main", and if I try to append some dataframe to this sheet, on first df.to_excel() pandas creates new sheet with name "Main1" and saves information to this new sheet. But if once to_excel(), than pandas add new sheet to dictionary and saves to existing sheet. So, on start appen pandas don't know nothing about existing sheets, and check in _openpyxl.py file don't realy work

        if sheet_name in self.sheets:
            wks = self.sheets[sheet_name]
        else:
            wks = self.book.create_sheet()
            wks.title = sheet_name
            self.sheets[sheet_name] = wks

so I add new code to update sheetname-list before check.

  • without issue
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • remove the different (ambiguity) behavior of the function in the case of the existence of a sheet of the book. When the first time the DataFrame is written to the sheet, a new sheet is created (moreover, if we try to write to the sheet with the name "main" which alreary exist in the book, now sheet "main1" will be created. And we try to write another DataFrame inside the current function on the same sheet "main", function again wrote to the sheet "main1", although we asked it to write to the sheet "main". So now if sheet exist then it will write to existing sheet, if doesn't exist it will create sheet with name which we want.

Zoynels added 2 commits August 3, 2019 22:07
```python
import pandas as pd

df = pd.DataFrame(
    {
        "id": ["1", "2", "3", "4", "5"],
        "Feature1": ["A", "C", "E", "G", "I"],
        "Feature2": ["B", "D", "F", "H", "J"],
    },
    columns=["id", "Feature1", "Feature2"],
)
writer = pd.ExcelWriter(path="testOutput.xlsx", mode="a", engine="openpyxl")
df.to_excel(writer, sheet_name="Main")
writer.save()
writer.close()
```
#### Problem description

I have excel file with existing sheet "Main", and if I try to append some dataframe to this sheet, on first df.to_excel() pandas creates new sheet with name "Main1" and saves information to this new sheet. But if once to_excel(), than pandas add new sheet to dictionary and saves to existing sheet. So, on start appen pandas don't know nothing about existing sheets, and check in **_openpyxl.py** file don't realy work
```python
        if sheet_name in self.sheets:
            wks = self.sheets[sheet_name]
        else:
            wks = self.book.create_sheet()
            wks.title = sheet_name
            self.sheets[sheet_name] = wks
```
so I add new code to update sheetname-list before check.
Add: update dict of sheets before check
@Zoynels Zoynels changed the title Add: update dict of sheets before check BUG: update dict of sheets before check Aug 3, 2019
@simonjayhawkins simonjayhawkins added the IO Excel read_excel, to_excel label Aug 4, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this close an open issue? this would need a test at the very least.

@WillAyd
Copy link
Member

WillAyd commented Aug 5, 2019

I might not totally understand the issue but append mode allows you to add new sheets to an existing workbook, not necessarily to append data to an existing sheet.

The latter is what you are trying to accomplish right?

@Zoynels
Copy link
Author

Zoynels commented Aug 5, 2019

I might not totally understand the issue but append mode allows you to add new sheets to an existing workbook, not necessarily to append data to an existing sheet.

The latter is what you are trying to accomplish right?

Yes, you are right.

doesn'y work if sheet exist and creates sheet "Main1", and write to this sheet

df1.to_excel(writer, sheet_name = "Main", startrow = 1)

And it work fine, when I add data to same sheet second to_excel
write to sheet "Main1", because on first step it create new sheet.

df2.to_excel(writer, sheet_name = "Main", startrow = 10)

on this change it will not create new sheet on 1-st to_sheet() and write DataFrame to existing sheet "Main", and on 2-st to_sheet() write another DataFrame to same sheet.

So now it have different behavior on 1-st and 2-nd to_excel() in one function

@Zoynels
Copy link
Author

Zoynels commented Aug 5, 2019

If I use function

import pandas as pd

df = pd.DataFrame(
    {
        "id": ["1", "2", "3", "4", "5"],
        "Feature1": ["A", "C", "E", "G", "I"],
        "Feature2": ["B", "D", "F", "H", "J"],
    },
    columns=["id", "Feature1", "Feature2"],
)


writer = pd.ExcelWriter(path="testOutput.xlsx", mode="a", engine="openpyxl")
df.to_excel(writer, sheet_name="Main", startrow=1)
writer.save()
df.to_excel(writer, sheet_name="Main", startrow=10)
writer.save()
writer.close()

and add to pandas into current version:

        print(self.sheets)
        if sheet_name in self.sheets:
            wks = self.sheets[sheet_name]
        else:
            wks = self.book.create_sheet()
            wks.title = sheet_name
            self.sheets[sheet_name] = wks
        print(self.sheets)

it will print

{}
{'Main': <Worksheet "Main1">}
{'Main': <Worksheet "Main1">}
{'Main': <Worksheet "Main1">}

but it must use (my change):

{'Sheet1': <Worksheet "Sheet1">, 'Main': <Worksheet "Main">}
{'Sheet1': <Worksheet "Sheet1">, 'Main': <Worksheet "Main">}
{'Sheet1': <Worksheet "Sheet1">, 'Main': <Worksheet "Main">}
{'Sheet1': <Worksheet "Sheet1">, 'Main': <Worksheet "Main">}

current behavior:
testOutput (bad).xlsx
new (good) behavior:
testOutput (good).xlsx

@WillAyd
Copy link
Member

WillAyd commented Aug 6, 2019

I'm indifferent to the change though as mentioned would need tests for review. Right now it's nice that there isn't ambiguity to what append does - it adds a sheet. If we wanted to allow for the updating of an existing sheet we would have to handle the ambiguity of what a user wants (do they want to append a new sheet or add to an existing one?) and probably want to ensure that all excel writers can handle consistently

@Zoynels
Copy link
Author

Zoynels commented Aug 6, 2019

I'm indifferent to the change though as mentioned would need tests for review. Right now it's nice that there isn't ambiguity to what append does - it adds a sheet. If we wanted to allow for the updating of an existing sheet we would have to handle the ambiguity of what a user wants (do they want to append a new sheet or add to an existing one?) and probably want to ensure that all excel writers can handle consistently

Now we have ambiguity to what append does:
If in one function we write 2 or more DataFrames to sheet with same name.

  1. First write -- adds new sheet (even if sheet with this name exist) and write to this new sheet
  2. Second and etc. -- writes to already created sheet.
    So now function work different.

If it always creates new sheet -- then there would not be any ambiguity to what append does - it adds a sheet always.

As I understand "append" now work only in openpyxl engine which allow write (append) information to existing workbook.

@pep8speaks
Copy link

pep8speaks commented Aug 6, 2019

Hello @Zoynels! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-27 14:05:54 UTC

@WillAyd
Copy link
Member

WillAyd commented Sep 13, 2019

Taking another look at this it appears this code is just copy / pasted across the openpyxl, xlwt and xlsxwriter modules, but self.sheets is never really assigned so it doesn't do anything. Can you move this assignment into the base class and remove the copy / paste done in these methods instead?

@jreback
Copy link
Contributor

jreback commented Oct 6, 2019

@Zoynels can you merge master and update to @WillAyd comments.

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

Closing as stable but @Zoynels ping if you'd like to pick back up and can reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants