Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add XLSXToDocument converter #8522

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

feat: Add XLSXToDocument converter #8522

wants to merge 8 commits into from

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Nov 8, 2024

Related Issues

Proposed Changes:

Draft of the Excel to Document converter

How did you test it?

Added tests

Notes for the reviewer

Checklist

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Nov 8, 2024
@sjrl sjrl requested a review from bglearning November 8, 2024 09:48
@sjrl
Copy link
Contributor Author

sjrl commented Nov 8, 2024

Hey @bglearning I'd like to discuss with you how got references to work (e.g. pointing to the correct row in the original Excel table) to make sure we accommodate that properly in this component.

@coveralls
Copy link
Collaborator

coveralls commented Nov 8, 2024

Pull Request Test Coverage Report for Build 11890084216

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 10 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.07%) to 90.275%

Files with Coverage Reduction New Missed Lines %
components/embedders/openai_document_embedder.py 10 85.14%
Totals Coverage Status
Change from base Build 11840168232: 0.07%
Covered Lines: 7918
Relevant Lines: 8771

💛 - Coveralls

@bglearning
Copy link
Contributor

btw @sjrl , looks like to_csv actually also supports list of strings.

header: bool or list of str, default True

So the out_header (or equivalent) can be made the same for both (List[str]) which would just be the columns by default (if excel_column_names is optional).

Also on the flip side, to_markdown is eventually calling the tabulate library which uses the headers kwarg

- `headers` can be an explicit list of column headers
- if `headers="firstrow"`, then the first row of data is used
- if `headers="keys"`, then dictionary keys or column indices are used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants