Skip to content

Conversation

@afourney
Copy link
Member

@afourney afourney commented Mar 5, 2025

A very-large PR that updates DocumentConverters in a couple of major ways:

  • split responsibilities of deciding if a file can be converted DocumentConverter.accepts() vs performing the conversion itself DocumentConverter.convert()
  • file_stream, and stream_info inputs everywhere (rather than local_paths). No temporary files are used anymore.
  • Mimetypes (in addition to file extensions) are considered when electing to convert a file 

Adapted from #1045 , Thanks @KennyZhang1

@afourney afourney requested a review from KennyZhang1 March 5, 2025 19:57
@afourney afourney marked this pull request as ready for review March 5, 2025 19:57
@afourney afourney changed the title [Draft] [Experimental] Update converter API, user streams rather than filepaths Update converter API, user streams rather than filepaths Mar 5, 2025
@afourney afourney requested a review from gagb March 5, 2025 19:58
@gagb gagb self-requested a review March 5, 2025 21:59
Copy link
Contributor

@gagb gagb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic PR. Thank you!

@afourney afourney merged commit e921497 into main Mar 6, 2025
3 checks passed
@afourney afourney deleted the file_streams branch March 6, 2025 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants