Skip to content

Conversation

@pinglin
Copy link
Member

@pinglin pinglin commented Sep 17, 2025

Because

  • Gemini's document vision only meaningfully understands PDFs, while other document types (TXT, HTML, Markdown, etc.) lose visual formatting and are better served through text extraction
  • Users were getting unclear error messages when trying to use non-PDF documents
  • The prompt ordering wasn't optimized according to Gemini's best practices for document processing
  • Office documents (DOC, DOCX, PPT, etc.) contain visual elements that would be lost in text-only processing

This commit

  • Adds support for text-based document processing (HTML, Markdown, TXT, CSV, XML) by extracting content as plain text
  • Maintains full PDF document vision capabilities for visual understanding of charts, diagrams, and formatting
  • Provides clear guidance for Office documents suggesting PDF conversion to preserve visual elements
  • Optimizes prompt ordering by placing text prompts after document content for better results
  • Updates error messages to be more informative about document processing capabilities and conversion options
  • Adds comprehensive test coverage for all new document handling paths including text extraction, PDF processing, and error cases
  • Refactors helper functions with detailed documentation explaining the technical capabilities and limitations of each document type

@pinglin pinglin merged commit 38639c6 into main Sep 17, 2025
4 checks passed
@pinglin pinglin deleted the pinglin/chore-update-component-ai-gemini branch September 17, 2025 21:04
jvallesm pushed a commit that referenced this pull request Sep 18, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.60.0](v0.59.2...v0.60.0)
(2025-09-18)


### Features

* **artifact:** expose chunk file reference in search task
([#1085](#1085))
([39bbe95](39bbe95))
* **component,ai:** add Gemini integration
([#1088](#1088))
([cea127a](cea127a))
* **component,cohere:** add rerank indexes in the response
([#1087](#1087))
([fe6366a](fe6366a))


### Bug Fixes

* **compogen:** remove redundant escape characters
([#1089](#1089))
([9d21061](9d21061))
* **component,ai,instillmodel:** fix outdated data struct
([#1095](#1095))
([c81f59c](c81f59c))
* **component,ai,instillmodel:** resolve panics and test failures
([#1100](#1100))
([34fc930](34fc930))
* **recipe:** support nil, null, undefined for condition field
([#1091](#1091))
([a249070](a249070))
* **usage:** treat input rendering error as fatal
([#1098](#1098))
([06c8025](06c8025))


### Miscellaneous

* **ce:** release v0.60.0
([#1099](#1099))
([09c5c0f](09c5c0f))
* **compogen:** update component document layout
([#1090](#1090))
([5613ee3](5613ee3))
* **component,ai:** remove unused files
([#1094](#1094))
([11b0f4a](11b0f4a))
* **component,gemini:** optimize the IO struct
([#1092](#1092))
([a0772d2](a0772d2))
* **data,component,gemini:** improve error msg
([#1093](#1093))
([c2ea248](c2ea248))
* **data:** improve unified Instill Type data presentation
([#1078](#1078))
([abcccd6](abcccd6))


### Documentation

* **component:** update description format
([#1084](#1084))
([faaaed0](faaaed0))


### Refactor

* **component,ai,gemini:** enhance document processing with text …
([#1097](#1097))
([38639c6](38639c6))


### Tests

* **integration:** tinker script
([#1083](#1083))
([1712bb6](1712bb6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
pinglin added a commit to instill-ai/instill-core that referenced this pull request Sep 18, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`19480ec`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `19480ec`.

## Changes in pipeline-backend
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)
- chore(main): release 0.60.0 (instill-ai/pipeline-backend#1086)
- chore(ce): release v0.60.0 (instill-ai/pipeline-backend#1099)
- fix(component,ai,instillmodel): resolve panics and test failures
(instill-ai/pipeline-backend#1100)
- fix(usage): treat input rendering error as fatal
(instill-ai/pipeline-backend#1098)
- refactor(component,ai,gemini): enhance document processing with text …
(instill-ai/pipeline-backend#1097)
- ci(gitignore): ignore .cursor folder
(instill-ai/pipeline-backend#1096)
- fix(component,ai,instillmodel): fix outdated data struct
(instill-ai/pipeline-backend#1095)
- chore(component,ai): remove unused files
(instill-ai/pipeline-backend#1094)
- chore(data,component,gemini): improve error msg
(instill-ai/pipeline-backend#1093)
- chore(component,gemini): optimize the IO struct
(instill-ai/pipeline-backend#1092)
- fix(recipe): support nil, null, undefined for condition field
(instill-ai/pipeline-backend#1091)

Co-authored-by: pinglin <628430+pinglin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants