Skip to content

Conversation

@pinglin
Copy link
Member

@pinglin pinglin commented Sep 18, 2025

Because

  • Unit tests were failing with LibreOffice exit status 77 errors when external dependencies were missing
  • Test execution was extremely slow due to repeated file I/O operations and lack of parallelization
  • Tests had poor reliability in CI/CD environments without external tools installed
  • Code duplication existed across multiple test files with no shared utilities

This commit

  • Fixes dependency failures: Added graceful skipping of tests requiring LibreOffice, Python, or other external tools when they're not available
  • Optimizes performance: Implemented thread-safe file content caching with 1,229x performance improvement (8,107 ns/op → 6.593 ns/op)
  • Enables parallel execution: Added c.Parallel() to all test functions and subtests for concurrent execution
  • Creates fast test suite: Added fast_test.go with dependency-free tests that complete in ~0.35 seconds for CI/CD pipelines
  • Adds performance benchmarks: Created benchmark_test.go to measure and track optimization improvements
  • Consolidates utilities: Extracted shared helper functions into test_utils.go to eliminate code duplication
  • Maintains backward compatibility: Preserves all existing test functionality while adding graceful degradation
  • Improves test organization: Separates dependency-heavy tests from lightweight ones for better maintainability

@pinglin pinglin merged commit 19480ec into main Sep 18, 2025
3 of 4 checks passed
@pinglin pinglin deleted the pinglin/perf-document-optimize-unit-tests branch September 18, 2025 23:03
pinglin added a commit to instill-ai/instill-core that referenced this pull request Sep 18, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`19480ec`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `19480ec`.

## Changes in pipeline-backend
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)
- chore(main): release 0.60.0 (instill-ai/pipeline-backend#1086)
- chore(ce): release v0.60.0 (instill-ai/pipeline-backend#1099)
- fix(component,ai,instillmodel): resolve panics and test failures
(instill-ai/pipeline-backend#1100)
- fix(usage): treat input rendering error as fatal
(instill-ai/pipeline-backend#1098)
- refactor(component,ai,gemini): enhance document processing with text …
(instill-ai/pipeline-backend#1097)
- ci(gitignore): ignore .cursor folder
(instill-ai/pipeline-backend#1096)
- fix(component,ai,instillmodel): fix outdated data struct
(instill-ai/pipeline-backend#1095)
- chore(component,ai): remove unused files
(instill-ai/pipeline-backend#1094)
- chore(data,component,gemini): improve error msg
(instill-ai/pipeline-backend#1093)
- chore(component,gemini): optimize the IO struct
(instill-ai/pipeline-backend#1092)
- fix(recipe): support nil, null, undefined for condition field
(instill-ai/pipeline-backend#1091)

Co-authored-by: pinglin <628430+pinglin@users.noreply.github.com>
pinglin added a commit that referenced this pull request Sep 18, 2025
…ffice dependency failures (#1110)

Because

- Unit tests were failing with LibreOffice exit status 77 errors when
external dependencies were missing
- Test execution was extremely slow due to repeated file I/O operations
and lack of parallelization
- Tests had poor reliability in CI/CD environments without external
tools installed
- Code duplication existed across multiple test files with no shared
utilities

### This commit

- **Fixes dependency failures**: Added graceful skipping of tests
requiring LibreOffice, Python, or other external tools when they're not
available
- **Optimizes performance**: Implemented thread-safe file content
caching with 1,229x performance improvement (8,107 ns/op → 6.593 ns/op)
- **Enables parallel execution**: Added `c.Parallel()` to all test
functions and subtests for concurrent execution
- **Creates fast test suite**: Added `fast_test.go` with dependency-free
tests that complete in ~0.35 seconds for CI/CD pipelines
- **Adds performance benchmarks**: Created `benchmark_test.go` to
measure and track optimization improvements
- **Consolidates utilities**: Extracted shared helper functions into
`test_utils.go` to eliminate code duplication
- **Maintains backward compatibility**: Preserves all existing test
functionality while adding graceful degradation
- **Improves test organization**: Separates dependency-heavy tests from
lightweight ones for better maintainability
pinglin added a commit that referenced this pull request Sep 19, 2025
…ffice dependency failures (#1110)

Because

- Unit tests were failing with LibreOffice exit status 77 errors when
external dependencies were missing
- Test execution was extremely slow due to repeated file I/O operations
and lack of parallelization
- Tests had poor reliability in CI/CD environments without external
tools installed
- Code duplication existed across multiple test files with no shared
utilities

### This commit

- **Fixes dependency failures**: Added graceful skipping of tests
requiring LibreOffice, Python, or other external tools when they're not
available
- **Optimizes performance**: Implemented thread-safe file content
caching with 1,229x performance improvement (8,107 ns/op → 6.593 ns/op)
- **Enables parallel execution**: Added `c.Parallel()` to all test
functions and subtests for concurrent execution
- **Creates fast test suite**: Added `fast_test.go` with dependency-free
tests that complete in ~0.35 seconds for CI/CD pipelines
- **Adds performance benchmarks**: Created `benchmark_test.go` to
measure and track optimization improvements
- **Consolidates utilities**: Extracted shared helper functions into
`test_utils.go` to eliminate code duplication
- **Maintains backward compatibility**: Preserves all existing test
functionality while adding graceful degradation
- **Improves test organization**: Separates dependency-heavy tests from
lightweight ones for better maintainability
pinglin added a commit that referenced this pull request Sep 19, 2025
…ffice dependency failures (#1110)

Because

- Unit tests were failing with LibreOffice exit status 77 errors when
external dependencies were missing
- Test execution was extremely slow due to repeated file I/O operations
and lack of parallelization
- Tests had poor reliability in CI/CD environments without external
tools installed
- Code duplication existed across multiple test files with no shared
utilities

### This commit

- **Fixes dependency failures**: Added graceful skipping of tests
requiring LibreOffice, Python, or other external tools when they're not
available
- **Optimizes performance**: Implemented thread-safe file content
caching with 1,229x performance improvement (8,107 ns/op → 6.593 ns/op)
- **Enables parallel execution**: Added `c.Parallel()` to all test
functions and subtests for concurrent execution
- **Creates fast test suite**: Added `fast_test.go` with dependency-free
tests that complete in ~0.35 seconds for CI/CD pipelines
- **Adds performance benchmarks**: Created `benchmark_test.go` to
measure and track optimization improvements
- **Consolidates utilities**: Extracted shared helper functions into
`test_utils.go` to eliminate code duplication
- **Maintains backward compatibility**: Preserves all existing test
functionality while adding graceful degradation
- **Improves test organization**: Separates dependency-heavy tests from
lightweight ones for better maintainability
pinglin added a commit that referenced this pull request Sep 19, 2025
…ffice dependency failures (#1110)

Because

- Unit tests were failing with LibreOffice exit status 77 errors when
external dependencies were missing
- Test execution was extremely slow due to repeated file I/O operations
and lack of parallelization
- Tests had poor reliability in CI/CD environments without external
tools installed
- Code duplication existed across multiple test files with no shared
utilities

### This commit

- **Fixes dependency failures**: Added graceful skipping of tests
requiring LibreOffice, Python, or other external tools when they're not
available
- **Optimizes performance**: Implemented thread-safe file content
caching with 1,229x performance improvement (8,107 ns/op → 6.593 ns/op)
- **Enables parallel execution**: Added `c.Parallel()` to all test
functions and subtests for concurrent execution
- **Creates fast test suite**: Added `fast_test.go` with dependency-free
tests that complete in ~0.35 seconds for CI/CD pipelines
- **Adds performance benchmarks**: Created `benchmark_test.go` to
measure and track optimization improvements
- **Consolidates utilities**: Extracted shared helper functions into
`test_utils.go` to eliminate code duplication
- **Maintains backward compatibility**: Preserves all existing test
functionality while adding graceful degradation
- **Improves test organization**: Separates dependency-heavy tests from
lightweight ones for better maintainability
jvallesm added a commit to instill-ai/instill-core that referenced this pull request Sep 23, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`1b4cd1f`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `1b4cd1f`.

## Changes in pipeline-backend
- fix(text): correct positions on duplicate markdown chunks
(instill-ai/pipeline-backend#1120)
- refactor(component,generic,http): replace env-based URL validation
with constructor injection (instill-ai/pipeline-backend#1121)
- fix(usage): add missing error filtering for users/admin
(instill-ai/pipeline-backend#1119)
- feat(component,ai,gemini): implement File API support for large files…
(instill-ai/pipeline-backend#1118)
- perf(data): optimize struct marshaling/unmarshaling with caching and …
(instill-ai/pipeline-backend#1117)
- feat(data): enhance unmarshaler with JSON string to struct conversion
(instill-ai/pipeline-backend#1116)
- feat(data): implement time types support with pattern validation
(instill-ai/pipeline-backend#1115)
- feat(component,ai,gemini): add multimedia support with unified format…
(instill-ai/pipeline-backend#1114)
- ci(workflows): adopt GitHub-hosted runner
(instill-ai/pipeline-backend#1113)
- perf(data): enhance comprehensive format coverage and optimize test
performance (instill-ai/pipeline-backend#1112)
- ci(workflows): adopt loarger runner for coverage test
(instill-ai/pipeline-backend#1111)
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)

Co-authored-by: jvallesm <3977183+jvallesm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants