Skip to content

Conversation

@pinglin
Copy link
Member

@pinglin pinglin commented Sep 19, 2025

Because

  • Unit tests for pkg/data were taking too long due to large test files and redundant test cases
  • Audio tests only covered 3 out of 8 supported formats (37.5% coverage)
  • Video tests only covered 2 out of 8 supported formats (25% coverage)
  • Image tests only covered 3 out of 6 supported formats (50% coverage)
  • Document tests lacked comprehensive format coverage and data URI handling
  • Large test files and unused test data were impacting CI/CD performance

This commit

  • Optimizes test performance by 92.8% - reduced execution time from 49.111s to 3.534s
  • Achieves 100% audio format coverage - adds tests for AAC, FLAC, M4A, WMA, AIFF formats (8/8 formats)
  • Achieves 87.5% video format coverage - adds tests for AVI, WebM, MKV, FLV, MPEG formats (7/8 formats, WMA excluded due to platform limitations)
  • Achieves 100% image format coverage - adds tests for GIF, BMP, WEBP formats (6/6 formats)
  • Enhances document format coverage - adds comprehensive tests for 13 document formats including data URI handling
  • Creates optimized small test files - generates lightweight media files (1-60KB) using ffmpeg and ImageMagick
  • Removes unused test data - cleans up 16.7% of testdata directory size by removing obsolete large files
  • Adds comprehensive test functions - introduces TestAllSupportedAudioFormats, TestAllSupportedVideoFormats, TestAllSupportedImageFormats, and TestAllSupportedDocumentFormats
  • Improves test reliability - adjusts tolerance values for duration/frame rate variations across different codecs and formats
  • Maintains network test compatibility - preserves URL-based tests using existing remote files while optimizing local file tests

Files Enhanced:

  • pkg/data/audio_test.go (221→308 lines): Complete 8-format coverage
  • pkg/data/video_test.go (249→335 lines): 7-format coverage with comprehensive conversion tests
  • pkg/data/image_test.go (Enhanced): 6-format coverage with all supported types
  • pkg/data/document_test.go (Enhanced): 13-format coverage with data URI support
  • pkg/data/testdata/ (Optimized): Added 12 new small test files, removed 7 unused large files

Performance Impact:

  • Test execution: 49.111s → 3.534s (92.8% improvement)
  • Format coverage: 37.5% → 100% (audio), 25% → 87.5% (video), 50% → 100% (image)
  • Testdata size: Reduced by 16.7% while adding comprehensive format support

@pinglin pinglin merged commit a64989f into main Sep 19, 2025
5 checks passed
@pinglin pinglin deleted the pinglin/perf-data-comprehensive-format-coverage branch September 19, 2025 02:34
pinglin added a commit that referenced this pull request Sep 19, 2025
…erformance (#1112)

Because

- Unit tests for `pkg/data` were taking too long due to large test files
and redundant test cases
- Audio tests only covered 3 out of 8 supported formats (37.5% coverage)
- Video tests only covered 2 out of 8 supported formats (25% coverage) 
- Image tests only covered 3 out of 6 supported formats (50% coverage)
- Document tests lacked comprehensive format coverage and data URI
handling
- Large test files and unused test data were impacting CI/CD performance

### This commit

- **Optimizes test performance by 92.8%** - reduced execution time from
49.111s to 3.534s
- **Achieves 100% audio format coverage** - adds tests for AAC, FLAC,
M4A, WMA, AIFF formats (8/8 formats)
- **Achieves 87.5% video format coverage** - adds tests for AVI, WebM,
MKV, FLV, MPEG formats (7/8 formats, WMA excluded due to platform
limitations)
- **Achieves 100% image format coverage** - adds tests for GIF, BMP,
WEBP formats (6/6 formats)
- **Enhances document format coverage** - adds comprehensive tests for
13 document formats including data URI handling
- **Creates optimized small test files** - generates lightweight media
files (1-60KB) using ffmpeg and ImageMagick
- **Removes unused test data** - cleans up 16.7% of testdata directory
size by removing obsolete large files
- **Adds comprehensive test functions** - introduces
`TestAllSupportedAudioFormats`, `TestAllSupportedVideoFormats`,
`TestAllSupportedImageFormats`, and `TestAllSupportedDocumentFormats`
- **Improves test reliability** - adjusts tolerance values for
duration/frame rate variations across different codecs and formats
- **Maintains network test compatibility** - preserves URL-based tests
using existing remote files while optimizing local file tests

**Files Enhanced:**
- `pkg/data/audio_test.go` (221→308 lines): Complete 8-format coverage
- `pkg/data/video_test.go` (249→335 lines): 7-format coverage with
comprehensive conversion tests
- `pkg/data/image_test.go` (Enhanced): 6-format coverage with all
supported types
- `pkg/data/document_test.go` (Enhanced): 13-format coverage with data
URI support
- `pkg/data/testdata/` (Optimized): Added 12 new small test files,
removed 7 unused large files

**Performance Impact:**
- Test execution: 49.111s → 3.534s (92.8% improvement)
- Format coverage: 37.5% → 100% (audio), 25% → 87.5% (video), 50% → 100%
(image)
- Testdata size: Reduced by 16.7% while adding comprehensive format
support
pinglin added a commit that referenced this pull request Sep 19, 2025
…erformance (#1112)

Because

- Unit tests for `pkg/data` were taking too long due to large test files
and redundant test cases
- Audio tests only covered 3 out of 8 supported formats (37.5% coverage)
- Video tests only covered 2 out of 8 supported formats (25% coverage) 
- Image tests only covered 3 out of 6 supported formats (50% coverage)
- Document tests lacked comprehensive format coverage and data URI
handling
- Large test files and unused test data were impacting CI/CD performance

### This commit

- **Optimizes test performance by 92.8%** - reduced execution time from
49.111s to 3.534s
- **Achieves 100% audio format coverage** - adds tests for AAC, FLAC,
M4A, WMA, AIFF formats (8/8 formats)
- **Achieves 87.5% video format coverage** - adds tests for AVI, WebM,
MKV, FLV, MPEG formats (7/8 formats, WMA excluded due to platform
limitations)
- **Achieves 100% image format coverage** - adds tests for GIF, BMP,
WEBP formats (6/6 formats)
- **Enhances document format coverage** - adds comprehensive tests for
13 document formats including data URI handling
- **Creates optimized small test files** - generates lightweight media
files (1-60KB) using ffmpeg and ImageMagick
- **Removes unused test data** - cleans up 16.7% of testdata directory
size by removing obsolete large files
- **Adds comprehensive test functions** - introduces
`TestAllSupportedAudioFormats`, `TestAllSupportedVideoFormats`,
`TestAllSupportedImageFormats`, and `TestAllSupportedDocumentFormats`
- **Improves test reliability** - adjusts tolerance values for
duration/frame rate variations across different codecs and formats
- **Maintains network test compatibility** - preserves URL-based tests
using existing remote files while optimizing local file tests

**Files Enhanced:**
- `pkg/data/audio_test.go` (221→308 lines): Complete 8-format coverage
- `pkg/data/video_test.go` (249→335 lines): 7-format coverage with
comprehensive conversion tests
- `pkg/data/image_test.go` (Enhanced): 6-format coverage with all
supported types
- `pkg/data/document_test.go` (Enhanced): 13-format coverage with data
URI support
- `pkg/data/testdata/` (Optimized): Added 12 new small test files,
removed 7 unused large files

**Performance Impact:**
- Test execution: 49.111s → 3.534s (92.8% improvement)
- Format coverage: 37.5% → 100% (audio), 25% → 87.5% (video), 50% → 100%
(image)
- Testdata size: Reduced by 16.7% while adding comprehensive format
support
pinglin added a commit that referenced this pull request Sep 19, 2025
…erformance (#1112)

Because

- Unit tests for `pkg/data` were taking too long due to large test files
and redundant test cases
- Audio tests only covered 3 out of 8 supported formats (37.5% coverage)
- Video tests only covered 2 out of 8 supported formats (25% coverage) 
- Image tests only covered 3 out of 6 supported formats (50% coverage)
- Document tests lacked comprehensive format coverage and data URI
handling
- Large test files and unused test data were impacting CI/CD performance

### This commit

- **Optimizes test performance by 92.8%** - reduced execution time from
49.111s to 3.534s
- **Achieves 100% audio format coverage** - adds tests for AAC, FLAC,
M4A, WMA, AIFF formats (8/8 formats)
- **Achieves 87.5% video format coverage** - adds tests for AVI, WebM,
MKV, FLV, MPEG formats (7/8 formats, WMA excluded due to platform
limitations)
- **Achieves 100% image format coverage** - adds tests for GIF, BMP,
WEBP formats (6/6 formats)
- **Enhances document format coverage** - adds comprehensive tests for
13 document formats including data URI handling
- **Creates optimized small test files** - generates lightweight media
files (1-60KB) using ffmpeg and ImageMagick
- **Removes unused test data** - cleans up 16.7% of testdata directory
size by removing obsolete large files
- **Adds comprehensive test functions** - introduces
`TestAllSupportedAudioFormats`, `TestAllSupportedVideoFormats`,
`TestAllSupportedImageFormats`, and `TestAllSupportedDocumentFormats`
- **Improves test reliability** - adjusts tolerance values for
duration/frame rate variations across different codecs and formats
- **Maintains network test compatibility** - preserves URL-based tests
using existing remote files while optimizing local file tests

**Files Enhanced:**
- `pkg/data/audio_test.go` (221→308 lines): Complete 8-format coverage
- `pkg/data/video_test.go` (249→335 lines): 7-format coverage with
comprehensive conversion tests
- `pkg/data/image_test.go` (Enhanced): 6-format coverage with all
supported types
- `pkg/data/document_test.go` (Enhanced): 13-format coverage with data
URI support
- `pkg/data/testdata/` (Optimized): Added 12 new small test files,
removed 7 unused large files

**Performance Impact:**
- Test execution: 49.111s → 3.534s (92.8% improvement)
- Format coverage: 37.5% → 100% (audio), 25% → 87.5% (video), 50% → 100%
(image)
- Testdata size: Reduced by 16.7% while adding comprehensive format
support
jvallesm added a commit to instill-ai/instill-core that referenced this pull request Sep 23, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`1b4cd1f`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `1b4cd1f`.

## Changes in pipeline-backend
- fix(text): correct positions on duplicate markdown chunks
(instill-ai/pipeline-backend#1120)
- refactor(component,generic,http): replace env-based URL validation
with constructor injection (instill-ai/pipeline-backend#1121)
- fix(usage): add missing error filtering for users/admin
(instill-ai/pipeline-backend#1119)
- feat(component,ai,gemini): implement File API support for large files…
(instill-ai/pipeline-backend#1118)
- perf(data): optimize struct marshaling/unmarshaling with caching and …
(instill-ai/pipeline-backend#1117)
- feat(data): enhance unmarshaler with JSON string to struct conversion
(instill-ai/pipeline-backend#1116)
- feat(data): implement time types support with pattern validation
(instill-ai/pipeline-backend#1115)
- feat(component,ai,gemini): add multimedia support with unified format…
(instill-ai/pipeline-backend#1114)
- ci(workflows): adopt GitHub-hosted runner
(instill-ai/pipeline-backend#1113)
- perf(data): enhance comprehensive format coverage and optimize test
performance (instill-ai/pipeline-backend#1112)
- ci(workflows): adopt loarger runner for coverage test
(instill-ai/pipeline-backend#1111)
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)

Co-authored-by: jvallesm <3977183+jvallesm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants