Skip to content

Conversation

@pinglin
Copy link
Member

@pinglin pinglin commented Sep 18, 2025

Because

  • External packages like genai use camelCase json tags while Instill's schema uses kebab-case, causing unmarshaling incompatibility
  • Nested Go struct fields using different naming conventions can cause silent data loss - when top-level structs use instill tags but nested external types use json tags with different conventions, field mappings fail and data is silently dropped
  • Manual specification of naming conventions creates confusion and maintenance overhead for users
  • Repeated field name resolution operations impact performance in high-throughput scenarios
  • Code duplication existed between marshaling and unmarshaling logic for handling different naming conventions

This commit

  • Implements automatic naming convention detection that tries multiple formats (kebab-case, camelCase, snake_case, PascalCase) and uses the one that exists in input data
  • Prevents silent data loss in nested structures by recursively applying naming convention detection to all struct levels, ensuring external package types with camelCase/PascalCase/snake_case json tags are properly mapped even when the schema provides kebab-case field names
  • Adds LRU cache with 200-entry capacity for struct field name mappings to improve performance
  • Adds comprehensive benchmark tests showing 46% performance improvement for marshaling operations
  • Maintains full backward compatibility while eliminating user configuration complexity

Performance Impact:

  • Marshaling: 46% faster (801ns vs 1491ns), 53% less memory, 58% fewer allocations
  • Unmarshaling: Minimal overhead (~2%) with automatic detection
  • Memory bounded LRU cache prevents unlimited growth in long-running services

@pinglin pinglin force-pushed the pinglin/perf-data-automatic-naming-convention-detection branch from ccb0f63 to d53e001 Compare September 18, 2025 18:35
@pinglin pinglin force-pushed the pinglin/perf-data-automatic-naming-convention-detection branch from d53e001 to 301dea6 Compare September 18, 2025 20:41
@pinglin pinglin merged commit f1148f1 into main Sep 18, 2025
6 checks passed
@pinglin pinglin deleted the pinglin/perf-data-automatic-naming-convention-detection branch September 18, 2025 21:32
pinglin added a commit to instill-ai/instill-core that referenced this pull request Sep 18, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`19480ec`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `19480ec`.

## Changes in pipeline-backend
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)
- chore(main): release 0.60.0 (instill-ai/pipeline-backend#1086)
- chore(ce): release v0.60.0 (instill-ai/pipeline-backend#1099)
- fix(component,ai,instillmodel): resolve panics and test failures
(instill-ai/pipeline-backend#1100)
- fix(usage): treat input rendering error as fatal
(instill-ai/pipeline-backend#1098)
- refactor(component,ai,gemini): enhance document processing with text …
(instill-ai/pipeline-backend#1097)
- ci(gitignore): ignore .cursor folder
(instill-ai/pipeline-backend#1096)
- fix(component,ai,instillmodel): fix outdated data struct
(instill-ai/pipeline-backend#1095)
- chore(component,ai): remove unused files
(instill-ai/pipeline-backend#1094)
- chore(data,component,gemini): improve error msg
(instill-ai/pipeline-backend#1093)
- chore(component,gemini): optimize the IO struct
(instill-ai/pipeline-backend#1092)
- fix(recipe): support nil, null, undefined for condition field
(instill-ai/pipeline-backend#1091)

Co-authored-by: pinglin <628430+pinglin@users.noreply.github.com>
jvallesm added a commit to instill-ai/instill-core that referenced this pull request Sep 23, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`1b4cd1f`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `1b4cd1f`.

## Changes in pipeline-backend
- fix(text): correct positions on duplicate markdown chunks
(instill-ai/pipeline-backend#1120)
- refactor(component,generic,http): replace env-based URL validation
with constructor injection (instill-ai/pipeline-backend#1121)
- fix(usage): add missing error filtering for users/admin
(instill-ai/pipeline-backend#1119)
- feat(component,ai,gemini): implement File API support for large files…
(instill-ai/pipeline-backend#1118)
- perf(data): optimize struct marshaling/unmarshaling with caching and …
(instill-ai/pipeline-backend#1117)
- feat(data): enhance unmarshaler with JSON string to struct conversion
(instill-ai/pipeline-backend#1116)
- feat(data): implement time types support with pattern validation
(instill-ai/pipeline-backend#1115)
- feat(component,ai,gemini): add multimedia support with unified format…
(instill-ai/pipeline-backend#1114)
- ci(workflows): adopt GitHub-hosted runner
(instill-ai/pipeline-backend#1113)
- perf(data): enhance comprehensive format coverage and optimize test
performance (instill-ai/pipeline-backend#1112)
- ci(workflows): adopt loarger runner for coverage test
(instill-ai/pipeline-backend#1111)
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)

Co-authored-by: jvallesm <3977183+jvallesm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants