-
Notifications
You must be signed in to change notification settings - Fork 22
refactor(component,generic,http): replace env-based URL validation with constructor injection #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ation with constructor injection
Because - The version of the pipeline-backend service is not updated in the instill-core repository. This commit - updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to `1b4cd1f`. - updates the `pipelineBackend.image.tag` in the helm chart values.yaml file to `1b4cd1f`. ## Changes in pipeline-backend - fix(text): correct positions on duplicate markdown chunks (instill-ai/pipeline-backend#1120) - refactor(component,generic,http): replace env-based URL validation with constructor injection (instill-ai/pipeline-backend#1121) - fix(usage): add missing error filtering for users/admin (instill-ai/pipeline-backend#1119) - feat(component,ai,gemini): implement File API support for large files… (instill-ai/pipeline-backend#1118) - perf(data): optimize struct marshaling/unmarshaling with caching and … (instill-ai/pipeline-backend#1117) - feat(data): enhance unmarshaler with JSON string to struct conversion (instill-ai/pipeline-backend#1116) - feat(data): implement time types support with pattern validation (instill-ai/pipeline-backend#1115) - feat(component,ai,gemini): add multimedia support with unified format… (instill-ai/pipeline-backend#1114) - ci(workflows): adopt GitHub-hosted runner (instill-ai/pipeline-backend#1113) - perf(data): enhance comprehensive format coverage and optimize test performance (instill-ai/pipeline-backend#1112) - ci(workflows): adopt loarger runner for coverage test (instill-ai/pipeline-backend#1111) - perf(component,operator,document): optimize unit tests and fix LibreOffice dependency failures (instill-ai/pipeline-backend#1110) - perf(component,operator,video): optimize unit test performance by 59.7% (instill-ai/pipeline-backend#1109) - perf(component,operator,image): optimize unit tests for 98.5% faster … (instill-ai/pipeline-backend#1107) - ci(docker): optimize Dockerfiles with multi-stage builds for faster build times (instill-ai/pipeline-backend#1108) - perf(data): implement automatic field naming convention detection with LRU caching (instill-ai/pipeline-backend#1105) - feat(component,ai,gemini): enhance streaming to output all fields (instill-ai/pipeline-backend#1106) - fix(component,ai,gemini): correct text-based documents logic (instill-ai/pipeline-backend#1103) - test(component,generic,http): replace external httpbin.org dependency with local test server (instill-ai/pipeline-backend#1101) - ci(docker): add GitHub fallback for ffmpeg installation (instill-ai/pipeline-backend#1102) Co-authored-by: jvallesm <3977183+jvallesm@users.noreply.github.com>
| if tc.expectBlock { | ||
| c.Assert(err, qt.IsNotNil, qt.Commentf("Should be blocked: %s (%s)", tc.url, tc.reason)) | ||
| } else { | ||
| // These might fail due to DNS in test environment, but that's expected | ||
| // The important thing is that the whitelist logic is in place | ||
| if err != nil { | ||
| // If it fails, it should be due to DNS, not whitelist logic | ||
| c.Assert(err.Error(), qt.Contains, "lookup", qt.Commentf("If blocked, should be due to DNS lookup, not whitelist logic")) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if elses can be avoided, it tends to improve the code legibility
| if tc.expectBlock { | |
| c.Assert(err, qt.IsNotNil, qt.Commentf("Should be blocked: %s (%s)", tc.url, tc.reason)) | |
| } else { | |
| // These might fail due to DNS in test environment, but that's expected | |
| // The important thing is that the whitelist logic is in place | |
| if err != nil { | |
| // If it fails, it should be due to DNS, not whitelist logic | |
| c.Assert(err.Error(), qt.Contains, "lookup", qt.Commentf("If blocked, should be due to DNS lookup, not whitelist logic")) | |
| } | |
| } | |
| if tc.expectBlock { | |
| c.Assert(err, qt.IsNotNil, qt.Commentf("Should be blocked: %s (%s)", tc.url, tc.reason)) | |
| return | |
| } | |
| // These might fail due to DNS in test environment, but that's expected | |
| // The important thing is that the whitelist logic is in place | |
| if err != nil { | |
| // If it fails, it should be due to DNS, not whitelist logic | |
| c.Assert(err.Error(), qt.Contains, "lookup", qt.Commentf("If blocked, should be due to DNS lookup, not whitelist logic")) | |
| } |
| // NewTestURLValidator creates a validator for testing | ||
| func NewTestURLValidator(whitelistedEndpoints []string, allowLocalhost bool) URLValidator { | ||
| return &urlValidator{ | ||
| whitelistedEndpoints: whitelistedEndpoints, | ||
| allowLocalhost: allowLocalhost, | ||
| allowPrivateIPs: true, // Test mode allows external URLs by default | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define this in the non-production code (*_test.go files)?
| if !v.allowPrivateIPs { | ||
| prodWhitelist := []string{ | ||
| // Pipeline's public port is exposed to call pipelines from pipelines. | ||
| // When a `pipeline` component is implemented, this won't be necessary. | ||
| fmt.Sprintf("%s:%d", config.Config.Server.InstanceID, config.Config.Server.PublicPort), | ||
| // Model's public port is exposed until the model component allows | ||
| // triggering models in the custom mode. | ||
| fmt.Sprintf("%s:%d", config.Config.ModelBackend.Host, config.Config.ModelBackend.PublicPort), | ||
| } | ||
| // Certain pipelines used by artifact-backend need to trigger pipelines and | ||
| // models via this component. | ||
| // TODO jvallesm: Remove this after INS-8119 is completed. | ||
| if slices.Contains(prodWhitelist, parsedURL.Host) { | ||
| return nil | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this validator gives us the possibility to move the the model and pipeline whitelisting away from the CE repo, as it is used only in production.
I'd make the whitelist parameter a constructor param for the validator and then in the program's main.go I'd read the whitelisted hosts from the config, create the URL validator with them and pass it to the component store.
However, I'm fine with having this here out of simplicity and fairness with CE users (they might also want to trigger pipelines or models from other pipelines)
| // InitForTest creates a component instance for testing with configurable validation | ||
| // whitelist: URLs to allow (nil/empty = allow all external URLs) | ||
| // allowLocalhost: whether to allow localhost/127.x.x.x URLs | ||
| func InitForTest(bc base.Component, whitelist []string, allowLocalhost bool) *component { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd put this away from the production code
…nd improve code legibility (#1131) Because - PR #1121 comments requested moving test-specific functions out of production code to improve code organization - Nested if-else structures in test code reduced readability and could be simplified with early returns This commit - Moves `NewTestURLValidator` from `validator.go` to `validator_test.go` to keep test utilities in test files - Moves `InitForTest` from `main.go` to `validator_test.go` to separate test-specific component initialization from production code - Refactors nested if-else structure in validator test (lines 90-99) using early returns to improve code legibility - Removes test functions from production code, reducing binary size and improving maintainability
…nd improve code legibility (#1131) Because - PR #1121 comments requested moving test-specific functions out of production code to improve code organization - Nested if-else structures in test code reduced readability and could be simplified with early returns This commit - Moves `NewTestURLValidator` from `validator.go` to `validator_test.go` to keep test utilities in test files - Moves `InitForTest` from `main.go` to `validator_test.go` to separate test-specific component initialization from production code - Refactors nested if-else structure in validator test (lines 90-99) using early returns to improve code legibility - Removes test functions from production code, reducing binary size and improving maintainability
🤖 I have created a release *beep* *boop* --- ## [0.61.0](v0.60.0...v0.61.0) (2025-10-06) ### Features * **component,ai,gemini:** add image generation support ([#1122](#1122)) ([d986614](d986614)) * **component,ai,gemini:** add multimedia support with unified format… ([#1114](#1114)) ([291b379](291b379)) * **component,ai,gemini:** add text embeddings task support ([#1129](#1129)) ([d7ca6cf](d7ca6cf)) * **component,ai,gemini:** enhance streaming to output all fields ([#1106](#1106)) ([dfb6b24](dfb6b24)) * **component,ai,gemini:** implement automatic format conversion for unsupported media types ([#1128](#1128)) ([f767b8a](f767b8a)) * **component,ai,gemini:** implement File API support for large files… ([#1118](#1118)) ([b51c8f4](b51c8f4)) * **data:** add comprehensive AVIF image format support ([#1135](#1135)) ([76d6941](76d6941)) * **data:** add HEIC/HEIF image support and normalize MIME types ([#1127](#1127)) ([2dfa254](2dfa254)) * **data:** enhance unmarshaler with JSON string to struct conversion ([#1116](#1116)) ([9e06b7c](9e06b7c)) * **data:** implement time types support with pattern validation ([#1115](#1115)) ([79630c0](79630c0)) ### Bug Fixes * **compogen:** escape curly braces for readme.com compatibility ([#1124](#1124)) ([904992d](904992d)) * **component,ai,gemini:** add operation validation for cache task ([#1130](#1130)) ([9e19255](9e19255)) * **component,ai,gemini:** correct text-based documents logic ([#1103](#1103)) ([ed5a111](ed5a111)) * **component,ai,gemini:** unify InlineData processing and enable images in streaming responses ([#1125](#1125)) ([3117046](3117046)) * **data:** remove duplicate dot in generated filenames ([#1136](#1136)) ([0a74a00](0a74a00)) * **external:** fix Content-Disposition header parsing for filename extraction ([#1132](#1132)) ([869b081](869b081)) * **service:** handle null JSON metadata in pipeline conversion ([#1134](#1134)) ([b244784](b244784)) * **text:** correct positions on duplicate markdown chunks ([#1120](#1120)) ([1b4cd1f](1b4cd1f)) * **usage:** add missing error filtering for users/admin ([#1119](#1119)) ([cd1bd55](cd1bd55)) ### Refactor * **component,ai,gemini:** merge usage and usage-metadata fields into single usage field ([#1126](#1126)) ([a6046cd](a6046cd)) * **component,ai.gemini:** standardize file api timeout and use native embedding type ([#1133](#1133)) ([174f7d6](174f7d6)) * **component,generic,http:** move test functions to test files and improve code legibility ([#1131](#1131)) ([1153a09](1153a09)) * **component,generic,http:** replace env-based URL validation with constructor injection ([#1121](#1121)) ([f1f7d2f](f1f7d2f)) ### Tests * **component,generic,http:** replace external httpbin.org dependency with local test server ([#1101](#1101)) ([a82d155](a82d155)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.61.0](v0.60.0...v0.61.0) (2025-10-07) ### Features * **component,ai,gemini:** add image generation support ([#1122](#1122)) ([d986614](d986614)) * **component,ai,gemini:** add multimedia support with unified format… ([#1114](#1114)) ([291b379](291b379)) * **component,ai,gemini:** add text embeddings task support ([#1129](#1129)) ([d7ca6cf](d7ca6cf)) * **component,ai,gemini:** enhance streaming to output all fields ([#1106](#1106)) ([dfb6b24](dfb6b24)) * **component,ai,gemini:** implement automatic format conversion for unsupported media types ([#1128](#1128)) ([f767b8a](f767b8a)) * **component,ai,gemini:** implement File API support for large files… ([#1118](#1118)) ([b51c8f4](b51c8f4)) * **data:** add comprehensive AVIF image format support ([#1135](#1135)) ([76d6941](76d6941)) * **data:** add HEIC/HEIF image support and normalize MIME types ([#1127](#1127)) ([2dfa254](2dfa254)) * **data:** enhance unmarshaler with JSON string to struct conversion ([#1116](#1116)) ([9e06b7c](9e06b7c)) * **data:** implement time types support with pattern validation ([#1115](#1115)) ([79630c0](79630c0)) ### Bug Fixes * **compogen:** escape curly braces for readme.com compatibility ([#1124](#1124)) ([904992d](904992d)) * **component,ai,gemini:** add operation validation for cache task ([#1130](#1130)) ([9e19255](9e19255)) * **component,ai,gemini:** correct text-based documents logic ([#1103](#1103)) ([ed5a111](ed5a111)) * **component,ai,gemini:** unify InlineData processing and enable images in streaming responses ([#1125](#1125)) ([3117046](3117046)) * **component,document:** fix incorrect expected value in the unit test ([#1138](#1138)) ([189dbd6](189dbd6)) * **data:** remove duplicate dot in generated filenames ([#1136](#1136)) ([0a74a00](0a74a00)) * **external:** fix Content-Disposition header parsing for filename extraction ([#1132](#1132)) ([869b081](869b081)) * **service:** handle null JSON metadata in pipeline conversion ([#1134](#1134)) ([b244784](b244784)) * **text:** correct positions on duplicate markdown chunks ([#1120](#1120)) ([1b4cd1f](1b4cd1f)) * **usage:** add missing error filtering for users/admin ([#1119](#1119)) ([cd1bd55](cd1bd55)) ### Miscellaneous * release v0.61.0 ([e1db93c](e1db93c)) ### Refactor * **component,ai,gemini:** merge usage and usage-metadata fields into single usage field ([#1126](#1126)) ([a6046cd](a6046cd)) * **component,ai.gemini:** standardize file api timeout and use native embedding type ([#1133](#1133)) ([174f7d6](174f7d6)) * **component,generic,http:** move test functions to test files and improve code legibility ([#1131](#1131)) ([1153a09](1153a09)) * **component,generic,http:** replace env-based URL validation with constructor injection ([#1121](#1121)) ([f1f7d2f](f1f7d2f)) ### Tests * **component,generic,http:** replace external httpbin.org dependency with local test server ([#1101](#1101)) ([a82d155](a82d155)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
To address the PR comment:
Because
GO_TESTING) to control URL validation behavior, making the component's security posture unclear and hard to testThis commit
URLValidatorinterface with constructor-based dependency injection, making validation behavior explicit and testablevalidator.gofile with a unifiedurlValidatorimplementationNewURLValidator()for production,NewTestURLValidator()for testing)pipeline-backend:8081,model-backend:8083)InitForTest()function