Skip to content

Conversation

@pinglin
Copy link
Member

@pinglin pinglin commented Sep 22, 2025

To address the PR comment:

Because

  • The HTTP component relied on implicit environment variables (GO_TESTING) to control URL validation behavior, making the component's security posture unclear and hard to test
  • URL validation logic was scattered and difficult to maintain, with security-critical code mixed with component logic
  • The component allowed dangerous localhost access in test mode without explicit configuration
  • There was no comprehensive test coverage for the URL validation logic, especially for internal service whitelisting

This commit

  • Introduces a URLValidator interface with constructor-based dependency injection, making validation behavior explicit and testable
  • Consolidates URL validation logic into a dedicated validator.go file with a unified urlValidator implementation
  • Replaces environment variable checks with configurable validator instances (NewURLValidator() for production, NewTestURLValidator() for testing)
  • Fixes a critical security vulnerability where test mode implicitly allowed all localhost/127.x.x.x addresses without explicit opt-in
  • Adds comprehensive test coverage including production whitelist testing for internal services (pipeline-backend:8081, model-backend:8083)
  • Preserves existing security guarantees: production mode only allows publicly available endpoints, blocking private/internal IP addresses
  • Maintains backward compatibility while making the component's behavior more explicit and secure
  • Consolidates multiple test initialization functions into a single flexible InitForTest() function
  • Restores important future work comments about internal service access patterns

@pinglin pinglin changed the title refactor(component,generic,http): replace environment-based URL valid… refactor(component,generic,http): replace environment-based URL validation with constructor injection Sep 22, 2025
@pinglin pinglin changed the title refactor(component,generic,http): replace environment-based URL validation with constructor injection refactor(component,generic,http): replace env-based URL validation with constructor injection Sep 22, 2025
@pinglin pinglin self-assigned this Sep 22, 2025
@pinglin pinglin merged commit f1f7d2f into main Sep 22, 2025
8 checks passed
@pinglin pinglin deleted the pinglin/refactor-http-constructor-validation branch September 22, 2025 19:17
jvallesm added a commit to instill-ai/instill-core that referenced this pull request Sep 23, 2025
Because
- The version of the pipeline-backend service is not updated in the
instill-core repository.

This commit
- updates the `PIPELINE_BACKEND_VERSION` in the `.env` file to
`1b4cd1f`.
- updates the `pipelineBackend.image.tag` in the helm chart values.yaml
file to `1b4cd1f`.

## Changes in pipeline-backend
- fix(text): correct positions on duplicate markdown chunks
(instill-ai/pipeline-backend#1120)
- refactor(component,generic,http): replace env-based URL validation
with constructor injection (instill-ai/pipeline-backend#1121)
- fix(usage): add missing error filtering for users/admin
(instill-ai/pipeline-backend#1119)
- feat(component,ai,gemini): implement File API support for large files…
(instill-ai/pipeline-backend#1118)
- perf(data): optimize struct marshaling/unmarshaling with caching and …
(instill-ai/pipeline-backend#1117)
- feat(data): enhance unmarshaler with JSON string to struct conversion
(instill-ai/pipeline-backend#1116)
- feat(data): implement time types support with pattern validation
(instill-ai/pipeline-backend#1115)
- feat(component,ai,gemini): add multimedia support with unified format…
(instill-ai/pipeline-backend#1114)
- ci(workflows): adopt GitHub-hosted runner
(instill-ai/pipeline-backend#1113)
- perf(data): enhance comprehensive format coverage and optimize test
performance (instill-ai/pipeline-backend#1112)
- ci(workflows): adopt loarger runner for coverage test
(instill-ai/pipeline-backend#1111)
- perf(component,operator,document): optimize unit tests and fix
LibreOffice dependency failures (instill-ai/pipeline-backend#1110)
- perf(component,operator,video): optimize unit test performance by
59.7% (instill-ai/pipeline-backend#1109)
- perf(component,operator,image): optimize unit tests for 98.5% faster …
(instill-ai/pipeline-backend#1107)
- ci(docker): optimize Dockerfiles with multi-stage builds for faster
build times (instill-ai/pipeline-backend#1108)
- perf(data): implement automatic field naming convention detection with
LRU caching (instill-ai/pipeline-backend#1105)
- feat(component,ai,gemini): enhance streaming to output all fields
(instill-ai/pipeline-backend#1106)
- fix(component,ai,gemini): correct text-based documents logic
(instill-ai/pipeline-backend#1103)
- test(component,generic,http): replace external httpbin.org dependency
with local test server (instill-ai/pipeline-backend#1101)
- ci(docker): add GitHub fallback for ffmpeg installation
(instill-ai/pipeline-backend#1102)

Co-authored-by: jvallesm <3977183+jvallesm@users.noreply.github.com>
Comment on lines +90 to +99
if tc.expectBlock {
c.Assert(err, qt.IsNotNil, qt.Commentf("Should be blocked: %s (%s)", tc.url, tc.reason))
} else {
// These might fail due to DNS in test environment, but that's expected
// The important thing is that the whitelist logic is in place
if err != nil {
// If it fails, it should be due to DNS, not whitelist logic
c.Assert(err.Error(), qt.Contains, "lookup", qt.Commentf("If blocked, should be due to DNS lookup, not whitelist logic"))
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if elses can be avoided, it tends to improve the code legibility

Suggested change
if tc.expectBlock {
c.Assert(err, qt.IsNotNil, qt.Commentf("Should be blocked: %s (%s)", tc.url, tc.reason))
} else {
// These might fail due to DNS in test environment, but that's expected
// The important thing is that the whitelist logic is in place
if err != nil {
// If it fails, it should be due to DNS, not whitelist logic
c.Assert(err.Error(), qt.Contains, "lookup", qt.Commentf("If blocked, should be due to DNS lookup, not whitelist logic"))
}
}
if tc.expectBlock {
c.Assert(err, qt.IsNotNil, qt.Commentf("Should be blocked: %s (%s)", tc.url, tc.reason))
return
}
// These might fail due to DNS in test environment, but that's expected
// The important thing is that the whitelist logic is in place
if err != nil {
// If it fails, it should be due to DNS, not whitelist logic
c.Assert(err.Error(), qt.Contains, "lookup", qt.Commentf("If blocked, should be due to DNS lookup, not whitelist logic"))
}

Comment on lines +31 to +38
// NewTestURLValidator creates a validator for testing
func NewTestURLValidator(whitelistedEndpoints []string, allowLocalhost bool) URLValidator {
return &urlValidator{
whitelistedEndpoints: whitelistedEndpoints,
allowLocalhost: allowLocalhost,
allowPrivateIPs: true, // Test mode allows external URLs by default
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define this in the non-production code (*_test.go files)?

Comment on lines +78 to +93
if !v.allowPrivateIPs {
prodWhitelist := []string{
// Pipeline's public port is exposed to call pipelines from pipelines.
// When a `pipeline` component is implemented, this won't be necessary.
fmt.Sprintf("%s:%d", config.Config.Server.InstanceID, config.Config.Server.PublicPort),
// Model's public port is exposed until the model component allows
// triggering models in the custom mode.
fmt.Sprintf("%s:%d", config.Config.ModelBackend.Host, config.Config.ModelBackend.PublicPort),
}
// Certain pipelines used by artifact-backend need to trigger pipelines and
// models via this component.
// TODO jvallesm: Remove this after INS-8119 is completed.
if slices.Contains(prodWhitelist, parsedURL.Host) {
return nil
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this validator gives us the possibility to move the the model and pipeline whitelisting away from the CE repo, as it is used only in production.
I'd make the whitelist parameter a constructor param for the validator and then in the program's main.go I'd read the whitelisted hosts from the config, create the URL validator with them and pass it to the component store.

However, I'm fine with having this here out of simplicity and fairness with CE users (they might also want to trigger pipelines or models from other pipelines)

// InitForTest creates a component instance for testing with configurable validation
// whitelist: URLs to allow (nil/empty = allow all external URLs)
// allowLocalhost: whether to allow localhost/127.x.x.x URLs
func InitForTest(bc base.Component, whitelist []string, allowLocalhost bool) *component {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd put this away from the production code

pinglin added a commit that referenced this pull request Sep 27, 2025
…nd improve code legibility (#1131)

Because

- PR #1121 comments requested moving test-specific functions out of
production code to improve code organization
- Nested if-else structures in test code reduced readability and could
be simplified with early returns

This commit

- Moves `NewTestURLValidator` from `validator.go` to `validator_test.go`
to keep test utilities in test files
- Moves `InitForTest` from `main.go` to `validator_test.go` to separate
test-specific component initialization from production code
- Refactors nested if-else structure in validator test (lines 90-99)
using early returns to improve code legibility
- Removes test functions from production code, reducing binary size and
improving maintainability
pinglin added a commit that referenced this pull request Sep 27, 2025
…nd improve code legibility (#1131)

Because

- PR #1121 comments requested moving test-specific functions out of
production code to improve code organization
- Nested if-else structures in test code reduced readability and could
be simplified with early returns

This commit

- Moves `NewTestURLValidator` from `validator.go` to `validator_test.go`
to keep test utilities in test files
- Moves `InitForTest` from `main.go` to `validator_test.go` to separate
test-specific component initialization from production code
- Refactors nested if-else structure in validator test (lines 90-99)
using early returns to improve code legibility
- Removes test functions from production code, reducing binary size and
improving maintainability
jvallesm pushed a commit that referenced this pull request Oct 7, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.61.0](v0.60.0...v0.61.0)
(2025-10-06)


### Features

* **component,ai,gemini:** add image generation support
([#1122](#1122))
([d986614](d986614))
* **component,ai,gemini:** add multimedia support with unified format…
([#1114](#1114))
([291b379](291b379))
* **component,ai,gemini:** add text embeddings task support
([#1129](#1129))
([d7ca6cf](d7ca6cf))
* **component,ai,gemini:** enhance streaming to output all fields
([#1106](#1106))
([dfb6b24](dfb6b24))
* **component,ai,gemini:** implement automatic format conversion for
unsupported media types
([#1128](#1128))
([f767b8a](f767b8a))
* **component,ai,gemini:** implement File API support for large files…
([#1118](#1118))
([b51c8f4](b51c8f4))
* **data:** add comprehensive AVIF image format support
([#1135](#1135))
([76d6941](76d6941))
* **data:** add HEIC/HEIF image support and normalize MIME types
([#1127](#1127))
([2dfa254](2dfa254))
* **data:** enhance unmarshaler with JSON string to struct conversion
([#1116](#1116))
([9e06b7c](9e06b7c))
* **data:** implement time types support with pattern validation
([#1115](#1115))
([79630c0](79630c0))


### Bug Fixes

* **compogen:** escape curly braces for readme.com compatibility
([#1124](#1124))
([904992d](904992d))
* **component,ai,gemini:** add operation validation for cache task
([#1130](#1130))
([9e19255](9e19255))
* **component,ai,gemini:** correct text-based documents logic
([#1103](#1103))
([ed5a111](ed5a111))
* **component,ai,gemini:** unify InlineData processing and enable images
in streaming responses
([#1125](#1125))
([3117046](3117046))
* **data:** remove duplicate dot in generated filenames
([#1136](#1136))
([0a74a00](0a74a00))
* **external:** fix Content-Disposition header parsing for filename
extraction
([#1132](#1132))
([869b081](869b081))
* **service:** handle null JSON metadata in pipeline conversion
([#1134](#1134))
([b244784](b244784))
* **text:** correct positions on duplicate markdown chunks
([#1120](#1120))
([1b4cd1f](1b4cd1f))
* **usage:** add missing error filtering for users/admin
([#1119](#1119))
([cd1bd55](cd1bd55))


### Refactor

* **component,ai,gemini:** merge usage and usage-metadata fields into
single usage field
([#1126](#1126))
([a6046cd](a6046cd))
* **component,ai.gemini:** standardize file api timeout and use native
embedding type
([#1133](#1133))
([174f7d6](174f7d6))
* **component,generic,http:** move test functions to test files and
improve code legibility
([#1131](#1131))
([1153a09](1153a09))
* **component,generic,http:** replace env-based URL validation with
constructor injection
([#1121](#1121))
([f1f7d2f](f1f7d2f))


### Tests

* **component,generic,http:** replace external httpbin.org dependency
with local test server
([#1101](#1101))
([a82d155](a82d155))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
donch1989 pushed a commit that referenced this pull request Oct 7, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.61.0](v0.60.0...v0.61.0)
(2025-10-07)


### Features

* **component,ai,gemini:** add image generation support
([#1122](#1122))
([d986614](d986614))
* **component,ai,gemini:** add multimedia support with unified format…
([#1114](#1114))
([291b379](291b379))
* **component,ai,gemini:** add text embeddings task support
([#1129](#1129))
([d7ca6cf](d7ca6cf))
* **component,ai,gemini:** enhance streaming to output all fields
([#1106](#1106))
([dfb6b24](dfb6b24))
* **component,ai,gemini:** implement automatic format conversion for
unsupported media types
([#1128](#1128))
([f767b8a](f767b8a))
* **component,ai,gemini:** implement File API support for large files…
([#1118](#1118))
([b51c8f4](b51c8f4))
* **data:** add comprehensive AVIF image format support
([#1135](#1135))
([76d6941](76d6941))
* **data:** add HEIC/HEIF image support and normalize MIME types
([#1127](#1127))
([2dfa254](2dfa254))
* **data:** enhance unmarshaler with JSON string to struct conversion
([#1116](#1116))
([9e06b7c](9e06b7c))
* **data:** implement time types support with pattern validation
([#1115](#1115))
([79630c0](79630c0))


### Bug Fixes

* **compogen:** escape curly braces for readme.com compatibility
([#1124](#1124))
([904992d](904992d))
* **component,ai,gemini:** add operation validation for cache task
([#1130](#1130))
([9e19255](9e19255))
* **component,ai,gemini:** correct text-based documents logic
([#1103](#1103))
([ed5a111](ed5a111))
* **component,ai,gemini:** unify InlineData processing and enable images
in streaming responses
([#1125](#1125))
([3117046](3117046))
* **component,document:** fix incorrect expected value in the unit test
([#1138](#1138))
([189dbd6](189dbd6))
* **data:** remove duplicate dot in generated filenames
([#1136](#1136))
([0a74a00](0a74a00))
* **external:** fix Content-Disposition header parsing for filename
extraction
([#1132](#1132))
([869b081](869b081))
* **service:** handle null JSON metadata in pipeline conversion
([#1134](#1134))
([b244784](b244784))
* **text:** correct positions on duplicate markdown chunks
([#1120](#1120))
([1b4cd1f](1b4cd1f))
* **usage:** add missing error filtering for users/admin
([#1119](#1119))
([cd1bd55](cd1bd55))


### Miscellaneous

* release v0.61.0
([e1db93c](e1db93c))


### Refactor

* **component,ai,gemini:** merge usage and usage-metadata fields into
single usage field
([#1126](#1126))
([a6046cd](a6046cd))
* **component,ai.gemini:** standardize file api timeout and use native
embedding type
([#1133](#1133))
([174f7d6](174f7d6))
* **component,generic,http:** move test functions to test files and
improve code legibility
([#1131](#1131))
([1153a09](1153a09))
* **component,generic,http:** replace env-based URL validation with
constructor injection
([#1121](#1121))
([f1f7d2f](f1f7d2f))


### Tests

* **component,generic,http:** replace external httpbin.org dependency
with local test server
([#1101](#1101))
([a82d155](a82d155))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants