Skip to content

Comments

Validate Canonical field URLs match retrieval location#42

Closed
Copilot wants to merge 11 commits intomainfrom
copilot/validate-canonical-field
Closed

Validate Canonical field URLs match retrieval location#42
Copilot wants to merge 11 commits intomainfrom
copilot/validate-canonical-field

Conversation

Copy link

Copilot AI commented Dec 4, 2025

Implemented validation for the Canonical field as specified in RFC 9116 section 2.5.2.

If this field appears within a "security.txt" file and the URI used to retrieve that file is not listed within any canonical fields, then the contents of the file SHOULD NOT be trusted.

Changes Made

  • New violation class: SecurityTxtCanonicalUrlMismatch warns when the fetched URL doesn't match any canonical URLs
  • Validation logic: CanonicalUrlFieldValidator in Validator\Validators namespace validates that the retrieval URL is listed in the Canonical field
  • Implements FieldValidator: Validator implements the standard FieldValidator interface and is added to the $fieldValidators array
  • SecurityTxt stores fetched URL: Added withFetchedUrl() and getFetchedUrl() methods to store the retrieval URL context
  • Parser integration: parseString() sets fetched URL on SecurityTxt object before validation when fetch result is available
  • Dynamic format generation: Violation messages use individual placeholders for each URL (singular/plural aware)
  • Strict mode support: Files with mismatched canonical URLs are marked invalid in strict mode

Implementation Details

  1. CanonicalUrlFieldValidator - Implements FieldValidator interface, validates canonical URLs by checking getFetchedUrl() on SecurityTxt
  2. SecurityTxt - Added withFetchedUrl() and getFetchedUrl() methods to store fetch context
  3. SecurityTxtValidator - Includes CanonicalUrlFieldValidator in $fieldValidators array alongside other field validators
  4. SecurityTxtParser::parseString() - Sets fetched URL on SecurityTxt before validation when fetch result is available
  5. Uniform validation - All validators run through the same loop with no special handling

Behavior

When a security.txt file is fetched and contains a Canonical field that doesn't list the retrieval URL:

[Warning] The file was fetched from https://example.com/.well-known/security.txt 
          but the Canonical field does not list this URL
[Warning] (How to fix: Add the URL https://example.com/.well-known/security.txt 
          to the Canonical field, or ensure the file is fetched from the listed 
          canonical URL: https://www.example.com/.well-known/security.txt)

No validation occurs when:

  • The Canonical field is absent (per RFC, it's optional)
  • Validating without a fetch result (e.g., parseString() called directly without fetchResult parameter)

Multiple canonical URLs are supported—validation passes if any matches the retrieval URL.

Testing

  • ✅ Manual testing with example from issue (michalspacek.cz)
  • ✅ Unit tests for canonical URL mismatch, match, and absent scenarios
  • ✅ Code review and security checks completed
  • ✅ Functionality validated with and without fetch result context
Original prompt

This section details on the original issue you should resolve

<issue_title>Validate the Canonical field</issue_title>
<issue_description>> If this field appears within a "security.txt" file and the URI used to retrieve that file is not listed within any canonical fields, then the contents of the file SHOULD NOT be trusted.

https://www.rfc-editor.org/rfc/rfc9116#section-2.5.2-3

Right now the result of the check of michalspacek.cz (no www) is fine:

$ bin/checksecuritytxt.php michalspacek.cz --no-ipv6
[Info] Parsing security.txt for michalspacek.cz
[Info] Loading security.txt from https://michalspacek.cz/.well-known/security.txt
[Info] Loading security.txt from https://michalspacek.cz/security.txt
[Info] Redirected from https://michalspacek.cz/security.txt to https://michalspacek.cz/.well-known/security.txt
[Info] Selecting security.txt located at https://michalspacek.cz/.well-known/security.txt for further tests
[Info] The file will expire in 271 days (2026-09-01T00:00:00+00:00)
[Info] Signature valid, key 4BD4C403AF2F9FCCB151FE61B64BDD6E464AB529, signed on 2025-09-23T15:02:54+00:00

But the Canonical field URL is with www:

spaze@over9k:~/libs/security-txt@main$ curl https://michalspacek.cz/.well-known/security.txt | grep Canonical
Canonical: https://www.michalspacek.cz/.well-known/security.txt
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Created SecurityTxtCanonicalUrlMismatch violation class
- Added validateCanonicalUrls method to SecurityTxtParser
- Modified parseFetchResult to validate canonical URLs against fetched URL
- Added tests for canonical URL validation
- Per RFC 9116 section 2.5.2, if Canonical field is present, the retrieval URL must be listed

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI changed the title [WIP] Validate the Canonical field in security.txt Validate Canonical field URLs match retrieval location Dec 4, 2025
Copilot AI requested a review from spaze December 4, 2025 00:28
Copilot AI and others added 2 commits December 4, 2025 14:42
…ral/singular

- Created CanonicalUrlValidator in Validator namespace
- Updated SecurityTxtCanonicalUrlMismatch to use singular/plural correctly
- Removed validateCanonicalUrls from SecurityTxtParser
- Updated all instantiations of SecurityTxtParser to pass the new validator

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI requested a review from spaze December 4, 2025 14:46
- Added withFetchedUrl() and getFetchedUrl() to SecurityTxt class
- Moved CanonicalUrlValidator to Validator\Validators namespace
- Made it implement FieldValidator interface
- Added to SecurityTxtValidator fieldValidators list
- Removed CanonicalUrlValidator from Parser constructor
- Parser now sets fetched URL on SecurityTxt object
- Updated violation to generate format dynamically with URL placeholders
- Removed all comments from validator as requested
- Changed $howToFixTemplate to $howToFixFormat
- No longer using implode for URLs in violation

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI requested a review from spaze December 4, 2025 18:28
Copilot AI and others added 2 commits December 4, 2025 18:44
- Removed fetchedUrl property and with/get methods from SecurityTxt
- Updated SecurityTxtValidator::validate() to accept optional SecurityTxtFetchResult parameter
- Updated CanonicalUrlValidator to accept fetch result parameter
- Parser now passes fetch result directly to validator
- Cleaner design - no fetch-specific state in SecurityTxt object

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
…FieldValidator

- CanonicalUrlValidator doesn't implement FieldValidator interface
- Stored as separate property in SecurityTxtValidator
- Called separately when fetch result is available
- Cleaner design following SOLID principles

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI requested a review from spaze December 4, 2025 18:48
Copilot AI and others added 2 commits December 4, 2025 19:10
- Added validateWithFetchResult() method to SecurityTxtValidator
- Removed optional parameter and if checks from validate()
- Parser only recreates result object when validation result changes
- Cleaner separation of concerns

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Compare arrays instead of objects to correctly determine when to recreate result

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI requested a review from spaze December 4, 2025 19:14
- Added optional fetchResult parameter to parseString()
- parseFetchResult() now simply passes fetchResult to parseString()
- Removed validateWithFetchResult() method
- Removed SecurityTxtParseStringResult recreation in parser
- Canonical validator called from validate() when fetchResult available

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI requested a review from spaze December 5, 2025 00:58
…erface

- Renamed CanonicalUrlValidator to CanonicalUrlFieldValidator
- Implements FieldValidator interface
- Added to $fieldValidators array in SecurityTxtValidator
- Removed separate canonicalUrlValidator property and if check
- Added fetchedUrl property to SecurityTxt with with/get methods
- Parser sets fetchedUrl on SecurityTxt before validation
- Validator accesses fetchedUrl from SecurityTxt object

Co-authored-by: spaze <1966648+spaze@users.noreply.github.com>
Copilot AI requested a review from spaze December 5, 2025 01:50
@spaze
Copy link
Owner

spaze commented Dec 6, 2025

Superseded by #44

@spaze spaze closed this Dec 6, 2025
@spaze spaze deleted the copilot/validate-canonical-field branch December 6, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate the Canonical field

2 participants