Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content directory symlinks for #264; also permit some content errors, consider configurable suffixes #265

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

wti
Copy link

@wti wti commented Jan 16, 2025

For #264

Issue changes
This reates a ContentFinder to gather content resources, resolving symlinks and maintaining a deployment path.

Content initializer now takes an optional deployPath, defaulting to nil for the prior behavior.

Incidental changes
This also adds some code features beyond the direct scope of #264:

Content also publishes the resource keys it expects. That should simplify adding and removing value dependencies, since clients will get any updates. Changing the parameter to support the published (static) keys as a default parameter should be possible.

This supports printing some content errors instead of failing on the first one. That means a small content error can just print the error, instead of breaking the whole site build. Site gets a new Int indicating how many errors to tolerate, defaulting to 0 for current behavior of throwing the first error.

ContentFinder adds (uppercase) ".MD" support and can be easily configured with more suffixes. We could add a suffixes: [String] property to the SIte and make that user-configurable with the current default of ".md" only.

Standards and Limitations

This is already behind your recent commit (after I branched), but I thought it best to wait for feedback before final edits and rebasing.

I ran swift-lint and got 8 errors, but in other files.

Testing: This does not include a test shell script, and I haven't run it on the examples yet, only my own ~500+ files with multiple symlinks.

Apologies for the big ask and scope without prior discussion. Symlinks can be tricky, and I definitely wouldn't want to create new stream of support requests, so "no" is a fine answer :)

@wti wti changed the title Content directory symlinks for #246 Content directory symlinks for #246; also permit some content errors, consider configurable suffixes Jan 16, 2025
@wti wti changed the title Content directory symlinks for #246; also permit some content errors, consider configurable suffixes Content directory symlinks for #264; also permit some content errors, consider configurable suffixes Jan 16, 2025
@wti
Copy link
Author

wti commented Jan 18, 2025

Update: invalid implementation - see issue comment: #264 (comment)

/// - The name used for the file deploy path is exclusive of the suffix matched.
///
/// Limitations:
/// - Does not detect or avoid symbolic-link cycles
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be resolved with some sort of Set of URLs that have been processed already? It seems a little dangerous as-is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

I'll save the directories visited (normalized using standardizedFileURL), in a Set as you suggest for faster lookup.

I think we can ignore files, even if they have been processed already. I can imagine a user putting the same file at different locations via symlinks, so it's not clearly an error or a cycle to see it twice. One might argue the same is possibly true for symlinked directories, but we'll just say no to that for simplicity now.

Conservatively this should probably be an error even though we can just prune the cycle. It's likely not intended and indicates structural deploy issues. If/since we will allow users to tolerate some errors, they have a workaround if it is intended. But I'd also be happy to print warnings and continue.

That in turn raises the question whether the tolerance for content- and cycle-errors should be distinct settings. It would be easy to add another Int parameter or wrap all error configuration in a struct/enum. But we're already mixing error sources, and it would complicate things to have per-type tolerances, and the library aims to be simple, so I'm guessing no.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'd also be happy to print warnings and continue.

Could we go with this approach and eliminate the error tolerance setting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'd also be happy to print warnings and continue.

Could we go with this approach and eliminate the error tolerance setting?

I think there are different classes of errors.

One we'd like to tolerate is when markdown content is malformed and the parser throws an error. Because (particularly with symlinks) the Site is ingesting content from elsewhere, it's likely to be unvalidated. It would be a real pain to have to rebuild for every missed delimiter (esp. when the markdown author is not the site builder). That's the behavior before this PR, but errors are likely to be more common after the PR, since you can just ingest entire trees with one symlink. So I think it helps to preserve the error tolerance for content parsing errors.

The error with a bad symlink (pointing nowhere, or duplicated or cycled) by contrast is a problem with the structure of the site, and it's less clear that skipping the symlink target won't have bad effects. This is a problem the site builder made and must fix. So I personally would opt to throw, but because I can't think of downstream problems caused by omitting the link, it would be ok to have a warning.

So my recommendation (and the current implementation) is to keep the content error tolerance but throw when seeing duplicate directories (more in line with the prior fail-fast behavior). But I'm happy to skip and print warnings for bad symlinks since pruning invalid targets should result in a safely-built (if incorrect) Site.

/// - contentMaker: closure taking ``DeployContent`` and returning false to halt
func find(
root: URL,
suffixes: [String] = [".md", ".MD"],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching uppercase and lowercase forms of this extension is very helpful, but I'm wondering whether a better approach might be to make the check itself case-insensitive? If users override this with (e.g.) .txt, I think they'd expect to match .TXT and other variants automatically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can match case-insensitively.

Also, I considered using a Set, but like the order and convenience of the array, which permits users to specify longer suffixes before shorter ([".tmp.md", ".md"]), to enjoy the benefit of longer clipping. Let me know if you prefer swift-collection OrderedSet

(I haven't checked what happens when clipping or symlinks results in the same deploy path for two content URL's, but I assume that's something to manage properly.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use OrderedSet a lot throughout Ignite. Could we use it here too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use OrderedSet a lot throughout Ignite. Could we use it here too?

I'm happy to, if you like, but I don't think it buys us anything, and it might confuse users who are just trying to list suffixes.

All users understand arrays, and few users would specify duplicate elements in such a small, literal list (so they don't need a set). Also, it's harmless if they do specify a duplicate since the code returns on first match. So there's not much benefit to being a Set here.

OrderSet, by contrast, requires reading the documentation to see if it's using insertion order or the Comparable String order. Since users rely on that to swap out longer variants of suffixes (.tmp.md above), that's key to understand.

Also OrderedSet doesn't solve what's likely to be the most common mistake, of needlessly supplying upper- and lower-case variants of the same suffix. (But again, that's harmless.)

if let customPath = metadata["path"] as? String {
if let deployPath, !deployPath.isEmpty {
path = deployPath
} else if let customPath = metadata["path"] as? String {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the logic here: if users have set a path property in their content's metadata, I would have thought that should be honored regardless of what other logic we have. Why would if let customPath = metadata["path"] as? String { not come first here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes: I will respect and document the priority of the more-specific document metadata.

@twostraws
Copy link
Owner

Thank you for this change! Separating content detection from the publishing context really helps clean up the code, and I absolutely agree that supporting symlinks will help a lot of people. I've added a handful of small comments/questions; if we can get those resolved, then I'm happy to merge this in.

@@ -0,0 +1,124 @@
//
// File.swift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// File.swift
// PublishingContext-Finding.swift

@wti
Copy link
Author

wti commented Jan 28, 2025

PR updated per feedback:

  • metadata takes priority for deploy path
  • suffix extraction is now case-insensitive. (This is a behavior change.)
  • Errors are thrown when a directory is visited a second time. The error includes both paths to the directory. This handles duplicates and cycles (both induced by symbolic links). Errors are not thrown when a file is included twice by symbolic links.
  • File comments all properly named.
  • Error tolerance was not updated; it's just a counter defaulting to zero.

Also

  • I removed overlapping URL-prefix derivation of deploy path in Content.init(..)

Testing took a fair percentage of effort, though it only targets ContentFinder (not any integration). There are only 5 test cases, but we can create more by specification now as needed. I used a fairly condensed style for test declarations and utilities, and tried to limit the scope (I'm a bit uncertain of the Ignite's overall testing direction). It now tries to have the test driver and suite contain mainly high-level logic, with minutia in the helper, but that scatters the type members - hmm. I'd grade it usable but not super clear or robust, and recommend vetting a bit more after review, when adding cases.

Did I backtrack on hygiene?

  • swiftlint has ~40K warnings now; it was ~10 when I started ??
  • I thought there was a swift-format configuration as well.

Deferred

  • I realize we do no checking for uniqueness of the deploy path. Users might appreciate a warning for duplicate paths (if it doesn't already throw an error).

@wti
Copy link
Author

wti commented Jan 29, 2025

Sorry, swift-lint warnings now fixed

wti added 9 commits February 2, 2025 23:22
Default of 0 reflects current behavior (throw on first error).
Positive values print that many errors before throwing error to halt.
Permit caller to specify deploy path, to support symbolic links.
(Default of nil reverts to prior behavior)

Specify resource keys expected by Caller, for clarity and later updates.
DeployContent passed to client closure has url, path, and resources.
Client can return false to halt or throw errors (now configured in Site)
Prints up to Site.maxContentErrors (default 0) before throwing errors.
This is to tolerate broken content without breaking site build.
Prior behavior of throw-on-first-error supported by default of 0.

This includes ".MD" as well as ".md" as a valid suffix.
(Later: easy to support configuration of target suffixes)
This now avoids any URL de-prefixing and gets type from path
- Input errors are type ParameterError.
- Deploy-name de-suffixing is extracted to method for use in tests
- DeployPath gets a description that reads well in tests
Now no swift-lint errors:
- Suite permits _ in composite variable names for clarity

Also:
- Using Ignite.ContentFinder for case-insensitive suffix check
- MdHeaderList avoids [String] extension for markdown output
@wti
Copy link
Author

wti commented Feb 3, 2025

Rebased on recent changes in main...

@wti
Copy link
Author

wti commented Feb 9, 2025

Rebased on recent change in main...

@twostraws or @JPToroDev , would you like anything else in this PR, or related work in a follow-on PR? The comment from 2 weeks ago should summarize status. Thanks!

@JPToroDev
Copy link
Collaborator

Thank you for staying persistent with this PR—I'm really excited about it! I'll be reviewing it thoroughly in the coming days and will circle back here with any questions. Based on your revisions though, it should be a straightforward merge.

@JPToroDev
Copy link
Collaborator

I realize we do no checking for uniqueness of the deploy path. Users might appreciate a warning for duplicate paths (if it doesn't already throw an error)

Has that still been deferred or is that part of your recent updates?

@JPToroDev
Copy link
Collaborator

I've added some comments throughout the PR. Overall the code looks good, but a lot of the naming isn't very reader friendly. Once those notes are addressed, I'll be very happy to merge this in—let me know if you have any questions.

@wti
Copy link
Author

wti commented Feb 11, 2025

I realize we do no checking for uniqueness of the deploy path. Users might appreciate a warning for duplicate paths (if it doesn't already throw an error)

Has that still been deferred or is that part of your recent updates?

Yes, duplicate directories are now detected and result in an error listing both originating paths.

@wti
Copy link
Author

wti commented Feb 11, 2025

but a lot of the naming isn't very reader friendly.

Sorry, I'm happy to do renames as needed. I tried to make public API super clear. I also looked for unclear function-local names, and did find resrcSet which should be resourceSet. I'll try to be clearer next time, but again if you have some blocking re-names, I'll do them!

Once those notes are addressed,

Please let me know if I missed anything.

@JPToroDev
Copy link
Collaborator

Yes, the public API looks excellent. I'm very impressed with it. I'm really just talking about the test files, and it seems like they need extensive cleaning up. It's not even a stylistic thing—it's just taking a tremendous amount of effort to understand what's going on, and since this is a very significant addition, it's important that the tests are as crystal clear as the public API to the maintainers. Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants