-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content directory symlinks for #264; also permit some content errors, consider configurable suffixes #265
base: main
Are you sure you want to change the base?
Conversation
Update: invalid implementation - see issue comment: #264 (comment) |
/// - The name used for the file deploy path is exclusive of the suffix matched. | ||
/// | ||
/// Limitations: | ||
/// - Does not detect or avoid symbolic-link cycles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be resolved with some sort of Set of URLs that have been processed already? It seems a little dangerous as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
I'll save the directories visited (normalized using standardizedFileURL
), in a Set as you suggest for faster lookup.
I think we can ignore files, even if they have been processed already. I can imagine a user putting the same file at different locations via symlinks, so it's not clearly an error or a cycle to see it twice. One might argue the same is possibly true for symlinked directories, but we'll just say no to that for simplicity now.
Conservatively this should probably be an error even though we can just prune the cycle. It's likely not intended and indicates structural deploy issues. If/since we will allow users to tolerate some errors, they have a workaround if it is intended. But I'd also be happy to print warnings and continue.
That in turn raises the question whether the tolerance for content- and cycle-errors should be distinct settings. It would be easy to add another Int parameter or wrap all error configuration in a struct/enum. But we're already mixing error sources, and it would complicate things to have per-type tolerances, and the library aims to be simple, so I'm guessing no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'd also be happy to print warnings and continue.
Could we go with this approach and eliminate the error tolerance setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'd also be happy to print warnings and continue.
Could we go with this approach and eliminate the error tolerance setting?
I think there are different classes of errors.
One we'd like to tolerate is when markdown content is malformed and the parser throws an error. Because (particularly with symlinks) the Site is ingesting content from elsewhere, it's likely to be unvalidated. It would be a real pain to have to rebuild for every missed delimiter (esp. when the markdown author is not the site builder). That's the behavior before this PR, but errors are likely to be more common after the PR, since you can just ingest entire trees with one symlink. So I think it helps to preserve the error tolerance for content parsing errors.
The error with a bad symlink (pointing nowhere, or duplicated or cycled) by contrast is a problem with the structure of the site, and it's less clear that skipping the symlink target won't have bad effects. This is a problem the site builder made and must fix. So I personally would opt to throw, but because I can't think of downstream problems caused by omitting the link, it would be ok to have a warning.
So my recommendation (and the current implementation) is to keep the content error tolerance but throw when seeing duplicate directories (more in line with the prior fail-fast behavior). But I'm happy to skip and print warnings for bad symlinks since pruning invalid targets should result in a safely-built (if incorrect) Site.
/// - contentMaker: closure taking ``DeployContent`` and returning false to halt | ||
func find( | ||
root: URL, | ||
suffixes: [String] = [".md", ".MD"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching uppercase and lowercase forms of this extension is very helpful, but I'm wondering whether a better approach might be to make the check itself case-insensitive? If users override this with (e.g.) .txt, I think they'd expect to match .TXT and other variants automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can match case-insensitively.
Also, I considered using a Set, but like the order and convenience of the array, which permits users to specify longer suffixes before shorter ([".tmp.md", ".md"]
), to enjoy the benefit of longer clipping. Let me know if you prefer swift-collection OrderedSet
(I haven't checked what happens when clipping or symlinks results in the same deploy path for two content URL's, but I assume that's something to manage properly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use OrderedSet
a lot throughout Ignite. Could we use it here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use OrderedSet a lot throughout Ignite. Could we use it here too?
I'm happy to, if you like, but I don't think it buys us anything, and it might confuse users who are just trying to list suffixes.
All users understand arrays, and few users would specify duplicate elements in such a small, literal list (so they don't need a set). Also, it's harmless if they do specify a duplicate since the code returns on first match. So there's not much benefit to being a Set here.
OrderSet
, by contrast, requires reading the documentation to see if it's using insertion order or the Comparable String order. Since users rely on that to swap out longer variants of suffixes (.tmp.md
above), that's key to understand.
Also OrderedSet
doesn't solve what's likely to be the most common mistake, of needlessly supplying upper- and lower-case variants of the same suffix. (But again, that's harmless.)
if let customPath = metadata["path"] as? String { | ||
if let deployPath, !deployPath.isEmpty { | ||
path = deployPath | ||
} else if let customPath = metadata["path"] as? String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the logic here: if users have set a path
property in their content's metadata, I would have thought that should be honored regardless of what other logic we have. Why would if let customPath = metadata["path"] as? String {
not come first here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes: I will respect and document the priority of the more-specific document metadata.
Thank you for this change! Separating content detection from the publishing context really helps clean up the code, and I absolutely agree that supporting symlinks will help a lot of people. I've added a handful of small comments/questions; if we can get those resolved, then I'm happy to merge this in. |
@@ -0,0 +1,124 @@ | |||
// | |||
// File.swift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// File.swift | |
// PublishingContext-Finding.swift |
PR updated per feedback:
Also
Testing took a fair percentage of effort, though it only targets Did I backtrack on hygiene?
Deferred
|
Sorry, swift-lint warnings now fixed |
Default of 0 reflects current behavior (throw on first error). Positive values print that many errors before throwing error to halt.
Permit caller to specify deploy path, to support symbolic links. (Default of nil reverts to prior behavior) Specify resource keys expected by Caller, for clarity and later updates.
DeployContent passed to client closure has url, path, and resources. Client can return false to halt or throw errors (now configured in Site)
Prints up to Site.maxContentErrors (default 0) before throwing errors. This is to tolerate broken content without breaking site build. Prior behavior of throw-on-first-error supported by default of 0. This includes ".MD" as well as ".md" as a valid suffix. (Later: easy to support configuration of target suffixes)
This now avoids any URL de-prefixing and gets type from path
- Input errors are type ParameterError. - Deploy-name de-suffixing is extracted to method for use in tests - DeployPath gets a description that reads well in tests
Now no swift-lint errors: - Suite permits _ in composite variable names for clarity Also: - Using Ignite.ContentFinder for case-insensitive suffix check - MdHeaderList avoids [String] extension for markdown output
Rebased on recent changes in main... |
Rebased on recent change in main... @twostraws or @JPToroDev , would you like anything else in this PR, or related work in a follow-on PR? The comment from 2 weeks ago should summarize status. Thanks! |
Thank you for staying persistent with this PR—I'm really excited about it! I'll be reviewing it thoroughly in the coming days and will circle back here with any questions. Based on your revisions though, it should be a straightforward merge. |
Has that still been deferred or is that part of your recent updates? |
I've added some comments throughout the PR. Overall the code looks good, but a lot of the naming isn't very reader friendly. Once those notes are addressed, I'll be very happy to merge this in—let me know if you have any questions. |
Yes, duplicate directories are now detected and result in an error listing both originating paths. |
Sorry, I'm happy to do renames as needed. I tried to make public API super clear. I also looked for unclear function-local names, and did find
Please let me know if I missed anything. |
Yes, the public API looks excellent. I'm very impressed with it. I'm really just talking about the test files, and it seems like they need extensive cleaning up. It's not even a stylistic thing—it's just taking a tremendous amount of effort to understand what's going on, and since this is a very significant addition, it's important that the tests are as crystal clear as the public API to the maintainers. Does that make sense? |
For #264
Issue changes
This reates a
ContentFinder
to gather content resources, resolving symlinks and maintaining a deployment path.Content
initializer now takes an optional deployPath, defaulting to nil for the prior behavior.Incidental changes
This also adds some code features beyond the direct scope of #264:
Content
also publishes the resource keys it expects. That should simplify adding and removing value dependencies, since clients will get any updates. Changing the parameter to support the published (static) keys as a default parameter should be possible.This supports printing some content errors instead of failing on the first one. That means a small content error can just print the error, instead of breaking the whole site build.
Site
gets a new Int indicating how many errors to tolerate, defaulting to 0 for current behavior of throwing the first error.ContentFinder
adds (uppercase) ".MD" support and can be easily configured with more suffixes. We could add asuffixes: [String]
property to the SIte and make that user-configurable with the current default of ".md" only.Standards and Limitations
This is already behind your recent commit (after I branched), but I thought it best to wait for feedback before final edits and rebasing.
I ran swift-lint and got 8 errors, but in other files.
Testing: This does not include a test shell script, and I haven't run it on the examples yet, only my own ~500+ files with multiple symlinks.
Apologies for the big ask and scope without prior discussion. Symlinks can be tricky, and I definitely wouldn't want to create new stream of support requests, so "no" is a fine answer :)