Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduces:
wraith spider [config_name]
command (Addedwraith spider [config_name]
command #488)wraith info
command (Added 'wraith info' command #484)Spidering and Imports
We have a lot of open 'Spider mode' issues. Many of these arise from the fact that the Wraith spidering logic is quite complex and difficult to test. e.g. Wraith has to maintain a lot of internal state, triggering spidering automatically when certain config properties are missing, but only triggering the spidering if it hasn't been done in the last x-config-option days, etc.
We can make Wraith much simpler under the hood by giving spidering its own dedicated
spider
command, which the user must choose to run manually, as regularly as they choose. A new 'Imports' feature allows users to import configs into one another.Paths determined through the
spider
command are stored to a file as YAML, instead of in a .txt file, which reduces complexity further by storing paths in the same way as non-spider use of Wraith. It also removes the Nokogiri dependency, which will speed up Wraith setup times slightly.The 'Imports' feature has lots of potential uses beyond just supporting spidering. For example, you may have a common Wraith config defining browser engine, screen sizes to capture, colour of diff image, etc - and then you can have multiple different Wraith configs for each of your sites, all of which import the base config and stop you from having to duplicate all of that information.
Proposed spider workflow
Say we have a simplified config:
wraith spider test.yml
=> spiders the sites, and saves the paths tospider_configs.yml
.wraith capture test.yml
=> thespider_configs.yml
paths are automatically imported into thetest.yml
, and Wraith continues as if the paths were specified manually.