-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat add puppeteer #54
Conversation
…sion with 'stashing-changes' branch
… Still errored, so added per docs --no-sandbox (though may not want to do in prod...., but now it works).
…so added global install of puppeteer which seems to be required. Will remove chrome install as not sure needed with puppeteer not global in future iterations.
…ok - need to reformat to fit schema
… reformatted results and installed graphql-request to use API in future.
…globally as node user, and added extra packages needed for debian to run Chrome. Will rebuild to confirm works.
…l together (on monday)
…304 responses in the scan
…n index - removed commented out lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, great job!
I have only a few small comments, but I'd like to have a quick meeting before merging where we review the behaviour on a few concrete examples (e.g. nats pub "ClonedRepoEvent.>" '{"webEndpoints": ["https://www.canada.ca/en/public-health.html"]}'
).
In the spirit of 1 PR = 1 feature = 1 issue, I'd like to make some issues with future TODO items, some of which are blocked because the GraphQL API hasn't been merged to the main
branch yet.
for await (const message of sub) { | ||
const clonedRepoEventPayload = await jc.decode(message.data) | ||
const { webEndpoints } = clonedRepoEventPayload | ||
const productName = message.subject.split('.').pop() // last NATs subject token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the logic behind this line?
When I run this locally, message.subject.split('.').pop()
gives me >
as a result, which I think is the same regardless of the URL we are analyzing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it was a way to pass the product name (as the last token of subject from github-cloned-repo), that would be used when saving to database. (https://docs.nats.io/nats-concepts/subjects#wildcards). When you subscribe to NATS subject.>, the '>' allows it to subscribe to all things that starts with 'subject'. Product name can just as easily be passed as part of the payload (as I image we'll do with the graph pieces). It was just a placeholder as not yet saving to database.
webEndpointResults[webEndpoint][pageToEvaluate] = axeReport | ||
} | ||
|
||
accessibilityResults.push(webEndpointResults) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct is that this is roughly where we would publish to the GraphQL API (which doesn't exist on the main
branch yet).
My thought is that we would accumulate all slugs associated with a root web URL and attach that "list" of accessibility report results to a field in the Endpoint
object.
Let's discuss this in more detail when we meet about this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - yes that does make sense to split them out rather than having the entire response for all the endpoints combined before saving - thanks for catching!
@@ -0,0 +1,33 @@ | |||
async function getSlugs(url, page, browser) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to discuss this logic in more detail when we meet.
We don't necessarily need to change anything in this PR, but I'd like to take the time to review how this works along with an example.
Typically, web crawling like this requires some kind of recursive logic like "visit every page to check links, then for every link found visit that page to check links...", where you "bottom out" once you've build the entire tree of pages starting at the root URL.
On the other hand, your current approach might get most of the possible links with only a single page request, which would involve significantly less network traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting - it seemed like a lot requests, happy to refactor it if it makes sense. Although I might be missing sub-slugs the way you describe...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of minimizing complexity, I'm inclined to say leave it for now and we can refactor if we find a significant fraction of links are being missed without the recursive approach.
@@ -0,0 +1,17 @@ | |||
// fs docs node https://nodejs.org/docs/v0.3.4/api/fs.html | |||
import * as fs from 'fs'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment on the README.md
document, I'm wondering if logic for parsing product.yaml
should live in the "graph updater" component rather than the "endpoint scanner" component.
The idea being that "graph updater" handles stuff related to the graph structure, and "endpoint scanner" handles stuff related to attributes of given endpoints (e.g. accessibility scans).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely - this was pre-last week conversation, getting this working.
…Ts message to be WebEvent
… up commented out stuff.
… parsing and publishing form cloned repo-checks
…_URL (though may be different once api is implimented)
…e docs! only printing out message and html for nodes for incomplete (untested) and violations
Pushing to main - there are still some open issues to address here. Have added them into issue #56 to address in the near future. |
Added accessibility checks (the web-endpoint-puppeteer-checks (though feel like we could use a better name for this module)).
Other changes
TODO
-tests