Feat add puppeteer #54

LilaKelland · 2023-10-27T21:01:20Z

Added accessibility checks (the web-endpoint-puppeteer-checks (though feel like we could use a better name for this module)).

Modified git clone repo to extract endpoints from .product.yaml and publish them to 'ClonedRepoEvent' NATs subject - though this will obviously change with the new structure.
web-endpoint-puppeteer-checks subscribes to 'ClonedRepoEvent',
then cycles through webEndpoints and filters for only web-type endpoints (ie serves html but not graphiql)
Slugs are found for each endpoint , and concatenated with endpoint itself to form the page url that is then scanned for accessibility by puppeteer.

Other changes

Dev container - added global puppeteer install by node user, added markdown linter, install libraries needed for chrome/ debian. (update - just realizing that I should probably be adding a docker container for the module and install it here... should I do this before approval?)

TODO

documentation on the checks and response.
get the mutation to save to database
-tests

…sion with 'stashing-changes' branch

… Still errored, so added per docs --no-sandbox (though may not want to do in prod...., but now it works).

…so added global install of puppeteer which seems to be required. Will remove chrome install as not sure needed with puppeteer not global in future iterations.

…ok - need to reformat to fit schema

… reformatted results and installed graphql-request to use API in future.

…globally as node user, and added extra packages needed for debian to run Chrome. Will rebuild to confirm works.

…l together (on monday)

…304 responses in the scan

…n index - removed commented out lines

Collinbrown95

This is fantastic, great job!

I have only a few small comments, but I'd like to have a quick meeting before merging where we review the behaviour on a few concrete examples (e.g. nats pub "ClonedRepoEvent.>" '{"webEndpoints": ["https://www.canada.ca/en/public-health.html"]}').

In the spirit of 1 PR = 1 feature = 1 issue, I'd like to make some issues with future TODO items, some of which are blocked because the GraphQL API hasn't been merged to the main branch yet.

scanners/web-endpoint-puppeteer-checks/README.md

scanners/web-endpoint-puppeteer-checks/index.js

Collinbrown95 · 2023-10-31T17:37:25Z

scanners/web-endpoint-puppeteer-checks/index.js

+  for await (const message of sub) {
+    const clonedRepoEventPayload  = await jc.decode(message.data)
+    const { webEndpoints } = clonedRepoEventPayload 
+    const productName = message.subject.split('.').pop()  // last NATs subject token


Can you explain the logic behind this line?

When I run this locally, message.subject.split('.').pop() gives me > as a result, which I think is the same regardless of the URL we are analyzing.

Yes - it was a way to pass the product name (as the last token of subject from github-cloned-repo), that would be used when saving to database. (https://docs.nats.io/nats-concepts/subjects#wildcards). When you subscribe to NATS subject.>, the '>' allows it to subscribe to all things that starts with 'subject'. Product name can just as easily be passed as part of the payload (as I image we'll do with the graph pieces). It was just a placeholder as not yet saving to database.

Collinbrown95 · 2023-10-31T17:50:03Z

scanners/web-endpoint-puppeteer-checks/index.js

+          webEndpointResults[webEndpoint][pageToEvaluate] = axeReport
+        }
+
+        accessibilityResults.push(webEndpointResults)


My instinct is that this is roughly where we would publish to the GraphQL API (which doesn't exist on the main branch yet).

My thought is that we would accumulate all slugs associated with a root web URL and attach that "list" of accessibility report results to a field in the Endpoint object.

Let's discuss this in more detail when we meet about this PR.

Ah - yes that does make sense to split them out rather than having the entire response for all the endpoints combined before saving - thanks for catching!

Collinbrown95 · 2023-10-31T18:02:06Z

scanners/web-endpoint-puppeteer-checks/src/get-url-slugs.js

@@ -0,0 +1,33 @@
+async function getSlugs(url, page, browser) {


I'd like to discuss this logic in more detail when we meet.

We don't necessarily need to change anything in this PR, but I'd like to take the time to review how this works along with an example.

Typically, web crawling like this requires some kind of recursive logic like "visit every page to check links, then for every link found visit that page to check links...", where you "bottom out" once you've build the entire tree of pages starting at the root URL.

On the other hand, your current approach might get most of the possible links with only a single page request, which would involve significantly less network traffic.

Oh interesting - it seemed like a lot requests, happy to refactor it if it makes sense. Although I might be missing sub-slugs the way you describe...

In the spirit of minimizing complexity, I'm inclined to say leave it for now and we can refactor if we find a significant fraction of links are being missed without the recursive approach.

Collinbrown95 · 2023-10-31T18:05:38Z

scanners/github-cloned-repo-checks/src/yaml-parser.js

@@ -0,0 +1,17 @@
+// fs docs node https://nodejs.org/docs/v0.3.4/api/fs.html
+import * as fs from 'fs';


Similar to the comment on the README.md document, I'm wondering if logic for parsing product.yaml should live in the "graph updater" component rather than the "endpoint scanner" component.

The idea being that "graph updater" handles stuff related to the graph structure, and "endpoint scanner" handles stuff related to attributes of given endpoints (e.g. accessibility scans).

Yes absolutely - this was pre-last week conversation, getting this working.

…Ts message to be WebEvent

… up commented out stuff.

… parsing and publishing form cloned repo-checks

…_URL (though may be different once api is implimented)

…e docs! only printing out message and html for nodes for incomplete (untested) and violations

LilaKelland · 2023-11-01T18:38:42Z

Pushing to main - there are still some open issues to address here. Have added them into issue #56 to address in the near future.

LilaKelland added 13 commits October 25, 2023 18:52

adding service discovery and endpoint dispatcher - need to update ver…

d57269a

…sion with 'stashing-changes' branch

commiting before rebasing to main

4967187

added yaml parsing and publish to github cloned checks

93fcc31

added puppeteer, but running into linux chrome issue.

46413e6

Installed headless chrome in dev container (and added to Dockerfile).…

bf94c70

… Still errored, so added per docs --no-sandbox (though may not want to do in prod...., but now it works).

Added installchrome into dockerfile (properly this time so works), al…

118cff3

…so added global install of puppeteer which seems to be required. Will remove chrome install as not sure needed with puppeteer not global in future iterations.

reformated to have checkPasses and metadata

4b1fc7a

saving all before shutting down - getting report for pages with webho…

77ff2f1

…ok - need to reformat to fit schema

added markdown linter to devcontainer - added back in chrome install,…

3c39624

… reformatted results and installed graphql-request to use API in future.

Removed extra packages, reordered dev container to install puppeteer …

0d4b199

…globally as node user, and added extra packages needed for debian to run Chrome. Will rebuild to confirm works.

added in check to ensure html serving web app - now need to put it al…

bb5a1d0

…l together (on monday)

Removed non-web(api) endpoints, as well as graphiql. Included cached …

43c4cad

…304 responses in the scan

Cleaned up (had removed browser and page from src functions and put i…

da438c4

…n index - removed commented out lines

LilaKelland requested a review from Collinbrown95 October 30, 2023 19:35

re-removed service discovery

f36321c

Collinbrown95 suggested changes Oct 31, 2023

View reviewed changes

Collinbrown95 and others added 6 commits October 31, 2023 18:15

feat(example.launch.jsonc): add puppeteer scanner debug config

fca9556

changed clonedRepoEventPayload to webEventPayload

3dfcebe

moved placeholder to save to database

fabbe91

reformed response to match the intended nested structure - changed NA…

159e42b

…Ts message to be WebEvent

added cli command to decouple from github-cloned-repo-checks, cleaned…

e886333

… up commented out stuff.

Decoupled accessibility checks from cloned repo checks - removed yaml…

3dd48de

… parsing and publishing form cloned repo-checks

LilaKelland requested a review from Collinbrown95 November 1, 2023 14:14

LilaKelland added 2 commits November 1, 2023 17:22

renamed to web-endpoint-checks, removed database reference, added API…

9a8f438

…_URL (though may be different once api is implimented)

added todos to readme and examples that will need to be moved into th…

d841940

…e docs! only printing out message and html for nodes for incomplete (untested) and violations

LilaKelland merged commit a885625 into main Nov 1, 2023

LilaKelland deleted the feat-add-puppeteer branch November 1, 2023 18:38

LilaKelland mentioned this pull request Dec 13, 2023

service URL endpoint scan #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat add puppeteer #54

Feat add puppeteer #54

LilaKelland commented Oct 27, 2023 •

edited

Loading

Collinbrown95 left a comment

Collinbrown95 Oct 31, 2023

LilaKelland Oct 31, 2023

Collinbrown95 Oct 31, 2023

LilaKelland Oct 31, 2023

Collinbrown95 Oct 31, 2023

LilaKelland Oct 31, 2023 •

edited

Loading

Collinbrown95 Nov 1, 2023

Collinbrown95 Oct 31, 2023

LilaKelland Nov 1, 2023

LilaKelland commented Nov 1, 2023

		@@ -0,0 +1,33 @@
		async function getSlugs(url, page, browser) {

		@@ -0,0 +1,17 @@
		// fs docs node https://nodejs.org/docs/v0.3.4/api/fs.html
		import * as fs from 'fs';

Feat add puppeteer #54

Feat add puppeteer #54

Conversation

LilaKelland commented Oct 27, 2023 • edited Loading

Collinbrown95 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LilaKelland Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LilaKelland commented Nov 1, 2023

LilaKelland commented Oct 27, 2023 •

edited

Loading

LilaKelland Oct 31, 2023 •

edited

Loading