Skip to content

Commit

Permalink
Merge branch 'master' into conda-support
Browse files Browse the repository at this point in the history
  • Loading branch information
qtomlinson authored May 2, 2024
2 parents ae5aeba + 63526a8 commit 9f29866
Show file tree
Hide file tree
Showing 12 changed files with 296 additions and 148 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Run tests

on:
push:
branches:
- master
pull_request:
branches:
- master

permissions:
contents: read

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4.1.1

- uses: actions/setup-node@v4.0.1
with:
node-version: 18
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Run tests
run: npm test
22 changes: 17 additions & 5 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Here are a few example request objects.
}
```

The request `type` describes the crawling activity being requested. For example, "do `package` crawling". It is typically the same as the `type` in the url (see below). There are some more advanced scenarios where the two values are different but for starters, treat them as the same. The general form of a request URL is (note: it is a URL because of the underlying crawling infrastructure, the `cd` scheme is not particularly relevant)
The request `type` describes the crawling activity being requested. For example, "do `package` crawling" (see [More on type](#more-on-type) for a description of valid type values). It is typically the same as the `type` in the url (see segments description below). There are some more advanced scenarios where the two values are different but for starters, treat them as the same. The general form of a request URL is (note: it is a URL because of the underlying crawling infrastructure, the `cd` scheme is not particularly relevant)

```
cd:/type/provider/namespace/name/revision
Expand Down Expand Up @@ -80,6 +80,18 @@ Process the source, if any:

The crawler's output is stored for use by the rest of the ClearlyDefined infrastructure -- it is not intended to be used directly by humans. Note that each tool's output is stored separately and the results of processing the component and the component source are also separated.

### <a id="more-on-type"></a>More on `type`
The `type` in the request object typically corresponds to an internal processor in CD.
1. `component` is the most generic type. Internally, it is converted to a `package` or `source` request by the component processor.
2. `package` request is processed by the package processor and is further converted to a request with a specific type (`crate`, `deb`, `gem`, `go`, `maven`, `npm`, `nuget`, `composer`, `pod`, `pypi`). For a `package` typed request, if the mentioned specific binary package type is known, the specific type (e.g. `npm`) can be used (instead of `package`) in the harvest request and skip the conversion step. For example,
```json
{
"type": "npm",
"url": "cd:/npm/npmjs/-/redie/0.3.0"
}
```
3. `source` requests are processed by the source processor, which subsequently dispatches a `clearlydefined` typed request for the supported source types and other requests (one for each scanning tool). These are the more advanced scenarios where the request type and the coordinate type differ.

# Configuration

The crawler is quite configuable. Out of the box it is setup for demo-level use directly on your computer. In its full glory it can run with arbitrarily many distributed clients using an array of different queuing, caching and storage technologies.
Expand Down Expand Up @@ -121,7 +133,7 @@ If a CRAWLER_ID is specified, then each instance must have this setting globally
## Run Docker image from Docker Hub

You can run the image as is from docker (this is w/o any port forwarding, which means the only way you can interact with the crawler locally is through the queue. See below for examples of how to run with ports exposed to do curl based testing).
`docker run --env-file ../<env_name>.env.list clearlydefined/crawler`
`docker run --platform linux/amd64 --env-file ../<env_name>.env.list clearlydefined/crawler`

See `local.env.list`, `dev.env.list` and `prod.env.list` tempate files.

Expand All @@ -133,13 +145,13 @@ See `local.env.list`, `dev.env.list` and `prod.env.list` tempate files.

## Build and run Docker image locally

`docker build -t cdcrawler:latest .`
`docker build --platform linux/amd64 -t cdcrawler:latest .`

`docker run --rm --env-file ../dev.env.list -p 5000:5000 -p 9229:9229 cdcrawler:latest`
`docker run --platform linux/amd64 --rm --env-file ../dev.env.list -p 5000:5000 -p 9229:9229 cdcrawler:latest`

With a debugger:

`docker run --rm -d --env-file ../dev.env.list -p 9229:9229 -p 5000:5000 --entrypoint node cdcrawler:latest --inspect-brk=0.0.0.0:9229 index.js`
`docker run --platform linux/amd64 --rm -d --env-file ../dev.env.list -p 9229:9229 -p 5000:5000 --entrypoint node cdcrawler:latest --inspect-brk=0.0.0.0:9229 index.js`

At this point you can attach VS Code with the built in debugging profile (see .vscode/launch.json)

Expand Down
21 changes: 0 additions & 21 deletions azure-pipelines.yml

This file was deleted.

9 changes: 8 additions & 1 deletion lib/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// SPDX-License-Identifier: MIT
const { DateTime } = require('luxon')
const { spawn } = require('child_process')
const { intersection } = require('lodash')

const dateTimeFormats = [
'EEE MMM d HH:mm:ss \'GMT\'ZZ yyyy' //in pom properties
Expand Down Expand Up @@ -31,6 +32,12 @@ function trimAllParents(paths, parents) {
return paths.map(path => trimParents(path, parents))
}

function isGitFile(file) {
if (!file) return false
const segments = file.split(/[\\/]/g)
return intersection(segments, ['.git']).length > 0
}

function extractDate(dateAndTime, formats = dateTimeFormats) {
if (!dateAndTime) return dateAndTime
let luxonResult = DateTime.fromISO(dateAndTime)
Expand Down Expand Up @@ -75,4 +82,4 @@ function spawnPromisified(command, args, options) {
})
}

module.exports = { normalizePath, normalizePaths, trimParents, trimAllParents, extractDate, spawnPromisified }
module.exports = { normalizePath, normalizePaths, trimParents, trimAllParents, isGitFile, extractDate, spawnPromisified }
Loading

0 comments on commit 9f29866

Please sign in to comment.