[ipqs] Integrate IPQS Malware File Scanner into the existing connector#6395
[ipqs] Integrate IPQS Malware File Scanner into the existing connector#6395SamuelHassine wants to merge 12 commits into
Conversation
The original IPQS Analyzer integration was shipped as a new connector in #5970 (`internal-enrichment/ipqs-analyzer`). That PR was merged then force-reverted on master because the repository already ships an `internal-enrichment/ipqs` connector covering the fraud-and-risk-scoring API. This commit brings the malware-file-scanner functionality into the existing `ipqs` connector so a single connector covers every IPQS use case, closes the duplication, and unblocks issue #6199. Highlights: * `src/ipqs/client.py`: extended with the IPQS Malware File Scanner flow originally proposed in #5970. The new ``get_malware_scan_info`` method drives the cache-first ``/malware/lookup`` → ``/malware/scan`` → ``/postback`` polling loop (9 attempts × 10 s) through the new ``_query_malware`` helper. The helper centralises status-code handling (401 → key error, 5xx → upstream error, JSON-decode failures, network errors, ...) and always returns ``None`` rather than raising on infrastructure problems so the connector can surface a friendly Note. ``file_enrich_fields`` is added to describe the fields surfaced in the Indicator description on Artifact enrichment. The legacy ``get_ipqs_info`` flow used by IP / Email / URL / Phone is unchanged. * `src/ipqs/ipqs.py`: new ``_process_artifact`` handler downloads the Artifact from OpenCTI storage, submits it to IPQS, builds the Indicator (``[file:hashes.'SHA-256' = '<hash>']``) with a ``Clean`` / ``Malicious`` label and a deterministic ``based-on`` relationship to the Artifact, sets the observable's ``x_opencti_score`` (100 if any engine flagged the file, 50 otherwise), and attaches an external reference to the IPQS report. ``_send_failure_note`` emits a STIX Note attached to the observable when IPQS returns ``success=false`` or is unreachable, so the operator can diagnose enrichment failures from the UI. A new ``_check_max_tlp`` gate (configurable through ``IPQS_MAX_TLP``) prevents sending higher-TLP observables to the IPQS API; the fraud-and-risk-scoring branches inherit the same guard and now carry a configured default TLP fallback through ``IPQS_DEFAULT_TLP``. * `src/ipqs/builder.py`: ``IPQSBuilder`` now accepts ``default_object_marking_refs`` and propagates them to the generated Indicator / Relationship. ``create_indicator_based_on`` is backwards-compatible (it still accepts the legacy ``{"value": ...}`` label shape returned by the fraud-scoring helpers) and gains a ``detection`` flag that maps to ``x_opencti_detection`` / ``x_opencti_main_observable_type`` on the Indicator. New ``add_reference`` and ``malware_file_detection`` helpers implement the Artifact-specific external-reference and label workflow. The observable-score update is now wrapped so a failure to persist the score never aborts the enrichment. * `src/config.yml.sample` / `docker-compose.yml`: the connector scope now includes ``Artifact``; two new TLP variables (``IPQS_DEFAULT_TLP``, ``IPQS_MAX_TLP``) are documented inline. * `README.md`: rewritten to describe both API families, the cache-first ``/malware/lookup`` → ``/malware/scan`` → ``/postback`` flow, the new TLP gate and external-reference generation, the failure-note path, and explicit links back to PR #5970 and issue #6199. Closes #6199.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6395 +/- ##
===========================================
- Coverage 26.41% 0.18% -26.23%
===========================================
Files 1777 1703 -74
Lines 104319 103948 -371
===========================================
- Hits 27552 194 -27358
- Misses 76767 103754 +26987
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR integrates IPQS Malware File Scanner support into the existing internal-enrichment/ipqs connector so the same connector handles both fraud/risk scoring and Artifact malware scanning.
Changes:
- Adds Artifact enrichment flow for IPQS lookup → scan → postback polling.
- Extends STIX bundle building for malware verdict indicators, labels, references, notes, and TLP handling.
- Updates configuration samples, Docker Compose, and documentation to include Artifact scope and TLP settings.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
internal-enrichment/ipqs/src/ipqs/ipqs.py |
Adds Artifact processing, malware scan result handling, failure notes, and max-TLP gating. |
internal-enrichment/ipqs/src/ipqs/client.py |
Adds IPQS malware scanner API calls and polling flow. |
internal-enrichment/ipqs/src/ipqs/builder.py |
Extends indicator creation, markings, labels, score update handling, and external references. |
internal-enrichment/ipqs/src/config.yml.sample |
Adds Artifact scope and TLP configuration examples. |
internal-enrichment/ipqs/README.md |
Documents the integrated malware scanner behavior and configuration. |
internal-enrichment/ipqs/docker-compose.yml |
Adds Artifact scope and TLP environment variables. |
* `src/ipqs/ipqs.py`: drop the silent `TLP:AMBER+STRICT` -> `stix2.TLP_AMBER` downgrade. `_TLP_MAP` now stores OpenCTI-canonical marking-definition **ids** and `TLP:AMBER+STRICT` maps to the platform's `marking-definition--826578e1-40ad-459f-bc73-ede076f81f37` id (kept as the `_TLP_AMBER_STRICT_ID` constant), so STIX objects emitted by the Artifact branch keep the operator-configured strict marking instead of being downgraded to plain AMBER. `_default_marking_refs` returns the id directly and `self.default_tlp_id` replaces the old object-shaped attribute, matching the new map. * `src/ipqs/ipqs.py`: surface "pending" responses explicitly. When `get_malware_scan_info` exhausts its polling budget while IPQS is still computing the verdict, the response is returned with `status == "pending"`. `_process_artifact` previously treated any `success=True` response as final, so a still-running scan could mark the observable `Clean` (score 50) and create a label/indicator from incomplete data. The branch now detects the pending status, logs a warning and emits the standard "IPQS enrichment failed" Note (with a retry-later message) instead of producing a wrong verdict. * `src/ipqs/ipqs.py`: salt the failure-Note deterministic id with the observable's `standard_id` (`"IPQS enrichment failed for <standard_id>: <message>"`). Generic upstream messages such as "No response received from IPQS API." would otherwise produce the same `note--<uuid>` for unrelated observables, silently merging / overwriting their object_refs on import. Each observable now owns its own failure-note id. * `src/ipqs/ipqs.py`: `IPQS_BASE_URL` now defaults to `https://ipqualityscore.com/api/json` in code, matching the value already documented in `README.md` and `config.yml.sample`. Previously the README documented the default but `get_config_variable` was called without one, so deployments that omitted the variable ended up with `None` and a broken client. * `src/config.yml.sample` + `docker-compose.yml`: rewrite the `IPQS_MAX_TLP` / `IPQS_DEFAULT_TLP` inline comments. The previous text described the gate as "Artifact-only", but `_process_message` enforces `_check_max_tlp` before the entity-type dispatch, so the gate applies to every enrichment branch (IP / Email / URL / Phone / Artifact). The comments now state this explicitly. * `README.md`: the IPQS_MAX_TLP / IPQS_DEFAULT_TLP rows in the IPQS Configuration table are updated to match: the supported TLP aliases are spelled out (including `TLP:AMBER+STRICT`) and `IPQS_MAX_TLP` is documented as applying to every enrichment branch.
SamuelHassine
left a comment
There was a problem hiding this comment.
Posting as a COMMENT review (GitHub does not allow self-approval on a PR authored by the same user; a second maintainer's APPROVE is still required to dismiss the REVIEW_REQUIRED decision).
All six Copilot review threads on the previous commit are addressed and resolved by caaf2a8:
src/ipqs/ipqs.py:_process_artifactnow detects astatus == "pending"response after the polling budget is exhausted and emits a "retry later" failure Note instead of producing a wrongCleanverdict.src/ipqs/ipqs.py:TLP:AMBER+STRICTmaps to the OpenCTI-canonical id (marking-definition--826578e1-40ad-459f-bc73-ede076f81f37) so STIX objects emitted by the Artifact branch keep the strict marking instead of being silently downgraded to plain AMBER._TLP_MAPand_default_marking_refs()now operate on ids end-to-end.src/ipqs/ipqs.py: failure-Note deterministic id is salted withobservable["standard_id"]so unrelated observables hitting the same upstream message no longer share a singlenote--<uuid>.src/ipqs/ipqs.py:IPQS_BASE_URLnow defaults tohttps://ipqualityscore.com/api/jsonin code, matching the value documented inREADME.md/config.yml.sample. A deployment that omits the variable no longer ends up withself.base_url is None.src/config.yml.sample+docker-compose.yml+README.md: rewrite the inline comments / table rows so theIPQS_MAX_TLPdescription matches the actual behaviour (_check_max_tlpis enforced before the entity-type dispatch and applies to every enrichment branch, not just to Artifact).IPQS_DEFAULT_TLPnow also lists the supported aliases includingTLP:AMBER+STRICT.
CI status on caaf2a8:
- GitHub Actions: PR title convention, signed commits, baseline coverage, issue-link, do-not-merge, test detection, codecov — all green.
- CircleCI:
ensure_formatting,base_linter,linter,test,build_manifest— all green. filigran/cla— already signed (organization member).
Locally validated with isort --profile black --check, black --check, flake8 --ignore=E,W and the project's linter_stix_id_generator pylint plugin (10/10). Branch is MERGEABLE (no conflict with master).
Address the seven outstanding Copilot review threads on
``caaf2a8bee``:
1. **TLP downgrade — `TLP:CLEAR` mapped to legacy id**
(`ipqs.py` line 38). `_TLP_MAP` is rebuilt around real
``stix2.MarkingDefinition`` objects (mirroring
``connectors-sdk.models.tlp_marking``). ``CLEAR`` and
``AMBER+STRICT`` are now materialised via a new
``_make_tlp_marking`` helper that uses
``pycti.MarkingDefinition.generate_id`` and carries the
``x_opencti_definition`` UI label — Artifact indicators / failure
notes emitted with the connector default no longer show up as
``TLP:WHITE`` in the OpenCTI UI. The default marking-definition
object is also prepended to every Artifact and failure-note
bundle so the platform can register the marking by name.
2. **Silent TLP downgrade on operator typos** (`ipqs.py` line 149).
A new ``_resolve_tlp(env_var, value)`` helper raises
:class:`ValueError` listing every supported alias instead of
silently falling back to ``TLP:WHITE`` on an unknown value. Both
``IPQS_DEFAULT_TLP`` and ``IPQS_MAX_TLP`` are validated at
startup so a typo (``ANBER`` ≠ ``AMBER``) cannot push a less-
restrictive marking into the platform.
3. **Whitespace-only TLP value bug** (caught by the new
``test_blank_falls_back_to_default[ ]`` case). The previous
``_normalize_tlp`` returned the literal string ``"TLP:"`` on
whitespace-only inputs, which would have raised on the
subsequent ``_TLP_MAP`` lookup. Added an explicit empty-after-
strip guard so a blank value falls back to the configured
fallback instead.
4. **Failure-note marking not derived from observable** (`ipqs.py`
line 267). A new ``_observable_marking_refs`` helper extracts the
source observable's markings from either ``objectMarking`` (the
GraphQL shape) or ``object_marking_refs`` (the legacy list);
``_note_marking_refs`` returns the observable refs and falls back
to the connector default only when the observable carries no
marking of its own. A TLP:AMBER artifact whose enrichment fails
now produces a TLP:AMBER diagnostic note, never a less-
restrictive ``TLP:CLEAR`` / ``TLP:WHITE`` one that could leak the
existence of the artifact to user groups not entitled to see it.
5. **Artifact download failures don't surface a Note** (`ipqs.py`
line 450). The broad ``except`` in ``_process_artifact`` now also
calls ``_send_failure_note`` with a download-specific message so
the operator can see the failure from the OpenCTI UI without
inspecting connector logs.
6. **Misleading backward-compatibility comment** (`ipqs.py` line
138). The TLP setup comment is rewritten so it correctly states
that ``IPQS_MAX_TLP`` (default ``TLP:AMBER``) **gates every**
enrichment branch (IP / Email / URL / Phone / Artifact) — not
just the new Artifact one. Operators running the previous version
with TLP:RED observables must set ``IPQS_MAX_TLP=TLP:RED`` to
keep the existing behaviour. ``IPQS_DEFAULT_TLP`` only affects
the new Artifact / failure-note STIX outputs.
7. **Scan response missing `request_id` treated as final** (`client.py`
line 311). When the IPQS scan endpoint returns
``success=True`` without a ``request_id``, the client now
converts the response into an explicit ``success=False`` payload
(preserving the upstream message as ``(upstream: ...)``) so
``_process_artifact`` raises a failure note instead of building
an indicator from the acknowledgement.
8. **Polling budget too wide** (`client.py` line 318). The postback
loop now uses a tighter per-request timeout
(``_POSTBACK_REQUEST_TIMEOUT_SECONDS = 10``) and enforces an
overall ``_POLLING_BUDGET_SECONDS = 120`` deadline via
``time.monotonic()``. The worst-case stall on a single Artifact
enrichment goes from ~10 minutes (9 × 70 s) down to the
documented ~120 s.
Tests:
* New ``tests/`` directory with ``conftest.py``,
``test-requirements.txt`` (pulls in ``pycti`` + ``stix2`` +
``pytest``) and two suites:
- ``test_tlp.py`` — 34 cases pinning ``_normalize_tlp``,
``_TLP_MAP`` (real ``MarkingDefinition`` objects, CLEAR distinct
from ``stix2.TLP_WHITE``, AMBER+STRICT canonical id), the
strict-mode ``_resolve_tlp``, ``_make_tlp_marking`` (id +
metadata), and ``_observable_marking_refs`` (shapes,
deduplication, missing/malformed fallback).
- ``test_malware_client.py`` — 3 cases pinning the client's new
contract: missing ``request_id`` is converted into a failure;
the postback loop honours the overall deadline; postback
requests use the tighter ``_POSTBACK_REQUEST_TIMEOUT_SECONDS``.
All 37 pytest cases pass locally; ``black --check``,
``isort --profile black --check`` and ``flake8 --select=F`` all
clean.
|
Full review and fix pass complete on
Ready for an external reviewer's approval. |
Addresses the three new Copilot review threads on 6e84c8c: 1. README lookup step: the client uploads the file *content* to ``/malware/lookup``; IPQS hashes it server-side and matches that hash against its 24h cache. The previous wording suggested the connector hashed the file locally before submitting, which was wrong and could mislead operators when debugging API behaviour / credits. 2. README polling step: corrected from "~90s budget" to the actual value enforced by the client, which is the hard 120s ceiling from ``_POLLING_BUDGET_SECONDS``. Spelled out the per-request 10s ``_POSTBACK_REQUEST_TIMEOUT_SECONDS`` and the ``_MAX_POLLING_ATTEMPTS`` cap so operators can understand all three knobs in one place. 3. ``client.py`` inline comment on ``_POSTBACK_REQUEST_TIMEOUT_SECONDS``: same fix. The attempt-level upper bound is ``9 * (10 + 10) == 180 s`` but ``_POLLING_BUDGET_SECONDS`` caps the loop absolutely at 120 s; both numbers are now mentioned together so the relationship between attempts, sleep, per-request timeout, and the overall deadline is unambiguous. No code-path changes; ``black --check``, ``isort --profile black --check``, ``flake8 --ignore=E,W`` clean. 37/37 pytest cases still pass.
|
Third-pass review-and-fix complete on 54123ee:
Ready for an external reviewer's approval. |
…curacy Three new Copilot review threads on top of 54123ee: * `src/ipqs/client.py`: dropped the per-request INFO log from `_query_malware`. The postback polling loop hits that code path up to `_MAX_POLLING_ATTEMPTS` times per Artifact enrichment, so emitting one INFO line per call would flood normal `info`-level deployments with N noisy lines per single enrichment. The per-request line is now a DEBUG log, and `get_malware_scan_info` gained three INFO-level lifecycle markers (cache lookup, cache-miss-then-scan, polling start) so operators still see the high-level state changes at INFO without the polling-loop spam. * `README.md`: the **Generated STIX Objects** section now states explicitly that the IPQS external reference is attached only when the response carries a `request_id` (cache hits intentionally skip it because the upstream does not surface a stable `request_id` for cached verdicts). This matches the actual `add_reference` behaviour in `builder.py`. * `README.md`: the Debugging / troubleshooting section now spells out the hard `_POLLING_BUDGET_SECONDS = 120` ceiling and the per-`/postback`-request `_POSTBACK_REQUEST_TIMEOUT_SECONDS = 10` knob instead of the stale "90s polling budget" wording — those constants were tightened in 80285c8 and the README was the last place left referencing the old value. Verified locally: * `black --check` / `isort --profile black --check` clean across `internal-enrichment/ipqs/`; * `pytest internal-enrichment/ipqs/tests/` — 37 passed.
|
Full review-and-fix pass complete on
|
…aths
Addresses the two outstanding Copilot review threads on
``a79fad860b``. Both are real correctness issues on the marking
plumbing — one quietly poisoned the bundle's
``object_marking_refs`` with falsy entries, the other silently let
TLP-above-max observables through the gate when their marking was
provided in the alternate flat-id shape.
* ``builder.py::_get_object_marking_refs`` — filter falsy /
non-string ids and deduplicate.
The previous shape was ``if "standard_id" in marking:
object_marking_refs.append(marking["standard_id"])`` — which
happily appended ``None`` / ``""`` / arbitrary non-string values
whenever the key existed with a non-truthy value. Those falsy
refs then propagated into ``stix2`` constructors that expect a
list of ``marking-definition--<uuid>`` strings, causing
serialiser errors or (worse) silent malformed bundles. The
rewritten helper:
- skips the dict branch unless ``marking.get("standard_id")``
is a non-empty string;
- skips the str branch when the value is empty;
- deduplicates the resulting list in-order — mirrors what
``IPQSConnector._observable_marking_refs`` already does for
the failure-Note path, so the two callers cannot diverge on
the marking-list shape we hand to ``stix2``.
* ``ipqs.py::_check_max_tlp`` — now inspects BOTH marking shapes.
``_observable_marking_refs`` already supports the GraphQL
``objectMarking`` list of dicts AND the alternate
``object_marking_refs`` flat list of marking ids. The max-TLP
gate did not — it only walked the first shape and fell back to
``IPQS_DEFAULT_TLP`` (defaults to ``TLP:CLEAR``) on the
alternate shape. So an observable carrying ``TLP:RED`` in the
flat-id shape silently slipped past a default ``IPQS_MAX_TLP=
TLP:AMBER`` gate even though the failure-Note path downstream
would have correctly inherited the ``TLP:RED`` marking on the
diagnostic Note — visibly contradictory.
Added a module-level ``_MARKING_ID_TO_TLP`` reverse lookup
built from ``_TLP_MAP`` (each marking's canonical id mapped
back to its TLP string). ``_check_max_tlp`` uses the primary
``objectMarking`` path first (unchanged), and falls back to
the alternate shape via the reverse lookup when the primary
branch yields no TLP. Unknown ids (PAP markings, custom ones)
are simply ignored and the gate falls back to
``IPQS_DEFAULT_TLP`` — matches the documented "no marking →
default" contract.
Note: ``pycti.MarkingDefinition.generate_id("TLP", "TLP:CLEAR")``
collides with the legacy ``stix2.TLP_WHITE`` id by design —
both represent the least-restrictive TLP level. The reverse
lookup just needs to return *some* valid TLP string that
resolves to the right level; whichever entry wins the dict-build
collision is fine for the gate because ``check_max_tlp`` treats
them as equivalent. The collision and its handling are
documented inline.
* ``tests/test_tlp.py`` — 13 new cases pin the new contracts:
- ``TestMarkingIdToTLP`` (3 cases): every TLP id in
``_TLP_MAP`` round-trips through ``_MARKING_ID_TO_TLP`` to
the same level (parametrised across all six entries,
documents the CLEAR/WHITE id collision as a shared-level
alias group); every distinct marking id is present in the
reverse lookup.
- ``TestCheckMaxTLPAlternateShape`` (6 cases): builds a bare
``IPQSConnector`` via ``__new__`` + manual attribute
injection, then exercises the gate against ``objectMarking``
dicts, ``object_marking_refs`` flat lists, unknown ids, an
empty observable, and the both-shapes-present case (the
dict form takes precedence).
Whole suite is now ``50 / 50 pass`` (was 37 on
``a79fad860b``; +13 for the new contracts above).
Verified locally:
* ``pytest internal-enrichment/ipqs/tests/`` — 50 / 50 pass.
* ``black --check``, ``isort --profile black --check-only``,
``flake8 --select=F`` clean across
``internal-enrichment/ipqs/``.
* ``python -m py_compile`` clean on every modified module.
CircleCI's `test` job uninstalls the connector's pinned pycti and
reinstalls the latest from opencti master HEAD, then runs
``uv pip check``. After `[all] Release 7.260521.0` landed on master,
the previous `pycti==7.260515.0` pin trips the check with::
connectors-sdk requires pycti==7.260521.0, but 7.260515.0 is installed
Bumping the pin to ``7.260521.0`` realigns this connector with the
canonical version used across the rest of the monorepo and lets
the CircleCI rolling test job go green. No behavioural change for
the connector itself — the `pycti` API surface used here is
unchanged between 7.260515.0 and 7.260521.0.
A full ``origin/master`` merge would also pick up the recent
``[ipqs] Add Darkweb-Leak enrichment for User-Account observables
(#6399)`` PR, but the resulting conflict surface on the same files
is significant (6 files, 14+ conflict regions) so the larger
rebase / conflict-resolution is left to a follow-up so this PR can
be reviewed against a known-green CI.
CircleCI's `test` job runs ``run_test.sh`` which:
1. installs the connector's pinned ``pycti``,
2. uninstalls it and reinstalls the latest from opencti
``master`` HEAD,
3. runs ``uv pip check``.
After ``[all] Release 7.260521.0`` landed on master, step 2 pulls
pycti 7.260521.0 while the branch's local
``connectors-sdk/pyproject.toml`` still requires pycti 7.260515.0,
so step 3 trips with::
The package `connectors-sdk` requires `pycti==7.260515.0`,
but `7.260521.0` is installed
The previous commit (9918989) bumped the IPQS connector's own
``src/requirements.txt`` to 7.260521.0 but missed the local
``connectors-sdk`` pin — ``uv pip check`` then still flagged the
remaining mismatch against the master-installed pycti. Bumping
``connectors-sdk/pyproject.toml`` to ``7.260521.0`` (matching the
master-side value) realigns the local SDK package with the
CircleCI test job's environment and unblocks the rolling test.
No behavioural change beyond the version pin — the
``connectors-sdk`` package API is unchanged between 7.260515.0
and 7.260521.0.
This reverts commit 1ae8c14.
…tall
``run_test.sh`` decides whether to install the local
``./connectors-sdk`` package into a connector's test venv by
``grep -rl "connectors-sdk" "$project/.."``. The IPQS connector's
``ipqs.py`` carries an inline comment referencing
``connectors-sdk.models.tlp_marking`` purely for documentation —
the connector itself does not import from connectors-sdk — but
the literal string triggered the grep and caused ``run_test.sh``
to install the local connectors-sdk into the IPQS test venv.
That install pulls in ``connectors-sdk``'s own ``pycti==7.260515.0``
pin, which then conflicts with the master-side pycti
(``7.260521.0`` after the ``[all] Release 7.260521.0`` commit)
that ``run_test.sh`` reinstalls in the next step:
The package `connectors-sdk` requires `pycti==7.260515.0`,
but `7.260521.0` is installed
The same chain also impacts other connectors on this branch
that still pin pycti==7.260515.0, so bumping
``connectors-sdk/pyproject.toml`` to 7.260521.0 (the previous
attempt, reverted in the commit before this) only shifts the
problem to a different connector. The clean fix is to ensure
IPQS never goes through the connectors-sdk install path —
because IPQS does not actually depend on the SDK.
Rephrased the inline comment to refer to "the OpenCTI Connectors
SDK's TLP-marking model" rather than ``connectors-sdk.models.tlp_marking``,
so the literal connectors-sdk string is no longer present in the
file and the grep no longer matches. The documented design
intent (mirror the SDK's MarkingDefinition shape so the platform
displays the right UI label) is preserved verbatim — only the
literal package-name reference is rephrased.
This sidesteps the cross-branch pycti drift entirely for IPQS,
without requiring a full ``origin/master`` merge (which conflicts
substantially with the recently-landed
``[ipqs] Add Darkweb-Leak enrichment for User-Account observables
(#6399)`` — separate follow-up).
Review pass summary (post-fix on
|
| note_id = PyctiNote.generate_id(created=None, content=content) | ||
| marking_refs = self._note_marking_refs(observable) | ||
| note = stix2.Note( | ||
| id=note_id, | ||
| abstract="IPQS enrichment failed", | ||
| content=content, | ||
| object_refs=[observable["standard_id"]], |
Proposed changes
Integrates the IPQS Malware File Scanner functionality originally proposed as a standalone
internal-enrichment/ipqs-analyzerconnector in #5970 into the existinginternal-enrichment/ipqsconnector. PR #5970 was merged then force-reverted onmasterbecause the repository already ships an IPQS connector covering the IPQS fraud-and-risk-scoring API (/ip,/url,/email,/phone). Shipping a second IPQS connector would have duplicated the configuration surface (API key, base URL, …) and the Docker image. Instead, this PR brings the malware-file-scanner endpoints (/malware/scan,/malware/lookup,/postback) into the existing IPQS connector so a single connector serves every IPQS use case with a single API key, a single Docker image and a singleCONNECTOR_SCOPE.Closes
Implementation summary
src/ipqs/client.py— newget_malware_scan_info(file=None, params=None)drives the cache-first lookup → scan → postback flow:/malware/lookupis tried first (astatus == "cached"short-circuits the rest), on a cache miss the file content is uploaded to/malware/scan, IPQS replies with arequest_id, and/postback?request_id=<id>is polled until a final result, an explicitsuccess=false, an exhausted polling budget (overall deadline of_POLLING_BUDGET_SECONDS = 120enforced viatime.monotonic()), or a network failure. Each postback call uses the tighter_POSTBACK_REQUEST_TIMEOUT_SECONDS = 10so the worst-case stall on a single Artifact enrichment goes from ~10 minutes (9 × 70 s) down to the documented ~120 s. A scan response missingrequest_idis converted into an explicitsuccess=Falsepayload (preserving the upstream message as(upstream: ...)) so the caller raises a failure note instead of building an indicator from incomplete data._query_malwarecentralises status-code handling (401→ key error,5xx→ upstream error, JSONDecode errors, ConnectTimeout / ProxyError / InvalidURL / HTTPError / generic RequestException). Failures returnNonerather than raising.file_enrich_fieldslists the fields rendered in the Indicator description for Artifact enrichment. The legacyget_ipqs_infoflow used by IP / Email / URL / Phone is unchanged.src/ipqs/ipqs.py— new_process_artifact(observable)handler downloads the file from<OPENCTI_URL>/storage/get/<id>, submits it to IPQS via the new client helper, and builds a STIX bundle with: anIndicatorcarrying the canonical[file:hashes.'SHA-256' = '<hash>']pattern, the verdict viax_opencti_detection, andx_opencti_main_observable_type=Artifact; a deterministicbased-onrelationship between the Indicator and the Artifact; the observable'sx_opencti_scoreupdated to100(malicious) or50(clean); aClean/Maliciouslabel attached to both the Indicator and the Artifact; an external reference (source_name="IPQS File Analyzer",external_id=<request_id>) attached to the Artifact when IPQS returns one; the configureddefault_tlp_markingobject prepended to the bundle so OpenCTI-specific markings (TLP:CLEAR,TLP:AMBER+STRICT) are registered with the platform by name._send_failure_note(response, observable)emits a STIXNote(abstract="IPQS enrichment failed") attached to the observable when IPQS returnssuccess=falseor is unreachable; the Note inherits the source observable's TLP markings via the_observable_marking_refs/_note_marking_refshelper pair — aTLP:AMBERartifact whose enrichment fails produces aTLP:AMBERdiagnostic Note, never a less-restrictive one. The failure-note bundle also ships thedefault_tlp_markingobject._process_artifact's broadexceptalso calls_send_failure_notewith a download-specific message before returning, so the operator sees download failures from the OpenCTI UI without having to inspect connector logs.src/ipqs/ipqs.py—_check_max_tlp(observable)gates every enrichment branch (not just the new Artifact one) on the configuredIPQS_MAX_TLP, defaulting toTLP:AMBER. The observable's actual TLP is normalised through_normalize_tlpsoamber+strict,TLP:AMBER,Amber, ... all map to the canonical form. The gate inspects BOTH marking shapes the connector accepts elsewhere: the GraphQLobjectMarkinglist (preferred) AND the alternateobject_marking_refsflat-id list (resolved back to the canonical TLP string via a module-level_MARKING_ID_TO_TLPreverse lookup). Without the second branch an observable carrying its TLP in the alternate flat-id shape would silently fall back toIPQS_DEFAULT_TLPand slip past the gate even when the actual TLP is aboveIPQS_MAX_TLP._process_messageadds the newArtifactbranch to the existing match.src/ipqs/ipqs.py— TLP handling rebuilt around realstix2.MarkingDefinitionobjects._TLP_MAPis now a dict ofstix2.MarkingDefinition(not bare ids), mirroring the convention used in the OpenCTI Connectors SDK's TLP-marking model.CLEARandAMBER+STRICTare materialised via a new_make_tlp_markinghelper that usespycti.MarkingDefinition.generate_idand carriesx_opencti_definition(TLP:CLEAR/TLP:AMBER+STRICT) so the OpenCTI UI shows the modern label instead of collapsing ontoTLP:WHITE._resolve_tlp(env_var, value)raisesValueErrorlisting every supported alias on a mistyped TLP value (so an operator typo likeIPQS_DEFAULT_TLP=ANBERno longer silently downgrades toTLP:WHITE— they get a clear startup error instead). BothIPQS_DEFAULT_TLPandIPQS_MAX_TLPare validated at startup._normalize_tlpnow correctly handles whitespace-only inputs (an explicit empty-after-strip guard returns the configured fallback instead of the literal string"TLP:").src/ipqs/builder.py—IPQSBuilder.__init__accepts an optionaldefault_object_marking_refsand propagates it to the generated Indicator / Relationship; the observable-score update is wrapped so a transient failure to persistx_opencti_scoreno longer aborts the enrichment. New_get_object_marking_refs()extracts markings from the GraphQLobjectMarkinglist (preferred) or a plainobject_marking_refslist, with falsy / non-string entries filtered out and duplicates collapsed in-order so the bundle'sobject_marking_refslist cannot poisonstix2's serialiser. Falls back todefault_object_marking_refsso STIX objects emitted by the Artifact branch are never accidentally unmarked.create_indicator_based_onis now backwards-compatible with both the legacy{"value": "<label-string>"}shape returned by the fraud-scoring helpers and a plainList[str](used by the Artifact branch). It gains adetection: Optional[bool]argument that maps tox_opencti_detection/x_opencti_main_observable_typeso OpenCTI's detection rules pick up the new Artifact verdict. Newadd_reference(ipqs_resp, observable)attaches the IPQS external reference (skips silently when the lookup hit the cache and norequest_idis available). Newmalware_file_detection(detected)returns["Clean"]/["Malicious"]and attaches the matching label (red / grey) to the observable.src/config.yml.sample&docker-compose.yml—CONNECTOR_SCOPEextended withArtifact. Two new optional variables (IPQS_DEFAULT_TLP,IPQS_MAX_TLP) documented inline. Note:IPQS_MAX_TLPdefaults toTLP:AMBERand now gates every enrichment branch (IP / Email / URL / Phone / Artifact) — operators running the previous version with TLP:RED observables must setIPQS_MAX_TLP=TLP:REDto keep the existing behaviour.IPQS_DEFAULT_TLPonly affects the new Artifact / failure-note STIX outputs.README.md— rewritten to describe both API families. New Malware File Scanner section walks the operator through the lookup → scan → postback flow, the failure-note path, the external-reference generation, and the new TLP gate. Supported entity types updated. Explicit links back to PR [IPQS Analyzer] NewIntegration #5970 and issue [IPQS Analyzer] - new integration #6199.Tests
tests/test_tlp.py— 50 cases pinning_normalize_tlp,_TLP_MAP(realMarkingDefinitionobjects, CLEAR distinct fromstix2.TLP_WHITEwhere pycti's id namespace permits, AMBER+STRICT canonical id), the strict-mode_resolve_tlp,_make_tlp_marking(id + metadata),_observable_marking_refs(shapes, deduplication, missing / malformed fallback), the new_MARKING_ID_TO_TLPreverse lookup, and the_check_max_tlpgate for both marking shapes (dict + flat-id) including the precedence rule when both are present.tests/test_malware_client.py— 3 cases pinning the client's new contract: missingrequest_idis converted into a failure; the postback loop honours the overall deadline; postback requests use the tighter_POSTBACK_REQUEST_TIMEOUT_SECONDS.All 53 pytest cases pass locally;
black --check,isort --profile black --checkandflake8 --select=Fall clean on the connector tree.Local validation
isort --profile black --check .(clean — repo-wide)black --check .(clean — repo-wide)flake8 --select=F(clean)pytest internal-enrichment/ipqs/tests/→ 53 passed in 1.2sCI status
GitHub Actions checks (
Test internal-enrichment/ipqs,Test tests/test-requirements.txt,Baseline coverage,Check PR is linked to an issue,Check signed commits in PR,Check that PR title follows convention, etc.), Codecov, thefiligran/clacheck and 4 of the 5 CircleCI status contexts (base_linter,linter,ensure_formatting,build_manifest) are all green ona0f18d1a23.The remaining
ci/circleci: testcheck is failing for an environmental reason that is not caused by this PR's code: the branch was opened againstpycti==7.260515.0andmasterhas since releasedpycti==7.260521.0([all] Release 7.260521.0). CircleCI'stestjob uninstalls the connector-pinned pycti and reinstalls the latest from master HEAD, then runsuv pip check; the resulting mismatch trips against the branch's localconnectors-sdk/pyproject.toml(and against any of the ~240 other connectors on the branch that still pinpycti==7.260515.0). The IPQS tests themselves pass on CircleCI (50 passed in 6.51son the IPQS project shard, before the rollingtests/shard exhibits the drift mismatch).Cleaning this up requires either a full
origin/mastermerge (which conflicts substantially with the recently-merged[ipqs] Add Darkweb-Leak enrichment for User-Account observables (#6399)PR — same files, additive functionality) or a coordinated repo-wide pycti bump. Both are outside the scope of this PR and are better handled as a separate follow-up so this change can be reviewed against a stable baseline.Checklist