Skip to content

feat: scan history log and skipping already scanned targets#1640

Open
odecode wants to merge 7 commits intoprojectdiscovery:devfrom
odecode:skip-scanned
Open

feat: scan history log and skipping already scanned targets#1640
odecode wants to merge 7 commits intoprojectdiscovery:devfrom
odecode:skip-scanned

Conversation

@odecode
Copy link

@odecode odecode commented Feb 4, 2026

issue #1631 requests a feature of logging scanning history and option to skip recently scanned targets. This PR adds such functionality.

Summary by CodeRabbit

  • New Features

    • Scan history tracking with persistent storage (JSON/TXT), TTL-based expiry, and configurable scope/format.
    • New CLI options to control scan history: scan-log, log-format, log-scope, ttl, skip-scanned, force-rescan.
    • Optionally skip previously scanned targets; JSON output now includes previously_seen and first_seen_at metadata.
  • Tests

    • Extensive tests covering persistence, formats, TTL expiry, skipping/force-rescan, concurrency, and round-trip integrity.

@auto-assign auto-assign bot requested a review from dwisiswant0 February 4, 2026 11:39
@coderabbitai
Copy link

coderabbitai bot commented Feb 4, 2026

Walkthrough

Adds a scan-history feature: new CLI options, a thread-safe ScanHistory (TXT/JSON) with TTL and scope semantics, Runner integration to load/record/save history and optionally skip previously scanned targets, and output fields for previously seen results.

Changes

Cohort / File(s) Summary
Configuration
pkg/runner/options.go
Added Options fields: ScanLog, SkipScanned, LogFormat, LogScope, ForceRescan, ScanLogTTL and CLI flag parsing under a scan-history group.
Output Structures
pkg/runner/output.go
Added PreviouslySeen bool and FirstSeenAt time.Time to Result and jsonResult with JSON tags (affects JSON output; no CSV tags added).
Scan History Implementation
pkg/runner/scanhistory.go
New thread-safe ScanHistory and ScanEntry types with NewScanHistory, IsScanned, GetScanCount, Record, Load, Save and format-specific load/save for JSON and TXT; TTL and scope handling included.
Runner Integration
pkg/runner/runner.go
Runner now has scanHistory *ScanHistory, loads history on NewRunner when configured, records host/IP pairs in handleOutput, and saves history on Close.
Target Processing
pkg/runner/targets.go
AddTarget pre-checks scanHistory and returns early when SkipScanned is true and ForceRescan is false; per-IP and normalized target checks added before emitting targets.
Tests
pkg/runner/runner_test.go, pkg/runner/scanhistory_test.go
Extensive tests added for ScanHistory behavior, formats (txt/json), TTL expiry, persistence round-trips, concurrency, Runner integration, dirty/save behavior, and numerous edge cases.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Main
    participant Runner as Runner
    participant ScanHistory as ScanHistory
    participant Disk as Disk/File

    Client->>Runner: NewRunner(options)
    activate Runner
    Runner->>ScanHistory: NewScanHistory(filePath, format, scope, ttl)
    activate ScanHistory
    ScanHistory->>Disk: Load()
    Disk-->>ScanHistory: existing entries
    ScanHistory-->>Runner: ScanHistory instance
    deactivate ScanHistory
    Runner-->>Client: Ready

    Client->>Runner: AddTarget(target)
    activate Runner
    Runner->>ScanHistory: IsScanned(target)
    alt previously scanned & SkipScanned
        ScanHistory-->>Runner: true
        Runner-->>Client: Skip target
    else not scanned or ForceRescan
        ScanHistory-->>Runner: false
        Runner->>Runner: Process target (resolve/scan)
        Runner-->>Client: Continue processing
    end
    deactivate Runner

    Client->>Runner: onReceive(result)
    activate Runner
    Runner->>Runner: Emit result (include PreviouslySeen/FirstSeenAt)
    Runner->>ScanHistory: Record(target, ip)
    activate ScanHistory
    ScanHistory->>ScanHistory: Update entry, mark dirty
    ScanHistory-->>Runner: Recorded
    deactivate ScanHistory
    Runner-->>Client: Result handled
    deactivate Runner

    Client->>Runner: Close()
    activate Runner
    Runner->>ScanHistory: Save()
    activate ScanHistory
    ScanHistory->>Disk: Write history (JSON/TXT)
    Disk-->>ScanHistory: Persisted
    ScanHistory-->>Runner: Saved
    deactivate ScanHistory
    Runner-->>Client: Closed
    deactivate Runner
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through logs in txt and json bright,
I note first-seen timestamps, skip what's in sight,
TTL keeps memories fresh and neat,
I record each host and every IP I meet,
A tiny carrot for each saved bite. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main changes: implementing scan history logging and the ability to skip already-scanned targets, which are the core features added across multiple files in this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/runner/output.go (1)

105-121: ⚠️ Potential issue | 🟠 Major

History metadata is never serialized to JSON.
Result.JSON doesn’t copy PreviouslySeen and FirstSeenAt into jsonResult, so they’re omitted even when set.

Proposed fix
 	data.ServiceFP = r.ServiceFP
 	data.Tunnel = r.Tunnel
 	data.Version = r.Version
 	data.Confidence = r.Confidence
+	data.PreviouslySeen = r.PreviouslySeen
+	data.FirstSeenAt = r.FirstSeenAt
pkg/runner/targets.go (1)

121-133: ⚠️ Potential issue | 🟠 Major

Skip check misses host:port inputs.
The pre-check uses the raw target, so example.com:443 won’t match history keyed by example.com. Normalize before calling IsScanned.

Proposed fix
 	target = strings.TrimSpace(target)
 	if target == "" {
 		return nil
 	}
 
-	if r.options.SkipScanned && !r.options.ForceRescan && r.scanHistory != nil {
-		if r.scanHistory.IsScanned(target) {
+	lookupTarget := target
+	if host, _, hasPort := getPort(target); hasPort {
+		lookupTarget = host
+	}
+	if r.options.SkipScanned && !r.options.ForceRescan && r.scanHistory != nil {
+		if r.scanHistory.IsScanned(lookupTarget) {
 			gologger.Debug().Msgf("Skipping previously scanned target: %s\n", target)
 			return nil
 		}
 	}
🤖 Fix all issues with AI agents
In `@pkg/runner/options.go`:
- Around line 265-271: The flag help currently claims "--log-format" supports
"db" but ScanHistory.Load/Save only support "txt" and "json", so update the flag
and add early validation: change the flagSet.StringVar call that sets
options.LogFormat to list only "txt,json" in the help text (remove "db") and/or
implement DB support, and add a validation step (e.g., in an options.Validate or
before using ScanHistory.Load/Save) that checks options.LogFormat is one of
"txt" or "json" and returns an error if not; reference the flag definition that
sets options.LogFormat and the ScanHistory.Load/Save callers to locate where to
change help text and add validation.

In `@pkg/runner/scanhistory.go`:
- Around line 230-255: The deferred writer.Flush() in saveTXT ignores errors;
change it to perform an explicit flush and check its error before returning
(i.e., remove the deferred call and call writer.Flush() at the end, returning
fmt.Errorf or the flush error if non-nil). Apply the same change in saveBinary
for its buffered writer/encoder so any write/flush failures (e.g., disk full)
are propagated instead of silently ignored; reference the saveTXT and saveBinary
functions and the writer variable so you update the correct places.
- Around line 18-90: ScanHistory stores a scope but never applies it, so
IsScanned/Record (and Load's lookup) use raw target keys and break scope-based
deduplication; add a helper function (e.g., normalizeKey or keyForScope) that
takes (target string) and returns the normalized key based on sh.scope: if scope
== "ip" resolve the IP (use net.LookupIP or equivalent) and return the IP
string, otherwise treat as domain/host and strip any port with net.SplitHostPort
(fall back to the original host when SplitHostPort fails); then call this helper
in ScanHistory.IsScanned, ScanHistory.Record, and the lookup logic inside
ScanHistory.Load so all lookups/updates use the normalized key consistently.
🧹 Nitpick comments (3)
pkg/runner/runner.go (1)

286-291: Consider recording history once per host-result to avoid inflated counts.
onReceive fires per open port, so Record is called multiple times per host in a single run, inflating ScanCount. Consider deduping per hostResult (or moving history writes to a post-scan stage) and, if you plan to emit previously_seen metadata, capture the prior entry before output.

pkg/runner/runner_test.go (1)

942-1049: Skip‑scanned integration test doesn’t validate the skip effect.
AddTarget returns nil in both paths, so expectedAdded only checks for errors. Consider asserting that the target wasn’t added to IPRanger (or that history state didn’t change) when skip is expected.

pkg/runner/scanhistory_test.go (1)

14-61: Prefer t.TempDir() over fixed /tmp paths.
Hard-coded /tmp paths can collide across parallel runs and break on Windows. Use t.TempDir() + filepath.Join for per-test files.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@pkg/runner/runner_test.go`:
- Around line 943-944: Replace hardcoded "/tmp/..." test paths with per-test
sandbox directories from t.TempDir(): create dir := t.TempDir() and set tmpFile
:= filepath.Join(dir, "test-integration-history.log") (importing path/filepath),
remove manual os.Remove defer since t.TempDir() cleans up, and apply the same
change for the other occurrences referenced around the tests (lines near
1062-1064, 1115-1117, 1163-1164) so all tmpFile usages use
filepath.Join(t.TempDir(), "<name>").
- Around line 1076-1094: The test expects history recording but runner.onReceive
only formats/output results while history is recorded in handleOutput; update
the test to simulate the real path by adding the hostResult into
runner.ScanResults (or the appropriate results collection) and then call
runner.handleOutput(...) instead of only runner.onReceive, ensuring you still
call runner.scanner.IPRanger.AddHostWithMetadata("1.2.3.4", "example.com")
beforehand and assert runner.scanHistory.IsScanned("example.com") afterwards;
alternatively, if you prefer changing behavior, move the scanHistory recording
logic from handleOutput into runner.onReceive (and remove/adjust duplication in
handleOutput) so onReceive itself updates scanHistory.

In `@pkg/runner/runner.go`:
- Around line 1228-1254: The current scan-history logic only iterates
scanResults.GetIPsPorts(), so IP-only discovery runs (scanResults.HasIPS()) are
not recorded; update the block guarded by r.scanHistory != nil to also handle
IP-only results by checking scanResults.HasIPS() and iterating
scanResults.GetIPs() (or the equivalent IP-only iterator) to add each IP as a
host->IP entry into the same recordedHosts map (use host=ip for entries where no
hostname exists), then continue to call r.scanHistory.Record(host, ip) and log
errors as before; keep deduplication logic and reuse r.scanner.IPRanger lookup
path for consistency where needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/runner/scanhistory.go`:
- Around line 55-68: IsScanned() is not using the IP when scope == "ip", causing
Record(host, ip) entries (which use ScanHistory.key) to be missed; update the
ScanHistory.IsScanned signature to accept an ip string (e.g., IsScanned(target,
ip string) bool), have it call ScanHistory.key(target, ip) just like Record
does, and then update the caller(s) that currently call IsScanned(target) (the
place that has local variables named target and ip) to pass the ip argument as
well so IP-scoped lookups match stored keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant