feat: scan history log and skipping already scanned targets#1640
feat: scan history log and skipping already scanned targets#1640odecode wants to merge 7 commits intoprojectdiscovery:devfrom
Conversation
WalkthroughAdds a scan-history feature: new CLI options, a thread-safe ScanHistory (TXT/JSON) with TTL and scope semantics, Runner integration to load/record/save history and optionally skip previously scanned targets, and output fields for previously seen results. Changes
Sequence DiagramsequenceDiagram
participant Client as Client/Main
participant Runner as Runner
participant ScanHistory as ScanHistory
participant Disk as Disk/File
Client->>Runner: NewRunner(options)
activate Runner
Runner->>ScanHistory: NewScanHistory(filePath, format, scope, ttl)
activate ScanHistory
ScanHistory->>Disk: Load()
Disk-->>ScanHistory: existing entries
ScanHistory-->>Runner: ScanHistory instance
deactivate ScanHistory
Runner-->>Client: Ready
Client->>Runner: AddTarget(target)
activate Runner
Runner->>ScanHistory: IsScanned(target)
alt previously scanned & SkipScanned
ScanHistory-->>Runner: true
Runner-->>Client: Skip target
else not scanned or ForceRescan
ScanHistory-->>Runner: false
Runner->>Runner: Process target (resolve/scan)
Runner-->>Client: Continue processing
end
deactivate Runner
Client->>Runner: onReceive(result)
activate Runner
Runner->>Runner: Emit result (include PreviouslySeen/FirstSeenAt)
Runner->>ScanHistory: Record(target, ip)
activate ScanHistory
ScanHistory->>ScanHistory: Update entry, mark dirty
ScanHistory-->>Runner: Recorded
deactivate ScanHistory
Runner-->>Client: Result handled
deactivate Runner
Client->>Runner: Close()
activate Runner
Runner->>ScanHistory: Save()
activate ScanHistory
ScanHistory->>Disk: Write history (JSON/TXT)
Disk-->>ScanHistory: Persisted
ScanHistory-->>Runner: Saved
deactivate ScanHistory
Runner-->>Client: Closed
deactivate Runner
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/runner/output.go (1)
105-121:⚠️ Potential issue | 🟠 MajorHistory metadata is never serialized to JSON.
Result.JSONdoesn’t copyPreviouslySeenandFirstSeenAtintojsonResult, so they’re omitted even when set.Proposed fix
data.ServiceFP = r.ServiceFP data.Tunnel = r.Tunnel data.Version = r.Version data.Confidence = r.Confidence + data.PreviouslySeen = r.PreviouslySeen + data.FirstSeenAt = r.FirstSeenAtpkg/runner/targets.go (1)
121-133:⚠️ Potential issue | 🟠 MajorSkip check misses
host:portinputs.
The pre-check uses the raw target, soexample.com:443won’t match history keyed byexample.com. Normalize before callingIsScanned.Proposed fix
target = strings.TrimSpace(target) if target == "" { return nil } - if r.options.SkipScanned && !r.options.ForceRescan && r.scanHistory != nil { - if r.scanHistory.IsScanned(target) { + lookupTarget := target + if host, _, hasPort := getPort(target); hasPort { + lookupTarget = host + } + if r.options.SkipScanned && !r.options.ForceRescan && r.scanHistory != nil { + if r.scanHistory.IsScanned(lookupTarget) { gologger.Debug().Msgf("Skipping previously scanned target: %s\n", target) return nil } }
🤖 Fix all issues with AI agents
In `@pkg/runner/options.go`:
- Around line 265-271: The flag help currently claims "--log-format" supports
"db" but ScanHistory.Load/Save only support "txt" and "json", so update the flag
and add early validation: change the flagSet.StringVar call that sets
options.LogFormat to list only "txt,json" in the help text (remove "db") and/or
implement DB support, and add a validation step (e.g., in an options.Validate or
before using ScanHistory.Load/Save) that checks options.LogFormat is one of
"txt" or "json" and returns an error if not; reference the flag definition that
sets options.LogFormat and the ScanHistory.Load/Save callers to locate where to
change help text and add validation.
In `@pkg/runner/scanhistory.go`:
- Around line 230-255: The deferred writer.Flush() in saveTXT ignores errors;
change it to perform an explicit flush and check its error before returning
(i.e., remove the deferred call and call writer.Flush() at the end, returning
fmt.Errorf or the flush error if non-nil). Apply the same change in saveBinary
for its buffered writer/encoder so any write/flush failures (e.g., disk full)
are propagated instead of silently ignored; reference the saveTXT and saveBinary
functions and the writer variable so you update the correct places.
- Around line 18-90: ScanHistory stores a scope but never applies it, so
IsScanned/Record (and Load's lookup) use raw target keys and break scope-based
deduplication; add a helper function (e.g., normalizeKey or keyForScope) that
takes (target string) and returns the normalized key based on sh.scope: if scope
== "ip" resolve the IP (use net.LookupIP or equivalent) and return the IP
string, otherwise treat as domain/host and strip any port with net.SplitHostPort
(fall back to the original host when SplitHostPort fails); then call this helper
in ScanHistory.IsScanned, ScanHistory.Record, and the lookup logic inside
ScanHistory.Load so all lookups/updates use the normalized key consistently.
🧹 Nitpick comments (3)
pkg/runner/runner.go (1)
286-291: Consider recording history once per host-result to avoid inflated counts.
onReceivefires per open port, soRecordis called multiple times per host in a single run, inflatingScanCount. Consider deduping per hostResult (or moving history writes to a post-scan stage) and, if you plan to emitpreviously_seenmetadata, capture the prior entry before output.pkg/runner/runner_test.go (1)
942-1049: Skip‑scanned integration test doesn’t validate the skip effect.
AddTargetreturns nil in both paths, soexpectedAddedonly checks for errors. Consider asserting that the target wasn’t added toIPRanger(or that history state didn’t change) when skip is expected.pkg/runner/scanhistory_test.go (1)
14-61: Prefert.TempDir()over fixed/tmppaths.
Hard-coded/tmppaths can collide across parallel runs and break on Windows. Uset.TempDir()+filepath.Joinfor per-test files.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@pkg/runner/runner_test.go`:
- Around line 943-944: Replace hardcoded "/tmp/..." test paths with per-test
sandbox directories from t.TempDir(): create dir := t.TempDir() and set tmpFile
:= filepath.Join(dir, "test-integration-history.log") (importing path/filepath),
remove manual os.Remove defer since t.TempDir() cleans up, and apply the same
change for the other occurrences referenced around the tests (lines near
1062-1064, 1115-1117, 1163-1164) so all tmpFile usages use
filepath.Join(t.TempDir(), "<name>").
- Around line 1076-1094: The test expects history recording but runner.onReceive
only formats/output results while history is recorded in handleOutput; update
the test to simulate the real path by adding the hostResult into
runner.ScanResults (or the appropriate results collection) and then call
runner.handleOutput(...) instead of only runner.onReceive, ensuring you still
call runner.scanner.IPRanger.AddHostWithMetadata("1.2.3.4", "example.com")
beforehand and assert runner.scanHistory.IsScanned("example.com") afterwards;
alternatively, if you prefer changing behavior, move the scanHistory recording
logic from handleOutput into runner.onReceive (and remove/adjust duplication in
handleOutput) so onReceive itself updates scanHistory.
In `@pkg/runner/runner.go`:
- Around line 1228-1254: The current scan-history logic only iterates
scanResults.GetIPsPorts(), so IP-only discovery runs (scanResults.HasIPS()) are
not recorded; update the block guarded by r.scanHistory != nil to also handle
IP-only results by checking scanResults.HasIPS() and iterating
scanResults.GetIPs() (or the equivalent IP-only iterator) to add each IP as a
host->IP entry into the same recordedHosts map (use host=ip for entries where no
hostname exists), then continue to call r.scanHistory.Record(host, ip) and log
errors as before; keep deduplication logic and reuse r.scanner.IPRanger lookup
path for consistency where needed.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/runner/scanhistory.go`:
- Around line 55-68: IsScanned() is not using the IP when scope == "ip", causing
Record(host, ip) entries (which use ScanHistory.key) to be missed; update the
ScanHistory.IsScanned signature to accept an ip string (e.g., IsScanned(target,
ip string) bool), have it call ScanHistory.key(target, ip) just like Record
does, and then update the caller(s) that currently call IsScanned(target) (the
place that has local variables named target and ip) to pass the ip argument as
well so IP-scoped lookups match stored keys.
issue #1631 requests a feature of logging scanning history and option to skip recently scanned targets. This PR adds such functionality.
Summary by CodeRabbit
New Features
Tests