feat: scan history log and skipping already scanned targets by odecode · Pull Request #1640 · projectdiscovery/naabu

odecode · 2026-02-04T11:39:52Z

issue #1631 requests a feature of logging scanning history and option to skip recently scanned targets. This PR adds such functionality.

Summary by CodeRabbit

New Features
- Scan history tracking with persistent storage (JSON/TXT), TTL-based expiry, and configurable scope/format.
- New CLI options to control scan history: scan-log, log-format, log-scope, ttl, skip-scanned, force-rescan.
- Optionally skip previously scanned targets; JSON output now includes previously_seen and first_seen_at metadata.
Tests
- Extensive tests covering persistence, formats, TTL expiry, skipping/force-rescan, concurrency, and round-trip integrity.

coderabbitai · 2026-02-04T11:40:21Z

Walkthrough

Adds a scan-history feature: new CLI options, a thread-safe ScanHistory (TXT/JSON) with TTL and scope semantics, Runner integration to load/record/save history and optionally skip previously scanned targets, and output fields for previously seen results.

Changes

Cohort / File(s)	Summary
Configuration `pkg/runner/options.go`	Added Options fields: `ScanLog`, `SkipScanned`, `LogFormat`, `LogScope`, `ForceRescan`, `ScanLogTTL` and CLI flag parsing under a `scan-history` group.
Output Structures `pkg/runner/output.go`	Added `PreviouslySeen bool` and `FirstSeenAt time.Time` to `Result` and `jsonResult` with JSON tags (affects JSON output; no CSV tags added).
Scan History Implementation `pkg/runner/scanhistory.go`	New thread-safe `ScanHistory` and `ScanEntry` types with `NewScanHistory`, `IsScanned`, `GetScanCount`, `Record`, `Load`, `Save` and format-specific load/save for JSON and TXT; TTL and scope handling included.
Runner Integration `pkg/runner/runner.go`	Runner now has `scanHistory *ScanHistory`, loads history on NewRunner when configured, records host/IP pairs in `handleOutput`, and saves history on Close.
Target Processing `pkg/runner/targets.go`	`AddTarget` pre-checks `scanHistory` and returns early when `SkipScanned` is true and `ForceRescan` is false; per-IP and normalized target checks added before emitting targets.
Tests `pkg/runner/runner_test.go`, `pkg/runner/scanhistory_test.go`	Extensive tests added for ScanHistory behavior, formats (txt/json), TTL expiry, persistence round-trips, concurrency, Runner integration, dirty/save behavior, and numerous edge cases.

Sequence Diagram

sequenceDiagram
    participant Client as Client/Main
    participant Runner as Runner
    participant ScanHistory as ScanHistory
    participant Disk as Disk/File

    Client->>Runner: NewRunner(options)
    activate Runner
    Runner->>ScanHistory: NewScanHistory(filePath, format, scope, ttl)
    activate ScanHistory
    ScanHistory->>Disk: Load()
    Disk-->>ScanHistory: existing entries
    ScanHistory-->>Runner: ScanHistory instance
    deactivate ScanHistory
    Runner-->>Client: Ready

    Client->>Runner: AddTarget(target)
    activate Runner
    Runner->>ScanHistory: IsScanned(target)
    alt previously scanned & SkipScanned
        ScanHistory-->>Runner: true
        Runner-->>Client: Skip target
    else not scanned or ForceRescan
        ScanHistory-->>Runner: false
        Runner->>Runner: Process target (resolve/scan)
        Runner-->>Client: Continue processing
    end
    deactivate Runner

    Client->>Runner: onReceive(result)
    activate Runner
    Runner->>Runner: Emit result (include PreviouslySeen/FirstSeenAt)
    Runner->>ScanHistory: Record(target, ip)
    activate ScanHistory
    ScanHistory->>ScanHistory: Update entry, mark dirty
    ScanHistory-->>Runner: Recorded
    deactivate ScanHistory
    Runner-->>Client: Result handled
    deactivate Runner

    Client->>Runner: Close()
    activate Runner
    Runner->>ScanHistory: Save()
    activate ScanHistory
    ScanHistory->>Disk: Write history (JSON/TXT)
    Disk-->>ScanHistory: Persisted
    ScanHistory-->>Runner: Saved
    deactivate ScanHistory
    Runner-->>Client: Closed
    deactivate Runner

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through logs in txt and json bright,
I note first-seen timestamps, skip what's in sight,
TTL keeps memories fresh and neat,
I record each host and every IP I meet,
A tiny carrot for each saved bite. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main changes: implementing scan history logging and the ability to skip already-scanned targets, which are the core features added across multiple files in this changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/runner/output.go (1)
105-121: ⚠️ Potential issue | 🟠 Major

History metadata is never serialized to JSON.
Result.JSON doesn’t copy PreviouslySeen and FirstSeenAt into jsonResult, so they’re omitted even when set.
Proposed fix
 	data.ServiceFP = r.ServiceFP
 	data.Tunnel = r.Tunnel
 	data.Version = r.Version
 	data.Confidence = r.Confidence
+	data.PreviouslySeen = r.PreviouslySeen
+	data.FirstSeenAt = r.FirstSeenAt
pkg/runner/targets.go (1)
121-133: ⚠️ Potential issue | 🟠 Major

Skip check misses host:port inputs.
The pre-check uses the raw target, so example.com:443 won’t match history keyed by example.com. Normalize before calling IsScanned.
Proposed fix
 	target = strings.TrimSpace(target)
 	if target == "" {
 		return nil
 	}
 
-	if r.options.SkipScanned && !r.options.ForceRescan && r.scanHistory != nil {
-		if r.scanHistory.IsScanned(target) {
+	lookupTarget := target
+	if host, _, hasPort := getPort(target); hasPort {
+		lookupTarget = host
+	}
+	if r.options.SkipScanned && !r.options.ForceRescan && r.scanHistory != nil {
+		if r.scanHistory.IsScanned(lookupTarget) {
 			gologger.Debug().Msgf("Skipping previously scanned target: %s\n", target)
 			return nil
 		}
 	}

🤖 Fix all issues with AI agents

In `@pkg/runner/options.go`:
- Around line 265-271: The flag help currently claims "--log-format" supports
"db" but ScanHistory.Load/Save only support "txt" and "json", so update the flag
and add early validation: change the flagSet.StringVar call that sets
options.LogFormat to list only "txt,json" in the help text (remove "db") and/or
implement DB support, and add a validation step (e.g., in an options.Validate or
before using ScanHistory.Load/Save) that checks options.LogFormat is one of
"txt" or "json" and returns an error if not; reference the flag definition that
sets options.LogFormat and the ScanHistory.Load/Save callers to locate where to
change help text and add validation.

In `@pkg/runner/scanhistory.go`:
- Around line 230-255: The deferred writer.Flush() in saveTXT ignores errors;
change it to perform an explicit flush and check its error before returning
(i.e., remove the deferred call and call writer.Flush() at the end, returning
fmt.Errorf or the flush error if non-nil). Apply the same change in saveBinary
for its buffered writer/encoder so any write/flush failures (e.g., disk full)
are propagated instead of silently ignored; reference the saveTXT and saveBinary
functions and the writer variable so you update the correct places.
- Around line 18-90: ScanHistory stores a scope but never applies it, so
IsScanned/Record (and Load's lookup) use raw target keys and break scope-based
deduplication; add a helper function (e.g., normalizeKey or keyForScope) that
takes (target string) and returns the normalized key based on sh.scope: if scope
== "ip" resolve the IP (use net.LookupIP or equivalent) and return the IP
string, otherwise treat as domain/host and strip any port with net.SplitHostPort
(fall back to the original host when SplitHostPort fails); then call this helper
in ScanHistory.IsScanned, ScanHistory.Record, and the lookup logic inside
ScanHistory.Load so all lookups/updates use the normalized key consistently.

🧹 Nitpick comments (3)

pkg/runner/runner.go (1)

286-291: Consider recording history once per host-result to avoid inflated counts.
onReceive fires per open port, so Record is called multiple times per host in a single run, inflating ScanCount. Consider deduping per hostResult (or moving history writes to a post-scan stage) and, if you plan to emit previously_seen metadata, capture the prior entry before output.

pkg/runner/runner_test.go (1)

942-1049: Skip‑scanned integration test doesn’t validate the skip effect.
AddTarget returns nil in both paths, so expectedAdded only checks for errors. Consider asserting that the target wasn’t added to IPRanger (or that history state didn’t change) when skip is expected.

pkg/runner/scanhistory_test.go (1)

14-61: Prefer t.TempDir() over fixed /tmp paths.
Hard-coded /tmp paths can collide across parallel runs and break on Windows. Use t.TempDir() + filepath.Join for per-test files.

pkg/runner/options.go

pkg/runner/scanhistory.go

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@pkg/runner/runner_test.go`:
- Around line 943-944: Replace hardcoded "/tmp/..." test paths with per-test
sandbox directories from t.TempDir(): create dir := t.TempDir() and set tmpFile
:= filepath.Join(dir, "test-integration-history.log") (importing path/filepath),
remove manual os.Remove defer since t.TempDir() cleans up, and apply the same
change for the other occurrences referenced around the tests (lines near
1062-1064, 1115-1117, 1163-1164) so all tmpFile usages use
filepath.Join(t.TempDir(), "<name>").
- Around line 1076-1094: The test expects history recording but runner.onReceive
only formats/output results while history is recorded in handleOutput; update
the test to simulate the real path by adding the hostResult into
runner.ScanResults (or the appropriate results collection) and then call
runner.handleOutput(...) instead of only runner.onReceive, ensuring you still
call runner.scanner.IPRanger.AddHostWithMetadata("1.2.3.4", "example.com")
beforehand and assert runner.scanHistory.IsScanned("example.com") afterwards;
alternatively, if you prefer changing behavior, move the scanHistory recording
logic from handleOutput into runner.onReceive (and remove/adjust duplication in
handleOutput) so onReceive itself updates scanHistory.

In `@pkg/runner/runner.go`:
- Around line 1228-1254: The current scan-history logic only iterates
scanResults.GetIPsPorts(), so IP-only discovery runs (scanResults.HasIPS()) are
not recorded; update the block guarded by r.scanHistory != nil to also handle
IP-only results by checking scanResults.HasIPS() and iterating
scanResults.GetIPs() (or the equivalent IP-only iterator) to add each IP as a
host->IP entry into the same recordedHosts map (use host=ip for entries where no
hostname exists), then continue to call r.scanHistory.Record(host, ip) and log
errors as before; keep deduplication logic and reuse r.scanner.IPRanger lookup
path for consistency where needed.

pkg/runner/runner_test.go

pkg/runner/runner.go

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@pkg/runner/scanhistory.go`:
- Around line 55-68: IsScanned() is not using the IP when scope == "ip", causing
Record(host, ip) entries (which use ScanHistory.key) to be missed; update the
ScanHistory.IsScanned signature to accept an ip string (e.g., IsScanned(target,
ip string) bool), have it call ScanHistory.key(target, ip) just like Record
does, and then update the caller(s) that currently call IsScanned(target) (the
place that has local variables named target and ip) to pass the ip argument as
well so IP-scoped lookups match stored keys.

pkg/runner/scanhistory.go

Travel Planner Developer added 3 commits February 4, 2026 12:47

add scan log & skip already scanned feature

8c73b01

add unit tests

3cfa2b7

add integration tests

633170f

auto-assign bot requested a review from dwisiswant0 February 4, 2026 11:39

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

pkg/runner/options.go Show resolved Hide resolved

pkg/runner/scanhistory.go Show resolved Hide resolved

pkg/runner/scanhistory.go Show resolved Hide resolved

Travel Planner Developer added 2 commits February 4, 2026 13:57

coderabbit code review fixes

41c72c5

coderabbit code review fixes

aa27bf8

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

pkg/runner/runner_test.go Outdated Show resolved Hide resolved

pkg/runner/runner_test.go Outdated Show resolved Hide resolved

pkg/runner/runner.go Show resolved Hide resolved

replace /tmp/ strings with TempDir(), fix erroneous integration test

4868979

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

pkg/runner/scanhistory.go Show resolved Hide resolved

fix issue ip scope scan history, add test coverage

ca5de75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: scan history log and skipping already scanned targets#1640

feat: scan history log and skipping already scanned targets#1640
odecode wants to merge 7 commits intoprojectdiscovery:devfrom
odecode:skip-scanned

odecode commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

odecode commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

odecode commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading