Skip to content

[BUG] HTML/JavaScript recursion #2

@jshlbrd

Description

@jshlbrd

Describe the bug
We've identified a bug in the HTML/JavaScript identification and extraction code. It's possible that libmagic will incorrectly identify a file as "text/html" while YARA will correctly identify a file as "javascript_file". When this happens, the ScanHtml scanner is applied to the JavaScript file and enters a recursive file extraction loop until the maximum depth is hit.

Steps to reproduce
Steps to reproduce the behavior:

  1. Find an HTML file that contains embedded JavaScript that gets tasted as "text/html" by libmagic
  2. Run the file through Strelka
  3. Check for Python logs that describe "exceeded maximum depth" or scan results where the same HTML file is being repeatedly extracted

Expected behavior
JavaScript should not be tasted as HTML.

Screenshots
N/A

Server and project version

  • OS: Ubuntu Bionic
  • Commit Hash: N/A (first release)

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions