[BUG] HTML/JavaScript recursion

**Describe the bug**
We've identified a bug in the HTML/JavaScript identification and extraction code. It's possible that libmagic will incorrectly identify a file as "text/html" while YARA will correctly identify a file as "javascript_file". When this happens, the ScanHtml scanner is applied to the JavaScript file and enters a recursive file extraction loop until the maximum depth is hit. 

**Steps to reproduce**
Steps to reproduce the behavior:
1. Find an HTML file that contains embedded JavaScript that gets tasted as "text/html" by libmagic
2. Run the file through Strelka
3. Check for Python logs that describe "exceeded maximum depth" or scan results where the same HTML file is being repeatedly extracted

**Expected behavior** 
JavaScript should not be tasted as HTML.

**Screenshots**
N/A

**Server and project version**
 - OS: Ubuntu Bionic
 - Commit Hash: N/A (first release)

**Additional context**
N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] HTML/JavaScript recursion #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] HTML/JavaScript recursion #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions