Skip to content

Extend file.pe Fieldset #1071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Feb 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
7080358
Merge pull request #1 from elastic/master
peasead Oct 20, 2020
314f9ab
Merge pull request #2 from elastic/master
peasead Nov 3, 2020
408816b
initial commit
peasead Nov 3, 2020
342cf00
update module
peasead Nov 3, 2020
4c5f266
further clarification
peasead Nov 3, 2020
2213d49
updates
peasead Nov 3, 2020
a8f1a80
'make' and 'make test'
peasead Nov 3, 2020
6582bde
changelog
peasead Nov 3, 2020
51fd471
added PR
peasead Nov 3, 2020
41a5bf0
added PR
peasead Nov 3, 2020
7744bd1
reorganized and fixed orig pe.yml
peasead Nov 5, 2020
205eeac
updatd SMEs
peasead Nov 5, 2020
7fa6ae5
reran make to reset files
peasead Nov 5, 2020
a80581e
removed changelog entry
peasead Nov 5, 2020
ec587e4
removed existing fields
peasead Nov 5, 2020
6e86d31
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 5, 2020
24baef4
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
c091452
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
c1ac596
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
70e68b0
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
b9b2686
Update rfcs/text/pe/pe.yml
peasead Nov 6, 2020
80641f0
remove stage headers
peasead Nov 6, 2020
a0f193f
add examples and references
peasead Nov 6, 2020
b3e72da
removed vt module blob for now
peasead Nov 6, 2020
b6f9dfc
adjustments to entry_point as keyword
peasead Nov 10, 2020
6f6cd39
move rich_headers into its own fields
peasead Nov 10, 2020
b12961b
extended compiler to include name and version
peasead Nov 10, 2020
0063d23
adjusted dhash description
peasead Nov 10, 2020
f0e7d61
Update icon fields
peasead Nov 10, 2020
0bab0ec
duplicate fields
peasead Nov 10, 2020
4d2d65b
removed unnecessary hashing algos
peasead Nov 10, 2020
7cc04f8
moving overlay to file.*
peasead Nov 11, 2020
f5a0533
removing resource_languages in favor of resource_details
peasead Nov 11, 2020
4356939
removed packers, not part of peinfo
peasead Nov 11, 2020
53467c6
moved debug to nested fields
peasead Nov 11, 2020
f7c1af7
moved sections to nested
peasead Nov 11, 2020
958c646
updated imports name and change type to flattened
peasead Nov 19, 2020
5bbc6f5
resources rename
peasead Nov 19, 2020
4fa79d5
added "s" to types
peasead Nov 19, 2020
13d264f
remove resources.types aggregation
peasead Dec 23, 2020
550b038
removed plurality of resources.type
peasead Dec 23, 2020
74628f9
add nested resources fields to table
peasead Dec 23, 2020
a8d954e
update entry_point desc.
peasead Dec 23, 2020
ccbed5d
Update rfcs/text/pe/pe.yml
peasead Dec 23, 2020
cb7631a
Merge branch 'file.pe-extend' of github.com:peasead/ecs into file.pe-…
peasead Dec 23, 2020
4a68601
update/add pe.packers
peasead Dec 23, 2020
1f88931
fixed compiler type
peasead Dec 23, 2020
1bd64de
added pe.icon to table
peasead Dec 23, 2020
ad156fd
removed file. from names
peasead Dec 23, 2020
58fca91
Update rfcs/text/0000-extend-file-pe.md
peasead Jan 13, 2021
d54ad8c
Update pe.yml
peasead Jan 13, 2021
a63a716
Update pe.yml
peasead Jan 13, 2021
49d069b
combined debug.type and debut.type_str
peasead Feb 1, 2021
9e2d59d
field definition housekeeping
ebeahan Feb 5, 2021
9464876
adjust markdown comments to align with updated proposal stages
ebeahan Feb 8, 2021
3f816cb
assigning rfc number and set advance date
ebeahan Feb 8, 2021
aa000c2
rename using assigned rfc number
ebeahan Feb 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions rfcs/text/0014-extend-file-pe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# 0014: Extend the PE field set

- Stage: **1 (draft)**
- Date: **2021-02-08**

The Portable Executable (PE) sub-field, of the `file` top-level fieldset, can be updated to include more file attributes to aid in file analysis. This additional document metadata can be used for malware research, as well as coding and other application development efforts.

## Fields

This RFC is to create 25 additional sub-fields within the `file.pe` fieldset.

| Name | Type | Description |
| ---- | ---- | ----------- |
| pe.authentihash | keyword | Authentihash of the PE file. |
| pe.compile_timestamp | date | Compile timestamp of the PE file. |
| pe.compiler | nested | Compiler information. |
| pe.compiler.version | keyword | Version of the compiler. |
| pe.compiler.name | keyword | Name of the compiler. |
| pe.creation_date | date | Extracted when possible from the file's metadata. Indicates when it was built or compiled. It can also be faked by malware creators. |
| pe.entry_point | keyword | Relative byte offset to the base of the PE file. |
| pe.exports | keyword | List of symbols exported by PE |
| pe.debug | nested | Debug information, if present |
| pe.debug.offset | keyword | Debug offset information. |
| pe.debug.size | keyword | Size of the debug information. |
| pe.debug.type | keyword | Information type generated by the debug options. |
| pe.debug.timestamp | date | Timestamp of the debug information. |
| pe.imports | flattened | List of all imported functions |
| pe.sections | nested | Data about sections of compiled binary PE |
| pe.sections.chi2 | long | Chi-square probability distribution. |
| pe.sections.virtual_address | long | Virtual address available to the file. |
| pe.sections.entropy | float | Measurement of entropy randomness in the file. |
| pe.sections.flags | keyword | Section flags of the file. |
| pe.sections.name | keyword | Section names of the file. |
| pe.sections.raw_size | long | Size of the section or the size of the initialized data on disk. |
| pe.resources | nested | If the PE contains resources, some info about them |
| pe.resources.chi2 | long | Chi-square probability distribution |
| pe.resources.filetype | keyword | File type of the resources section |
| pe.resources.entropy | long | Measurement of entropy randomness in the resources section. |
| pe.resources.sha256 | keyword | SHA256 hash of resources section |
| pe.resources.language | keyword | Language identification |
| pe.resources.type | keyword | List of resource types. |
| pe.machine_type | keyword | Machine type of the PE file. |
| pe.packers | keyword | List of packers and tools used. |
| pe.rich_header.hash.md5 | keyword | Hash of the PE header. |
| pe.icon | nested | Information of embedded program icon. |
| pe.icon.hash | nested | Hash information for the embedded program icon. |
| pe.icon.hash.dhash | keyword | Difference Hash (dhash) to find files with a visually similar icon or thumbnail. |


[New `pe.yml` fields](pe/pe.yml)

<!--
Stage 3: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting.
-->

## Usage

In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures.

As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, packers, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family.

As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor.

## Source data

This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms.

* [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815)
* [VirusTotal API](https://developers.virustotal.com/v3.0/reference)
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf)
* [Target Strelka](https://github.com/target/strelka)
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss)

<!--
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list.
-->

<!--
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting.
-->

<!--
Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2.
-->

## Scope of impact

There should be no breaking changes, depreciation strategies, or significant refactoring as this is extending the existing fieldset.

While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields.

<!--
Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into:
* Ingestion mechanisms (e.g. beats/logstash)
* Usage mechanisms (e.g. Kibana applications, detections)
* ECS project (e.g. docs, tooling)
The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each.
-->

## Concerns

<!--
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake.
-->

<!--
Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns.
-->

<!--
Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed.
-->

## People

The following are the people that consulted on the contents of this RFC.

* @peasead | author
* @devonakerr | sponsor
* @dcode, @peasead | subject matter expert

## References

* [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815)
* [VirusTotal API](https://developers.virustotal.com/v3.0/reference)
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf)
* [Target Strelka](https://github.com/target/strelka)
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss)

### RFC Pull Requests

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 1: https://github.com/elastic/ecs/pull/1071

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->
216 changes: 216 additions & 0 deletions rfcs/text/0014/pe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
---
- name: pe

fields:

- name: icon.hash.dhash
level: extended
type: keyword
description: >
Difference Hash (dhash) to find files with a visually similar icon or thumbnail.

example: b806e17c8e330d82

- name: debug
level: extended
type: nested
description: >
Debug information, if present

- name: debug.offset
level: extended
type: keyword
description: Debug offset information.
example: 1296336

- name: debug.size
level: extended
type: long
format: bytes
description: Size of the debug information.
example: 816

- name: debug.type
level: extended
type: keyword
description: Information type generated by the debug options.
example: IMAGE_DEBUG_TYPE_POGO

- name: debug.timestamp
level: extended
type: date
description: Timestamp of the debug information.
example: "2020-11-05T17:25:47.000Z"

- name: imports
level: extended
type: flattened
description: List of all imported functions
example: '{ "library_name" : "mscoree.dll", "imported_functions" : "GetFileVersionInfoSizeA" }'

- name: sections
level: extended
description: >
Data about sections of compiled binary PE
type: nested

- name: sections.chi2
level: extended
description: Chi-square probability distribution.
type: long
example: 3027194

- name: sections.virtual_address
level: extended
description: Virtual address available to the file.
type: long
format: bytes
example: 8192

- name: sections.entropy
level: extended
description: Measurement of entropy randomness in the file.
type: float
example: 6.24

- name: sections.flags
level: extended
description: Section flags of the file.
type: keyword
example: rx

- name: sections.name
level: extended
description: Section names of the file.
type: keyword
example: .text, .data

- name: sections.raw_size
level: extended
description: Size of the section or the dize of the initialized data on disk.
type: long
format: bytes
example: 198144

- name: resources
level: extended
type: nested
description: >
If the PE contains resources, some info about them

- name: resources.chi2
level: extended
description: Chi-square probability distribution.
type: long
example: -1

- name: resources.filetype
level: extended
description: File type of the resources section.
type: keyword
example: Data

- name: resources.entropy
level: extended
description: Measurement of entropy randomness in the resources section.
type: long
example: 0, 1

- name: resources.sha256
level: extended
description: SHA256 hash of resources section.
type: keyword
example: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

- name: resources.language
level: extended
description: Language identification.
type: keyword
example: "CHINESE SIMPLIFIED"

- name: resources.type
level: extended
type: keyword
short: List of resource types.
description: >
Digest of resource types.
example: '["RT_VERSION", "RT_MANIFEST"]'
normalize:
- array

- name: exports
level: extended
type: keyword
description: >
List of symbols exported by PE
example: '["DllInstall", "DllRegisterServer", "DllUnregisterServer"]'
normalize:
- array

- name: creation_date
level: extended
short: Build or compile date.
description: >
Extracted when possible from the file's metadata. Indicates when it was
built or compiled. It can also be faked by malware creators.
type: date
example: "2020-11-05T17:25:47.000Z"

- name: authentihash
level: extended
description: >
Authentihash of the PE file.
type: keyword
example: ac9555d914bbb112ecc5f15bb9887ca8371f493ab0941344e976bb8410c8aa78

- name: compile_timestamp
level: extended
description: >
Compile timestamp of the PE file.
type: date
example: "2020-11-05T17:25:47.000Z"

- name: compiler.name
level: extended
type: keyword
description: >
Name of the compiler
example: Clang

- name: compiler.version
level: extended
type: keyword
description: >
Version of the compiler.
example: 11.0.0

- name: rich_header.hash.md5
level: extended
type: keyword
description: >
MD5 hash of the header for the PE file.

example: 5aa1aa0f2b4be70397a1e9e2b87627cd

- name: entry_point
level: extended
description: >
Relative byte offset to the base of the PE file.
type: keyword
example: 25856

- name: machine_type
level: extended
description: >
Machine type of the PE file.
type: keyword
example: "Intel 386 or later, and compatibles"

- name: packers
level: extended
description: >
List of packers and tools used.
type: keyword
example: '["ASPack v2.12", ".NET executable"]'
normalize:
- array