Skip to content

Double reuse of a schema doesn't succeed #959

Closed
@jonathan-buttner

Description

@jonathan-buttner

Description of the problem including expected versus actual behavior:

In ECS version 1.6 process schema is reused on itself to create the parent section. If a custom schema is used to reuse process onto the custom schema, the parent fields are not included.

Steps to reproduce:

Create these files in a directory called test_schema_reuse:

custom_process.yml

---
- name: process
  title: Process
  group: 2
  short: These fields contain information about a process.
  description: >
    These fields contain information about a process.

    These fields can help you correlate metrics information with a process id/name
    from a log message.  The `process.pid` often stays in the metric itself and is
    copied to the global field for correlation.
  reusable:
    top_level: true
    expected:
      - DoubleReuse
  type: group
  fields:
    - name: test_base
      level: custom
      type: keyword
      description: Object for all custom defined fields to live in.

custom_double_reuse.yml

---
- name: DoubleReuse
  title: DoubleReuse
  group: 2
  short: double reuse example.
  description: double reuse example
    
  type: group
  fields:
    - name: process
      level: custom
      type: object
      description: >
        Process.

To make things a little easier you can short circuit the generator like so:

diff --git a/scripts/generator.py b/scripts/generator.py
index b7ae2a4..3b5140b 100644
--- a/scripts/generator.py
+++ b/scripts/generator.py
@@ -43,6 +43,9 @@ def main():
     fields = loader.load_schemas(ref=args.ref, included_files=args.include)
     cleaner.clean(fields)
     finalizer.finalize(fields)
+    ecs_helpers.yaml_dump('ecs.yml', fields)
+    import sys
+    sys.exit()
     fields = subset_filter.filter(fields, args.subset, out_dir)
     nested, flat = intermediate_files.generate(fields, os.path.join(out_dir, 'ecs'), default_dirs)

Run python scripts/generator.py --include <path to test_schema_reuse> --ref v1.6.0

Examine the output of ecs.yml:

DoubleReuse section of ecs.yml
DoubleReuse:
  field_details:
    dashed_name: DoubleReuse
    description: double reuse example
    flat_name: DoubleReuse
    name: DoubleReuse
    node_name: DoubleReuse
    short: double reuse example.
    type: group
  fields:
    process:
      field_details:
        dashed_name: DoubleReuse-process
        description: 'These fields contain information about a process.

          These fields can help you correlate metrics information with a process id/name
          from a log message.  The `process.pid` often stays in the metric itself
          and is copied to the global field for correlation.'
        flat_name: DoubleReuse.process
        intermediate: true
        name: process
        node_name: process
        original_fieldset: process
        short: These fields contain information about a process.
        type: group
      fields:
        args:
          field_details:
            dashed_name: DoubleReuse-process-args
            description: 'Array of process arguments, starting with the absolute path
              to the executable.

              May be filtered to protect sensitive information.'
            example:
            - /usr/bin/ssh
            - -l
            - user
            - 10.0.0.16
            flat_name: DoubleReuse.process.args
            ignore_above: 1024
            level: extended
            name: args
            node_name: args
            normalize:
            - array
            original_fieldset: process
            short: Array of process arguments.
            type: keyword
        args_count:
          field_details:
            dashed_name: DoubleReuse-process-args-count
            description: 'Length of the process.args array.

              This field can be useful for querying or performing bucket analysis
              on how many arguments were provided to start a process. More arguments
              may be an indication of suspicious activity.'
            example: 4
            flat_name: DoubleReuse.process.args_count
            level: extended
            name: args_count
            node_name: args_count
            normalize: []
            original_fieldset: process
            short: Length of the process.args array.
            type: long
        code_signature:
          field_details:
            dashed_name: DoubleReuse-process-code-signature
            description: These fields contain information about binary code signatures.
            flat_name: DoubleReuse.process.code_signature
            intermediate: true
            name: code_signature
            node_name: code_signature
            original_fieldset: code_signature
            short: These fields contain information about binary code signatures.
            type: group
          fields:
            exists:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-exists
                description: Boolean to capture if a signature is present.
                example: 'true'
                flat_name: DoubleReuse.process.code_signature.exists
                level: core
                name: exists
                node_name: exists
                normalize: []
                original_fieldset: code_signature
                short: Boolean to capture if a signature is present.
                type: boolean
            status:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-status
                description: 'Additional information about the certificate status.

                  This is useful for logging cryptographic errors with the certificate
                  validity or trust status. Leave unpopulated if the validity or trust
                  of the certificate was unchecked.'
                example: ERROR_UNTRUSTED_ROOT
                flat_name: DoubleReuse.process.code_signature.status
                ignore_above: 1024
                level: extended
                name: status
                node_name: status
                normalize: []
                original_fieldset: code_signature
                short: Additional information about the certificate status.
                type: keyword
            subject_name:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-subject-name
                description: Subject name of the code signer
                example: Microsoft Corporation
                flat_name: DoubleReuse.process.code_signature.subject_name
                ignore_above: 1024
                level: core
                name: subject_name
                node_name: subject_name
                normalize: []
                original_fieldset: code_signature
                short: Subject name of the code signer
                type: keyword
            trusted:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-trusted
                description: 'Stores the trust status of the certificate chain.

                  Validating the trust of the certificate chain may be complicated,
                  and this field should only be populated by tools that actively check
                  the status.'
                example: 'true'
                flat_name: DoubleReuse.process.code_signature.trusted
                level: extended
                name: trusted
                node_name: trusted
                normalize: []
                original_fieldset: code_signature
                short: Stores the trust status of the certificate chain.
                type: boolean
            valid:
              field_details:
                dashed_name: DoubleReuse-process-code-signature-valid
                description: 'Boolean to capture if the digital signature is verified
                  against the binary content.

                  Leave unpopulated if a certificate was unchecked.'
                example: 'true'
                flat_name: DoubleReuse.process.code_signature.valid
                level: extended
                name: valid
                node_name: valid
                normalize: []
                original_fieldset: code_signature
                short: Boolean to capture if the digital signature is verified against
                  the binary content.
                type: boolean
        command_line:
          field_details:
            dashed_name: DoubleReuse-process-command-line
            description: 'Full command line that started the process, including the
              absolute path to the executable, and all arguments.

              Some arguments may be filtered to protect sensitive information.'
            example: /usr/bin/ssh -l user 10.0.0.16
            flat_name: DoubleReuse.process.command_line
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.command_line.text
              name: text
              norms: false
              type: text
            name: command_line
            node_name: command_line
            normalize: []
            original_fieldset: process
            short: Full command line that started the process.
            type: keyword
        entity_id:
          field_details:
            dashed_name: DoubleReuse-process-entity-id
            description: 'Unique identifier for the process.

              The implementation of this is specified by the data source, but some
              examples of what could be used here are a process-generated UUID, Sysmon
              Process GUIDs, or a hash of some uniquely identifying components of
              a process.

              Constructing a globally unique identifier is a common practice to mitigate
              PID reuse as well as to identify a specific process over time, across
              multiple monitored hosts.'
            example: c2c455d9f99375d
            flat_name: DoubleReuse.process.entity_id
            ignore_above: 1024
            level: extended
            name: entity_id
            node_name: entity_id
            normalize: []
            original_fieldset: process
            short: Unique identifier for the process.
            type: keyword
        executable:
          field_details:
            dashed_name: DoubleReuse-process-executable
            description: Absolute path to the process executable.
            example: /usr/bin/ssh
            flat_name: DoubleReuse.process.executable
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.executable.text
              name: text
              norms: false
              type: text
            name: executable
            node_name: executable
            normalize: []
            original_fieldset: process
            short: Absolute path to the process executable.
            type: keyword
        exit_code:
          field_details:
            dashed_name: DoubleReuse-process-exit-code
            description: 'The exit code of the process, if this is a termination event.

              The field should be absent if there is no exit code for the event (e.g.
              process start).'
            example: 137
            flat_name: DoubleReuse.process.exit_code
            level: extended
            name: exit_code
            node_name: exit_code
            normalize: []
            original_fieldset: process
            short: The exit code of the process.
            type: long
        hash:
          field_details:
            dashed_name: DoubleReuse-process-hash
            description: 'The hash fields represent different hash algorithms and
              their values.

              Field names for common hashes (e.g. MD5, SHA1) are predefined. Add fields
              for other hashes by lowercasing the hash algorithm name and using underscore
              separators as appropriate (snake case, e.g. sha3_512).'
            flat_name: DoubleReuse.process.hash
            intermediate: true
            name: hash
            node_name: hash
            original_fieldset: hash
            short: Hashes, usually file hashes.
            type: group
          fields:
            md5:
              field_details:
                dashed_name: DoubleReuse-process-hash-md5
                description: MD5 hash.
                flat_name: DoubleReuse.process.hash.md5
                ignore_above: 1024
                level: extended
                name: md5
                node_name: md5
                normalize: []
                original_fieldset: hash
                short: MD5 hash.
                type: keyword
            sha1:
              field_details:
                dashed_name: DoubleReuse-process-hash-sha1
                description: SHA1 hash.
                flat_name: DoubleReuse.process.hash.sha1
                ignore_above: 1024
                level: extended
                name: sha1
                node_name: sha1
                normalize: []
                original_fieldset: hash
                short: SHA1 hash.
                type: keyword
            sha256:
              field_details:
                dashed_name: DoubleReuse-process-hash-sha256
                description: SHA256 hash.
                flat_name: DoubleReuse.process.hash.sha256
                ignore_above: 1024
                level: extended
                name: sha256
                node_name: sha256
                normalize: []
                original_fieldset: hash
                short: SHA256 hash.
                type: keyword
            sha512:
              field_details:
                dashed_name: DoubleReuse-process-hash-sha512
                description: SHA512 hash.
                flat_name: DoubleReuse.process.hash.sha512
                ignore_above: 1024
                level: extended
                name: sha512
                node_name: sha512
                normalize: []
                original_fieldset: hash
                short: SHA512 hash.
                type: keyword
        name:
          field_details:
            dashed_name: DoubleReuse-process-name
            description: 'Process name.

              Sometimes called program name or similar.'
            example: ssh
            flat_name: DoubleReuse.process.name
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.name.text
              name: text
              norms: false
              type: text
            name: name
            node_name: name
            normalize: []
            original_fieldset: process
            short: Process name.
            type: keyword
        pe:
          field_details:
            dashed_name: DoubleReuse-process-pe
            description: These fields contain Windows Portable Executable (PE) metadata.
            flat_name: DoubleReuse.process.pe
            intermediate: true
            name: pe
            node_name: pe
            original_fieldset: pe
            short: These fields contain Windows Portable Executable (PE) metadata.
            type: group
          fields:
            architecture:
              field_details:
                dashed_name: DoubleReuse-process-pe-architecture
                description: CPU architecture target for the file.
                example: x64
                flat_name: DoubleReuse.process.pe.architecture
                ignore_above: 1024
                level: extended
                name: architecture
                node_name: architecture
                normalize: []
                original_fieldset: pe
                short: CPU architecture target for the file.
                type: keyword
            company:
              field_details:
                dashed_name: DoubleReuse-process-pe-company
                description: Internal company name of the file, provided at compile-time.
                example: Microsoft Corporation
                flat_name: DoubleReuse.process.pe.company
                ignore_above: 1024
                level: extended
                name: company
                node_name: company
                normalize: []
                original_fieldset: pe
                short: Internal company name of the file, provided at compile-time.
                type: keyword
            description:
              field_details:
                dashed_name: DoubleReuse-process-pe-description
                description: Internal description of the file, provided at compile-time.
                example: Paint
                flat_name: DoubleReuse.process.pe.description
                ignore_above: 1024
                level: extended
                name: description
                node_name: description
                normalize: []
                original_fieldset: pe
                short: Internal description of the file, provided at compile-time.
                type: keyword
            file_version:
              field_details:
                dashed_name: DoubleReuse-process-pe-file-version
                description: Internal version of the file, provided at compile-time.
                example: 6.3.9600.17415
                flat_name: DoubleReuse.process.pe.file_version
                ignore_above: 1024
                level: extended
                name: file_version
                node_name: file_version
                normalize: []
                original_fieldset: pe
                short: Process name.
                type: keyword
            imphash:
              field_details:
                dashed_name: DoubleReuse-process-pe-imphash
                description: 'A hash of the imports in a PE file. An imphash -- or
                  import hash -- can be used to fingerprint binaries even after recompilation
                  or other code-level transformations have occurred, which would change
                  more traditional hash values.

                  Learn more at https://www.fireeye.com/blog/threat-research/2014/01/tracking-malware-import-hashing.html.'
                example: 0c6803c4e922103c4dca5963aad36ddf
                flat_name: DoubleReuse.process.pe.imphash
                ignore_above: 1024
                level: extended
                name: imphash
                node_name: imphash
                normalize: []
                original_fieldset: pe
                short: A hash of the imports in a PE file.
                type: keyword
            original_file_name:
              field_details:
                dashed_name: DoubleReuse-process-pe-original-file-name
                description: Internal name of the file, provided at compile-time.
                example: MSPAINT.EXE
                flat_name: DoubleReuse.process.pe.original_file_name
                ignore_above: 1024
                level: extended
                name: original_file_name
                node_name: original_file_name
                normalize: []
                original_fieldset: pe
                short: Internal name of the file, provided at compile-time.
                type: keyword
            product:
              field_details:
                dashed_name: DoubleReuse-process-pe-product
                description: Internal product name of the file, provided at compile-time.
                example: "Microsoft\xAE Windows\xAE Operating System"
                flat_name: DoubleReuse.process.pe.product
                ignore_above: 1024
                level: extended
                name: product
                node_name: product
                normalize: []
                original_fieldset: pe
                short: Internal product name of the file, provided at compile-time.
                type: keyword
        pgid:
          field_details:
            dashed_name: DoubleReuse-process-pgid
            description: Identifier of the group of processes the process belongs
              to.
            flat_name: DoubleReuse.process.pgid
            format: string
            level: extended
            name: pgid
            node_name: pgid
            normalize: []
            original_fieldset: process
            short: Identifier of the group of processes the process belongs to.
            type: long
        pid:
          field_details:
            dashed_name: DoubleReuse-process-pid
            description: Process id.
            example: 4242
            flat_name: DoubleReuse.process.pid
            format: string
            level: core
            name: pid
            node_name: pid
            normalize: []
            original_fieldset: process
            short: Process id.
            type: long
        ppid:
          field_details:
            dashed_name: DoubleReuse-process-ppid
            description: Parent process' pid.
            example: 4241
            flat_name: DoubleReuse.process.ppid
            format: string
            level: extended
            name: ppid
            node_name: ppid
            normalize: []
            original_fieldset: process
            short: Parent process' pid.
            type: long
        start:
          field_details:
            dashed_name: DoubleReuse-process-start
            description: The time the process started.
            example: '2016-05-23T08:05:34.853Z'
            flat_name: DoubleReuse.process.start
            level: extended
            name: start
            node_name: start
            normalize: []
            original_fieldset: process
            short: The time the process started.
            type: date
        test_base:
          field_details:
            dashed_name: DoubleReuse-process-test-base
            description: Object for all custom defined fields to live in.
            flat_name: DoubleReuse.process.test_base
            ignore_above: 1024
            level: custom
            name: test_base
            node_name: test_base
            normalize: []
            original_fieldset: process
            short: Object for all custom defined fields to live in.
            type: keyword
        thread:
          field_details:
            dashed_name: DoubleReuse-process-thread
            flat_name: DoubleReuse.process.thread
            intermediate: true
            name: thread
            node_name: thread
            original_fieldset: process
            type: object
          fields:
            id:
              field_details:
                dashed_name: DoubleReuse-process-thread-id
                description: Thread ID.
                example: 4242
                flat_name: DoubleReuse.process.thread.id
                format: string
                level: extended
                name: thread.id
                node_name: id
                normalize: []
                original_fieldset: process
                short: Thread ID.
                type: long
            name:
              field_details:
                dashed_name: DoubleReuse-process-thread-name
                description: Thread name.
                example: thread-0
                flat_name: DoubleReuse.process.thread.name
                ignore_above: 1024
                level: extended
                name: thread.name
                node_name: name
                normalize: []
                original_fieldset: process
                short: Thread name.
                type: keyword
        title:
          field_details:
            dashed_name: DoubleReuse-process-title
            description: 'Process title.

              The proctitle, some times the same as process name. Can also be different:
              for example a browser setting its title to the web page currently opened.'
            flat_name: DoubleReuse.process.title
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.title.text
              name: text
              norms: false
              type: text
            name: title
            node_name: title
            normalize: []
            original_fieldset: process
            short: Process title.
            type: keyword
        uptime:
          field_details:
            dashed_name: DoubleReuse-process-uptime
            description: Seconds the process has been up.
            example: 1325
            flat_name: DoubleReuse.process.uptime
            level: extended
            name: uptime
            node_name: uptime
            normalize: []
            original_fieldset: process
            short: Seconds the process has been up.
            type: long
        working_directory:
          field_details:
            dashed_name: DoubleReuse-process-working-directory
            description: The working directory of the process.
            example: /home/alice
            flat_name: DoubleReuse.process.working_directory
            ignore_above: 1024
            level: extended
            multi_fields:
            - flat_name: DoubleReuse.process.working_directory.text
              name: text
              norms: false
              type: text
            name: working_directory
            node_name: working_directory
            normalize: []
            original_fieldset: process
            short: The working directory of the process.
            type: keyword
  schema_details:
    group: 2
    nestings:
    - DoubleReuse.process
    prefix: DoubleReuse.
    reused_here:
    - full: DoubleReuse.process
      schema_name: process
      short: These fields contain information about a process.
    root: false
    title: DoubleReuse

Notice that the name DoubleReuse-process-parent does not exist in the ecs.yml file. The initial field DoubleReuse-process-test-base does though.

The endpoint team leverages the ability to reuse process and the parent fields in custom schema for malware: https://github.com/elastic/endpoint-package/blob/master/custom_schemas/custom_process.yml#L15

This works for ecs version 1.5 because the parent fields were defined manually.

Metadata

Metadata

Assignees

No one assigned

    Labels

    1.6.0bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions