Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support validation ECS style dataset format #520

Merged
merged 3 commits into from
Jun 18, 2020

Conversation

jonathan-buttner
Copy link
Contributor

Once the dataset fields are included in ecs the yaml file will look like this:

---
- name: dataset
  title: dataset
  group: 2
  short: Fields describing the new indexing strategy
  description: >
    Fields describing the new indexing strategy <type>-<name>-<namespace>
  type: group
  fields:
    - name: type
      level: custom
      type: constant_keyword
      description: >
        Dataset type.

    - name: name
      level: custom
      type: constant_keyword
      description: >
        Dataset name.

    - name: namespace
      level: custom
      type: constant_keyword
      description: >
        Dataset namespace.

This results in the following fields.yml:

- name: dataset
  title: dataset
  group: 2
  description: Fields describing the new indexing strategy <type>-<name>-<namespace>
  type: group
  fields:
  - name: name
    level: custom
    type: constant_keyword
    description: Dataset name.
    default_field: false
  - name: namespace
    level: custom
    type: constant_keyword
    description: Dataset namespace.
    default_field: false
  - name: type
    level: custom
    type: constant_keyword
    description: Dataset type.
    default_field: false
- name: destination
  title: Destination
  group: 2
  description: 'Destination fields describe details about the destination of a packet/event.

    Destination fields are usually populated in conjunction with source fields.'
  type: group
  fields:
  - name: address
<more fields>

This will fail validation because we're searching for dataset.*.

This PR adds support for splitting the dataset.* string and looking inside the map. It finds name: dataset first and then pulls out the array fields and looks for the last part of the string (e.g. namespace or type).

This implementation is suupeeeer slow. So if you have better ideas that'd be great.

Basically I'd like to define the dataset.* fields in ecs style schema instead of using:

- name: dataset.type
  type: constant_keyword
  description: >
    Dataset type.
- name: dataset.name
  type: constant_keyword
  description: >
    Dataset name.
- name: dataset.namespace
  type: constant_keyword
  description: >
    Dataset namespace.

It doesn't look like this line: https://github.com/elastic/package-registry/blob/master/util/dataset.go#L282

Is flattening it like the way I hoped. It still results in:

{
  name: dataset
  fields: [
    {name: name, ...}, {name: type, ...}, {name: namespace, ...}
  ]
}

@jonathan-buttner jonathan-buttner changed the title Searching for dataset fields in ecs style definition Support validation ECS style dataset format Jun 18, 2020
@elasticmachine
Copy link

elasticmachine commented Jun 18, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #520 updated]

  • Start Time: 2020-06-18T14:18:36.777+0000

  • Duration: 9 min 25 sec

Test stats 🧪

Test Results
Failed 0
Passed 105
Skipped 0
Total 105

@mtojek
Copy link
Contributor

mtojek commented Jun 18, 2020

Thank you for your contribution. I think I managed to simplify your code. Indeed, the issue is around flattening data.

Please take a look and share your thoughts on this: #521

@mtojek
Copy link
Contributor

mtojek commented Jun 18, 2020

Resolving in favor of #521

@mtojek mtojek closed this Jun 18, 2020
@mtojek mtojek reopened this Jun 18, 2020
@mtojek
Copy link
Contributor

mtojek commented Jun 18, 2020

The PR #521 doesn't properly support nested arrays while flattenning, hence reverting: #527

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out a solution that's really shorter than this. Feel free to merge this change if the CI goes green.

It would be good to add an entry to the CHANGELOG ;)

@ruflin
Copy link
Member

ruflin commented Jun 18, 2020

To test your changes, I assume you could update one of the testdata packages with the new format and if it still works, all should be good: https://github.com/elastic/package-registry/blob/master/testdata/package/default_pipeline/0.0.2/dataset/foo/fields/base-fields.yml

@jonathan-buttner jonathan-buttner merged commit 8a32f33 into elastic:master Jun 18, 2020
@jonathan-buttner
Copy link
Contributor Author

To test your changes, I assume you could update one of the testdata packages with the new format and if it still works, all should be good: https://github.com/elastic/package-registry/blob/master/testdata/package/default_pipeline/0.0.2/dataset/foo/fields/base-fields.yml

Oh sorry just saw this after I merged haha. I'll open up another PR to update the tests.

@jonathan-buttner jonathan-buttner deleted the ecs-style-dataset branch June 18, 2020 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants