Support input groups #137

mtojek · 2021-02-19T13:29:05Z

What does this PR do?

This PR introduces policy groups.

Package preview: https://github.com/mtojek/package-spec/tree/132-support-policy-groups/test/packages/input_groups
Don't focus on the policy_groups name, it's a working name for the test package, in reality it will be aws, nginx, etc.

Why is it important?

We need to ensure better granularity for policies, e.g. for the AWS integration.

Checklist

I have added test packages to test/packages that prove my change is effective.
I have added an entry in versions/N/changelog.yml.

Related issues

[Discuss] Proposal: support integration sets #132

elasticmachine · 2021-02-19T13:31:42Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #137 updated
Start Time: 2021-03-01T14:18:00.501+0000
Duration: 2 min 18 sec
Commit: 1d17ee3

Trends 🧪

test/packages/policy_groups/manifest.yml

sorantis · 2021-02-19T14:24:52Z

I think we're getting close! We also would like each policy set to be part of one or more categories, i.e. if the user browses the Database category, they should be able to see DynamoDb tile too, so it should be possible to define categories for each policy set. E.g.

policy_templates:
  - name: dynamodb
    title: AWS DynamoDB
    description: Collect logs and metrics from DynamoDB service
    categories: [AWS, Cloud, Database]

mtojek · 2021-02-19T14:34:03Z

I think we're getting close! We also would like each policy set to be part of one or more categories, i.e. if the user browses the Database category, they should be able to see DynamoDb tile too, so it should be possible to define categories for each policy set. E.g.

I think we can do this. It would require also a slight modification in the package-registry to cover this change. We have a predefined set of categories that are supported: https://github.com/elastic/package-spec/blob/master/versions/1/manifest.spec.yml#L61 (enum).

sorantis · 2021-02-19T14:35:34Z

@mtojek who's curating this list?

mtojek · 2021-02-19T14:41:38Z

Here is a thread in which we considered introducing another category: elastic/package-registry#671 (comment)

You can see all people taking part.

mtojek · 2021-02-19T15:33:55Z

I think I've covered the entire proposal with schema, marking it as ready for review.

jen-huang · 2021-02-19T16:48:11Z

Tagging myself to do some exploratory testing and requirements gathering on Kibana side. Thanks for putting the changes in PR, this makes it much easier.

test/packages/policy_groups/manifest.yml

kaiyan-sheng · 2021-02-19T19:47:30Z

@mtojek Will we still have a data_stream directory at the top level with docs, img and manifest.yml?

ycombinator · 2021-02-19T20:51:28Z

test/packages/policy_groups/manifest.yml

+  - name: secret_access_key
+    type: text
+    title: Secret Access Key
+policy_templates:


@mtojek is this the array from which Kibana only shows the first element today? Or is that something different?

Or maybe @jen-huang can weigh in on my question since she's already watching this PR. 🙂

yes, this is what Kibana uses the first element of today

ycombinator · 2021-02-19T20:57:54Z

Will we still have a data_stream directory at the top level with docs, img and manifest.yml?

I have the same question. I think we still need this otherwise where else would the ingest pipelines, field definitions, etc. for each data stream be stored? @mtojek if you agree, could you add a couple of data stream folders to this PR please so we can see the fuller picture of the package? Thanks!

ruflin · 2021-02-22T12:38:21Z

test/packages/input_groups/manifest.yml

+    title: Barracuda logs
+    description: Collect Barracuda logs from syslog or a file.
+    inputs:
+      - type: udp


Do we pull in here the additional vars from the data_stream manifest.yml? How do we know which one to select? What if there are multiple data_streams which configure udp? I remember I asked the question below but don't remember the answer :-(

Do we pull in here the additional vars from the data_stream manifest.yml?

I think so.

How do we know which one to select? What if there are multiple data_streams which configure udp?

I don't reckon if we came up with a solution. Do you think we should introduce a "policy_templates" field to data stream manifest or do you see other options? We can also introduce another "data streams" field to policy template.

BTW I understand that such problem already exists, it's just not harmful because all data streams are installed, right?

Yes, my understanding is also that such a problem (or is it a feature? 😎) already exists. I'm wary of changing this behavior, unless we can keep the default behavior the same as it is today.

What do you think about putting this is in a separate PR if this became an issue? We know that it can be solved with an additional field - policy_templates in data_streams.

I'm afraid it (specifying it in the policy template ) would violate the open-closed principle, but we can turn a blind eye here. Let me try to introduce it.

Just a proposal, happy to discuss other solutions :-)

I added the data_streams property. Actually I'm fine with both solutions, I wouldn't say it's a hard to understand concept :)

I'm good with the proposed solution with two comments:

Default behavior: not specifying data_streams in a policy template needs to be interpreted as if all data streams have been specified. This would guarantee backwards compatibility. I assume such default behavior logic will need to go in the Fleet UI code?

Referential integrity: We'll need a way to validate that data streams specified under data_streams actually exist in the package.

Neither comment above blocks this PR; just things to address in follow up PRs (EDIT: maybe make issues for these once this PR is approved).

@ycombinator do you have any preference where data_streams proporty should be placed, either in policy_template or input? For reference: ed04c0f#commitcomment-47498856

ycombinator · 2021-02-22T19:45:24Z

I think each entry in policy_templates should be a tile. I.e. if the user searches for "aws", the search results should include all supported AWS services, even the ones from a different package, like AWS Fargate.

So we will need to decide (most likely outside the scope of this PR) whether the package registry's search API response represents a list of packages (status quo) or a list of packages + policy templates.

Also, I'm not sure how to make this part work under the current proposal:

... even the ones from a different package, like AWS Fargate.

Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the aws package?

mtojek · 2021-02-23T09:19:05Z

I think we should move towards the final decision about this pull request. Apart from:

category vs subcategories for policy templates
hardcoded input groups

do you see any other concerns around this PR?

sorantis · 2021-02-23T10:18:31Z

Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the aws package?

I guess it's due to the different configuration options and API endpoints, but @kaiyan-sheng will know more.

I think we should move towards the final decision about this pull request. Apart from:

category vs subcategories for policy templates

hardcoded input groups

do you see any other concerns around this PR?

I would avoid hardcoding input groups, one of the use cases we had in mind is to distinguish between operational and security logs (for setting different retentions). If we hardcode the input groups now, it will be hard for us to have this flexibility.

mtojek · 2021-02-23T10:25:36Z

I would avoid hardcoding input groups, one of the use cases we had in mind is to distinguish between operational and security logs (for setting different retentions). If we hardcode the input groups now, it will be hard for us to have this flexibility.

I'm not sure if this PR assumes retention adjustments. It seems that input groups are more like UI tabs. Data retention is related rather to data stream configuration. Do we want to include retention settings in this PR or follow ups when necessary? I would vote for iterative approach, keep in mind that the first candidate is the AWS integration.

sorantis · 2021-02-23T10:32:00Z

@mtojek fair enough. In my mind input groups serve two purposes: 1) separate config options on tabs; 2) set different retentions at the input group level. The question is then how would the user set different retentions for operational vs. security logs, if it's not the input groups that define such separation? I think if don't hardcode input groups it will be easier for the UI to reuse them for setting retention policies.

mtojek · 2021-02-23T10:55:31Z

I think if don't hardcode input groups it will be easier for the UI to reuse them for setting retention policies.

I don't have a strong opinion on that, I suppose it's rather a UI design issue (@ruflin and @jen-huang can speak for Kibana). You brought here "operational logs" and "security logs", which means that it already violates (or extends?) the default split (logs-metrics-checks). If this is the approach we'd like to follow, let's stick to user defined input groups.

kaiyan-sheng · 2021-02-23T23:58:04Z

Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the aws package?

Yep @sorantis is right. In order to collect metrics from awsfargate, you have to deploy metricbeat/agent inside the same task as the ECS containers you want to monitor. It does not require AWS credentials (access key, secret key).

ycombinator · 2021-02-24T00:17:21Z

Yep @sorantis is right. In order to collect metrics from awsfargate, you have to deploy metricbeat/agent inside the same task as the ECS containers you want to monitor. It does not require AWS credentials (access key, secret key).

Thanks @kaiyan-sheng.

Thinking about it some more, I think the proposal in this PR is unrelated to the italicized part of the following use case:

I think each entry in policy_templates should be a tile. I.e. if the user searches for "aws", the search results should include all supported AWS services, even the ones from a different package, like AWS Fargate.

The proposal in this PR will cover the non-italicized part.

As a follow up we can figure out how to make the Package Registry's search API return the aws package, each of its policy_templates, and other related packages like AWS Fargate when the user searches for the aws package. The solution might involve relaxing the matching strictness in the Package Registry's search API code OR introducing additional search keywords at the package / policy_templates level OR something else. Either way, I'm good with deferring this discussion to a follow up issue/PR.

My main concern was that nothing being introduced in this PR would later get in the way of solving for the italicized part of the above use case, and I think we are good on that front.

mtojek · 2021-02-25T16:13:32Z

We had an online discussion about left points and I think we came to the agreement:

input groups stay as is
a policy template defines additional property data_streams to select particular data streams that are compatible with the policy template
top categories are common, categories on the policy template level extend them

@ycombinator @ruflin @kaiyan-sheng please approve/comment on the PR

@jen-huang please double check if the PR is fine from Kibana dev perspective and also approve/comment here

ycombinator

LGTM. Really nice work iterating on this proposal and driving consensus!

I know this proposal will impact kibana and package-registry code at minimum. I think it will also need a follow up PR in package-spec to enforce referential integrity between data_streams mentioned in policy template items and folders under <package root>/data_stream/. I don't think it affects elastic-package code (but I could be wrong about that). And, of course, eventually we'll want to introduce the enhancements enabled by this proposal to packages in the integrations repo.

So right before we merge this PR, could we create a checklist in #132 that will capture each of the above tasks in an ordered fashion (topological sort on the dependency graph)? This will ensure that we can safely rollout this change and have a single place to track progress on the rollout. Some examples of such rollout checklists can be found in #106 and #17.

sorantis

Great to see this moving closer to the finish line! Thank you all for your involvement.

Agree with @ycombinator on creating a list of changes that need to be covered by related components. From the priority perspective, I'd like the focus be on Phase 1 described in the doc.

mtojek · 2021-03-01T15:25:48Z

@jen-huang could you please double check if these spec changes are fine for Kibana?

jen-huang

The changes look good to me from a Kibana perspective.

I would like to do more thorough validation by testing it with a local Kibana instance but that would need package registry changes so that the new fields are served through the package info endpoint. I saw that the registry changes item calls for checking with AWS package changes, but if it would be faster to get a draft PR up with a test package (even this nice input_groups package done here 😉), that would be super helpful.

And just to double check: we expect Kibana to be backwards compatible for packages without input groups too, right?

mtojek · 2021-03-02T09:27:46Z

And just to double check: we expect Kibana to be backwards compatible for packages without input groups too, right?

Right, I would say this is an extension to current packages.

I saw that the registry changes item calls for checking with AWS package changes, but if it would be faster to get a draft PR up with a test package (even this nice

Once we merge this PR, I will push support for this change to the elastic/integrations.

Support policy groups

07e5d63

mtojek self-assigned this Feb 19, 2021

mtojek requested review from ycombinator, ruflin, sorantis and kaiyan-sheng February 19, 2021 13:31

mtojek mentioned this pull request Feb 19, 2021

[Discuss] Proposal: support integration sets #132

Closed

Temporary fix

3231fbd

sorantis reviewed Feb 19, 2021

View reviewed changes

test/packages/policy_groups/manifest.yml Outdated Show resolved Hide resolved

Just defines logs and metrics

0ca1336

mtojek force-pushed the 132-support-policy-groups branch from 6d627a1 to 0ca1336 Compare February 19, 2021 14:12

Define categories of policy templates

a51d6c1

mtojek added 3 commits February 19, 2021 16:23

Define spec for policy groups feature

8c774d2

Add changelog entry

c15189c

Define other docs template

5eec7b0

mtojek marked this pull request as ready for review February 19, 2021 15:32

Merge branch 'master' into 132-support-policy-groups

a77e479

jen-huang self-requested a review February 19, 2021 16:46

sorantis reviewed Feb 19, 2021

View reviewed changes

test/packages/policy_groups/manifest.yml Outdated Show resolved Hide resolved

kaiyan-sheng reviewed Feb 19, 2021

View reviewed changes

test/packages/policy_groups/manifest.yml Outdated Show resolved Hide resolved

ycombinator reviewed Feb 19, 2021

View reviewed changes

Use input groups

accac5a

ruflin reviewed Feb 22, 2021

View reviewed changes

Add var property to barracuda

e9e7337

Merge branch 'master' into 132-support-policy-groups

b1aff4e

Add data_streams property

ed04c0f

mtojek requested review from ruflin, kaiyan-sheng and sorantis February 25, 2021 16:10

ycombinator approved these changes Feb 26, 2021

View reviewed changes

Merge branch 'master' into 132-support-policy-groups

1d17ee3

sorantis approved these changes Mar 1, 2021

View reviewed changes

jen-huang approved these changes Mar 2, 2021

View reviewed changes

mtojek merged commit 28bc03f into elastic:master Mar 2, 2021

mtojek changed the title ~~Support policy groups~~ Support input groups Mar 2, 2021

mtojek mentioned this pull request Mar 2, 2021

Meta: enable input groups #144

Closed

8 tasks

ycombinator mentioned this pull request Mar 2, 2021

[Change Proposal] support specifiying alternate data_stream.dataset #145

Open

rw-access pushed a commit to rw-access/package-spec that referenced this pull request Mar 23, 2021

Move context close to headings (elastic#137)

b5ff5d3

jen-huang mentioned this pull request Mar 30, 2021

[discuss] Limited packages & multiple policy templates #155

Open

mtojek mentioned this pull request Feb 15, 2022

/package_policies: package vars should be on input level elastic/kibana#125625

Closed

Support input groups #137

Support input groups #137

Conversation

mtojek commented Feb 19, 2021 • edited Loading

What does this PR do?

Why is it important?

Checklist

Related issues

elasticmachine commented Feb 19, 2021 • edited Loading

💚 Build Succeeded

Build stats

Trends 🧪

sorantis commented Feb 19, 2021

mtojek commented Feb 19, 2021

sorantis commented Feb 19, 2021

mtojek commented Feb 19, 2021

mtojek commented Feb 19, 2021

jen-huang commented Feb 19, 2021

kaiyan-sheng commented Feb 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Feb 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

ycombinator Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Feb 22, 2021

mtojek commented Feb 23, 2021

sorantis commented Feb 23, 2021

mtojek commented Feb 23, 2021

sorantis commented Feb 23, 2021

mtojek commented Feb 23, 2021

kaiyan-sheng commented Feb 23, 2021

ycombinator commented Feb 24, 2021

mtojek commented Feb 25, 2021 • edited Loading

ycombinator left a comment • edited Loading

Choose a reason for hiding this comment

sorantis left a comment

Choose a reason for hiding this comment

mtojek commented Mar 1, 2021

jen-huang left a comment

Choose a reason for hiding this comment

mtojek commented Mar 2, 2021 • edited Loading

mtojek commented Feb 19, 2021 •

edited

Loading

elasticmachine commented Feb 19, 2021 •

edited

Loading

mtojek Feb 22, 2021 •

edited

Loading

mtojek Feb 24, 2021 •

edited

Loading

ycombinator Feb 24, 2021 •

edited

Loading

mtojek commented Feb 25, 2021 •

edited

Loading

ycombinator left a comment •

edited

Loading

mtojek commented Mar 2, 2021 •

edited

Loading