Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support input groups #137

Merged
merged 17 commits into from
Mar 2, 2021
Merged

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Feb 19, 2021

What does this PR do?

This PR introduces policy groups.

Package preview: https://github.com/mtojek/package-spec/tree/132-support-policy-groups/test/packages/input_groups
Don't focus on the policy_groups name, it's a working name for the test package, in reality it will be aws, nginx, etc.

Why is it important?

We need to ensure better granularity for policies, e.g. for the AWS integration.

Checklist

Related issues

@mtojek mtojek self-assigned this Feb 19, 2021
@elasticmachine
Copy link

elasticmachine commented Feb 19, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #137 updated

  • Start Time: 2021-03-01T14:18:00.501+0000

  • Duration: 2 min 18 sec

  • Commit: 1d17ee3

Trends 🧪

Image of Build Times

@sorantis
Copy link

I think we're getting close! We also would like each policy set to be part of one or more categories, i.e. if the user browses the Database category, they should be able to see DynamoDb tile too, so it should be possible to define categories for each policy set. E.g.

policy_templates:
  - name: dynamodb
    title: AWS DynamoDB
    description: Collect logs and metrics from DynamoDB service
    categories: [AWS, Cloud, Database]

@mtojek
Copy link
Contributor Author

mtojek commented Feb 19, 2021

I think we're getting close! We also would like each policy set to be part of one or more categories, i.e. if the user browses the Database category, they should be able to see DynamoDb tile too, so it should be possible to define categories for each policy set. E.g.

I think we can do this. It would require also a slight modification in the package-registry to cover this change. We have a predefined set of categories that are supported: https://github.com/elastic/package-spec/blob/master/versions/1/manifest.spec.yml#L61 (enum).

@sorantis
Copy link

@mtojek who's curating this list?

@mtojek
Copy link
Contributor Author

mtojek commented Feb 19, 2021

Here is a thread in which we considered introducing another category: elastic/package-registry#671 (comment)

You can see all people taking part.

@mtojek mtojek marked this pull request as ready for review February 19, 2021 15:32
@mtojek
Copy link
Contributor Author

mtojek commented Feb 19, 2021

I think I've covered the entire proposal with schema, marking it as ready for review.

@jen-huang jen-huang self-requested a review February 19, 2021 16:46
@jen-huang
Copy link
Contributor

Tagging myself to do some exploratory testing and requirements gathering on Kibana side. Thanks for putting the changes in PR, this makes it much easier.

@kaiyan-sheng
Copy link
Contributor

@mtojek Will we still have a data_stream directory at the top level with docs, img and manifest.yml?

- name: secret_access_key
type: text
title: Secret Access Key
policy_templates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek is this the array from which Kibana only shows the first element today? Or is that something different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe @jen-huang can weigh in on my question since she's already watching this PR. 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is what Kibana uses the first element of today

@ycombinator
Copy link
Contributor

Will we still have a data_stream directory at the top level with docs, img and manifest.yml?

I have the same question. I think we still need this otherwise where else would the ingest pipelines, field definitions, etc. for each data stream be stored? @mtojek if you agree, could you add a couple of data stream folders to this PR please so we can see the fuller picture of the package? Thanks!

title: Barracuda logs
description: Collect Barracuda logs from syslog or a file.
inputs:
- type: udp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we pull in here the additional vars from the data_stream manifest.yml? How do we know which one to select? What if there are multiple data_streams which configure udp? I remember I asked the question below but don't remember the answer :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we pull in here the additional vars from the data_stream manifest.yml?

I think so.

Copy link
Contributor Author

@mtojek mtojek Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know which one to select? What if there are multiple data_streams which configure udp?

I don't reckon if we came up with a solution. Do you think we should introduce a "policy_templates" field to data stream manifest or do you see other options? We can also introduce another "data streams" field to policy template.

BTW I understand that such problem already exists, it's just not harmful because all data streams are installed, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my understanding is also that such a problem (or is it a feature? 😎) already exists. I'm wary of changing this behavior, unless we can keep the default behavior the same as it is today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about putting this is in a separate PR if this became an issue? We know that it can be solved with an additional field - policy_templates in data_streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it (specifying it in the policy template ) would violate the open-closed principle, but we can turn a blind eye here. Let me try to introduce it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a proposal, happy to discuss other solutions :-)

Copy link
Contributor Author

@mtojek mtojek Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the data_streams property. Actually I'm fine with both solutions, I wouldn't say it's a hard to understand concept :)

Copy link
Contributor

@ycombinator ycombinator Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the proposed solution with two comments:

  1. Default behavior: not specifying data_streams in a policy template needs to be interpreted as if all data streams have been specified. This would guarantee backwards compatibility. I assume such default behavior logic will need to go in the Fleet UI code?

  2. Referential integrity: We'll need a way to validate that data streams specified under data_streams actually exist in the package.

Neither comment above blocks this PR; just things to address in follow up PRs (EDIT: maybe make issues for these once this PR is approved).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator do you have any preference where data_streams proporty should be placed, either in policy_template or input? For reference: ed04c0f#commitcomment-47498856

@ycombinator
Copy link
Contributor

I think each entry in policy_templates should be a tile. I.e. if the user searches for "aws", the search results should include all supported AWS services, even the ones from a different package, like AWS Fargate.

So we will need to decide (most likely outside the scope of this PR) whether the package registry's search API response represents a list of packages (status quo) or a list of packages + policy templates.

Also, I'm not sure how to make this part work under the current proposal:

... even the ones from a different package, like AWS Fargate.

Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the aws package?

@mtojek
Copy link
Contributor Author

mtojek commented Feb 23, 2021

I think we should move towards the final decision about this pull request. Apart from:

  • category vs subcategories for policy templates
  • hardcoded input groups

do you see any other concerns around this PR?

@sorantis
Copy link

Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the aws package?

I guess it's due to the different configuration options and API endpoints, but @kaiyan-sheng will know more.

I think we should move towards the final decision about this pull request. Apart from:

  • category vs subcategories for policy templates
  • hardcoded input groups

do you see any other concerns around this PR?

I would avoid hardcoding input groups, one of the use cases we had in mind is to distinguish between operational and security logs (for setting different retentions). If we hardcode the input groups now, it will be hard for us to have this flexibility.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 23, 2021

I would avoid hardcoding input groups, one of the use cases we had in mind is to distinguish between operational and security logs (for setting different retentions). If we hardcode the input groups now, it will be hard for us to have this flexibility.

I'm not sure if this PR assumes retention adjustments. It seems that input groups are more like UI tabs. Data retention is related rather to data stream configuration. Do we want to include retention settings in this PR or follow ups when necessary? I would vote for iterative approach, keep in mind that the first candidate is the AWS integration.

@sorantis
Copy link

@mtojek fair enough. In my mind input groups serve two purposes: 1) separate config options on tabs; 2) set different retentions at the input group level. The question is then how would the user set different retentions for operational vs. security logs, if it's not the input groups that define such separation? I think if don't hardcode input groups it will be easier for the UI to reuse them for setting retention policies.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 23, 2021

I think if don't hardcode input groups it will be easier for the UI to reuse them for setting retention policies.

I don't have a strong opinion on that, I suppose it's rather a UI design issue (@ruflin and @jen-huang can speak for Kibana). You brought here "operational logs" and "security logs", which means that it already violates (or extends?) the default split (logs-metrics-checks). If this is the approach we'd like to follow, let's stick to user defined input groups.

@kaiyan-sheng
Copy link
Contributor

Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the aws package?

Yep @sorantis is right. In order to collect metrics from awsfargate, you have to deploy metricbeat/agent inside the same task as the ECS containers you want to monitor. It does not require AWS credentials (access key, secret key).

@ycombinator
Copy link
Contributor

Yep @sorantis is right. In order to collect metrics from awsfargate, you have to deploy metricbeat/agent inside the same task as the ECS containers you want to monitor. It does not require AWS credentials (access key, secret key).

Thanks @kaiyan-sheng.

Thinking about it some more, I think the proposal in this PR is unrelated to the italicized part of the following use case:

I think each entry in policy_templates should be a tile. I.e. if the user searches for "aws", the search results should include all supported AWS services, even the ones from a different package, like AWS Fargate.

The proposal in this PR will cover the non-italicized part.

As a follow up we can figure out how to make the Package Registry's search API return the aws package, each of its policy_templates, and other related packages like AWS Fargate when the user searches for the aws package. The solution might involve relaxing the matching strictness in the Package Registry's search API code OR introducing additional search keywords at the package / policy_templates level OR something else. Either way, I'm good with deferring this discussion to a follow up issue/PR.

My main concern was that nothing being introduced in this PR would later get in the way of solving for the italicized part of the above use case, and I think we are good on that front.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 25, 2021

We had an online discussion about left points and I think we came to the agreement:

  • input groups stay as is
  • a policy template defines additional property data_streams to select particular data streams that are compatible with the policy template
  • top categories are common, categories on the policy template level extend them

@ycombinator @ruflin @kaiyan-sheng please approve/comment on the PR

@jen-huang please double check if the PR is fine from Kibana dev perspective and also approve/comment here

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Really nice work iterating on this proposal and driving consensus!

I know this proposal will impact kibana and package-registry code at minimum. I think it will also need a follow up PR in package-spec to enforce referential integrity between data_streams mentioned in policy template items and folders under <package root>/data_stream/. I don't think it affects elastic-package code (but I could be wrong about that). And, of course, eventually we'll want to introduce the enhancements enabled by this proposal to packages in the integrations repo.

So right before we merge this PR, could we create a checklist in #132 that will capture each of the above tasks in an ordered fashion (topological sort on the dependency graph)? This will ensure that we can safely rollout this change and have a single place to track progress on the rollout. Some examples of such rollout checklists can be found in #106 and #17.

Copy link

@sorantis sorantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this moving closer to the finish line! Thank you all for your involvement.

Agree with @ycombinator on creating a list of changes that need to be covered by related components. From the priority perspective, I'd like the focus be on Phase 1 described in the doc.

@mtojek
Copy link
Contributor Author

mtojek commented Mar 1, 2021

@jen-huang could you please double check if these spec changes are fine for Kibana?

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me from a Kibana perspective.

I would like to do more thorough validation by testing it with a local Kibana instance but that would need package registry changes so that the new fields are served through the package info endpoint. I saw that the registry changes item calls for checking with AWS package changes, but if it would be faster to get a draft PR up with a test package (even this nice input_groups package done here 😉), that would be super helpful.

And just to double check: we expect Kibana to be backwards compatible for packages without input groups too, right?

@mtojek
Copy link
Contributor Author

mtojek commented Mar 2, 2021

And just to double check: we expect Kibana to be backwards compatible for packages without input groups too, right?

Right, I would say this is an extension to current packages.

I saw that the registry changes item calls for checking with AWS package changes, but if it would be faster to get a draft PR up with a test package (even this nice

Once we merge this PR, I will push support for this change to the elastic/integrations.

@mtojek mtojek merged commit 28bc03f into elastic:master Mar 2, 2021
@mtojek mtojek changed the title Support policy groups Support input groups Mar 2, 2021
@mtojek mtojek mentioned this pull request Mar 2, 2021
8 tasks
rw-access pushed a commit to rw-access/package-spec that referenced this pull request Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants