-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support input groups #137
Support input groups #137
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Trends 🧪 |
6d627a1
to
0ca1336
Compare
I think we're getting close! We also would like each policy set to be part of one or more categories, i.e. if the user browses the Database category, they should be able to see DynamoDb tile too, so it should be possible to define categories for each policy set. E.g. policy_templates:
- name: dynamodb
title: AWS DynamoDB
description: Collect logs and metrics from DynamoDB service
categories: [AWS, Cloud, Database] |
I think we can do this. It would require also a slight modification in the package-registry to cover this change. We have a predefined set of categories that are supported: https://github.com/elastic/package-spec/blob/master/versions/1/manifest.spec.yml#L61 (enum). |
@mtojek who's curating this list? |
Here is a thread in which we considered introducing another category: elastic/package-registry#671 (comment) You can see all people taking part. |
I think I've covered the entire proposal with schema, marking it as ready for review. |
Tagging myself to do some exploratory testing and requirements gathering on Kibana side. Thanks for putting the changes in PR, this makes it much easier. |
- name: secret_access_key | ||
type: text | ||
title: Secret Access Key | ||
policy_templates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek is this the array from which Kibana only shows the first element today? Or is that something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe @jen-huang can weigh in on my question since she's already watching this PR. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is what Kibana uses the first element of today
I have the same question. I think we still need this otherwise where else would the ingest pipelines, field definitions, etc. for each data stream be stored? @mtojek if you agree, could you add a couple of data stream folders to this PR please so we can see the fuller picture of the package? Thanks! |
title: Barracuda logs | ||
description: Collect Barracuda logs from syslog or a file. | ||
inputs: | ||
- type: udp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we pull in here the additional vars from the data_stream manifest.yml? How do we know which one to select? What if there are multiple data_streams which configure udp
? I remember I asked the question below but don't remember the answer :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we pull in here the additional vars from the data_stream manifest.yml?
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know which one to select? What if there are multiple data_streams which configure udp?
I don't reckon if we came up with a solution. Do you think we should introduce a "policy_templates" field to data stream manifest or do you see other options? We can also introduce another "data streams" field to policy template.
BTW I understand that such problem already exists, it's just not harmful because all data streams are installed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my understanding is also that such a problem (or is it a feature? 😎) already exists. I'm wary of changing this behavior, unless we can keep the default behavior the same as it is today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about putting this is in a separate PR if this became an issue? We know that it can be solved with an additional field - policy_templates
in data_streams
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid it (specifying it in the policy template ) would violate the open-closed principle, but we can turn a blind eye here. Let me try to introduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a proposal, happy to discuss other solutions :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the data_streams
property. Actually I'm fine with both solutions, I wouldn't say it's a hard to understand concept :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the proposed solution with two comments:
-
Default behavior: not specifying
data_streams
in a policy template needs to be interpreted as if all data streams have been specified. This would guarantee backwards compatibility. I assume such default behavior logic will need to go in the Fleet UI code? -
Referential integrity: We'll need a way to validate that data streams specified under
data_streams
actually exist in the package.
Neither comment above blocks this PR; just things to address in follow up PRs (EDIT: maybe make issues for these once this PR is approved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator do you have any preference where data_streams
proporty should be placed, either in policy_template
or input
? For reference: ed04c0f#commitcomment-47498856
So we will need to decide (most likely outside the scope of this PR) whether the package registry's search API response represents a list of packages (status quo) or a list of packages + policy templates. Also, I'm not sure how to make this part work under the current proposal:
Just to get the obvious question out of the way first: is there a reason AWS Fargate is it's own package and not a data stream under the |
I think we should move towards the final decision about this pull request. Apart from:
do you see any other concerns around this PR? |
I guess it's due to the different configuration options and API endpoints, but @kaiyan-sheng will know more.
I would avoid hardcoding input groups, one of the use cases we had in mind is to distinguish between operational and security logs (for setting different retentions). If we hardcode the input groups now, it will be hard for us to have this flexibility. |
I'm not sure if this PR assumes retention adjustments. It seems that input groups are more like UI tabs. Data retention is related rather to data stream configuration. Do we want to include retention settings in this PR or follow ups when necessary? I would vote for iterative approach, keep in mind that the first candidate is the AWS integration. |
@mtojek fair enough. In my mind input groups serve two purposes: 1) separate config options on tabs; 2) set different retentions at the input group level. The question is then how would the user set different retentions for operational vs. security logs, if it's not the input groups that define such separation? I think if don't hardcode input groups it will be easier for the UI to reuse them for setting retention policies. |
I don't have a strong opinion on that, I suppose it's rather a UI design issue (@ruflin and @jen-huang can speak for Kibana). You brought here "operational logs" and "security logs", which means that it already violates (or extends?) the default split (logs-metrics-checks). If this is the approach we'd like to follow, let's stick to user defined input groups. |
Yep @sorantis is right. In order to collect metrics from |
Thanks @kaiyan-sheng. Thinking about it some more, I think the proposal in this PR is unrelated to the italicized part of the following use case:
The proposal in this PR will cover the non-italicized part. As a follow up we can figure out how to make the Package Registry's search API return the My main concern was that nothing being introduced in this PR would later get in the way of solving for the italicized part of the above use case, and I think we are good on that front. |
We had an online discussion about left points and I think we came to the agreement:
@ycombinator @ruflin @kaiyan-sheng please approve/comment on the PR @jen-huang please double check if the PR is fine from Kibana dev perspective and also approve/comment here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Really nice work iterating on this proposal and driving consensus!
I know this proposal will impact kibana
and package-registry
code at minimum. I think it will also need a follow up PR in package-spec
to enforce referential integrity between data_streams
mentioned in policy template items and folders under <package root>/data_stream/
. I don't think it affects elastic-package
code (but I could be wrong about that). And, of course, eventually we'll want to introduce the enhancements enabled by this proposal to packages in the integrations
repo.
So right before we merge this PR, could we create a checklist in #132 that will capture each of the above tasks in an ordered fashion (topological sort on the dependency graph)? This will ensure that we can safely rollout this change and have a single place to track progress on the rollout. Some examples of such rollout checklists can be found in #106 and #17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this moving closer to the finish line! Thank you all for your involvement.
Agree with @ycombinator on creating a list of changes that need to be covered by related components. From the priority perspective, I'd like the focus be on Phase 1 described in the doc.
@jen-huang could you please double check if these spec changes are fine for Kibana? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me from a Kibana perspective.
I would like to do more thorough validation by testing it with a local Kibana instance but that would need package registry changes so that the new fields are served through the package info endpoint. I saw that the registry changes item calls for checking with AWS package changes, but if it would be faster to get a draft PR up with a test package (even this nice input_groups
package done here 😉), that would be super helpful.
And just to double check: we expect Kibana to be backwards compatible for packages without input groups too, right?
Right, I would say this is an extension to current packages.
Once we merge this PR, I will push support for this change to the elastic/integrations. |
What does this PR do?
This PR introduces policy groups.
Package preview: https://github.com/mtojek/package-spec/tree/132-support-policy-groups/test/packages/input_groups
Don't focus on the
policy_groups
name, it's a working name for the test package, in reality it will beaws
,nginx
, etc.Why is it important?
We need to ensure better granularity for policies, e.g. for the AWS integration.
Checklist
test/packages
that prove my change is effective.versions/N/changelog.yml
.Related issues