Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Netflow integration 0.0.2 #338

Merged
merged 12 commits into from
May 19, 2020
Merged

Add Netflow integration 0.0.2 #338

merged 12 commits into from
May 19, 2020

Conversation

alakahakai
Copy link

@alakahakai alakahakai commented Apr 12, 2020

This PR adds the Netflow integration as a replacement of the Filebeat netflow module.

Here are some screenshots of the ingest manager UI.

image
image
image

@alakahakai alakahakai requested a review from ruflin April 12, 2020 02:42
@andrewkroh andrewkroh self-requested a review April 13, 2020 15:08
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start. To make this easier to review and faster to iterate I suggest the following: Open 2 separate PR's for cisco and netflow. And for each, lets get a very basic version in first, meaning for example only have a single dataset inside that only supports one input and then start iterating on it? This should allow us to move quicker.

@mtojek
Copy link
Contributor

mtojek commented Apr 14, 2020

Did you use the import-beats script?

@alakahakai
Copy link
Author

Did you use the import-beats script?

Yes, but I had to make changes according to the coredns package, which @ruflin asked me to follow. I will give it a try end-to-end once the fleet function is working in ingest manager. It was crashing Kibana when I tried it last time.

@mtojek
Copy link
Contributor

mtojek commented Apr 20, 2020

Did you use the import-beats script?

Yes, but I had to make changes according to the coredns package, which @ruflin asked me to follow. I will give it a try end-to-end once the fleet function is working in ingest manager. It was crashing Kibana when I tried it last time.

When you notice that it crashed or generated dashboards in a wrong way, please open an issue and bring samples. It will help improve the generating process.

@@ -0,0 +1,14 @@
{{#if input == "syslog"}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the templating language used in these files?

@ruflin I think it would be nice to use the appropriate file extension for the template type (e.g. .erb, .j2, .mustache) to make it clear to editors (both the people and the tools).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is handlebars (reminds me off https://github.com/elastic/package-registry/issues/179). And the above is actually not valid handlebars :-( Having support for 2 inputs is coming here: #346

I agree we should probably adjust the file extension to make it better for the tooling. If we do, I wonder if we should the same for ingest pipeline files which reference an other pipeline. The challenge I see here is that 99% of the pipelines don't use it but a very few do. Does this still mean all should have to have a handlebars suffix or do we just check for both files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier if files that use templating had an extension that indicates it. Then when writing the code that uses the modules you know when you need to preprocess the file, and then you don't need to guess or unnecessarily preprocess every file like we do today in Beats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if supporting both types and not process would complicate the code? What is your concern about processing files that don't need processing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that you need to make sure you haven't used content that requires escaping in the templating language. So if you don't need those features you just call the file config.yml instead of config.yml.handlebars. Hopefully that isn't too much of issue in most files. Maybe we just process all files through the template engine for now, but put the template extension on them now.

I'd just be happy if my editor could easily recognize that these are template type X and recommend the appropriate plugin. 😄

Screen Shot 2020-04-21 at 9 43 50 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go with config.yml.hbs or config.hbs? I skiped handlebars as it is very long and .hbs seems to work at least in my IDE too. I kind of like the two suffix so the end goal format is also in there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.yml.hbs SGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin if this is an update to the format of package, please open an issue in the repository for adjusting all packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced the .yml.hbs naming in this PR: #380

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek thanks for taking care of it. I assume this make the issue obsolete.

@alakahakai alakahakai changed the title Add example packages: netflow-1.0.0, cisco-1.0.0 Add example packages: netflow-1.0.0 Apr 20, 2020
@alakahakai
Copy link
Author

Made this for netflow only. Will open another one for cisco.

@ruflin
Copy link
Member

ruflin commented Apr 21, 2020

@alakahakai Let me know when it's ready for review.

@alakahakai alakahakai changed the title Add example packages: netflow-1.0.0 Add example packages: netflow Apr 22, 2020
@alakahakai alakahakai changed the title Add example packages: netflow Add alpha package: netflow Apr 22, 2020
@alakahakai alakahakai changed the title Add alpha package: netflow Add netflow integration Apr 23, 2020
@alakahakai alakahakai changed the title Add netflow integration Add Netflow integration Apr 23, 2020
@alakahakai alakahakai marked this pull request as draft April 23, 2020 20:27
@alakahakai alakahakai changed the title Add Netflow integration Add Netflow integration 0.0.2 Apr 23, 2020
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshots in this PR would be welcome :)

max_message_size: '{{this.max_message_size}}'
expiration_timeout: '{{this.expiration_timeout}}'
queue_size: {{this.queue_size}}
{{#if this.timeout}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm stumbling over the this here. @mtojek do you know if this is necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already fixed, even though both forms (with/without this) are correct.

The previous implementation of processing stream configuration left this for consistency. If you upgrade to the new implementation, you will see a better formed file: https://github.com/elastic/package-registry/blob/master/dev/packages/beats/netflow-0.0.1/dataset/log/agent/stream/netflow.yml.hbs

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as this package landed, that dashboards should be revisted. My assumption is that with the new indexing strategy a lot of queries can be made more efficient by targeting much fewer indices.

@@ -0,0 +1,3141 @@
- name: '@timestamp'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only contain the fields that are used by this dataset or is this all of ECS? The goal should be to only have the fields in here that are actually used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using the netflow module as an example the fields consist of:

  • some base filebeat fields that may not be in ECS (like input.type)
  • all filebeat netflow input fields (some 400+ fields)
  • a subset of ECS namespaces (like agent, source, destination, network, event)

Very few of these fields are something that the module actually creates. This module basically brings together Filebeat with an Ingest Node pipeline to add geo location and autonomous system numbers. So keeping this module updated will be a lot of work. And since it depends on Filebeat's fields, this template will vary based on the Filebeat version.

So I wonder if it would be possible to list some dependent templates in addition to custom fields created by a module to create a composite. Like saying that for this module it depends on

  • filebeat $automatic_version | include_namespaces(agent | netflow | input.type | source | destination)
  • ECS 1.4 | include_namespaces(source | destination)

Screen Shot 2020-04-28 at 10 35 55 AM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh Nice diagram!

v2 Elasticsearch templates support components and one of the reasons we pushed for this is exactly the above problem. So long term, I'm pretty sure we will support this. Short term, I think we should get this into the development process. The current plan is to use the integrations repository for the development of the packages. There you could even use sysmlinks or whatever reference you need to in the end build the final package with all the files you need for the above. This keeps the final package simple for now.

I was not aware that netflow has 400+ fields (sounds like a lot) so in this case, the template will be pretty big in any case. Good news it is only the netflow template and does not affect anything else. Also it shouldn't be a problem if we have too many fields right now and later optimise as during update, the template will be overwritten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think some symlinks could accomplish this. Does the current module more allow for more than one fields file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have as many as you want as long as they end in .yml and are in the correct directory.

dev/packages/alpha/netflow-0.0.2/docs/README.md Outdated Show resolved Hide resolved
@elasticmachine
Copy link

elasticmachine commented May 8, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Branch indexing]

  • Start Time: 2020-05-19T12:18:21.151+0000

  • Duration: 4 min 16 sec (255933)

@alakahakai alakahakai requested review from ruflin and mtojek May 14, 2020 05:55
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some screenshots presenting the Kibana UI.

@alakahakai alakahakai requested a review from mtojek May 15, 2020 05:32
@mtojek
Copy link
Contributor

mtojek commented May 15, 2020

Here are some spottings on my side:

  • NetFlow log logs: maybe NetFlow standard logs?
  • netflow_host, netflow_port: the prefix netflow is redundant here
  • all configuration variables: UI labels should be human readable and not contain "_" (underscore)
  • not sure if we need to show to user all variables - maybe just host and port?
  • all configuration variables expect host and port: it would be feasible to leave a one sentence description
  • all configuration variables should contain information about their types (text, number, etc.)

@alakahakai alakahakai marked this pull request as ready for review May 18, 2020 22:35
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, let's iterate on top of this.

@alakahakai alakahakai merged commit 6fd650c into elastic:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants