Skip to content

Conversation

@joegallo
Copy link
Contributor

@joegallo joegallo commented Oct 29, 2025

Proposed commit message

Prefer the ignore_empty_value option of the set processor for the panos threat and traffic pipelines, and prefer the copy_from option of the set processor for the panos pipeline.

The logic applies elsewhere, too, but I'm limiting the scope of this change as a proof-of-concept.

ignore_failure operates by catching and ignoring whatever exception(s) might be thrown by the tagged processor, so the (somewhat unfortunate) default behavior of the set processor to throw an exception on an absent value ends up being surprisingly expensive. By way of contrast, ignore_empty_value does what it says, it just ignores the empty value and moves on. So the latter is faster and the former is slower.

The copy_from change was originally part of #15279, but I've pulled it into this PR in order to have all my panw changes in a single place. As with the ignore_failure change, this is a performance optimization: when the field to be copied is already just a string, and so the set processor with mustache isn't being used for the side effect of converting to a string, then it's quite a bit faster to use copy_from rather than value (with mustache templating).

This doesn't matter much for a low volume integration, or one that only uses a handful of set processors, but with many set processors being executed many times it can start to add up.

ignore_empty_value shortcuts within the processor itself, while
ignore_failure means that the processor throws an
exception (expensive) and the exception is caught.
I don't think these have value beyond the semantics of what
ignore_empty_value already does (but perhaps I'm wrong!).
@joegallo joegallo requested a review from a team as a code owner October 29, 2025 16:22
@joegallo joegallo marked this pull request as draft October 29, 2025 16:44
@taylor-swanson
Copy link
Contributor

taylor-swanson commented Oct 29, 2025

@joegallo, this looks great!

I've actually been working on tool for helping make bulk changes to integrations (eventually will make its way into elastic-package, but working on some yaml AST modification stuff in the mean time), and something like this would be an easy and quick addition.

I'll take a closer look at this PR and let you know if anything needs adjusting.

Edit: Also worth mentioning that we are also going to take a look at our use of ignore_failure as pretty much every use of this is wrong as far as I can tell (prefer using something better like ignore_empty_value or conditions where it makes sense).

Copy link
Contributor

@taylor-swanson taylor-swanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes seem fine, we just need to add an entry in the changelog and bump the version in the manifest. If you have elastic-package installed, you can use the changelog subcommand:

elastic-package changelog add --next patch --type enhancement --description 'Prefer set with ignore_empty_value.' --link 'https://github.com/elastic/integrations/pull/15800'

@taylor-swanson taylor-swanson added enhancement New feature or request Integration:panw Palo Alto Next-Gen Firewall Team:Integration-Experience Security Integrations Integration Experience [elastic/integration-experience] and removed >enhancement labels Oct 29, 2025
@taylor-swanson taylor-swanson changed the title Prefer set with ignore_empty_value [panw] Prefer set with ignore_empty_value Oct 29, 2025
@taylor-swanson
Copy link
Contributor

While we are on this topic... There are also usages like this:

set:
  field: user.name
  value: "{{{destination.user.name}}}"
  ignore_failure: true
  if: ctx.user?.name == null

Any reasons why we'd use templates over copy_from? Same question applies to override vs. field == null.
I can't think of any.

set:
  field: user.name
  copy_from: destination.user.name
  ignore_empty_value: true
  override: false

@joegallo
Copy link
Contributor Author

Any reasons why we'd use templates over copy_from?

Not for the ordinary case! See #15279 for some of my thoughts on that.

@joegallo
Copy link
Contributor Author

Same question applies to override vs. field == null.

Ooh, yeah, I don't specifically have a pre-built opinion on that one yet! Naively I'd guess that in the current implementation of things, the if: condition would be faster ('for reasons') but I also think a sufficiently motivated Elasticsearch pull request could close that gap.

Is there an integration that you're aware of that does that a whole lot?

@taylor-swanson
Copy link
Contributor

Ooh, yeah, I don't specifically have a pre-built opinion on that one yet! Naively I'd guess that in the current implementation of things, the if: condition would be faster ('for reasons') but I also think a sufficiently motivated Elasticsearch pull request could close that gap.

Is there an integration that you're aware of that does that a whole lot?

I think we use it quite extensively in our integrations. Probably not for any particular reason other than someone did it in the past and everyone else followed suit. I could see why if would be faster, but from a readability/maintainability standpoint, I like override as it eliminates the possibility of getting the field path wrong (especially when copy/paste is used). I think it makes more sense when you have to do more complicated checks, but most of the time, our use of conditionals is making sure that either the source or destination field isn't or is null (which ignore_empty_value and override seem to handle well).

I also think a sufficiently motivated Elasticsearch pull request could close that gap.

Putting this on my wish list 😄

tag: set__temp__external_zones_20b326b7
field: _temp_.external_zones
copy_from: _conf.external_zones
if: ctx._conf?.external_zones != null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get extremely pedantic, though -- my change here (and in the next stanza) could actually be a change in behavior.

The prior logic with the if: script would not copy_from a null value, but it would copy_from an empty string. The new logic with ignore_empty_value will ignore nulls and it will also ignore empty strings.

I don't imagine that that difference in behavior actually matters, but I don't want to let it sneak by without comment/consideration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should matter here, thankfully. _conf.external_zones and _conf.internal_zones is set by the integration config to always be either null or an array of strings.

To be honest, I'm not really sure why these processors even exist (since we could just use the conf fields directly), but that's a topic for a different day 😄.

@joegallo
Copy link
Contributor Author

/test benchmark fullreport

@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Oct 30, 2025

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@joegallo joegallo changed the title [panw] Prefer set with ignore_empty_value [panw] Prefer set with copy_from and ignore_empty_value Oct 30, 2025
because copy_from is faster than value (when the value is a mustache
template that merely does field access).
@elasticmachine
Copy link

💚 Build Succeeded

History

@joegallo joegallo marked this pull request as ready for review October 30, 2025 19:32
@elasticmachine
Copy link

Pinging @elastic/integration-experience (Team:Integration-Experience)

@joegallo
Copy link
Contributor Author

The current benchmark report #15800 (comment) is showing as blank (🤷!) but prior to my most recent commit here (4adb496) that should make things even faster, it was showing this:

Screenshot 2025-10-30 at 3 33 28 PM

Copy link
Contributor

@taylor-swanson taylor-swanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

tag: set__temp__external_zones_20b326b7
field: _temp_.external_zones
copy_from: _conf.external_zones
if: ctx._conf?.external_zones != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should matter here, thankfully. _conf.external_zones and _conf.internal_zones is set by the integration config to always be either null or an array of strings.

To be honest, I'm not really sure why these processors even exist (since we could just use the conf fields directly), but that's a topic for a different day 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:panw Palo Alto Next-Gen Firewall Team:Integration-Experience Security Integrations Integration Experience [elastic/integration-experience]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants