Handle multiline processors #2063

haetamoudi · 2024-09-02T10:19:33Z

The LastLine attribute is used to determine the end of a processor block in YAML files.

Currently, for YAML files such as:

---
processors:
  - script:
      source: |
        def a = 1;
        def b = 2;

The script processor is incorrectly identified with:

FirstLine:3
LastLine:4
When it should be:
FirstLine:3
LastLine:6

LastLine is used to identify the end of a processor when calculating code coverage.
Right now, it’s not accurately reflecting multiline processors, which can make code coverage look worse than it is. This can cause issues with PRs in the integrations repo and might block merges.

To fix this, we can consider a processor ends when the next one starts.
To identify the end of the last processor:

Find the next node that is not a processor node (e.g: on_failure node)
If there is none, find the end of the pipeline

Code coverage test results

Before the changes:

Code coverage link

the content of the script node is not considered as part of the processor
comments and empty lines inside the processors node are marked as uncovered

After the changes:

Code coverage link

the content of the script node is considered part of the processor

comments and empty lines inside the processors node are considered part of the processors

Fixes #2111

haetamoudi · 2024-09-02T12:51:31Z

test integrations

elastic-vault-github-plugin-prod · 2024-09-02T13:00:11Z

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10964

jsoriano

Thanks for this contribution!

In the build in integrations there is a failure that could be related: https://buildkite.com/elastic/integrations/builds/15324#0191b3b5-8906-4c82-ae84-994684452522/1582-1875

ERROR: Caused by: Line 149 is out of range in the file packages/teleport/data_stream/audit/elasticsearch/ingest_pipeline/default.yml (lines: 148)

And, is this change intended to fix #2045? If it is I think we also need to ignore empty and commented lines as mentioned in #2045 (comment).

internal/elasticsearch/ingest/processors.go

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>

jsoriano · 2024-09-04T12:19:28Z

test integrations

elastic-vault-github-plugin-prod · 2024-09-04T12:26:25Z

Created or updated PR in integrations repository to test this version. Check elastic/integrations#11003

haetamoudi · 2024-09-04T14:43:01Z

one XML is generated with additional LineToCover and it's causing the issue why the integration tests

jsoriano · 2024-09-04T15:03:12Z

one XML is generated with additional LineToCover and it's causing the issue why the integration tests

This could be related to this change, right?

haetamoudi · 2024-09-04T15:27:08Z

one XML is generated with additional LineToCover and it's causing the issue why the integration tests

This could be related to this change, right?

Probably as the failing XML is one created by the pipeline (the one I touched), I am running the integration tests on a fresh branch from main to be sure.

haetamoudi · 2024-09-05T07:18:09Z

I think I found the issue. The function is counting multilines here because of the \n. I'll fix that and add some tests

mrodm · 2024-09-05T12:59:50Z

internal/elasticsearch/ingest/processors_test.go

+processors:
+  - script:
+      description: Drops null/empty values recursively.
+      tag: script_drop_null_empty_values
+      lang: painless
+      source: "def a = b \n; def b = 2; \n"
+`),
+			expected: []Processor{
+				{Type: "script", FirstLine: 3, LastLine: 7},
+			},


Updating this test example with two processors, I think it does not report the expected lines in the coverage.

For instance if this test is modified to include two processors whose definition is multiline like this

content: []byte(`--- processors: - script: description: Drops null/empty values recursively. tag: script_drop_null_empty_values lang: painless source: "def a = b \n; def b = 2; \n" - script: lang: painless source: "def a = b \n; def b = 2; \n def c = 4; \n def d = 5; \n" - set: field: "foo" value: "bar" `),

This gives this result:

[]ingest.Processor{ingest.Processor{Type:"script", FirstLine:3, LastLine:9}, ingest.Processor{Type:"script", FirstLine:8, LastLine:13}, ingest.Processor{Type:"set", FirstLine:11, LastLine:13}}

But according to that:

the second processors starts (line 8) before the first processor finishes (line 9)

the second processor finishes in the same line as the third processor.

Should it be the result this ?

[]ingest.Processor{ingest.Processor{Type:"script", FirstLine:3, LastLine:7}, ingest.Processor{Type:"script", FirstLine:8, LastLine:10}, ingest.Processor{Type:"script", FirstLine:11, LastLine:13}}

I think to keep the coverage reports reporting the right lines, if that scenario happens (a processors is defined as oneline with some `\n` in it), that should be reported as just one line. Since in the YAML file, it is in the same line too. If that is not the case, probably the coverage reports would set incorrect lines as covered or not covered.

I need to revise the logic for determining the end of a processor. Simply counting line breaks doesn't work well, as in cases like:

- grok: tag: Extract header field: message trim_value: " \n\n"

will count, for example, 5 lines instead of 4.

One other option is to consider that a processor ends the line before the next one starts. However, I am running into issues when setting the LastLine number for the last processor. I can't just set it to the end of the file because there could be other elements afterward, like:

- rename: field: source.as.organization_name target_field: source.as.organization.name ignore_missing: true on_failure: - set: field: error.message value: '{{ _ingest.on_failure_message }}'

The problem is, when using yaml.Unmarshal, the YAML parser will interpret
"a = 1\nb = 2"
as

a = 1 b = 2

So I can't really base the logic on lines breaks

haetamoudi · 2024-09-06T18:19:38Z

test integrations

elastic-vault-github-plugin-prod · 2024-09-06T18:24:59Z

Created or updated PR in integrations repository to test this version. Check elastic/integrations#11032

haetamoudi · 2024-09-09T15:10:48Z

test integrations

elastic-vault-github-plugin-prod · 2024-09-09T15:18:54Z

Created or updated PR in integrations repository to test this version. Check elastic/integrations#11032

haetamoudi · 2024-09-10T10:37:54Z

Finally got a successful build on the integrations repo: https://buildkite.com/elastic/integrations/builds/15698 ✅

jsoriano

Looking to the changes here, I wonder if we should do #2045 first, so empty and commented lines are not included in coverage.

internal/elasticsearch/ingest/processors_test.go

internal/elasticsearch/ingest/processors.go

Co-authored-by: Mario Rodriguez Molins <marrodmo@gmail.com>

elasticmachine · 2024-09-20T06:43:38Z

💚 Build Succeeded

Buildkite Build
Commit: 03cd431

History

💔 Build #4018 failed 2c439bb
💚 Build #3981 succeeded ff6f176

cc @haetamoudi

jsoriano

LGTM, @mrodm please review if your last comments have been addressed.

mrodm

LGTM !
Thanks @haetamoudi !

handle multiline processors when getting last line

e7f8f90

haetamoudi added bug Something isn't working Team:Ecosystem Label for the Packages Ecosystem team labels Sep 2, 2024

add copyright

5b437cf

elastic-vault-github-plugin-prod bot mentioned this pull request Sep 2, 2024

Test elastic-package#2063 - DO NOT MERGE elastic/integrations#10964

Closed

haetamoudi added 2 commits September 2, 2024 17:54

add test cobertura for single pipeline

d159d30

fix typo in test

06d6ac1

haetamoudi marked this pull request as ready for review September 2, 2024 18:28

haetamoudi requested a review from a team September 2, 2024 18:28

haetamoudi self-assigned this Sep 2, 2024

jsoriano reviewed Sep 3, 2024

View reviewed changes

internal/elasticsearch/ingest/processors.go Outdated Show resolved Hide resolved

internal/elasticsearch/ingest/processors.go Outdated Show resolved Hide resolved

haetamoudi and others added 2 commits September 3, 2024 10:21

Update internal/elasticsearch/ingest/processors.go

09a1040

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>

Update internal/elasticsearch/ingest/processors.go

6d8c1a8

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>

elastic-vault-github-plugin-prod bot mentioned this pull request Sep 4, 2024

Test elastic-package#2063 - DO NOT MERGE elastic/integrations#11003

Closed

mrodm reviewed Sep 5, 2024

View reviewed changes

haetamoudi marked this pull request as draft September 5, 2024 13:44

ebeahan mentioned this pull request Sep 5, 2024

add missing fields gcp audit logs elastic/integrations#10886

Merged

6 tasks

processor ends when next one start

9d2902f

haetamoudi force-pushed the handle-multiline-processor branch from 548c938 to 9d2902f Compare September 6, 2024 13:58

elastic deleted a comment from elasticmachine Sep 6, 2024

elastic deleted a comment from elastic-vault-github-plugin-prod bot Sep 6, 2024

elastic-vault-github-plugin-prod bot mentioned this pull request Sep 6, 2024

Test elastic-package#2063 - DO NOT MERGE elastic/integrations#11032

Closed

Merge branch 'main' into handle-multiline-processor

ff6f176

haetamoudi marked this pull request as ready for review September 10, 2024 10:41

haetamoudi requested review from jsoriano and mrodm September 10, 2024 10:41

jsoriano reviewed Sep 10, 2024

View reviewed changes

internal/elasticsearch/ingest/processors_test.go Show resolved Hide resolved

internal/elasticsearch/ingest/processors_test.go Show resolved Hide resolved

mrodm reviewed Sep 13, 2024

View reviewed changes

internal/elasticsearch/ingest/processors.go Outdated Show resolved Hide resolved

internal/elasticsearch/ingest/processors.go Show resolved Hide resolved

internal/elasticsearch/ingest/processors.go Show resolved Hide resolved

Update internal/elasticsearch/ingest/processors.go

2c439bb

Co-authored-by: Mario Rodriguez Molins <marrodmo@gmail.com>

jsoriano mentioned this pull request Sep 19, 2024

[Test Coverage] Multiline YAML strings not accounted for in pipelines #2111

Closed

Merge branch 'main' into handle-multiline-processor

03cd431

haetamoudi requested review from jsoriano and mrodm September 20, 2024 06:44

jsoriano reviewed Sep 20, 2024

View reviewed changes

mrodm approved these changes Sep 20, 2024

View reviewed changes

haetamoudi merged commit ee40f69 into elastic:main Sep 20, 2024
3 checks passed

haetamoudi deleted the handle-multiline-processor branch September 20, 2024 11:14

Handle multiline processors #2063

Handle multiline processors #2063

Uh oh!

Conversation

haetamoudi commented Sep 2, 2024 • edited by jsoriano Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code coverage test results

Before the changes:

After the changes:

Uh oh!

haetamoudi commented Sep 2, 2024

Uh oh!

elastic-vault-github-plugin-prod bot commented Sep 2, 2024

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jsoriano commented Sep 4, 2024

Uh oh!

elastic-vault-github-plugin-prod bot commented Sep 4, 2024

Uh oh!

haetamoudi commented Sep 4, 2024

Uh oh!

jsoriano commented Sep 4, 2024

Uh oh!

haetamoudi commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haetamoudi commented Sep 5, 2024

Uh oh!

mrodm Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

haetamoudi Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

haetamoudi Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haetamoudi commented Sep 6, 2024

Uh oh!

elastic-vault-github-plugin-prod bot commented Sep 6, 2024

Uh oh!

haetamoudi commented Sep 9, 2024

Uh oh!

elastic-vault-github-plugin-prod bot commented Sep 9, 2024

Uh oh!

haetamoudi commented Sep 10, 2024

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elasticmachine commented Sep 20, 2024

💚 Build Succeeded

History

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

Uh oh!

mrodm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

haetamoudi commented Sep 2, 2024 •

edited by jsoriano

Loading

haetamoudi commented Sep 4, 2024 •

edited

Loading

haetamoudi Sep 5, 2024 •

edited

Loading