Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fileconsumer] Generlize emit function #24036

Merged

Conversation

djaglowski
Copy link
Member

@djaglowski djaglowski commented Jul 7, 2023

This is part of an effort to make fileconsumer.Reader more testable. I'd eventually like to move that struct into its own internal package. However, there is a dependency on the EmitFunc which is part of the fileconsumer package. Therefore, the EmitFunc will need to move into a separate package from fileconsumer.

Since this will be a breaking change (Go API only), I would like to take the opportunity to replace the emit function altogether with something more generic. The current EmitFunc includes a FileAttributes struct, the purpose of which is to pass along attributes describing the file from which a log is read. The FileAttributes struct may in turn contain a HeaderAttributes struct describing elements parsed from the header of a file.

If the emit function instead passes along all these attributes as a map[string]any, the Reader struct will be simpler and more testable, and dependencies between packages will be simpler as well. This PR is a proposal to make this change.

There are a few complicating factors here.

  1. The package checkpoints files by encoding and storing Reader's. This in turn means that the FileAttributes struct is encoded as well. However, only the HeaderAttributes struct is included in this encoded representation. The idea here was that these attributes need to be attached to any log read from the file, even if we are resuming after a restart or upgrade. Conversely, the other fields (file name/path, etc) are omitted from the encoding because the file may have moved. Since this PR would encode these values, we need to be sure to overwrite them when we reconstruct the Reader. This is accomplished in the reader factory by resolving the fields after copying the previously known values. A new test verified that these fields are updated appropriately.
  2. Since existing users may have encoded representations of the Reader from before this change, we need to handle this during decoding. This is handled in a single block of code which can be removed in a future version. I've tested this manually.
  3. The file_input operator used the FileAttributes struct in a specific way, but this is made redundant by the approach described in 1.

Benchmarks for this change show no notable impact:

--- main ---
BenchmarkFileInput/Single-10         	 1280559	       953.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkFileInput/Glob-10           	  342447	      3574 ns/op	       2 B/op	       0 allocs/op
BenchmarkFileInput/MultiGlob-10      	  347842	      3513 ns/op	       2 B/op	       0 allocs/op
BenchmarkFileInput/MaxConcurrent-10  	  339127	      3615 ns/op	       4 B/op	       0 allocs/op
BenchmarkFileInput/FngrPrntLarge-10  	 1266826	       955.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkFileInput/FngrPrntSmall-10  	 1260673	       964.4 ns/op	       0 B/op	       0 allocs/op

--- pr ---
BenchmarkFileInput/Single-10         	 1298505	       943.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkFileInput/Glob-10           	  351753	      3495 ns/op	       2 B/op	       0 allocs/op
BenchmarkFileInput/MultiGlob-10      	  348549	      3511 ns/op	       2 B/op	       0 allocs/op
BenchmarkFileInput/MaxConcurrent-10  	  353061	      3470 ns/op	       4 B/op	       0 allocs/op
BenchmarkFileInput/FngrPrntLarge-10  	 1270884	       948.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkFileInput/FngrPrntSmall-10  	 1279246	       941.2 ns/op	       0 B/op	       0 allocs/op

@github-actions github-actions bot requested a review from atoulme July 7, 2023 17:00
@djaglowski djaglowski force-pushed the fileconsumer-emitfunc-generalize branch 2 times, most recently from f1ed6c7 to 354683a Compare July 10, 2023 15:33
@djaglowski djaglowski force-pushed the fileconsumer-emitfunc-generalize branch 5 times, most recently from b6238a8 to a946d59 Compare July 12, 2023 19:45
@djaglowski djaglowski added the Run Windows Enable running windows test on a PR label Jul 12, 2023
@djaglowski djaglowski force-pushed the fileconsumer-emitfunc-generalize branch from a946d59 to 6de04b3 Compare July 12, 2023 20:15
@djaglowski djaglowski marked this pull request as ready for review July 12, 2023 21:36
@djaglowski djaglowski requested a review from a team July 12, 2023 21:36
@djaglowski djaglowski force-pushed the fileconsumer-emitfunc-generalize branch from 6de04b3 to 89dda18 Compare July 13, 2023 20:29
@djaglowski djaglowski force-pushed the fileconsumer-emitfunc-generalize branch from 89dda18 to 58234b6 Compare July 13, 2023 20:39
Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way this cleans up the API. I'm fairly unfamiliar with this package, but overall everything checks out to me. Just one non-blocking question.

pkg/stanza/fileconsumer/reader_factory.go Show resolved Hide resolved
@djaglowski djaglowski merged commit 6816bc1 into open-telemetry:main Jul 14, 2023
@djaglowski djaglowski deleted the fileconsumer-emitfunc-generalize branch July 14, 2023 22:15
@github-actions github-actions bot added this to the next release milestone Jul 14, 2023
djaglowski added a commit that referenced this pull request Jul 25, 2023
This continues from #24036 by moving additional complexity into an
internal package.

Some files contain headers that are different than the rest of the file.
In such cases, we may need to extract file attributes that will be
applied to logs from that file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg/stanza receiver/otlpjsonfile Run Windows Enable running windows test on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants