Skip to content

fix(file source) Fix a data corruption bug with multi-char delimiters#24028

Merged
thomasqueirozb merged 10 commits intovectordotdev:masterfrom
lfrancke:fix/multi-char-delimiter
Dec 1, 2025
Merged

fix(file source) Fix a data corruption bug with multi-char delimiters#24028
thomasqueirozb merged 10 commits intovectordotdev:masterfrom
lfrancke:fix/multi-char-delimiter

Conversation

@lfrancke
Copy link
Contributor

@lfrancke lfrancke commented Oct 20, 2025

Summary

Fix problem where multi-chars delimiters fail to be parsed if they happen right at a buffer boundary.

Vector configuration

See https://github.com/lfrancke/vector-repro-24027 for a reproduction repository

How did you test this PR?

The repro repo contains a test case which I used.
In addition I added unit tests for 1-5 char delimiters.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@thomasqueirozb
Copy link
Contributor

Hi @lfrancke thanks for your contribution! Since this is something that alters Vector behavior it is considered a user facing change (I edited the PR description already). Could you please add a changelog? Thanks!

Also, your changes seem sound to me but I still need to review them more throughly. I will take a closer look soon

@lfrancke
Copy link
Contributor Author

Will do! Thanks.
I see that I left two of my debugging statements in the test as well. I'll remove those too.

@lfrancke
Copy link
Contributor Author

I pushed the changelog and removed the debug statements. It's ready for review I believe.

NickLarsenNZ added a commit to stackabletech/docker-images that referenced this pull request Oct 30, 2025
NOTE: I removed async/await parts from the original patch as that comes after 0.49.0

```sh
pushd $(cargo patchable checkout vector 0.49.0)

git remote add lfrancke https://github.com/lfrancke/vector

git fetch lfrancke

git cherry-pick 3ce729073f23631dd7b5525be640b5fa15af0223
and git cherry-pick --continue
git commit --amend

popd
cargo patchable export vector 0.49.0
```
NickLarsenNZ added a commit to stackabletech/docker-images that referenced this pull request Oct 30, 2025
NOTE: I removed async/await parts from the original patch as that comes after 0.49.0

```sh
pushd $(cargo patchable checkout vector 0.49.0)

git remote add lfrancke https://github.com/lfrancke/vector

git fetch lfrancke

git cherry-pick 3ce729073f23631dd7b5525be640b5fa15af0223
and git cherry-pick --continue
git commit --amend

popd
cargo patchable export vector 0.49.0
```
github-merge-queue bot pushed a commit to stackabletech/docker-images that referenced this pull request Oct 30, 2025
* chore(vector): Init patchable

* chore(stackable-devel): Make a special variant for Vector so that a different rust toolchain can be selected

* chore(stackable-devel): Add note about moving the version to
boil-config.toml once renovate can check there (for consistency)

* chore(nix): Add rust and cargo dependencies

Otherwise cargo can't be found

```
error: the 'cargo' binary, normally provided by the 'cargo' component, is not applicable to the '1.89.0-x86_64-unknown-linux-gnu' toolchain
```

* chore(vector): Build from source (based on ubi9-rust-builder)

NOTE: The ubi9-rust-builder could not be used as it contains `ONBUILD`
steps which we need to run after patchable does it's thing. Also it is
specifically designed for operators and their layout (under `rust/` and
using workspaces).

* chore(nix): Remove unused image-tools

* chore(issue_template/vector): Update instructions for version bumps

* fix(vector): Cherry pick unmerged patch from vectordotdev/vector#24028

NOTE: I removed async/await parts from the original patch as that comes after 0.49.0

```sh
pushd $(cargo patchable checkout vector 0.49.0)

git remote add lfrancke https://github.com/lfrancke/vector

git fetch lfrancke

git cherry-pick 3ce729073f23631dd7b5525be640b5fa15af0223
and git cherry-pick --continue
git commit --amend

popd
cargo patchable export vector 0.49.0
```

* chore(vector): Add maintainer label

This seems to be added to other images, so I'm just copying that.

* chore: Update changelog

* Apply suggestions from code review

Co-authored-by: Techassi <sascha.lautenschlaeger@stackable.tech>

* chore(vector): Remove unused upload script

* chore(vector): Remove old comments, add new todo

---------

Co-authored-by: Techassi <sascha.lautenschlaeger@stackable.tech>
@lfrancke
Copy link
Contributor Author

lfrancke commented Nov 3, 2025

@thomasqueirozb a quick ping. Considering that it corrupts data I hope a ping is fine here.

Copy link
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, this fix is very welcome! Sorry for the delay

@thomasqueirozb thomasqueirozb added the meta: awaiting author Pull requests that are awaiting their author. label Nov 17, 2025
Co-authored-by: Thomas <thomasqueirozb@gmail.com>
@github-actions github-actions bot removed the meta: awaiting author Pull requests that are awaiting their author. label Nov 20, 2025
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Nov 20, 2025
Copy link
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I just verified the added test doesn't work on master, this is a very nice test to have 😃

@thomasqueirozb thomasqueirozb added this pull request to the merge queue Dec 1, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 1, 2025
@thomasqueirozb thomasqueirozb added this pull request to the merge queue Dec 1, 2025
Merged via the queue into vectordotdev:master with commit f4f6620 Dec 1, 2025
78 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 1, 2025
@lfrancke lfrancke deleted the fix/multi-char-delimiter branch December 1, 2025 21:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-byte line delimiters split across buffer boundaries cause log event merging

2 participants