Error_class=URI::InvalidURIError error=“bad URI(is not URI) ..” #4646

raulgupto · 2024-09-25T08:03:02Z

Describe the bug

I’m getting this error continuously. When using @http plugin.
I’ve not been able to find the root cause for this but I’ve noticed this in coincidentally when my external endpoint is down for restarts.
I’ve buffering enabled which writes into my local disk and I do not drop any log chunks ie I’ve retry_forever as try. But when the service is back up this one chunk goes into periodic retries till infinity as the dynamic tag in the http endpoint is not resolved in retries.

so the whole error is like this:
Error_class=URI::InvalidURIError error=“bad URI(is not URI) \”https://myexternalendpoint.com/v0/${tag}\””

fluentd version: 1.16.5

To Reproduce

Use http plugin to an endpoint with ${tag}, using retry forever as true.

Expected behavior

Buffer chunk should be sent. It should not complain for invalid uri

Your Environment

- Fluentd version: 1.16.5
- Package version:
- Operating system: Red Enterprise Linux Server 7.9(Maipo)
- Kernel version: 2024 x86_64 GNU/Linux

Your Configuration

<match **>
@type http 
endpoint http://externalxx.com/v0/${tag}
content_type application/json
<format>
@type json
</format>
json_array true
<buffer_tag>
@type file
path /local/data/fluentd
flush_interval 12s
flush_thread_count 1
overflow_action block
chunk_limit_size 4MB
retry_type periodic
retry_wait 60s
total_limit_size 6GB
retry_forever true
</buffer>
</match>

Your Error Log

Error_class=URI::InvalidURIError error=“bad URI(is not URI) \”https://myexternalendpoint.com/v0/${tag}\””

Additional context

No response

daipom · 2024-09-26T04:12:45Z

@raulgupto Thanks for your report.
However, I can't reproduce this.

The placeholder is replaced when retrying if the chunk has tag key info.
If you set tag to the chunk keys, ${tag} should be replaced when retrying.

<match test>
  @type http 
  endpoint http://localhost:9880/${tag}
  <format>
    @type json
  </format>
  <buffer tag>
    @type file
    path ...
    flush_mode immediate
  </buffer>
</match>

raulgupto · 2024-09-26T09:37:33Z

Can you try killing the fluentd process ? I can’t figure out the exact scenario to reproduce this issue. What I’ve noticed is that, In normal scenario we have two buffer files for a chunk of message. But in this case, I’ve also noticed that only one present most of the time.

daipom · 2024-09-27T01:48:07Z

Can you try killing the fluentd process ?

I have tried.
When Fluentd restarts, Fluentd loads the existing chunks and sends them correctly.

But in this case, I’ve also noticed that only one present most of the time.

This should be the cause.
I can reproduce this issue as follows.

Make some buffer files.
Stop Fluentd with some buffer files remaining.
Delete some .meta buffer files manually.
Restart Fluentd.
This error happens.

The file buffer (buf_file) needs a .meta file to process the placeholders.
If it is removed, Fluentd can't process the placeholders.

daipom · 2024-09-27T01:50:07Z

If .meta file is removed accidentally, it means the information about tag is lost.
So, it is very difficult for Fluentd to recover such data.

raulgupto · 2024-09-27T07:02:28Z

I understand without a location you don’t know where to send it. But since retry_forever is true and fluentd keeps on retrying this chunk. What I’ve noticed is that instead of waiting just this chunk to be flushed. Fluentd proces is heavily waiting for this to be flushed, does not go down but consume whole buffer space and remain stuck forever. A solution to manually clear that buffer is there but that requires manual intervention to delete the buffer in production environment which is not sustainable.
Either we should drop the chunk that is corrupted ie without end address or we should fix this with the current address. The later seems not correct because ${tag} or fields like this were supposed to be dynamically resolved. Also, what if someone had changed config with new http address that chunk which was meant for old would go to the new.
I’d go with dropping the ill-configured buffers.

raulgupto · 2024-09-27T07:04:50Z

Another approach is to find a way how this problem would not appear in the first place. I’ve seen this appear frequently. Around 3-5 unique /160 hosts are facing this on monthly basis. Any existing config change that would fix this issue?

daipom · 2024-09-27T07:35:39Z

To address the root cause, please investigate why some buffer files are disappearing.
Is it a bug in Fluentd or an external factor?

If this may be a bug in Fluentd, we need to find out how to reproduce this phenomenon to fix the bug.
(I can reproduce the error by manually removing some buffer files. On the other hand, some buffer files must have been lost for some reason in your environment. We need to find out the cause.)

But since retry_forever is true and fluentd keeps on retrying this chunk. What I’ve noticed is that instead of waiting just this chunk to be flushed. Fluentd proces is heavily waiting for this to be flushed, does not go down but consume whole buffer space and remain stuck forever.

Some errors are considered non-retriable, and Fluentd gives up retrying.

https://docs.fluentd.org/buffer#handling-unrecoverable-errors

About the error in this issue, Fluentd executes retrying. It is considered retriable in the current implementation.
So, if using retry_forever, Fluentd retries to flush the chunk forever.

The issue may be improved if this can be fixed so that the error can be determined as non-retriable.

A solution to manually clear that buffer is there but that requires manual intervention to delete the buffer in production environment which is not sustainable.

You can stop using retry_forever, and add secondary.
This allows to automatically save unexpected data to a file or other location without manual tweaking.

Either we should drop the chunk that is corrupted ie without end address or we should fix this with the current address.

Certainly, we should improve the handling of buffers about this point.
If there is no corresponding .meta buffer file, it could be better that Fluentd drops or backups the chunk.

raulgupto · 2024-09-27T09:35:57Z

I’ll definitely add secondary_file. 1 question:
If I use retry_timeout / retry_max_times, how will my retries work in this case.

If 1 buffer has exhausted the retry parameter it will stop sending all buffer chunks.
or
If 1 buffer has exhausted the retry parameter. This 1 chunk of won’t be retried but others will be retried for same no of times.

I don’t want to stop after n tries or n duration. I want to keep retrying assuming my endpoint will be back after recovering from failure / releases.
Edit : I tried secondary_file. It doesn’t resolve ${tag}. I have <match **> as my match condition.
I wanted to separate out in dump which log files chunks have failed so that I could manually send it to the endpoint.

daipom · 2024-10-02T06:37:16Z

@raulgupto Sorry for my late response.

If I use retry_timeout / retry_max_times, how will my retries work in this case.

1. If 1 buffer has exhausted the retry parameter it will stop sending all buffer chunks.
   or

2. If 1 buffer has exhausted the retry parameter. This 1 chunk of won’t be retried but others will be retried for same no of times.

2 is correct.
Fluentd handles retries for each chunk.

daipom · 2024-10-02T06:45:07Z

Edit : I tried secondary_file. It doesn’t resolve ${tag}. I have <match **> as my match condition.
I wanted to separate out in dump which log files chunks have failed so that I could manually send it to the endpoint.

Chunks that cannot resolve placeholders due to missing metafiles fail to be transferred.
The secondary_file handles such chunks, so it can't resolve ${tag}.
If the metafile is lost, the tag information cannot be recovered.

raulgupto · 2024-10-02T09:00:07Z

Thank you for the seconday_file workaround. It will help to manually recover and send logs in case of failures. It would however be great if we can have retries/solution that can help recover buffers in case the .meta file is lost

daipom · 2024-10-04T01:35:27Z

If .meta file is removed accidentally, it means the information about tag is lost.
So, it is very difficult for Fluentd to recover such data.

So, it would be better to avoid the disappearance of buffer files.

Do you have any idea as to why the buffer file disappears?

Is Fluentd running duplicatedly?

raulgupto · 2024-10-10T08:25:44Z

I’ve added graceful kill commands to kill running process and around 10 second of sleep for restarts.
However, we have a process monitor that checks if fluentd is running or not. If not running it restarts it. So even if during host maintenance or clean restarts I don’t think there will be process duplication. But there are chances of process kill and restarts which ideally should not leave half of metadata. Is there any flag that can prevent metadata corruption during restarts that I can use ?

raulgupto added the waiting-for-triage label Sep 25, 2024

daipom removed the waiting-for-triage label Sep 26, 2024

daipom added the waiting-for-user Similar to "moreinfo", but especially need feedback from user label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error_class=URI::InvalidURIError error=“bad URI(is not URI) ..” #4646

Error_class=URI::InvalidURIError error=“bad URI(is not URI) ..” #4646

raulgupto commented Sep 25, 2024

daipom commented Sep 26, 2024 •

edited

Loading

raulgupto commented Sep 26, 2024

daipom commented Sep 27, 2024

daipom commented Sep 27, 2024

raulgupto commented Sep 27, 2024

raulgupto commented Sep 27, 2024

daipom commented Sep 27, 2024

raulgupto commented Sep 27, 2024 •

edited by daipom

Loading

daipom commented Oct 2, 2024

daipom commented Oct 2, 2024

raulgupto commented Oct 2, 2024

daipom commented Oct 4, 2024

raulgupto commented Oct 10, 2024

Error_class=URI::InvalidURIError error=“bad URI(is not URI) ..” #4646

Error_class=URI::InvalidURIError error=“bad URI(is not URI) ..” #4646

Comments

raulgupto commented Sep 25, 2024

Describe the bug

To Reproduce

Expected behavior

Your Environment

Your Configuration

Your Error Log

Additional context

daipom commented Sep 26, 2024 • edited Loading

raulgupto commented Sep 26, 2024

daipom commented Sep 27, 2024

daipom commented Sep 27, 2024

raulgupto commented Sep 27, 2024

raulgupto commented Sep 27, 2024

daipom commented Sep 27, 2024

raulgupto commented Sep 27, 2024 • edited by daipom Loading

daipom commented Oct 2, 2024

daipom commented Oct 2, 2024

raulgupto commented Oct 2, 2024

daipom commented Oct 4, 2024

raulgupto commented Oct 10, 2024

daipom commented Sep 26, 2024 •

edited

Loading

raulgupto commented Sep 27, 2024 •

edited by daipom

Loading