-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet
not using full timestamp precision in its log records
#414
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/sig node |
/sig instrumentation this isn't really kubelet specific, it's more of a k8s.io/klog feature request IMHO: I'd recommend disabling log de-duping on your logging provider for kubelet logs. If kubelet starts generating spurious duplicate logs, that is a bug. But logging with the log format of a decade now is expected. Attempting to dedupe kubelet logs is probably a waste of resources aside from potentially hiding recurring events / errors. |
Hi @BenTheElder .
Thank you for correction. I would like to kindly ask you to move the issue to better location. I will following it.
Thank you for feedback. Unfortunately, we not have enough precision timestamp to be sure, that this is really duplicated log entries, or only same log messages emitted from kubelet very quick.
I fully agree, but due to missing more precision timestamp, this is actually impossible. |
https://github.com/kubernetes/klog
Er ... no, I'm saying you should simply disable any attempt to deduplicate kubelet logs in the log collector because anytime kubelet produced pointless duplicate logs that would be considered a bug to fix and in any other case you're just spending compute time on hiding intentionally repeated messages, this could be true with even more precise timestamps as there's no guarantee about how slowly something is legitimately logged twice. It would be far less likely that the entire line is identical with nanosecond precision but in that case it would still make sense to simply disable the log deduplication behavior in your pipeline outside of kubelet because you are now expecting it will never detect one anyhow... |
Thank you, unfortunately, I not have rights to do this. I would like to kindly ask you to move the issue there.
Yes, this is our default state, so, no deduplication. If same log record exist many times, log collector attempting to deliver every occurrences. But, log collector (FluentBit) calculating unique ID from log record and attempt to store it to the DB (OpenSearch). And here is our case - if FluentBit attempt to store "same" log record, OpenSearch reject it and after several time, FluentBit drop it. Unfortunately, meantime FluentBit attempt to retry delivery it, but will still failing, because same log record is exist in OpenSearch and OpenSearch will still reject it. As a consequence, affected FluentBit instance start rising the retry metric and Prometheus start firing alert, because FluentBit have a problem with delivery log records. And here is another problem - there can be many different reasons, why FluentBit retrying delivery, so, operation team must check every occurrence from the FluentBit logs, what is reason for retrying. First step is simple - ask Kubernetes community to improve log mechanism in Second step will be:
|
/transfer klog |
Like @BenTheElder I also don't think that the time stamp should be used to make log entries unique. There is no guarantee that "time stamp" + "message" + "key/value" is unique. A log consumer which makes that assumption should add its own unique ID (for example, a running counter) instead. |
That all sounds like bugs in the log pipeline, attempting to circumvent them by pushing new complexity into the logging used for all Kubernetes components for the better part of a decade now doesn't seem like the best way, on balance, to handle this.
Right, this can be handled by inserting a monotonic counter based on the line number in the log pipeline without modifying kubelet / klog. Without more demand for this functionality, I don't know that it's a good tradeoff to implement this, and we certainly wouldn't change the default lightly because there is a lot of tooling out there parsing these logs. The consistency of logs emitted by core kubernetes components across providers/installers/distros and the predictable ability to parse them correctly is a feature of its own currently. |
Thank you for your feedback. We using that timestamp precision so far without any problems, because it is very unlikely (yop, there is nothing "for sure"), that two log records will be emitted in same time, same nano second. I mean, this is unlikely lesser:
like
So, I am near sure, that if this situation
will happening, it is very high chance for duplication from
I agree, default timestamp should stay as is, but if anybody else need to have full timestamp precision, this feature will be available. |
👎 from me for adding more code and thus complexity to klog for something that I consider a workaround for a deficiency in the log consumer. |
I agree @pohly |
Hi guys. It seems, that you have better idea, where problem can be. So, lets jump to beginning and skip our logging stack. I used it only to locate log records, which it marked as duplicated. I was on the node, where the records was generated and I used something like this: Are the marked log records really duplicated by Thank you! |
Probably these are two different records. We are working on contextual logging, which could be used to add more information about which stream was being watched. As it stands, the kubelet is watching several and they all go down at roughly the same time, thus the high chance that they get the same time stamp. |
OK, thank you. Could you please share issue, where are these activities tracked? It seems, that I missed it.
OK, thank you for explanation. If timestamp precision will not be helpful in the situations, I afraid, that without adding another attribute to the log record, this will not possible to solved. I mean add something like "event counter" or adding some kind of ID to difference each log records, no matter, that they can be emitted too quick. |
Just to be clear, we are not going to add anything like that to our log output. Each log entry as written is a new log entry. |
Thank you.
I am curious - How I, as a end user, can be sure, that no duplication occurring? I would like to not be paranoid, but only blind trust to output from the klog isnt good idea for me. |
Why should duplication occur? As a user, you have to trust the code that you are using and the people who wrote it. As a developer, you can review the source code. |
This is the perfect question, which I actually attempting to got answer in the issue. As you can see from my screenshots, as a end user, I see identical log records and I not have bullet-proof evidence, that are or it arent duplicated log records. As I mentioned previously, they can be log records from events, which are generating too quickly, but again I not have bullet-proof evidence, that are or it arent that records. Unfortunately, my Golang knowledge and experiences are very limited. This is reason, why I open the issue and try to got answers.
In current age, when projects can be corrupted, I consider my paranoia as a good starting point to be curious, how things work inside. But this is off topic, only for illustration, that healthy part of mistrust is required. OK, lets get back to my case - Are these records duplicated or not? If not, I would like to kindly ask you to provide bullet-proof evidence, which I can confirm from my side. |
@dims: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This seems excessive. The point is this library is not intended to be used by code that is generating pointless duplicates. If you have evidence that kubelet is logging multiple times for the same underlying event, please file a bug with details and we will work with the relevant owners to fix that. But asking us to provide "bullet proof evidence" that this doesn't exist is an unreasonable request. IMHO: There's no reason to deduplicate kubelet logs. Every log line from kubelet is meaningful, and on the off chance that it isn't, that is, because it's evidence of a bug and you should report the bug to the project so we can fix it. As explained above, the "identical" logs are likely telling you that the same thing is happening repeatedly to different streams, deduplicating those is losing info but adding nanoseconds isn't meaningfully adding any. You can reasonably assume that every kubelet log is a unique entry/event and if something is logged repeatedly then it did in fact repeat. |
I provided proof, that duplication occurring. As a user, I really dont know, what is source of the duplication (kubelet itself?, any another library in the project?, environment?, journald?, ... or something totally different).
This is it! If I have it, definitely I provide it. Again, as a user, I really dont know, what is source of the duplication.
If you thinking, I am sure, that any investigation for this problem is only wasting of time.
Again, as a user, I really dont know, what is source of the duplication.
I agree "are likely", but not sure. I know, that there can be parser in another log collectors, which expecting some log format. This is reason, why I asked you to add more precision timestamp. Fine, you have right, this again isnt bullet proof, because event can happen even in nanoseconds or picoseconds. Istead of counter-suggestion you repeatelly you only claim, that this is "unlikely".
I know, something is somewhere wrong. Where and what?
This was my assumption too, until I discovered duplicated log records. OK, lets summarize it:
Fine, this is actually for us only annoyingly and we can wait for somebody else, who will have same problems and merge our effort to find source of the problem. Of course, if we will still using Kubernetes on the time, because:
is only longer writen "use it or let it". So, it is more convenient wait for somebody else to solving the problem, like wasting time. This was my first issue to the project, but after this experience, you can be sure, that last! I am willing to provide some effort only if the issue will exist, problem will be critical for us and of course, I will still using Kubernetes. Have a nice day! |
What happened?
We can see many times, that our log processor detecting duplicated entries. The entries arent really duplicated, but generated too fast. As a result, from log processor perspective, log entries looks duplicated (e.g.):
, no matter, that the situation can happen in very short time period (e.g.):
As you can see, in first example (current state)
kubelet
have only microseconds precision. We would like to set tokubelet
, that the log entries will have nanoseconds (second example, goal state).What did you expect to happen?
It will be possible to set for
kubelet
full time precision timestamp (RFC3339Nano
standard).How can we reproduce it (as minimally and precisely as possible)?
This situation happening randomly, but often on bigger kubernetes nodes with huge amount of small pods in reconciliation time (e.g. upgrade many pods in different namespaces in same time).
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: