-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support remote log level #4413
base: master
Are you sure you want to change the base?
Support remote log level #4413
Conversation
the build is gonna break of course because the dependencies are missing in pillar |
pkg/newlog/cmd/newlogd.go
Outdated
// create the necessary directories upfront | ||
if _, err := os.Stat(collectDir); os.IsNotExist(err) { | ||
if err := os.MkdirAll(collectDir, 0755); err != nil { | ||
log.Fatal(err) | ||
} | ||
} | ||
|
||
if _, err := os.Stat(uploadDevDir); os.IsNotExist(err) { | ||
if err := os.Mkdir(uploadDevDir, 0755); err != nil { | ||
log.Fatal(err) | ||
} | ||
} | ||
|
||
if _, err := os.Stat(uploadAppDir); os.IsNotExist(err) { | ||
if err := os.Mkdir(uploadAppDir, 0755); err != nil { | ||
log.Fatal(err) | ||
} | ||
} | ||
|
||
if _, err := os.Stat(keepSentDir); os.IsNotExist(err) { | ||
if err := os.MkdirAll(keepSentDir, 0755); err != nil { | ||
log.Fatal(err) | ||
} | ||
} | ||
|
||
if _, err := os.Stat(panicFileDir); os.IsNotExist(err) { | ||
if err := os.MkdirAll(panicFileDir, 0755); err != nil { | ||
log.Error(err) | ||
return | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for loop?
// app.log...gz | ||
// dev.log.keep...gz | ||
// dev.log.upload....gz | ||
// TODO: declare a clear order of removal by directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is format for saved files? It's not always that alphabetical <=> chronological, i.e. if we don't put zero in front of day/month, it won't work, f.e. '9' > '11' since we do symbol-wise comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the files are named like this:
app.6656f860-7563-4bbf-8bba-051f5942982b.log.1730464687367.gz
dev.log.keep.1730404601953.gz
dev.log.upload.1730404601953.gz
so we use timestamps instead of dates and the alphabetical order should also represent the time order.
However I understand that alphabetical order is not ideal and that's why I put that todo for myself :)
Put the correct name for the per-agent log level setting in the CONFIG-PROPERTIES.md file. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
As opposed to the default log level, which was setting which logs are produced by EVE's microservices, the remote log level will set which device logs are uploaded to the cloud. So it's assumed that the remote log level is always higher than the default log level. There are no changes in how newlogd collects the logs, only in how it handles the log files. For the logs that are uploaded: - we create a separate file with prefix dev.log.upload in collect - that file is gzipped when it reaches a certain size or by timer - the same as before - once gzipped the file is moved to devUpload - the same as before - once the file is uploaded successfully, it's deleted instead of being moved to keepSentQueue For the logs that stay on the device: - we create a separate file with prefix dev.log.keep in collect - that file is gzipped when it reaches a certain size - no timer - once gzipped the file is moved to keepSentQueue - the name of the folder is preserved The commit also contains some structural changes to the newlogd code: - init fileinfo inside trigMoveToGzip instead of passing it as parameter - add initNewLogfile function Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Add new metrics for newlogd: - LatestAvailableLog - the timestamp of the latest log available on the device - TotalSizeLogs - the total size of all logs on the device Signed-off-by: Paul Gaiduk <paulg@zededa.com>
defer devStatsKeep.file.Close() | ||
devStatsKeep.notUpload = true | ||
|
||
logmetrics.DevMetrics.LatestAvailableLog = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one will change on every reboots of the device, but the older log entries can still stay on the disk, so it won't reflect the earliest one. Also this 'Latest' can mean the most recent timestamp, maybe something like 'oldest_saved' or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, very nice catch!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something else to watch out for are devices (like RPi4) with no battery-backed up clock, which always boot on Jan 1st 1970 UTC. ntpd/chronos sets the clock, but if the network is out the agents including newlogd start before the time is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eriknordmark I'm not sure how to address that, because the logs will be then also dated Jan 1st 1970... we cannot wait until the time is set correctly and only then start logging
@@ -65,6 +70,7 @@ type NewlogMetrics struct { | |||
NumKmessages uint64 // total input kmessages | |||
NumSyslogMessages uint64 // total input syslog message | |||
DevTop10InputBytesPCT map[string]uint32 // top 10 sources device log input in percentage | |||
TotalSizeLogs uint64 // total size of logs on device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have like 'NumSkipUploadAppFile', maybe we can also record the number of uploaded files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's already recorded in NumGZipFilesSent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch in general looks good to me, I have some comments
- in collect-info.sh for 'newlog' directory, we should skip the 'devUpload' directory, since those not yet uploaded files will be the duplicates for the keep directory
- similar in the edgeview in walkLogDirs() function, we need to skip the 'devUpload' directory to avoid duplicates
- in loguploader.go, there is 'failSendDir' and logic to move the failed to send to controller log files and keep them there. With this change, i don't think it is needed anymore, since everything will be in the keep directory. this can simplify some logic
- when doing kibana search, i do search for 'Alpine' string for locate when the device has rebooted, now we may not have this. maybe to always log at least the first 5 device log entries?
@@ -108,8 +109,9 @@ var ( | |||
// from different goroutines, so in order to push the changes out of the | |||
// goroutines local caches and correctly observe changed values in another | |||
// goroutine sync/atomic synchronization is used. You've been warned. | |||
syslogPrio = types.SyslogKernelLogLevelNum[types.SyslogKernelDefaultLogLevel] | |||
kernelPrio = types.SyslogKernelLogLevelNum[types.SyslogKernelDefaultLogLevel] | |||
remotelogPrio = types.SyslogKernelLogLevelNum[types.SyslogKernelDefaultLogLevel] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we will have eve-api to have new config for device remote logging settings, it will be good to handle the setting for 'not upload any device log', which is not part of the log-level strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point!
tbh I was even thinking that we might need yet another log level - the local log level. because the user might not wanna store any logs on the device and only send them to the controller and that's not supported at the moment.
right now we store everything on the device because our default.log.level
== local.log.level
. and we also assume that the remote log level is going to be higher than default.log.level
but we don't have any checks for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We should include some settings through eve-api for defining (in project level, but can be overwritten by device configitem) local logging also. Remote logging level can not be higher than the local settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for end-to-end feature pov, the 'local' config should be per device, while the remote can be project scope. And the controller can add optional timeout for the local settings, e.g. turn on 'debug' for deviceA for 3 hours, controller side will remove this config after it expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@europaul I think we always need to store/queue the logs locally since the device might not be connected to the controller when the log is generated. But a notion of which log levels to keep stored on the device (after sent to the controller) might be what makes sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eriknordmark the logs that are meant to be sent to the controller are queued until they are
- sent to the controller - then they are removed
- fail to many many times - then we put them in
failedUpload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the description in https://github.com/lf-edge/eve/blob/master/docs/LOGGING.md with the intended new state of things?
Add a chapter explaining how different log levels are handled in the system. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
580384e
to
c894bec
Compare
@eriknordmark I made the changes to the documentation, please have a look. I'm not sure if it's an overkill with the |
This PR is a work in progress, but I appreciate if somebody could take a look at the code and the general approach and give me feedback.
Please see the commit messages (especially the second commit) for the overview of the changes.
Before merging this I need to update the pillar first to include the dependency in newlogd. I also need to update eve-api to include the new metrics.
I'm also planning to add more metrics as well as proper tests for newlogd to at least ensure that there is no regression.