-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Closed
Labels
MetaTeam:FleetTeam label for Observability Data Collection Fleet teamTeam label for Observability Data Collection Fleet team
Description
Currently, various parts of the APIs provided by Ingest Manager have implemented error handling and logging in different levels of completeness.
Overall, we should do the following when an error happens:
-
in the place where the error happens, use the Kibana Logger to log a descriptive error message. The logger is available since [Ingest] Use Kibana logger for proper server-side logging #66017 -
in the place where the error happens, throw aBoomerror with a suitable status code that is not500, and the same error message that was logged - In the place where the error happens, throw an
IngestManagerErrorwith a suitableIngestManagerErrorTypeand a helpful and descriptive error message. -
in the request handler, if the caught error is aBoomerror, use its status code inres.customError(). - In the request handler, if the caught error is
instanceof IngestManagerError, useres.customError()to return the error to the caller. Pass the message, and get a suitable HTTP response code from thegetHTTPResponseCode()helper. - In the request handler, also use the Kibana Logger to log the error message to the Kibana log.
- In the request handler, if the error is not
IngestManagerError, use status code500. In that case, log an error to the console with the full error message, and also log the stack trace of the error.
For an example how this looks in implementation see #66541
Implementation example: #67278
Reasoning
- Kibana platform code logs a stack trace whenever a request handler returns a status code
500. This stack trace comes from within platform code and is not helpful in debugging the error. I find it also confusing, because it implies that the error hasn't been caught and handled correctly, which is not entirely true in our code. (There is Add error logs for HTTP 500 error details #65291 open for that.) - We should therefore use informative HTTP response codes whenever possible, or at least
400 bad requestinstead. - In cloud, customers can't inspect the Kibana log, so as much information as is practical should be provided through API error messages. This way, we can ask users e.g. on https://discuss.elastic.co/ to inspect their browser dev tools network tab and find out what happened. (Once we're in production, customers can ask support, and support agents will be able to inspect the Kibana logs.)
- The UI can still decide not to show too much error information if that is not desired.
Tracking
This task is to go through each of these APIs and ensure it handles and reports errors properly:
-
setup -
agent_config -
enrollment_api_key -
agent -
epm -
datasource -
data_streams -
install_script -
output -
settings -
app
Metadata
Metadata
Assignees
Labels
MetaTeam:FleetTeam label for Observability Data Collection Fleet teamTeam label for Observability Data Collection Fleet team