Skip to content

[Ingest Manager] Error handling in server APIs #66688

@skh

Description

@skh

Currently, various parts of the APIs provided by Ingest Manager have implemented error handling and logging in different levels of completeness.

Overall, we should do the following when an error happens:

  • in the place where the error happens, use the Kibana Logger to log a descriptive error message. The logger is available since [Ingest] Use Kibana logger for proper server-side logging #66017
  • in the place where the error happens, throw a Boom error with a suitable status code that is not 500, and the same error message that was logged
  • In the place where the error happens, throw an IngestManagerError with a suitable IngestManagerErrorType and a helpful and descriptive error message.
  • in the request handler, if the caught error is a Boom error, use its status code in res.customError().
  • In the request handler, if the caught error is instanceof IngestManagerError, use res.customError() to return the error to the caller. Pass the message, and get a suitable HTTP response code from the getHTTPResponseCode() helper.
  • In the request handler, also use the Kibana Logger to log the error message to the Kibana log.
  • In the request handler, if the error is not IngestManagerError, use status code 500. In that case, log an error to the console with the full error message, and also log the stack trace of the error.

For an example how this looks in implementation see #66541

Implementation example: #67278

Reasoning

  • Kibana platform code logs a stack trace whenever a request handler returns a status code 500. This stack trace comes from within platform code and is not helpful in debugging the error. I find it also confusing, because it implies that the error hasn't been caught and handled correctly, which is not entirely true in our code. (There is Add error logs for HTTP 500 error details #65291 open for that.)
  • We should therefore use informative HTTP response codes whenever possible, or at least 400 bad request instead.
  • In cloud, customers can't inspect the Kibana log, so as much information as is practical should be provided through API error messages. This way, we can ask users e.g. on https://discuss.elastic.co/ to inspect their browser dev tools network tab and find out what happened. (Once we're in production, customers can ask support, and support agents will be able to inspect the Kibana logs.)
  • The UI can still decide not to show too much error information if that is not desired.

Tracking

This task is to go through each of these APIs and ensure it handles and reports errors properly:

  • setup
  • agent_config
  • enrollment_api_key
  • agent
  • epm
  • datasource
  • data_streams
  • install_script
  • output
  • settings
  • app

Metadata

Metadata

Assignees

Labels

MetaTeam:FleetTeam label for Observability Data Collection Fleet team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions