Skip to content

Log fiber #364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 17, 2023
Merged

Log fiber #364

merged 5 commits into from
Jan 17, 2023

Conversation

Serpentian
Copy link
Collaborator

Closes #107

@Serpentian Serpentian changed the title Fiber log Fiber logger Aug 5, 2022
@Serpentian Serpentian force-pushed the fiber_log branch 8 times, most recently from 3931f9e to 47ba263 Compare August 10, 2022 21:21
@Serpentian Serpentian changed the title Fiber logger Log fiber Aug 10, 2022
@Gerold103
Copy link
Collaborator

Didn't review the patch yet, but if you are doing the option from the big Alexey's comment, then most importantly don't forget to expose 'status' for the background workers. The log history isn't too important but the status is.

@Serpentian
Copy link
Collaborator Author

Didn't review the patch yet, but if you are doing the option from the big Alexey's comment, then most importantly don't forget to expose 'status' for the background workers. The log history isn't too important but the status is.

The question here is how we'll work with these fibers states. Are they supposed to be focused on errors or not?
E.g. if there's an error occured during fiber execution, should we set the state of the fiber to error and wait till the error is gone?

If we do it a way Alexey described in the second solution, this error will be dropped pretty soon by sleeping state e.g. and it doesn't have much sense in saving it. In that case we can grep the error message from the logs, which are saved in the same module.

@Gerold103
Copy link
Collaborator

E.g. if there's an error occured during fiber execution, should we set the state of the fiber to error and wait till the error is gone?

We should keep the error state until all becomes fine.

If we do it a way Alexey described in the second solution, this error will be dropped pretty soon by sleeping state

Sleeping state shouldn't exist, Alexey probably gave it as an example. Any fiber is sleeping except yours anyway, it is not multithread (call fiber:status() on any fiber except self, and it will be suspended). One could argue that this means "state machine sleeping", but still I think the error state should be the strongest one - IOW it can only be overridden by good state. It doesn't have to be a single field though.

Another option is that we could have separate state and activity. State would be error/balanced/good/full/etc. Activity would be backoff/idle/discovering/sending buckets/etc. Essentially, state would be like vshard.storage.info().status - a simple string which tells whether all is good or bad. It is very useful for monitoring too, not just for logs. For example, cartridge GUI could translate the states to colors like it does for whole instances depending whether they are alive.

Exact names for each background service states could be decided during the review. Or you could try to make an RFC with the whole design in terms of API at least. In "Discussions" tab of this repo. It would also allow to collect some feedback from potential users (usually there are 2 people who give feedback to almost everything).

@Serpentian
Copy link
Collaborator Author

Or you could try to make an RFC with the whole design in terms of API at least.

Sounds like a good idea. I'll write the RFC ASAP. Current implementation doesn't work in the way you described. It's better to discuss everything more precisely before making changes.

@Serpentian
Copy link
Collaborator Author

RFC is already up there: #365. It would be great to discuss, how all of this should work, when you have some time for this, so I could continue working on it.

@Serpentian Serpentian force-pushed the fiber_log branch 4 times, most recently from 15247ca to a0c10af Compare September 19, 2022 10:43
@Serpentian Serpentian marked this pull request as ready for review September 19, 2022 10:45
@Serpentian Serpentian force-pushed the fiber_log branch 3 times, most recently from 4dae872 to 9f6c094 Compare September 19, 2022 20:06
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for the patch! Really nice work, we are doing something cool here! See some comments/ideas below. I didn't review the last 2 commits yet because lacking time, sorry.

@Serpentian Serpentian marked this pull request as draft November 7, 2022 12:23
@Serpentian Serpentian force-pushed the fiber_log branch 5 times, most recently from 4187b57 to b3cbfc6 Compare November 11, 2022 23:09
@Serpentian Serpentian marked this pull request as ready for review November 11, 2022 23:39
@Serpentian Serpentian requested a review from Gerold103 November 11, 2022 23:40
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, cool stuff! We are closing to something finished already, I can see it.

@Serpentian
Copy link
Collaborator Author

The main problem with writing tests with such services is test hanging. I suppose, we should log, what are we doing with services.

@Serpentian Serpentian force-pushed the fiber_log branch 3 times, most recently from a2c3ab1 to 4223d4b Compare December 8, 2022 01:50
The problem is the fact that a reloadable fiber starts before the name
is set. So until the first fiber yield its name is 'lua', which is the
default value for `fiber.create` or `fiber.new`.

If the user greps logs in order to get info about some specific fiber
he will probably miss the beginning of these logs as they are
indistinguishable.

Let's use `fiber.new` and not `fiber.create` in order to initialize the
fiber. It won't start before context switching is performed. This way
we will have time to set the fiber's name.

We should also wakeup this fiber and yield the current one in order
to prepare and start the fiber which was created with `fiber.new`. This
is done so as not to change the current behavior of the
reloadable_fiber_create function.

NO_DOC=bugfix
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Beautiful, only nits are left I think.

Currently 'rebalancer_apply_routes' says in log that routes were
applied even when they weren't completely. If any of uuid was
skipped, then some serious error occurred on the receiver.

We should say the user that rebalancer_apply_routes failed to
apply some routes in such case.

Needed for tarantool#107

NO_DOC=bugfix
NO_TEST=<will be added in the subsequent commits>
This patch introduces the new module, the purpose of which is
to save statuses, activities and errors of any background service.

It's mostly done in order to ease the luatest testing of the background
fibers. So, this commit also adds helper functions for working with
service_info module.

It can also used for monitoring of the vshard's services and shows
whether any errors occurred and what this service is doing right now.

Part of tarantool#107

NO_DOC=internal
@Serpentian Serpentian force-pushed the fiber_log branch 2 times, most recently from 21fdcca to 19fcfff Compare January 16, 2023 09:34
@Serpentian Serpentian requested a review from Gerold103 January 16, 2023 14:16
This patch introduce the ability for the user to access occurred errors
and states of the background services via 'vshard.storage.info()'.

Part of tarantool#107

@TarantoolBot document
Title: vshard.storage.info() new option

Since 0.1.22 `vshard.storage.info()` accepts the table of opts which
can be used to control, which additional info to return. Currently,
only one option is available: `with_services`:

```Lua
> vshard.storage.info({with_services = true})
---
...
  services:
    gc:
      status: ok
      error: Error during garbage collection step
      activity: idling
      status_idx: 2
    recovery:
      status: ok
      activity: idling
      status_idx: 5
```
This patch introduce the ability for the user to access occurred errors
and states of the background services via 'vshard.router.info()'.

Closes tarantool#107

@TarantoolBot document
Title: vshard.router.info() new option

Since 0.1.22 `vshard.router.info()` accepts the table of opts which
can be used to control, which additional info to return. Currently,
only one option is available: `with_services`

```Lua
> vshard.router.info({with_services = true})
---
...
  services:
    failover:
      status: ok
      error: Error during failovering
      activity: idling
      status_idx: 2
    discovery:
      status: ok
      activity: idling
      status_idx: 3
```
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@Gerold103 Gerold103 merged commit f5f386a into tarantool:master Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show error from rebalancer or failover or any background service
2 participants