Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local unittest error - panic: leveldb: closed #653

Closed
2 of 4 tasks
zemyblue opened this issue Aug 29, 2022 · 1 comment
Closed
2 of 4 tasks

Local unittest error - panic: leveldb: closed #653

zemyblue opened this issue Aug 29, 2022 · 1 comment
Labels
A: bug Something isn't working lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@zemyblue
Copy link
Member

Summary of Bug

When I execute the unittest in local by make test, often the following error occurs.

panic: leveldb: closed

goroutine 519 [running]:
github.com/line/ostracon/store.(*BlockStore).LoadSeenCommit(0x14000132100, 0x104caf198?)
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/store/store.go:230 +0x188
github.com/line/ostracon/consensus.(*State).LoadCommit(0x14000027500, 0x19)
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/consensus/state.go:343 +0xe8
github.com/line/ostracon/consensus.(*Reactor).queryMaj23Routine(0x1400181a140, {0x106404440, 0x14002bf0480}, 0x0?)
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/consensus/reactor.go:862 +0x510
created by github.com/line/ostracon/consensus.(*Reactor).AddPeer
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/consensus/reactor.go:204 +0x21c
FAIL    github.com/line/lbm-sdk/x/feegrant/client/testutil      62.049s

Version

version: lbm-sdk v0.46.0-rc7
system: Macbook Pro, Apple M1 Pro
OS: Monterey(12.3)

Steps to Reproduce

  1. lbm-sdk compile. - make clean build
  2. do unittest - make test

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@zemyblue zemyblue added the A: bug Something isn't working label Aug 29, 2022
@torao
Copy link
Contributor

torao commented Sep 26, 2022

What's causing this?

A Node starts several threads (go routines) between Start() and Stop(). Then at Stop(), it waits for important threads such as Reactor and Switch to stop before closing BlockDB and StateDB (leveldb in the lower layer). However, Reactor is internally launching a number of worker threads on ad-hoc basis, and their termination is not properly managed.

The cause of this problem is that when the test case has finished its execution and Node.Stop() has been completed successfully, there are still such worker threads started by Node that have access leveldb, which has been closed.

This time, it appears that an Ostracon node connected to the other node, and the worker thread started by AddPeer() was conflicted to Node.Stop() during the process. Therefore, this problem is most likely to occur when a node stops at the same time as a similar asynchronous event.

What's the problem?

There are two underlying problems:

  1. The go routines invoked as worker threads aren't managed in such a way that they can wait to exit. This may not be a problem if the thread doesn't share any state and simply compute something but in this time, the leveldb is shared.
  2. The API of BlockStore isn't designed to expect errors in the underlying DB and always panics instead of returning an error. This will cause the process to terminate abnormally when there's a temporary storage problem, in addition to this issue.

Either of these fixes would solve the "panic during testing" problem, but in my opinion, the second API design problem appears a bit more serious from a system perspective.

What does this affect?

This problem can occur whenever Node.Stop() is called in an environment where an Ostracon interacts with Ostracon on other nodes or processes. In other words, it can happen whenever you attempt to successfully shut down an Ostracon node participating in a network. This is also true for the current version of Tendermint.

When executing Node.Stop(), however, the user intends to terminate the service, so this may not be a major practical problem.

@torao torao removed their assignment Apr 13, 2023
@tkxkd0159 tkxkd0159 added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Possibly useful, but not yet enough support to actually get it done. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/undecided Indicates a PR lacks a `priority/foo` label and requires one. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Possibly useful, but not yet enough support to actually get it done. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/undecided Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 31, 2023
@tkxkd0159 tkxkd0159 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Something isn't working lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants