Local unittest error - `panic: leveldb: closed` #653

zemyblue · 2022-08-29T03:35:55Z

Summary of Bug

When I execute the unittest in local by make test, often the following error occurs.

panic: leveldb: closed

goroutine 519 [running]:
github.com/line/ostracon/store.(*BlockStore).LoadSeenCommit(0x14000132100, 0x104caf198?)
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/store/store.go:230 +0x188
github.com/line/ostracon/consensus.(*State).LoadCommit(0x14000027500, 0x19)
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/consensus/state.go:343 +0xe8
github.com/line/ostracon/consensus.(*Reactor).queryMaj23Routine(0x1400181a140, {0x106404440, 0x14002bf0480}, 0x0?)
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/consensus/reactor.go:862 +0x510
created by github.com/line/ostracon/consensus.(*Reactor).AddPeer
        /Users/zemyblue/go/pkg/mod/github.com/line/ostracon@v1.0.7-0.20220729051742-2231684789c6/consensus/reactor.go:204 +0x21c
FAIL    github.com/line/lbm-sdk/x/feegrant/client/testutil      62.049s

Version

version: lbm-sdk v0.46.0-rc7
system: Macbook Pro, Apple M1 Pro
OS: Monterey(12.3)

Steps to Reproduce

lbm-sdk compile. - make clean build
do unittest - make test

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate contributors tagged
Contributor assigned/self-assigned

The text was updated successfully, but these errors were encountered:

torao · 2022-09-26T07:19:43Z

What's causing this?

A Node starts several threads (go routines) between Start() and Stop(). Then at Stop(), it waits for important threads such as Reactor and Switch to stop before closing BlockDB and StateDB (leveldb in the lower layer). However, Reactor is internally launching a number of worker threads on ad-hoc basis, and their termination is not properly managed.

The cause of this problem is that when the test case has finished its execution and Node.Stop() has been completed successfully, there are still such worker threads started by Node that have access leveldb, which has been closed.

This time, it appears that an Ostracon node connected to the other node, and the worker thread started by AddPeer() was conflicted to Node.Stop() during the process. Therefore, this problem is most likely to occur when a node stops at the same time as a similar asynchronous event.

What's the problem?

There are two underlying problems:

The go routines invoked as worker threads aren't managed in such a way that they can wait to exit. This may not be a problem if the thread doesn't share any state and simply compute something but in this time, the leveldb is shared.
The API of BlockStore isn't designed to expect errors in the underlying DB and always panics instead of returning an error. This will cause the process to terminate abnormally when there's a temporary storage problem, in addition to this issue.

Either of these fixes would solve the "panic during testing" problem, but in my opinion, the second API design problem appears a bit more serious from a system perspective.

What does this affect?

This problem can occur whenever Node.Stop() is called in an environment where an Ostracon interacts with Ostracon on other nodes or processes. In other words, it can happen whenever you attempt to successfully shut down an Ostracon node participating in a network. This is also true for the current version of Tendermint.

When executing Node.Stop(), however, the user intends to terminate the service, so this may not be a major practical problem.

zemyblue added the A: bug Something isn't working label Aug 29, 2022

zemyblue assigned torao Sep 16, 2022

torao mentioned this issue Sep 26, 2022

Fix panic when a node stops while processing async network events Finschia/ostracon#466

Closed

torao removed their assignment Apr 13, 2023

tkxkd0159 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local unittest error - `panic: leveldb: closed` #653

Local unittest error - `panic: leveldb: closed` #653

zemyblue commented Aug 29, 2022

torao commented Sep 26, 2022 •

edited

Loading

Local unittest error - panic: leveldb: closed #653

Local unittest error - panic: leveldb: closed #653

Comments

zemyblue commented Aug 29, 2022

Summary of Bug

Version

Steps to Reproduce

For Admin Use

torao commented Sep 26, 2022 • edited Loading

What's causing this?

What's the problem?

What does this affect?

Local unittest error - `panic: leveldb: closed` #653

Local unittest error - `panic: leveldb: closed` #653

torao commented Sep 26, 2022 •

edited

Loading