Skip to content

Enable multi node testing #98

Open
@Totktonada

Description

@Totktonada

I tried to run multi node testing (jepsen-cluster and jepsen-cluster-txm workflows in tarantool) and found that it does not work.

There are a lot warnings of this kind:

WARN [2021-11-14 14:45:49,060] jepsen node 146.185.243.54 - jepsen.control Encountered error with conn [:control "146.185.243.54"]; reopening
java.lang.InterruptedException: sleep interrupted

That finally ends with:

CMake Error at cmake/atomic.cmake:46 (message):
  C atomics not supported

Which points me to tarantool/tarantool#2088 and, it seems, means that those retries somehow lead to miss of the git submodule update --init --recursive command and/or incomplete cmake <...> commands.

The code that builds tarantool is the same for single node and multi node testing, so my guess is that it is a synchronization problem in the ssh connector implementation. There were relevant fixes in recent Jepsen versions, so we can try to update it and look, whether the problem will gone. See #30.

Full logs and artifacts:

Full logs from successful (single node) testing:

Tarantool's commit on which I run CI and got those logs.


As I see from tarantool/tarantool#5736 multi node testing was not enabled to save machine resources. I think we should enable it anyway, maybe just run rarely. Otherwise we'll meet surprises like this one without understanding what actually occurs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions