-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: switch to SHUTTING_DOWN state unconditionally #4408
Conversation
7daf643
to
402cc7b
Compare
@adiholden I also run the full regtest suit https://github.com/dragonflydb/dragonfly/actions/runs/12629900466 |
During the shutdown sequence always switch to SHUTTING_DOWN. Make sure that the rest of the code does not break if it can not switch to the desired global state + some clean ups around state transitions. Finally, reduce the amount of data in test_replicaof_reject_on_load Signed-off-by: Roman Gershman <roman@dragonflydb.io>
} | ||
if (switch_state) { | ||
SwitchState(GlobalState::ACTIVE, GlobalState::LOADING); | ||
loading_state_counter_++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how what possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have 2 loading processes at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
replica.stop() | ||
replica.start() | ||
c_replica = replica.client() | ||
|
||
@assert_eventually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you use the assert_eventually here? shouldnt we see the loading state at the first time we run the info persistence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ignore the implementation details and assume there can be a short period of time where the server has not started loading yet or ServerState::g_state_
has not been updated yet. I embrace the eventual consistency nature of state transitions
tests/dragonfly/replication_test.py
Outdated
persistence = await c_replica.info("PERSISTENCE") | ||
assert persistence["loading"] == 1 | ||
|
||
# If this fails adjust `keys` and the `assert dbsize >= 30000` above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can update the comment there is not assert on dbsize
tests/dragonfly/replication_test.py
Outdated
@@ -1971,23 +1971,27 @@ async def test_replicaof_reject_on_load(df_factory, df_seeder_factory): | |||
df_factory.start_all([master, replica]) | |||
|
|||
c_replica = replica.client() | |||
await c_replica.execute_command(f"DEBUG POPULATE 8000000") | |||
await c_replica.execute_command(f"DEBUG POPULATE 4000000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason we use large amount of keys here is to generate lots of data which will be later take time to load.
If you will use
seeder = StaticSeeder(
key_target=800,
data_size=100000,
collection_size=10000,
types=["SET"])
The test will be extremely fast and this will be same data size.
2 reasons for this:
- when populating we yeild every 32 keys added I think
- we add multiple elements in one command when using collections while for string we each time add one key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case the variation of "debug populate ... type SET ... " could also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just confirmed debug populate 800 key 1000 RAND type set elements 2000
creates 800 sets with 2000 elements of size 1000 each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StaticSeeder uses debug poulate its similar to your fix, anyway the change is good
if (schedule_done_.WaitFor(100ms)) { | ||
return; | ||
} | ||
} while (ss->gstate() == GlobalState::LOADING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason you move several places in the code to get the global state from ServerState?
I believe we used the service_.GetGlobalState() in this places intentionaly.
We assume that the ServerState::state is eventually consistent with the global state, therefor in places where we allow/recect running some flow based on the state we use the global state as it could be that there are 2 commands running at the same time one is changing the state and the other one checks the state to determine whether we can run the flow and might also change the state based on this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that's exactly why I removed it. It's misleading. There are no transactional guarantees when you test an atomic variable like this. The global state can change a millisecond later, so this check is not "better". I do not think that in all these places it's super important but I wanted to remove this false sense of correctness on purpose.
Fixes this one #4423 right ? |
we will see |
During the shutdown sequence always switch to SHUTTING_DOWN. Make sure that the rest of the code does not break if it can not switch to the desired global state + some clean ups around state transitions.