This project is a commandline repair tool for Akka.NET users who:
- Are using Akka.Cluster.Sharding;
- Are using sharding with
akka.cluster.sharding.state-store-mode=persistence
, currently the default as of Akka.NET v1.4+; and - Have ran into situations where the entire cluster goes offline ungracefully and are thus left with "artifacts" of the previous cluster's state in their Akka.Persistence data, which prevents Akka.Cluster.Sharding from starting up correctly and placing all of the
ShardRegion
actors.
N.B. You can avoid having to use this tool at all by changing your Akka.Cluster.Sharding settings to
akka.cluster.sharding.state-store-mode=ddata
- which uses Akka.Cluster.DistributedData's in-memory replication to track this data instead.
This tool:
- Will delete all data that belongs to the built-in Akka.Cluster.Sharding actors, i.e. those with persistence ids
/system/sharding
; - Will not delete any of your entity data;
- SHOULD NEVER BE RUN AGAINST A LIVE, RUNNING CLUSTER; and
- You should really, really read the source code and the instructions before you attempt to use this tool.
You only need this tool when your ShardRegionCoordinator
actors die suddenly before they have a chance to cleanup their data. This is a relatively rare occurrence in practice but it does happen.
This tool should not be a part of your standard CI/CD processes. It is to be used in the case of disaster recovery / cluster corruption only.
Once more, for repetition - this tool should never be used in automated deployments. You only need it in rare cases where the sharding system terminated abruptly, i.e. a process or hardware failure.
Akin to what many users do with Lighthouse, this project is designed to be consumed by users by cloning this repository as the first step.
This is because, due to the nature of how Akka.Persistence plugins are highly extensible, configuration-driven, and store end-user data we thought it safest for the end-user to be in control of those dependencies. That isn't going to change - don't submit an issue asking us to.
To get started with RepairTool
, do the following:
- Clone this repository;
- Install your specific Akka.Persistence plugins that you use with Akka.Cluster.Sharding into the
RepairTool
project; - Add your connection string and Akka.Persistence configuration data to
app.conf
- follow the instructions of your specific Akka.Persistence plugin on how to do this; - As a final step, you need to replace
Func<ActorSystem, ICurrentPersistenceIdsQuery>
block of code inProgram.cs
with your own Akka.Persistence plugin's code for retreiving theIReadJournal
. You only really need to do this step if you are interested in being able to preview what data is going to be removed before you remove it. The actual repair commands don't depend on it.
With the notable exception of Akka.Persistence.Redis we are planning on fixing that and Akka.Persistence.Azure, which we are also planning on fixing, all actively maintained Akka.Persistence plugins support Akka.Persistence.Query and the ICurrentPersistenceIdsQuery
method specifically.
Here are some examples of how to set it up:
For SQL packages, you have to install a second NuGet package to get Akka.Persistence.Query support:
PS> Install-Package Akka.Persistence.Query.Sql
Func<ActorSystem, ICurrentPersistenceIdsQuery> queryMapper = actorSystem =>
{
var pq = PersistenceQuery.Get(actorSystem);
var readJournal = pq.ReadJournalFor<SqlReadJournal>(SqlReadJournal.Identifier);
// works because `SqlReadJournal` implements `ICurrentPersistenceIdsQuery`, among other
// Akka.Persistence.Query interfaces
return readJournal;
};
Func<ActorSystem, ICurrentPersistenceIdsQuery> queryMapper = actorSystem =>
actorSystem.ReadJournalFor<MongoDbReadJournal>(MongoDbReadJournal.Identifier);
Runs, but doesn't work properly yet and is being fixed: petabridge/Akka.Persistence.Azure#130
Func<ActorSystem, ICurrentPersistenceIdsQuery> queryMapper = actorSystem =>
actorSystem.ReadJournalFor<AzureTableStorageReadJournal>(AzureTableStorageReadJournal.Identifier);
Please feel free to send along additional PRs to update this list.
After you have all of your code and configuration setup, it's time to produce your binaries.
RepairTool
requires .NET 5 to build and run.
In the root of the folder where you cloned this repository, run:
PS> build.cmd Docker
This will:
- Create a local Docker image called
repairtool:latest
andrepairtool:{currentVersion}
and - Create a binary deployable version of
RepairTool.dll
in/src/RepairTool/bin/Release/net5/publish/
.
You can use either of these to run RepairTool
.
Once you have compiled your application and configured everything correctly, it's time to run RepairTool
.
RepairTool
works using a custom Petabridge.Cmd palette that is included with the source code in this repository.
This palette has the following commands (which you can discover via tab-autocompletion if using the pbm
CLI)
cluster-sharding-repair print-sharding-regions
- lists the names of all of theShardRegion
s found inside the current Akka.Persistence connection.cluster-sharding-repair print-sharding-data
- lists all of the raw persistenceIds that belong to the Akka.Cluster.Sharding system.cluster-sharding-repair delete-sharding-data -t {regionName1} -t {regionName2}
- the actual repair command; deletes the data from both the Akka.Persistence journal and snapshot store for theseShardRegion
s.
By default ReparTool
hosts its Petabridge.Cmd
TCP port on port 9777, so you can do the following to run it:
docker run --name shardrepair -p 9777:9777 repairtool
or if you want to run it directly on your developer machine without Docker:
cd src/RepairTool
dotnet run -c Release
Next, we just connect to it via pbm
and run these commands:
pbm localhost:9777 cluster-sharding-repair print-sharding-regions
pbm localhost:9777 cluster-sharding-repair delete-sharding-data -t {regionName1} -t {regionName2}
And that should purge all of the entity data that belongs to the /system/sharding
actors only. You'll see the output on the CLI as it's written.
If you need support or help using this tool in practice, please purchase an Akka.NET Support Plan.
© 2015-2021 Petabridge