https://www.instaclustr.com/update-released-instaclustr-sstable-analysis-tools-apache-cassandra/ https://www.instaclustr.com/instaclustr-open-sources-cassandra-sstable-analysis-tools/
$ git clone git@github.com:instaclustr/cassandra-sstable-tools.git
$ cd cassandra-sstable-tools
# Select the correct branch for major version (default is cassandra-4.1)
$ git checkout cassandra-4.1
$ mvn clean install
Copy ic-sstable-tools.jar
to Cassandra JAR folder, eg. /usr/share/cassandra/lib
Copy the bin/ic-sstable-tools
script into your $PATH
We also offer RPM and DEB packages. In case you install them, you do not need to execute steps above, obviously.
$ ./bin/ic-sstable-tools
Missing required sub-command.
Usage: <main class> [-hV] [COMMAND]
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
Commands:
cfstats Detailed statistics about cells in a column family
pstats Partition size statistics for a column family
purge Statistics about reclaimable data for a column family
sstables Print out metadata for sstables that belong to a column family
summary Summary information about all column families including how much of the data is repaired
If you want invoke help command for each subcommand, do it like:
$ ./bin/ic-sstable-tools help cfstats
Provides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.
ic-sstable-tools summary
Column | Description |
---|---|
Keyspace | Keyspace the column family belongs to |
Column Family | Name of column family |
SSTables | Number of sstables on this node for the column family |
Disk Size | Compressed size on disk for this node |
Data Size | Uncompressed size of the data for this node |
Last Repaired | Maximum repair timestamp on sstables |
Repair % | Percentage of data marked as repaired |
Print out sstable metadata for a column family. Useful in helping to tune compaction settings.
ic-sstable-tools sstables <keyspace> <column-family>
Column | Description |
---|---|
SSTable | Data.db filename of sstable |
Disk Size | Size of sstable on disk |
Total Size | Uncompressed size of data contained in the sstable |
Min Timestamp | Minimum cell timestamp contained in the sstable |
Max Timestamp | Maximum cell timestamp contained in the sstable |
Duration | The time span between minimum and maximum cell timestamps |
Min Deletion Time | The minimum deletion time |
Max Deletion Time | The maximum deletion time |
Level | Leveled Tiered Compaction sstable level |
Keys | Number of partition keys |
Avg Partition Size | Average partition size |
Max Partition Size | Maximum partition size |
Avg Column Count | Average number of columns in a partition |
Max Column Count | Maximum number of columns in a partition |
Droppable | Estimated droppable tombstones |
Repaired At | Time when marked as repaired by incremental repair |
Tool for finding largest partitions.
ic-sstable-tools pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h | Display help |
---|---|
-b | Batch mode. Uses progress indicator that is friendly for running in batch jobs. |
-n | Number of partitions to display |
-t | Snapshot to analyse. Snapshot is created if none is specified. |
-f | Comma separated list of Data.db sstables to filter on |
Summary: Summary statistics about partitions
Column | Description |
---|---|
Count (Size) | Number of partition keys on this node |
Total (Size) | Total uncompressed size of all partitions on this node |
Total (SSTable) | Number of sstables on this node |
Minimum (Size) | Minimum uncompressed partition size |
Minimum (SSTable) | Minimum number of sstables a partition belongs to |
Average (Size) | Average (mean) uncompressed partition size |
Average (SSTable) | Average (mean) number of sstables a partition belongs to |
std dev. (Size) | Standard deviation of partition sizes |
std dev. (SSTable) | Standard deviation of number of sstables for a partition |
50% (Size) | Estimated 50th percentile of partition sizes |
50% (SSTable) | Estimated 50th percentile of sstables for a partition |
75% (Size) | Estimated 75th percentile of partition sizes |
75% (SSTable) | Estimated 75th percentile of sstables for a partition |
90% (Size) | Estimated 90th percentile of partition sizes |
90% (SSTable) | Estimated 90th percentile of sstables for a partition |
95% (Size) | Estimated 95th percentile of partition sizes |
95% (SSTable) | Estimated 95th percentile of sstables for a partition |
99% (Size) | Estimated 99th percentile of partition sizes |
99% (SSTable) | Estimated 99th percentile of sstables for a partition |
99.9% (Size) | Estimated 99.9th percentile of partition sizes |
99.9% (SSTable) | Estimated 99.9th percentile of sstables for a partition |
Maximum (Size) | Maximum uncompressed partition size |
Maximum (SSTable) | Maximum number of sstables a partition belongs to |
Largest partitions: The top N largest partitions
Column | Description |
---|---|
Key | The partition key |
Size | Total uncompressed size of the partition |
SSTable Count | Number of sstables that contain the partition |
SSTable Leaders: The top N partitions that belong to the most sstables
Column | Description |
---|---|
Key | The partition key |
SSTable Count | Number of sstables that contain the partition |
Size | Total uncompressed size of the partition |
SSTables: Metadata about sstables as it relates to partitions.
Column | Description |
---|---|
SSTable | Data.db filename of SSTable |
Size | Uncompressed size |
Min Timestamp | Minimum cell timestamp in the sstable |
Max Timestamp | Maximum cell timestamp in the sstable |
Level | Leveled Tiered Compaction level of sstable |
Partitions | Number of partition keys in the sstable |
Avg Partition Size | Average uncompressed partition size in sstable |
Max Partition Size | Maximum uncompressed partition size in sstable |
Tool for getting detailed cell statistics that can help identify issues with data model.
ic-sstable-tools cfstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h | Display help |
---|---|
-b | Batch mode. Uses progress indicator that is friendly for running in batch jobs. |
-n | Number of partitions to display |
-t | Snapshot to analyse. Snapshot is created if none is specified. |
-f | Comma separated list of Data.db sstables to filter on |
Summary: Summary statistics about partitions
Column | Description |
---|---|
Count (Size) | Number of partition keys on this node |
Rows (Size) | Number of clustering rows |
(deleted) | Number of clustering row deletions |
Total (Size) | Total uncompressed size of all partitions on this node |
Total (SSTable) | Number of sstables on this node |
Minimum (Size) | Minimum uncompressed partition size |
Minimum (SSTable) | Minimum number of sstables a partition belongs to |
Average (Size) | Average (mean) uncompressed partition size |
Average (SSTable) | Average (mean) number of sstables a partition belongs to |
std dev. (Size) | Standard deviation of partition sizes |
std dev. (SSTable) | Standard deviation of number of sstables for a partition |
50% (Size) | Estimated 50th percentile of partition sizes |
50% (SSTable) | Estimated 50th percentile of sstables for a partition |
75% (Size) | Estimated 75th percentile of partition sizes |
75% (SSTable) | Estimated 75th percentile of sstables for a partition |
90% (Size) | Estimated 90th percentile of partition sizes |
90% (SSTable) | Estimated 90th percentile of sstables for a partition |
95% (Size) | Estimated 95th percentile of partition sizes |
95% (SSTable) | Estimated 95th percentile of sstables for a partition |
99% (Size) | Estimated 99th percentile of partition sizes |
99% (SSTable) | Estimated 99th percentile of sstables for a partition |
99.9% (Size) | Estimated 99.9th percentile of partition sizes |
99.9% (SSTable) | Estimated 99.9th percentile of sstables for a partition |
Maximum (Size) | Maximum uncompressed partition size |
Maximum (SSTable) | Maximum number of sstables a partition belongs to |
Row Histogram: Histogram of number of rows per partition
Column | Description |
---|---|
Percentile | Minimum, average, standard deviation (std dev.), percentile, maximum |
Count | Estimated number of rows per partition for the given percentile |
Largest partitions: Partitions with largest uncompressed size
Column | Description |
---|---|
Key | The partition key |
Size | Total uncompressed size of the partition |
Rows | Total number of clustering rows in the partition |
(deleted) | Number of row deletions in the partition |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
Cells | Number of cells in the partition |
SSTable Count | Number of sstables that contain the partition |
Widest partitions: Partitions with the most cells
Column | Description |
---|---|
Key | The partition key |
Rows | Total number of clustering rows in the partition |
(deleted) | Number of row deletions in the partition |
Cells | Number of cells in the partition |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
Size | Total uncompressed size of the partition |
SSTable Count | Number of sstables that contain the partition |
Most Deleted Rows: Partitions with the most row deletions
Column | Description |
---|---|
Key | The partition key |
Rows | Total number of clustering rows in the partition |
(deleted) | Number of row deletions in the partition |
Size | Total uncompressed size of the partition |
SSTable Count | Number of sstables that contain the partition |
Tombstone Leaders: Partitions with the most tombstones
Column | Description |
---|---|
Key | The partition key |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
Rows | Total number of clustering rows in the partition |
Cells | Number of cells in the partition |
Size | Total uncompressed size of the partition |
SSTable Count | Number of sstables that contain the partition |
SSTable Leaders: Partitions that are in the most sstables
Column | Description |
---|---|
Key | The partition key |
SSTable Count | Number of sstables that contain the partition |
Size | Total uncompressed size of the partition |
Rows | Total number of clustering rows in the partition |
Cells | Number of cells in the partition |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
SSTables: Metadata about sstables as it relates to partitions.
Column | Description |
---|---|
SSTable | Data.db filename of SSTable |
Size | Uncompressed size |
Min Timestamp | Minimum cell timestamp in the sstable |
Max Timestamp | Maximum cell timestamp in the sstable |
Partitions | Number of partitions |
(deleted) | Number of row level partition deletions |
(avg size) | Average uncompressed partition size in sstable |
(max size) | Maximum uncompressed partition size in sstable |
Rows | Total number of clustering rows in sstable |
(deleted) | Number of row deletions in sstable |
Cells | Number of cells in the SSTable |
Tombstones | Number of cell or range tombstones in the SSTable |
(droppable) | Number of tombstones that are droppable according to gc_grace_seconds |
(range) | Number of range tombstones |
Cell Liveness | Percentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells. |
Finds the largest reclaimable partitions (GCable). Intensive process, effectively does "fake" compactions to calculate metrics.
ic-sstable-tools purge [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h | Display help |
---|---|
-b | Batch mode. Uses progress indicator that is friendly for running in batch jobs. |
-n | Number of partitions to display |
-t | Snapshot to analyse. Snapshot is created if none is specified. |
Largest reclaimable partitions: Partitions with the largest amount of reclaimable data
Column | Description |
---|---|
Key | The partition key |
Size | Total uncompressed size of the partition |
Reclaim | Reclaimable uncompressed size |
Generations | SSTable generations the partition belongs to |
You can test this tool with the CCM tool (as it will save some time over needing to install and configure cassandra), simply do the following
Locate the ccm installation directory on your machine, usually this is ~/.ccm
Identify the version of cassandra you wish to test on. Run take the binary (target/ic-sstable-tools.jar) and copy it into the lib directory located in
~/.ccm/repository/\<version to test on>/lib/
For example
~/.ccm/repository/5.0.0/lib/
Now run
export CASSANDRA_INCLUDE=~/.ccm/<ccm cluster name>/<node name in cluster>/bin/cassandra.in.sh
For example
export CASSANDRA_INCLUDE=~/.ccm/test/node1/bin/cassandra.in.sh
and you should be able to run commands using the script located in the bin directory!
ic-sstable-tools pstats keyspace table
Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project