Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tsdb): Add block exporter. #14233

Merged
merged 1 commit into from
Jul 1, 2019
Merged

Conversation

benbjohnson
Copy link
Contributor

@benbjohnson benbjohnson commented Jun 28, 2019

Adds export tooling to influxd inspect export-blocks so that we
can dump out block data in SQL format for better analysis during
the debugging process.

Useful SQL commands

Top 10 keys (by block count)

SELECT org_id, bucket_id, key, COUNT(*), SUM("count")
FROM blocks
GROUP BY org_id, bucket_id, key
ORDER BY COUNT(*) DESC
LIMIT 10;

Top 10 keys spread across the most files

SELECT org_id, bucket_id, key, COUNT(DISTINCT filename)
FROM blocks
GROUP BY org_id, bucket_id, key
ORDER BY COUNT(DISTINCT filename) DESC
LIMIT 10;

Overlapping block count

Note: This requires sqlite3 version 3.25.0 for the windowing functions, however, version 3.28.0 is advised since it includes additional windowing features.

SELECT
  SUM(tbl.overlapping) AS overlapping_blocks, 
  COUNT(*) AS total_blocks,
  CAST(SUM(tbl.overlapping) AS FLOAT)/COUNT(*) AS pct_overlapping
FROM (
  SELECT min_time > LAG(max_time, 1, 9223372036854775807) OVER (PARTITION BY org_id, bucket_id, key ORDER BY min_time) AS overlapping
  FROM blocks
) tbl;

Performance

For a 2GB level 4 TSM file, export takes approximately 40s and generates a 350MB file.

$ time influxd inspect export-blocks xyz.tsm > xyz.sql
real	0m40.120s
user	0m20.282s
sys	0m27.060s

Importing into a sqlite3 database then takes approximately 20s:

$ time sqlite3 xyz.db < xyz.sql 
real	0m19.812s
user	0m15.331s
sys	0m3.373s

@benbjohnson benbjohnson requested a review from e-dard June 28, 2019 16:21
@benbjohnson benbjohnson self-assigned this Jun 28, 2019
@benbjohnson benbjohnson requested a review from jacobmarble June 28, 2019 16:27
@benbjohnson benbjohnson force-pushed the feat/tsm1-block-exporter branch from ef8554b to cf3d02b Compare July 1, 2019 15:13
Adds export tooling to `influxd inspect export-blocks` so that we
can dump out block data in SQL format for better analysis during
the debugging process.
@benbjohnson benbjohnson requested a review from stuartcarnie July 1, 2019 16:11
@benbjohnson benbjohnson force-pushed the feat/tsm1-block-exporter branch from cf3d02b to 08e24fa Compare July 1, 2019 16:13
@benbjohnson benbjohnson merged commit 90a529e into master Jul 1, 2019
@stuartcarnie
Copy link
Contributor

stuartcarnie commented Jul 2, 2019

@benbjohnson had to flip the boolean expression in the windowing query:

SELECT
  SUM(tbl.overlapping) AS overlapping_blocks, 
  COUNT(*) AS total_blocks,
  CAST(SUM(tbl.overlapping) AS FLOAT)/COUNT(*) AS pct_overlapping
FROM (
  SELECT min_time < LAG(max_time, 1, 0) OVER (PARTITION BY org_id, bucket_id, key ORDER BY min_time) AS overlapping
  FROM blocks
) tbl;

Specifically, the partitioning subquery, ordering by min_time:

SELECT min_time < LAG(max_time, 1, 0) OVER (PARTITION BY org_id, bucket_id, key ORDER BY min_time) AS overlapping
  FROM blocks

As we want to know whether later rows (blocks) have a minimum time that is less than the maximum of the previous block.

😞

[  block 0  ]
          [  block 1  ]

🙂

[  block 0  ]
              [  block 1  ]

@mark-rushakoff mark-rushakoff deleted the feat/tsm1-block-exporter branch April 16, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants