Do not do `truncate table` operation by default #149

vponomaryov · 2024-10-17T10:49:31Z

If it is needed to truncate a DB table, then just use
following new parameter:

  -truncate-table

With this change it will be possible to use multiple scylla-bench
concurrent commands which won't truncate each other's data.

Closes: #30
Closes: #130

dkropachev · 2024-10-17T12:11:28Z

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.

To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

vponomaryov · 2024-10-17T12:27:05Z

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.

To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets.
So, data is not overlapped. And chance is 100% in this case.

dkropachev · 2024-10-17T13:21:32Z

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.
To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets. So, data is not overlapped. And chance is 100% in this case.

partition-offset - works only for sequential workload

vponomaryov · 2024-10-17T16:19:29Z

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.
To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets. So, data is not overlapped. And chance is 100% in this case.

partition-offset - works only for sequential workload

Changed the logic to consider the workload type.

dkropachev · 2024-10-17T16:51:28Z

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.
To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets. So, data is not overlapped. And chance is 100% in this case.

partition-offset - works only for sequential workload

Changed the logic to consider the workload type.

What I meant is that if your purpose to run couple of s-b in parallel targeting same table, it should be solved in a such a way that would cover all the cases, not only sequential.

From first look it should be pretty easy, there is alreayd concept of threadId that makes internal thread to target particular partitions only, if we somehow make it offset for every s-b instance

vponomaryov · 2024-10-18T15:37:30Z

@dkropachev done

fruch

LGTM

dkropachev

You also forgot about RangeScan,
It should go into RangeOffset and RangeCount

I also would prefere different CLI, something like:

   --worker-id N

where partitionOffset := (workerID - 1) * partitiionCount and truncateTable := workerID != 0

dkropachev · 2024-10-20T18:47:05Z

pkg/workloads/workloads.go

 	period := time.Duration(int64(time.Second.Nanoseconds()) * (pkCount / int64(threadCount)) / rate)
 	pkStride := int64(threadCount)
-	pkOffset := int64(threadId)
+	pkOffset := int64(threadId) + basicPkOffset


Suggested change

pkOffset := int64(threadId) + basicPkOffset

pkOffset := int64(threadId) + basicPkOffset

pkCount += basicPkOffset

Why do you propose to summarize the partition "offset" with the partition "count"?
Both serve different goals.

fruch · 2024-10-21T07:14:04Z

You also forgot about RangeScan, It should go into RangeOffset and RangeCount

I also would prefere different CLI, something like:
   --worker-id N
where partitionOffset := (workerID - 1) * partitiionCount and truncateTable := workerID != 0

how worker ID is related to the question ?, user should be able to say, should the stress tool truncate or not, same as in c-s. I fail to see how this is related to the title of this PR.

vponomaryov · 2024-10-21T08:17:39Z

You also forget about RangeScan, It should go into RangeOffset and RangeCount

I didn't forgot about "scan". It is explicitly mentioned to not support partition offsets.
Why? I am not aware about the use cases for it.

The main use case here is ability to disable truncation of a table with clear purpose - running multiple scylla-bench commands against single DB table for having cumulative "population" effect and not cumulative "breaking" one.

I also would prefere different CLI, something like:
   --worker-id N
where partitionOffset := (workerID - 1) * partitiionCount and truncateTable := workerID != 0

I have 2 concerns with this proposal:

Why should user count number of workers if he may avoid it? I don't see sense in it.
Truncation of a table should not be dependent on worker numbers at all. User decides whether he needs it or not, without relation to workers number.

fruch · 2024-10-21T15:19:39Z

lets split this one to it's two commit, and them test separately

If it is needed to truncate a DB table, then just use following new parameter: -truncate-table With this change it will be possible to use multiple scylla-bench concurrent commands which won't truncate each other's data. Closes: scylladb#30 Closes: scylladb#130

vponomaryov · 2024-10-21T17:00:58Z

lets split this one to it's two commit, and them test separately

Splitted the second commit to another PR here:

Make all write-capable workloads support partition offset #150

dkropachev

@vponomaryov , my assumption was that data validation won't work if s-b instances are writing to the same table, which is not the case, which changes a lot and now I am ok with your cli

roydahan · 2024-10-22T23:24:34Z

seems like we can merge.

vponomaryov requested review from fruch, juliayakovlev and dkropachev October 17, 2024 10:49

vponomaryov force-pushed the make-truncate-be-optional branch from ca8a3fe to 4070454 Compare October 17, 2024 16:17

vponomaryov changed the title ~~Do not do truncate table operation if not requested explicitly~~ Allow to skip 'truncate table' operation using 'sequential' workload Oct 17, 2024

vponomaryov force-pushed the make-truncate-be-optional branch from 4070454 to 1607a7b Compare October 18, 2024 15:33

vponomaryov changed the title ~~Allow to skip 'truncate table' operation using 'sequential' workload~~ Do not do 'truncate table' operation by default Oct 18, 2024

vponomaryov changed the title ~~Do not do 'truncate table' operation by default~~ Do not do truncate table operation by default Oct 18, 2024

vponomaryov requested a review from roydahan October 18, 2024 15:43

fruch approved these changes Oct 20, 2024

View reviewed changes

dkropachev requested changes Oct 20, 2024

View reviewed changes

vponomaryov force-pushed the make-truncate-be-optional branch from 1607a7b to 7aff1aa Compare October 21, 2024 17:00

vponomaryov requested review from dkropachev and fruch October 21, 2024 17:07

dkropachev approved these changes Oct 21, 2024

View reviewed changes

vponomaryov merged commit 7dd9989 into scylladb:master Oct 23, 2024

vponomaryov mentioned this pull request Oct 29, 2024

improvement(scylla-bench): bump version to v0.1.23 scylladb/scylla-cluster-tests#9084

Merged

2 tasks

mergify bot mentioned this pull request Oct 31, 2024

[Backport 6.1] improvement(scylla-bench): bump version to v0.1.23 scylladb/scylla-cluster-tests#9095

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not do `truncate table` operation by default #149

Do not do `truncate table` operation by default #149

vponomaryov commented Oct 17, 2024 •

edited

Loading

dkropachev commented Oct 17, 2024 •

edited

Loading

vponomaryov commented Oct 17, 2024

dkropachev commented Oct 17, 2024

vponomaryov commented Oct 17, 2024

dkropachev commented Oct 17, 2024 •

edited

Loading

vponomaryov commented Oct 18, 2024

fruch left a comment

dkropachev left a comment •

edited

Loading

dkropachev Oct 20, 2024

vponomaryov Oct 21, 2024

fruch commented Oct 21, 2024

vponomaryov commented Oct 21, 2024 •

edited

Loading

fruch commented Oct 21, 2024

vponomaryov commented Oct 21, 2024

dkropachev left a comment

roydahan commented Oct 22, 2024

	pkOffset := int64(threadId) + basicPkOffset
	pkOffset := int64(threadId) + basicPkOffset
	pkCount += basicPkOffset

Do not do truncate table operation by default #149

Do not do truncate table operation by default #149

Conversation

vponomaryov commented Oct 17, 2024 • edited Loading

dkropachev commented Oct 17, 2024 • edited Loading

vponomaryov commented Oct 17, 2024

dkropachev commented Oct 17, 2024

vponomaryov commented Oct 17, 2024

dkropachev commented Oct 17, 2024 • edited Loading

vponomaryov commented Oct 18, 2024

fruch left a comment

Choose a reason for hiding this comment

dkropachev left a comment • edited Loading

Choose a reason for hiding this comment

dkropachev Oct 20, 2024

Choose a reason for hiding this comment

vponomaryov Oct 21, 2024

Choose a reason for hiding this comment

fruch commented Oct 21, 2024

vponomaryov commented Oct 21, 2024 • edited Loading

fruch commented Oct 21, 2024

vponomaryov commented Oct 21, 2024

dkropachev left a comment

Choose a reason for hiding this comment

roydahan commented Oct 22, 2024

Do not do `truncate table` operation by default #149

Do not do `truncate table` operation by default #149

vponomaryov commented Oct 17, 2024 •

edited

Loading

dkropachev commented Oct 17, 2024 •

edited

Loading

dkropachev commented Oct 17, 2024 •

edited

Loading

dkropachev left a comment •

edited

Loading

vponomaryov commented Oct 21, 2024 •

edited

Loading