Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not do truncate table operation by default #149

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

vponomaryov
Copy link
Contributor

@vponomaryov vponomaryov commented Oct 17, 2024

If it is needed to truncate a DB table, then just use
following new parameter:

  -truncate-table

With this change it will be possible to use multiple scylla-bench
concurrent commands which won't truncate each other's data.

Closes: #30
Closes: #130

@dkropachev
Copy link
Contributor

dkropachev commented Oct 17, 2024

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.

To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

@vponomaryov
Copy link
Contributor Author

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.

To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets.
So, data is not overlapped. And chance is 100% in this case.

@dkropachev
Copy link
Contributor

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.
To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets. So, data is not overlapped. And chance is 100% in this case.

partition-offset - works only for sequential workload

@vponomaryov vponomaryov force-pushed the make-truncate-be-optional branch from ca8a3fe to 4070454 Compare October 17, 2024 16:17
@vponomaryov vponomaryov changed the title Do not do truncate table operation if not requested explicitly Allow to skip 'truncate table' operation using 'sequential' workload Oct 17, 2024
@vponomaryov
Copy link
Contributor Author

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.
To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets. So, data is not overlapped. And chance is 100% in this case.

partition-offset - works only for sequential workload

Changed the logic to consider the workload type.

@dkropachev
Copy link
Contributor

dkropachev commented Oct 17, 2024

@vponomaryov , if you want to run them in validation mode it can work only by chance. If in regular mode, table won't be truncated.
To make it work in parallel on the same table we need somehow split targeted primary key values between runners.

Different concurrent commands use index offsets. So, data is not overlapped. And chance is 100% in this case.

partition-offset - works only for sequential workload

Changed the logic to consider the workload type.

What I meant is that if your purpose to run couple of s-b in parallel targeting same table, it should be solved in a such a way that would cover all the cases, not only sequential.

From first look it should be pretty easy, there is alreayd concept of threadId that makes internal thread to target particular partitions only, if we somehow make it offset for every s-b instance

@vponomaryov vponomaryov force-pushed the make-truncate-be-optional branch from 4070454 to 1607a7b Compare October 18, 2024 15:33
@vponomaryov vponomaryov changed the title Allow to skip 'truncate table' operation using 'sequential' workload Do not do 'truncate table' operation by default Oct 18, 2024
@vponomaryov vponomaryov changed the title Do not do 'truncate table' operation by default Do not do truncate table operation by default Oct 18, 2024
@vponomaryov
Copy link
Contributor Author

@dkropachev done

@vponomaryov vponomaryov requested a review from roydahan October 18, 2024 15:43
Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also forgot about RangeScan,
It should go into RangeOffset and RangeCount

I also would prefere different CLI, something like:

   --worker-id N

where partitionOffset := (workerID - 1) * partitiionCount and truncateTable := workerID != 0

period := time.Duration(int64(time.Second.Nanoseconds()) * (pkCount / int64(threadCount)) / rate)
pkStride := int64(threadCount)
pkOffset := int64(threadId)
pkOffset := int64(threadId) + basicPkOffset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pkOffset := int64(threadId) + basicPkOffset
pkOffset := int64(threadId) + basicPkOffset
pkCount += basicPkOffset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you propose to summarize the partition "offset" with the partition "count"?
Both serve different goals.

@fruch
Copy link
Contributor

fruch commented Oct 21, 2024

You also forgot about RangeScan, It should go into RangeOffset and RangeCount

I also would prefere different CLI, something like:

   --worker-id N

where partitionOffset := (workerID - 1) * partitiionCount and truncateTable := workerID != 0

how worker ID is related to the question ?, user should be able to say, should the stress tool truncate or not, same as in c-s. I fail to see how this is related to the title of this PR.

@vponomaryov
Copy link
Contributor Author

vponomaryov commented Oct 21, 2024

You also forget about RangeScan, It should go into RangeOffset and RangeCount

I didn't forgot about "scan". It is explicitly mentioned to not support partition offsets.
Why? I am not aware about the use cases for it.

The main use case here is ability to disable truncation of a table with clear purpose - running multiple scylla-bench commands against single DB table for having cumulative "population" effect and not cumulative "breaking" one.

I also would prefere different CLI, something like:

   --worker-id N

where partitionOffset := (workerID - 1) * partitiionCount and truncateTable := workerID != 0

I have 2 concerns with this proposal:

  1. Why should user count number of workers if he may avoid it? I don't see sense in it.
  2. Truncation of a table should not be dependent on worker numbers at all. User decides whether he needs it or not, without relation to workers number.

@fruch
Copy link
Contributor

fruch commented Oct 21, 2024

  • lets split this one to it's two commit, and them test separately

If it is needed to truncate a DB table, then just use
following new parameter:

  -truncate-table

With this change it will be possible to use multiple scylla-bench
concurrent commands which won't truncate each other's data.

Closes: scylladb#30
Closes: scylladb#130
@vponomaryov vponomaryov force-pushed the make-truncate-be-optional branch from 1607a7b to 7aff1aa Compare October 21, 2024 17:00
@vponomaryov
Copy link
Contributor Author

  • lets split this one to it's two commit, and them test separately

Splitted the second commit to another PR here:

Copy link
Contributor

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vponomaryov , my assumption was that data validation won't work if s-b instances are writing to the same table, which is not the case, which changes a lot and now I am ok with your cli

@roydahan
Copy link
Collaborator

seems like we can merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants