-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement schema squashing #3253
Conversation
@aleksejuber PR looks pretty good to me. I just left a minor comment. Besides that, have you considered automating your md5sum test in as an integration test (possibly in |
This is a very good point, the integration test would surely benefit here to ensure initial data/schema consistency. I believe the check can be done without modifying the Go code, using modified schema initialisation bash scripts, as in
This way we will run same commands as the schema init scripts and avoid missing something out in test setup. In your opinion, should the integration tests be included in this PR or create a separate one for this issue? |
I'm OK with either way. Regarding how we implement the tests, I'd prefer having those in |
In order to evaluate the efficiency of schema updates, log the time it takes to apply every incremental schema change, as well as all the schema update time overall.
Refactor the tooling to separate the filesystem access and filtering of versioned directory names.
The directory filtering routine should accept the directories in the form of 'sx.x-y.y' which contain batched statements for version shortcuts from version x.x to version y.y If the directories are found and are within range, use the shortcuts to reduce the number of statements executed on setup.
Use "gonum.org/v1/gonum/graph" graph library and its' impementation of the Dijkstra algorithm to find the shortest path from verion X to Y, when version shortcuts are present in the schema directory.
Enable the path search algorithm in the schema update directory search routine.
Add a version shortcut from version 0.0 to 0.23 (the active schema version for Cadence version 0.11)
As suggested, split the shortcut statements in two groups: first with inserts only, second with INSERT statements for seed data.
- Use `.ElementsMatch` to compare two sets of tables - Use `.NoError` instead of `.Nil` for error asserts - Use `.Empty` instead of `Equal(0, len(...))`
05a62fa
to
f79c90b
Compare
As shortcuts for schema changes were introduced, a new integration test is required to ensure schema integrity. Test workflow is: - Enumerate all shortcuts and select target versions - Using incremental schema changes, apply each target version in order and export the schema using `cqlsh` - For each shortcut directory, copy it over, apply target version, call `cqlsh` to export the schema and compare it to the schema generated from incremental changes. On success, remove the directory. Testing: Tests pass with current schema After modifying the shortcut schema manually, test fails with ``` Diff: --- Expected +++ Actual @@ -518,3 +518,3 @@ data blob, - data_encoding text, + data_encoding blob, data_version int, ``` as expected. Add cqlsh to the testing docker Since tests are using cqlsh to export schema, add the cqlsh to the testing container
6e06287
to
d5a319f
Compare
* Add schema application timing log In order to evaluate the efficiency of schema updates, log the time it takes to apply every incremental schema change, as well as all the schema update time overall. * Add support for squashed versions The directory filtering routine should accept the directories in the form of 'sx.x-y.y' which contain batched statements for version shortcuts from version x.x to version y.y If the directories are found and are within range, use the shortcuts to reduce the number of statements executed on setup. * Split directory reads and filtering Refactor the tooling to separate the filesystem access and filtering of versioned directory names. * Implement shortest version upgrade path search Use "gonum.org/v1/gonum/graph" graph library and its' impementation of the Dijkstra algorithm to find the shortest path from verion X to Y, when version shortcuts are present in the schema directory. * Use the path search to apply schema changes Enable the path search algorithm in the schema update directory search routine. * Add squashed 0.23 version Add a version shortcut from version 0.0 to 0.23 (the active schema version for Cadence version 0.11) * Separate schema statements from inserts As suggested, split the shortcut statements in two groups: first with inserts only, second with INSERT statements for seed data. * Lint schema test base - Use `.ElementsMatch` to compare two sets of tables - Use `.NoError` instead of `.Nil` for error asserts - Use `.Empty` instead of `Equal(0, len(...))` * Add integration test for schema shortcuts As shortcuts for schema changes were introduced, a new integration test is required to ensure schema integrity. Test workflow is: - Enumerate all shortcuts and select target versions - Using incremental schema changes, apply each target version in order and export the schema using `cqlsh` - For each shortcut directory, copy it over, apply target version, call `cqlsh` to export the schema and compare it to the schema generated from incremental changes. On success, remove the directory. Testing: Tests pass with current schema After modifying the shortcut schema manually, test fails with ``` Diff: --- Expected +++ Actual @@ -518,3 +518,3 @@ data blob, - data_encoding text, + data_encoding blob, data_version int, ``` as expected. Add cqlsh to the testing docker Since tests are using cqlsh to export schema, add the cqlsh to the testing container
What changed?
For incremental schema, it is now possible to take a shortcut from one version to another without applying incremental changes one by one.
The directory name should be in format
s<from_ver>-<to_ver>
, where<to_ver>
must be greater than<from_ver>
<to_ver>
must exist in incremental schema steps, i.e. a directory with valid manifest namedv_<to_ver>
must exist<from_ver>
must either exist in incremental schema steps, or be equal to initial version0.0
Schema tool will automatically select the path with least steps.
Squashed shortcut schema from version
0.0
to0.23
included. This particular schema version is selected since it's on the tip of latest0.11
Cadence release.Why?
For Cassandra instances with large replication factors and latency schema statements might take up to a second for a trivial operation such as column creation. Given that current version on tip of master is
0.27
, the bootstrap time adds up and will grow with each schema update.Applying goal-state schema would only create 30+ entities and would run up to 4x times faster on Cassandra setups above.
Applying
schema.cql
is an option for a new installation, but prevents further incremental upgrades.How did you test it?
To measure speed improvements, application times for both schema and every incremental version is now logged. Tested by playing back incremental updates up to version
0.27
on docker instanceTime improvements
Before the change
After the change
Data validitiy
Before the change
After the change
Potential risks
As with any other schema change, there are risks involved. One of the concerns is mismatching schemas in
schema.cql
for0.11
Cadence version (v0.23
Cassandra schema) and playing back the incremental changes. Some are minor, such as column ordering, but there is a missing events table and mismatching data type for a column.However, the tests above show that shortcut schema is identical to one produced by playing incremental changes back.