-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove SASI indices #1328
Remove SASI indices #1328
Conversation
query := s.session.Query(depsSelectStmt, endTs.Add(-1*lookback), endTs) | ||
startTs := endTs.Add(-1 * lookback) | ||
var query cassandra.Query | ||
switch s.indexMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would GetDependencies
be simplified by omitting this switch? Instead we might read from SASIDisabled first, and if that fails we can read from SASIEnabled?
When migrating, this would mean that users don't need to redeploy query after moving over the dependency table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of adding the extra roundtrip time. Additionally, if the user has opted in to migrating to the new table, redeploying the query service will not be the most strenuous thing.
pkg/cassandra/config/config.go
Outdated
Port int `yaml:"port"` | ||
Authenticator Authenticator `yaml:"authenticator"` | ||
DisableAutoDiscovery bool `yaml:"disable_auto_discovery"` | ||
DependencySASIDisabled bool `yaml:"dependency_sasi_disabled"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified to SASI disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd rather it be more explicit but I can simplify
Parent: d.Parent, | ||
Child: d.Child, | ||
CallCount: int64(d.CallCount), | ||
} | ||
if s.indexMode == SASIEnabled { | ||
dep.Source = string(d.Source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assignment need to be conditional? (Also - shouldn't this be set for SASIDisabled?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if you attempt to add the source field to a schema that doesn't support it, c* will error. And good catch, it should be for disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - but that doesn't answer my question; the query string for SASIDisabled doesn't include this field anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll test it and let you know
plugin/storage/cassandra/options.go
Outdated
suffixCA = ".tls.ca" | ||
suffixServerName = ".tls.server-name" | ||
suffixVerifyHost = ".tls.verify-host" | ||
suffixDependencySASIDisabled = ".dependency-sasi-disabled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use sasi-disabled
instead of dependency-sasi-disabled
?
} | ||
|
||
// NewDependencyStore returns a DependencyStore | ||
func NewDependencyStore( | ||
session cassandra.Session, | ||
metricsFactory metrics.Factory, | ||
logger *zap.Logger, | ||
indexMode IndexMode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything preventing someone from passing in an undefined IndexMode
, like IndexMode(10) for e.g.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill validate here
plugin/storage/cassandra/factory.go
Outdated
@@ -106,7 +106,14 @@ func (f *Factory) CreateSpanWriter() (spanstore.Writer, error) { | |||
|
|||
// CreateDependencyReader implements storage.Factory | |||
func (f *Factory) CreateDependencyReader() (dependencystore.Reader, error) { | |||
return cDepStore.NewDependencyStore(f.primarySession, f.primaryMetricsFactory, f.logger), nil | |||
var ( | |||
sasiDisabled = f.Options.GetPrimary().DependencySASIDisabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following reads simpler:
indexMode := cDepStore.SASIEnabled
if f.Options.GetPrimary().DependencySASIDisabled {
indexMode = cDepStore.SASIDisabled
}
Parent: "goo", | ||
Child: "gle", | ||
Parent: "bi", | ||
Child: "ng", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quit raising entropy
SASIDisabled | ||
|
||
depsInsertStmtSASI = "INSERT INTO dependencies(ts, ts_index, dependencies) VALUES (?, ?, ?)" | ||
depsInsertStmt = "INSERT INTO dependenciesv2(ts, date_bucket, dependencies) VALUES (?, ?, ?)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do dependencies_v2
|
||
-- compaction strategy is intentionally different as compared to other tables due to the size of dependencies data | ||
CREATE TABLE IF NOT EXISTS jaeger.dependenciesv2 ( | ||
date_bucket bigint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we define it as ts_bucket timestamp
field. The code can truncate the timestamp to a Day precision, or we can even make it configurable if someone wants finer control (e.g maybe they are going to save deps every minute).
Codecov Report
@@ Coverage Diff @@
## master #1328 +/- ##
======================================
Coverage 100% 100%
======================================
Files 162 163 +1
Lines 7326 7367 +41
======================================
+ Hits 7326 7367 +41
Continue to review full report at Codecov.
|
CHANGELOG.md
Outdated
@@ -8,6 +8,17 @@ Changes by Version | |||
|
|||
##### Breaking Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing this is technically a non-breaking change. It will become breaking once we start defaulting to _v2
depsInsertStmt = "INSERT INTO dependencies(ts, ts_index, dependencies) VALUES (?, ?, ?)" | ||
depsSelectStmt = "SELECT ts, dependencies FROM dependencies WHERE ts_index >= ? AND ts_index < ?" | ||
// SASIEnabled is used when the dependency table is SASI indexed. | ||
SASIEnabled IndexMode = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it not make sense to simply refer to this as v2
throughout, including constants, variables, the CLI flags? SASI then becomes a side effect that the user doesn't really need to know about.
Parent: d.Parent, | ||
Child: d.Child, | ||
CallCount: int64(d.CallCount), | ||
} | ||
if s.indexMode == SASIDisabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another argument to refer to this change as "dependencies v2" rather than tying to SASI.
23bd58e
to
efeaa92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just need to confirm the code works with the old schema
CHANGELOG.md
Outdated
Migration Path: | ||
|
||
1. Run `plugin/storage/cassandra/schema/migration/v001tov002part1.sh` which will copy dependencies into a csv, update the `dependency UDT`, create a new `dependencies_v2` table, and write dependencies from the csv into the `dependencies_v2` table. | ||
2. Run the collector and query services with the cassandra flag `enable-dependencies-v2=true` which will update jaeger to write and read to and from the new `dependencies_v2` table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cassandra flag
enable-dependencies-v2=true
please specify the exact flag name
which will update jaeger to write
which will instruct jaeger to write
|
||
1. Run `plugin/storage/cassandra/schema/migration/v001tov002part1.sh` which will copy dependencies into a csv, update the `dependency UDT`, create a new `dependencies_v2` table, and write dependencies from the csv into the `dependencies_v2` table. | ||
2. Run the collector and query services with the cassandra flag `enable-dependencies-v2=true` which will update jaeger to write and read to and from the new `dependencies_v2` table. | ||
3. Update [spark job](https://github.com/jaegertracing/spark-dependencies) to write to the new `dependencies_v2` table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should rather say "update Spark job to version N"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll punt on this until I've actually made the change in spark.
@@ -36,6 +37,8 @@ func (d *Dependency) MarshalUDT(name string, info gocql.TypeInfo) ([]byte, error | |||
return gocql.Marshal(info, d.Child) | |||
case "call_count": | |||
return gocql.Marshal(info, d.CallCount) | |||
case "source": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if you run this code against the schema where UDT was not upgraded? Will gocql simply not invoke this function for "source"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
model/dependencies.go
Outdated
} | ||
|
||
// Sanitize sanitizes the DependencyLink. | ||
func (d DependencyLink) Sanitize() DependencyLink { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ApplyDefaults
fi | ||
|
||
|
||
while IFS="," read ts dependency; do bucket=`date +"%Y%m%d" -d "$ts"`; echo "$bucket,$ts,$dependency"; done < dependencies.csv > dependencies_datebucket.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write it in normal multi-line syntax?
echo "About to delete $row_count rows." | ||
confirm | ||
|
||
cqlsh -e "DROP INDEX IF EXISTS $keyspace.ts_index" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$keyspace.ts_index
naming 🤦♂️
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
77cd266
to
0dbea68
Compare
|
||
// IsValid returns true if the Version is a valid one. | ||
func (i Version) IsValid() bool { | ||
return i < end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a less confusing name than 'end', e.g. 'versionEnumEnd'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, Version(-1)
is valid according to this function
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang <wjang@uber.com>
Signed-off-by: Won Jun Jang wjang@uber.com
A lot of Copy Pasta from #793. Resolves #790.
Migration Path:
v001tov002part1.sh
which will copy dependencies into a csv, update thedependency UDT
, create a newdependenciesv2
table, and write dependencies from the csv into thedependenciesv2
table.dependency-sasi-disabled=true
which will update jaeger to read from the newdependenciesv2
table.dependenciesv2
table (this is a TODO).v001tov002part2.sh
which will delete the old dependency table and the SASI index.Users who wish to continue to use the SASI indices don't have to do anything as the cassandra flag
dependency-sasi-disabled
will default to false. Ideally, users will migrate on their own timeline and we can remove support for the old table in a major version release.