Open
Description
Feature Description
Request
We would like to add Snowflake ID generation feature to Vitess.
Here is a working example on v11 code base.
It can really integrate nicely just like Vitess Sequences
Snowflake ID
- 64 bits BIGINT constructed
- Sign 1 bit - always 0
- Timestamp 42 bit - in milliseconds since chosen epoch
- Machine ID 10 bit - 1024 machines possible
- Sequence 12-bit - 4096 max value; local counter per each machine;
- Requires Snowflake generator servers and Zookeeper for Machine ID coordination - extra parts
- Good enough for ~69 years - depends on chosen epoch
How to implement
- Use Sequences code path and the same primitive
- Store Snowflake configuration (
machine_id
+chosen epoch
) in a table just like Sequences do.- It can be sharded up to 1024 shards
- It can reload its configuration using configured interval
- During first load, it will auto-initialise its
machine_id
using cross-shard TX, which is ok, because table is practically never written due to the fact that we don't need to store last generated ID - benefit of Snowflake algorithm. This will cover Reshard operation too. - VTablets will be responsible for advancing
sequence
andtimestamp
, and returningSnowflake ID
- VTGates
- random shard will receive the query
select next N values from snow_table
- will receive
timestamp
+sequence
+chosen epoch
- generate
N
required ids
- random shard will receive the query
I would like some comments on this before working on contribution. Can I use the same Sequence primitive but return 3 values instead of one - timestamp
+ sequence
+ chosen epoch
? This would make chosen epoch
configurable.
Use Case(s)
Benefits:
- Generated values are Time-Sortable - good for indexes
- Secure - very hard to guess next value, so good for business
- Nice feature - contains created_at timestamp in value!
- Using
public_id
withUUID/NanoID/ULID
requires complex app code changes, complex communication between service, take at least 2x more space - BIGINT is better here
Activity
harshit-gangal commentedon Dec 13, 2024
We can definitely extend sequences to support other type of sequence generator.
A plugin based approach will be nice to extend it.
I do not understand the need for sharding sequences.
I think we can continue to use the existing sequence to return the first ID value and generate the rest of the ID part at VTGate.
DeathBorn commentedon Dec 13, 2024
Well, sharding is for:
mattlord commentedon Dec 16, 2024
@DeathBorn I think on
main
(which would be the relevant branch for a new feature) you would want to look at the code using the vtgateengine.Generate
type:vitess/go/vt/vtgate/engine/insert_common.go
Lines 87 to 99 in 998433c
It also sounds like you might want to use SnowFlake as a new vindex type as well so that it provides a
keyspace_id
value (64 bits) and we effectively route each row to a random shard (with that shard generating the snowflake ID / keyspace ID)? And when doing so you no longer need a "global" keyspace to house a sequence table as the keyspace ID value (not really a global sequence as each machine ID will have its own sequence value) is generated using the target shard primary's machine ID (which also has to be stored and managed somewhere like the topo server). Is that correct? If so, IMO we should talk about this as a new vindex type rather than a new sequence implementation as:auto_increment
feature for sharded tablesAm I missing or misunderstanding things?
DeathBorn commentedon Dec 22, 2024
yeah,
Generate
is one of the places to work on, but alsoNextVal
query plan should be changed in VtTtablets too.Certainly, VIndex could be created too, since
machine_id
can be extracted, but we still need to store state of the last generated Snowflake ID somewhere and from my perspective vttablets are the best place to store that. And then we havechosen epoch
configuration too.If we could "pluginize" Vitess
sequence/auto_increment
+ NextVal feature, I believe multiple Vitess users would be thrilled - a way to bring any ID generation algorithm.