Skip to content

feat(hash): add per-field expiration support#3346

Open
dashjay wants to merge 10 commits intoapache:unstablefrom
dashjay:feat/support-hash-fields-ttl
Open

feat(hash): add per-field expiration support#3346
dashjay wants to merge 10 commits intoapache:unstablefrom
dashjay:feat/support-hash-fields-ttl

Conversation

@dashjay
Copy link

@dashjay dashjay commented Jan 22, 2026

In Short

I introduce a way to store the TTL in an extra key, this can help make ugprade easily, no breaking change, no option for kvrocks. But the disadvantages are:

  • read/write amp (not good for rocksdb)
  • storage space amp (any key with ttl need extra key/value)
  • hlen O(N) problem
  • concurrency problem, atomic not guarantee for one key
  • SCAN became slow

advantages:

  • not need escape user value anymore (redis vset + bucket)
  • very easy to implement

TL;DR;

this pr want to close the origin issue: #2269

With reading all codes in this pr:

I came up with a way that differ from these three pr, I did learn alot from 3269 which encode the expire time into value.
But the compatibility is limited. The comment I said here: #3269 (comment) told that:

if the user's value was happended to be set to [0xFF][0xFE][1-byte flags][8-byte timestamp if flag set]..... before update, will this key be treat as expired at once ?

I have read all these PRs, and find the key problem of this new feature...

  • some time/space complexity will become O(N) from O(1)
  • how is the value encoded the expired time
  • underlying GC expired key will be cost.

So the key problem of add support of hash expiration commands for kvrocks is: Let some(every) key in hash carry a TTL while keeping the read/write hot-path still O(1) and keep all function as origin, correct and fast.

Sorry I can't agree that cmd "hlen" or any other redis command is not correct at any time, kvrocks should be 100% compactable with redis. I can just agree that if one command like "hrangebylex" is origin not exists in redis, that we can modify it's API(better not).

Following is the content of the proposal and trade off things:

Detail

redis command need to be implemented

Old commands (need compatibility)
  • hget: return not found when expired
  • hincrby: hincrby from 0 when expired
  • hincrbyfloat: hincrbyfloat from 0 when expired
  • hset: how the expire time work ?
  • hsetexpire: expire the entire hash
  • hsetnx: ""
  • hdel: return 0 if expired
  • hstrlen: check expired
  • hexists: check expired
  • hlen: big problem, size should minus expired key count
  • hmget: check expired
  • hmset: ""
  • hkeys: filtered the expired keys
  • hvals: filtered the expired values
  • hgetall: filterd the expired key-value pair
  • hscan: a lot expired key will exhaust the CPU
  • hrangebylex: ""
  • hrandfield: a lot expired key will exhaust the CPU

Where to store the timestamp ?

implements pros cons
all timestamp metadata stored in metadata_cf including the expire time of every field backward compatibility cost too much when update (write amp)
encoded the expire time in every value less cost compatibility problem
encoded the expire time in the key less cost compatibility problem
store the expire time in another encoded key less cost good compatibility

there must be an count N which make different implements has better performance, a lot trade-off work need to be done.

If I take store the expire time in another encoded key the origin command will be effect as following:

all read operation will execute twice

rocksdb:get -> rocksdb:get * 2 p99 from 25us -> 50us maybe

following command will be affected:

  • hget, hmget, hkeys, hvals, hgetall, hscan, hrangebylex, hrandfield

and how the key encoded is a problem, I came up with to ways:

  • just store them in another cf (this will bring one more cf for only a small function)
  • just use metadata.version + magic_number for storing the ttls... ( I think this will be better)

HLen problem

ok the final problem is this:

the cmd "hlen" which is origin O(1) but now need to be O(N)

I came up with two ways to solve this problem:

  • metadata.size is still the element count, hlen take O(N) to calculate alive key count
  • provide a new command "hlenrelax" provide not redis-compatable hlen, and provide not accurate count which fixed by compaction_filter

GC CompationFilter

if (IsMetadataExpired(metadata)){
    return true;
}else {
//  if filter iter on normal user field, get expire key and check expire
// if filter iter on  expire key, get user field check if it exists, if not exists we can delete
}

there is a problem, if hash metadata version not change, {key} AND {key_expire} both exists, we delete the {key} because we check the {key_expire} and found expired, but user set a new key in db, when this happen, we have to do the deletion on {key_expire} if {key} AND {key_expire} both exists.

@dashjay dashjay force-pushed the feat/support-hash-fields-ttl branch 3 times, most recently from bad48a7 to 88be89e Compare January 23, 2026 09:00
@dashjay dashjay changed the title WIP: feat(hash): add per-field expiration support feat(hash): add per-field expiration support Jan 23, 2026
@dashjay dashjay marked this pull request as draft January 27, 2026 02:20
@dashjay dashjay force-pushed the feat/support-hash-fields-ttl branch from 6775b84 to 471b3da Compare January 27, 2026 03:53
if (!s.ok() && !s.IsNotFound()) {
return s;
}
if (expire_at != NoExpireTime && expire_at < util::GetTimeStampMS()) {
Copy link
Author

@dashjay dashjay Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot expired key in hash may block scanner here

[[nodiscard]] rocksdb::Status GetSubKeyExpireTimestampMS(engine::Context &ctx, const Slice &user_key,
const Slice &hash_field, uint64_t metadata_version,
uint64_t *expired_at);
void MGetSubKeyExpireTimestampMS(engine::Context &ctx, const Slice &user_key, const std::vector<Slice> &hash_fields,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this three function define in SubKeyScanner but only used in hash for now

if (!s.ok()) {
return s;
}
*size = fields.size();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(N) problem

@dashjay dashjay force-pushed the feat/support-hash-fields-ttl branch 4 times, most recently from acfc017 to 701c404 Compare January 27, 2026 06:57
kevin and others added 9 commits February 2, 2026 17:05
Signed-off-by: kevin <kevin@kevin-desktop.lightspeed.mssnks.sbcglobal.net>
Signed-off-by: kevin <kevin@kevin-desktop.lightspeed.mssnks.sbcglobal.net>
Signed-off-by: kevin <kevin@kevin-desktop.lightspeed.mssnks.sbcglobal.net>
Signed-off-by: kevin <kevin@kevin-desktop.lightspeed.mssnks.sbcglobal.net>
Signed-off-by: kevin <kevin@kevin-desktop.lightspeed.mssnks.sbcglobal.net>
Signed-off-by: dashjay <dashjwz@gmail.com>
Signed-off-by: dashjay <dashjwz@gmail.com>
Signed-off-by: dashjay <dashjwz@gmail.com>
@dashjay dashjay force-pushed the feat/support-hash-fields-ttl branch from 701c404 to 5180789 Compare February 2, 2026 09:07
@dashjay dashjay marked this pull request as ready for review February 3, 2026 02:51
Signed-off-by: dashjay <dashjwz@gmail.com>
@dashjay dashjay force-pushed the feat/support-hash-fields-ttl branch from 707a8bc to 692386b Compare February 3, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant