Consider adding LRU cache to RocksDB impl of Database #5203

MaksymZavershynskyi · 2021-11-11T20:16:41Z

We haven't had an opportunity to optimize the performance of the RPC server and the ViewClient inside nearcore. Which leads to individual RPC nodes not being able to sustain the needed throughput, which in turn leads to high cloud expenses for running multiple nodes and the need to use throttling.

Anecdotally, some people were able to optimize RPC server to achieve higher QPS by using Redis in nearcore: https://twitter.com/vgrichina/status/1458289037525934084 . We might be able to achieve QPS increase by simpler means as a temporary measure until we revisit RPC server and ViewClient implementations. I suggest we consider adding simple LRU cache from https://docs.rs/cached/0.26.2/cached/ directly into get and write methods of impl Database for RocksDB here:

nearcore/core/store/src/db.rs

Lines 469 to 563 in 7349786

    
           impl Database for RocksDB { 
        
               fn get(&self, col: DBCol, key: &[u8]) -> Result<Option<Vec<u8>>, DBError> { 
        
                   let read_options = rocksdb_read_options(); 
        
                   let result = self.db.get_cf_opt(unsafe { &*self.cfs[col as usize] }, key, &read_options)?; 
        
                   Ok(RocksDB::get_with_rc_logic(col, result)) 
        
               } 
        
               fn iter_without_rc_logic<'a>( 
        
                   &'a self, 
        
                   col: DBCol, 
        
               ) -> Box<dyn Iterator<Item = (Box<[u8]>, Box<[u8]>)> + 'a> { 
        
                   let read_options = rocksdb_read_options(); 
        
                   unsafe { 
        
                       let cf_handle = &*self.cfs[col as usize]; 
        
                       let iterator = self.db.iterator_cf_opt(cf_handle, read_options, IteratorMode::Start); 
        
                       Box::new(iterator) 
        
                   } 
        
               } 
        
               fn iter<'a>(&'a self, col: DBCol) -> Box<dyn Iterator<Item = (Box<[u8]>, Box<[u8]>)> + 'a> { 
        
                   let read_options = rocksdb_read_options(); 
        
                   unsafe { 
        
                       let cf_handle = &*self.cfs[col as usize]; 
        
                       let iterator = self.db.iterator_cf_opt(cf_handle, read_options, IteratorMode::Start); 
        
                       RocksDB::iter_with_rc_logic(col, iterator) 
        
                   } 
        
               } 
        
               fn iter_prefix<'a>( 
        
                   &'a self, 
        
                   col: DBCol, 
        
                   key_prefix: &'a [u8], 
        
               ) -> Box<dyn Iterator<Item = (Box<[u8]>, Box<[u8]>)> + 'a> { 
        
                   // NOTE: There is no Clone implementation for ReadOptions, so we cannot really reuse 
        
                   // `self.read_options` here. 
        
                   let mut read_options = rocksdb_read_options(); 
        
                   read_options.set_prefix_same_as_start(true); 
        
                   unsafe { 
        
                       let cf_handle = &*self.cfs[col as usize]; 
        
                       // This implementation is copied from RocksDB implementation of `prefix_iterator_cf` since 
        
                       // there is no `prefix_iterator_cf_opt` method. 
        
                       let iterator = self 
        
                           .db 
        
                           .iterator_cf_opt( 
        
                               cf_handle, 
        
                               read_options, 
        
                               IteratorMode::From(key_prefix, Direction::Forward), 
        
                           ) 
        
                           .take_while(move |(key, _value)| key.starts_with(key_prefix)); 
        
                       RocksDB::iter_with_rc_logic(col, iterator) 
        
                   } 
        
               } 
        
               fn write(&self, transaction: DBTransaction) -> Result<(), DBError> { 
        
                   if let Err(check) = self.pre_write_check() { 
        
                       if check.is_io() { 
        
                           warn!("unable to verify remaing disk space: {}, continueing write without verifying (this may result in unrecoverable data loss if disk space is exceeded", check) 
        
                       } else { 
        
                           panic!("{}", check) 
        
                       } 
        
                   } 
        
                   let mut batch = WriteBatch::default(); 
        
                   for op in transaction.ops { 
        
                       match op { 
        
                           DBOp::Insert { col, key, value } => unsafe { 
        
                               batch.put_cf(&*self.cfs[col as usize], key, value); 
        
                           }, 
        
                           DBOp::UpdateRefcount { col, key, value } => unsafe { 
        
                               assert!(col.is_rc()); 
        
                               batch.merge_cf(&*self.cfs[col as usize], key, value); 
        
                           }, 
        
                           DBOp::Delete { col, key } => unsafe { 
        
                               batch.delete_cf(&*self.cfs[col as usize], key); 
        
                           }, 
        
                           DBOp::DeleteAll { col } => { 
        
                               let cf_handle = unsafe { &*self.cfs[col as usize] }; 
        
                               let opt_first = self.db.iterator_cf(cf_handle, IteratorMode::Start).next(); 
        
                               let opt_last = self.db.iterator_cf(cf_handle, IteratorMode::End).next(); 
        
                               assert_eq!(opt_first.is_some(), opt_last.is_some()); 
        
                               if let (Some((min_key, _)), Some((max_key, _))) = (opt_first, opt_last) { 
        
                                   batch.delete_range_cf(cf_handle, &min_key, &max_key); 
        
                                   // delete_range_cf deletes ["begin_key", "end_key"), so need one more delete 
        
                                   batch.delete_cf(cf_handle, max_key) 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
                   Ok(self.db.write(batch)?) 
        
               } 
        
               fn as_rocksdb(&self) -> Option<&RocksDB> { 
        
                   Some(self) 
        
               } 
        
           }

write method would be responsible for invalidating LRU cache entries.

Unfortunately, AFAIR there is a suboptimality in near-api-js that forces it re-request all access from an account before signing a transaction. This will likely touch iter_* methods (likely iter_prefix method) of impl Database for RocksDB, which we might also want to wrap in cache. CC @MaximusHaximus , @frol

We most likely would want to have this cache enabled only for the RPC nodes, and we should let the Contract Runtime team know about its existence, since it might affect the measurements of the param estimator.

The text was updated successfully, but these errors were encountered:

bowenwang1996 · 2021-11-11T20:35:01Z

Rocksdb has in-memory cache internally and we have it enabled.

MaksymZavershynskyi · 2021-11-11T20:39:39Z

Rocksdb has in-memory cache internally and we have it enabled.

Given the complexity of Rocksdb and that we revisited its configuration only once, I don't think we can be sure that we configured it correctly or that it is optimized for the view calls.

bowenwang1996 · 2021-11-12T00:33:50Z

Indeed, we currently set the cache size for all columns to be 32mb

nearcore/core/store/src/db.rs

Line 664 in 6d8eee6

    
           // We create block_cache for each of 47 columns, so the total cache size is 32 * 47 = 1504mb

, but not all columns are equal. I think we can and should allocate more for ColState. For example, we can do 256mb or even 512 for ColState. @mina86 @pmnoxx given that you worked with Rocksdb before, I wonder whether this is something you could help with.

pmnoxx · 2021-11-12T01:52:07Z

@bowenwang1996 I can write a PR like this: #5212

We recently found out that our rockdb is not efficient enough. Therefore we would like to increase cache size of specific columns. In this PR we will increase cache size for `DBCol::ColState` to 128mb. We can adjust that cache size later, but we try to be conservative in memory usage. Some testing should be done to determine optimize cache sizes. Closes #5203

ilblackdragon · 2021-11-14T12:48:15Z

I would suggest to make this parameter a node configuration.

This way RPC nodes can configure a larger cache if they have more RAM compared to validator nodes or other nodes that don't need to constantly respond to view calls

pmnoxx · 2021-11-14T12:52:29Z

node configuration

I would suggest to make this parameter a node configuration.

This way RPC nodes can configure a larger cache if they have more RAM compared to validator nodes or other nodes that don't need to constantly respond to view calls

@ilblackdragon I can add it to config.json, is that what you meant?

stale · 2022-02-12T12:56:12Z

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

andrei-near · 2024-05-30T12:42:52Z

State column cache size param is already implemented #6584

MaksymZavershynskyi added A-storage Area: storage and databases T-public-interfaces Team: issues relevant to the public interfaces team T-node Team: issues relevant to the node experience team labels Nov 11, 2021

MaksymZavershynskyi assigned bowenwang1996 and mm-near Nov 11, 2021

pmnoxx mentioned this issue Nov 12, 2021

Increase cache for RockDB by column #5212

Merged

near-bulldozer bot closed this as completed in #5212 Nov 13, 2021

pmnoxx reopened this Nov 13, 2021

pmnoxx mentioned this issue Nov 14, 2021

Add store configuration to config.json #5255

Closed

stale bot added the S-stale label Feb 12, 2022

akhi3030 removed the S-stale label Jul 8, 2022

gmilescu added the Node Node team label Oct 19, 2023

gmilescu mentioned this issue Oct 23, 2023

Consider adding LRU cache to RocksDB impl of Database #9948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding LRU cache to RocksDB impl of Database #5203

Consider adding LRU cache to RocksDB impl of Database #5203

MaksymZavershynskyi commented Nov 11, 2021 •

edited by exalate-issue-sync bot

Loading

bowenwang1996 commented Nov 11, 2021

MaksymZavershynskyi commented Nov 11, 2021

bowenwang1996 commented Nov 12, 2021

pmnoxx commented Nov 12, 2021 •

edited

Loading

ilblackdragon commented Nov 14, 2021

pmnoxx commented Nov 14, 2021

stale bot commented Feb 12, 2022

andrei-near commented May 30, 2024 •

edited

Loading

Consider adding LRU cache to RocksDB impl of Database #5203

Consider adding LRU cache to RocksDB impl of Database #5203

Comments

MaksymZavershynskyi commented Nov 11, 2021 • edited by exalate-issue-sync bot Loading

bowenwang1996 commented Nov 11, 2021

MaksymZavershynskyi commented Nov 11, 2021

bowenwang1996 commented Nov 12, 2021

pmnoxx commented Nov 12, 2021 • edited Loading

ilblackdragon commented Nov 14, 2021

pmnoxx commented Nov 14, 2021

stale bot commented Feb 12, 2022

andrei-near commented May 30, 2024 • edited Loading

MaksymZavershynskyi commented Nov 11, 2021 •

edited by exalate-issue-sync bot

Loading

pmnoxx commented Nov 12, 2021 •

edited

Loading

andrei-near commented May 30, 2024 •

edited

Loading