-
Notifications
You must be signed in to change notification settings - Fork 6.4k
IO Tracer and Parser
On Posix systems we can use existing tracing mechanism(strace, blktrace etc.) to understand the system calls, IO requests. But on storage systems where we cannot use existing tracing tools, we added a mechanism to trace IO operations to understand IO behavior of RocksDB while accessing data on the storage.
IO trace record contains following information:
Required for all records:
Column Name | Values | Comment |
---|---|---|
Access timestamp in microseconds | unsigned long | |
File Operation | string | type of operation (Append, Read,...). |
Latency | unsigned long | |
IO Status | IO Status of the file operation returned. | |
File Name | string | File name is printed instead of full file path |
Based on File Operation:
Column Name | Values | Comment |
---|---|---|
Length | unsigned long | |
Offset | unsigned long | |
File Size | unsigned long |
An example to start IO tracing:
Env* env = rocksdb::Env::Default();
EnvOptions env_options;
std::string trace_path = "/tmp/binary_trace_test_example”;
std::unique_ptr<TraceWriter> trace_writer;
DB* db = nullptr;
std::string db_name = "/tmp/rocksdb”;
/*Create the trace file writer*/
NewFileTraceWriter(env, env_options, trace_path, &trace_writer);
DB::Open(options, dbname);
/*Start IO tracing*/
db->StartIOTrace(env, trace_opt, std::move(trace_writer));
/*Your call of RocksDB APIs */
DB::Put();
/*End IO tracing*/
db->EndIOTrace();
If you call DB::Put then io_tracer will record all the FileSystem APIs called during DB::Put.
- Added tracing wrappers like
FileSystemTracingWrapper
extendsFileSystemWrapper
,FSRandomRWFileTracingWrapper
extendsFSRandomRWFileWrapper
, etc that calls the underlying FileSystem APIs and log the tracing. - In FileSystemTracingWrapper APIs (for eg
FileSystemTracingWrapper::Close()
):- Call underlying
FileSystem::Close()
, - Create IOTraceRecord,
- Call
IOTracer::WriteIOOp
to dump the trace in trace file.
- Call underlying
- Added new classes
FileSystemPtr
, etc. that overloads -> operator. It returns the appropriate f/s pointer based on tracing is enabled/disabled to avoid tracing overhead. - Details can be found in:
The trace file generated from IO tracing is in binary format. So parser can be used to read that binary trace file
./io_tracer_parser -io_trace_file trace_file
Implementation details can be found in https://github.com/facebook/rocksdb/tree/main/tools/io_tracer_parser_tool.h
- Trace DB::Open
- Include more information in trace format
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc