CImagesfb5c3705-b7dd-43e9-94e2-a2393bc8bd79

Abstract

MyRocks is a pioneering MySQL storage engine that meticulously integrates RocksDB, a high-performance, embeddable key-value store initially developed by Meta (formerly Facebook), into the robust MySQL database ecosystem. Conceived to substantially enhance write efficiency and optimize storage utilization, MyRocks strategically leverages the Log-Structured Merge-tree (LSM-tree) architecture to systematically address inherent limitations often encountered in traditional B-tree-based storage engines, such as InnoDB. This comprehensive paper undertakes an exhaustive examination of MyRocks, meticulously dissecting its intricate architectural design, evaluating its distinctive performance characteristics across diverse workloads, outlining critical operational considerations for deployment and management, and assessing its ultimate suitability for the demanding requirements of hyperscale, data-intensive environments. By analyzing these multi-faceted aspects, this document aims to furnish a profound and holistic understanding of MyRocks, elucidating its strategic position and pivotal role within contemporary enterprise database systems and cloud infrastructures.

1. Introduction

The relentless evolution of modern database systems has been perpetually driven by an unyielding pursuit of performance optimization, particularly acute in scenarios characterized by write-intensive operations and stringent storage constraints. For decades, traditional storage engines, exemplified by MySQL’s venerable InnoDB, have stood as the de facto standard for relational databases, underpinning countless mission-critical applications globally. InnoDB, a robust and ACID-compliant engine, excels in online transaction processing (OLTP) workloads due to its mature concurrency control mechanisms, reliable recovery features, and efficient B-tree indexing. However, as the digital landscape has progressively expanded, leading to unprecedented explosions in data volumes, concurrent transaction rates, and the proliferation of flash-based storage technologies, the inherent architectural paradigms of B-tree systems have begun to manifest discernible limitations. These limitations often present as elevated write amplification, suboptimal storage efficiency for append-heavy workloads, and sometimes, challenges in scaling write throughput linearly under extreme load.

In response to these escalating challenges, the database community has actively explored alternative data structures and storage methodologies. The Log-Structured Merge-tree (LSM-tree) emerged as a compelling paradigm, intrinsically optimized for environments demanding high write throughput and substantial data ingestion rates. Recognizing the transformative potential of LSM-tree architectures, Meta (Facebook) pioneered RocksDB, a highly optimized, embeddable key-value store built upon LSM-tree principles, specifically engineered to meet their immense internal data storage and processing needs. MyRocks represents the strategic culmination of integrating RocksDB’s cutting-edge capabilities directly into the MySQL framework, offering a novel storage engine that synergistically combines the established relational model and extensive ecosystem of MySQL with the unparalleled write efficiency and storage optimization inherent in RocksDB’s design. This fusion endeavors to provide a potent solution for modern applications where data ingress, high-volume writes, and efficient storage are paramount, thereby extending MySQL’s applicability to a broader spectrum of hyperscale use cases that were previously challenging for B-tree engines alone.

2. MyRocks Architecture

At its foundational core, MyRocks is not merely a re-implementation of MySQL’s internal components but rather a sophisticated integration that allows MySQL to leverage RocksDB as its underlying data persistence layer. This architectural choice bestows MyRocks with distinctive advantages, fundamentally differentiating it from other MySQL storage engines.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

2.1 Integration of RocksDB with MySQL

MyRocks functions as a pluggable storage engine for MySQL, adhering to the MySQL Storage Engine API. This design philosophy is crucial as it permits MyRocks to seamlessly integrate with the MySQL server, utilizing its parser, optimizer, query cache (though less relevant for transactional workloads), replication mechanisms, and other higher-level functionalities, while replacing the traditional data storage and indexing layer. Essentially, MySQL perceives MyRocks as another option alongside InnoDB, MyISAM, etc., for managing table data.

When a table is created with ENGINE=MyRocks, MySQL directs all data manipulation language (DML) operations (INSERT, UPDATE, DELETE) and data definition language (DDL) operations (ALTER TABLE) to the MyRocks engine. MyRocks then translates these relational operations into key-value operations that RocksDB can understand and execute. For instance, a row in a MySQL table becomes one or more key-value pairs in RocksDB, where the primary key of the MySQL table forms the basis of the RocksDB key, and the row data constitutes the value. Secondary indexes are implemented as additional key-value pairs where the index key forms the RocksDB key and the primary key (or a pointer to it) forms the value, enabling efficient lookups. This granular control over the underlying key-value store allows MyRocks to exploit RocksDB’s advanced features, such as efficient data compression and highly optimized write paths, while preserving the familiarity and compatibility with existing MySQL applications, drivers, and administrative tools (MariaDB.com, n.d. [1]).

The integration involves a thin translation layer that maps MySQL’s relational constructs (tables, columns, indexes, transactions) to RocksDB’s key-value primitives. This abstraction ensures that MyRocks can benefit from future enhancements in RocksDB without requiring substantial changes to the MySQL server itself. It also means that MyRocks inherits RocksDB’s strengths in handling large datasets and high write throughput, which are critical for Meta’s scale.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

2.2 Log-Structured Merge-Tree (LSM-Tree) Structure

The quintessential characteristic underpinning MyRocks’ superior write efficiency is its adoption of the Log-Structured Merge-tree (LSM-tree) as its core data structure. In stark contrast to B-tree systems that perform in-place updates, LSM-trees append all writes, making them inherently optimized for write-heavy workloads and sequential I/O patterns that are highly favorable for flash storage devices. The fundamental principle of an LSM-tree is to defer and batch random writes into sequential writes on disk.

The LSM-tree operates on a multi-layered architecture:

MemTables (In-Memory Buffer): All incoming writes (inserts, updates, deletes) are initially appended to a mutable in-memory data structure, typically a SkipList or a similar sorted structure, known as the MemTable. Writes to the MemTable are exceptionally fast as they are performed in RAM. To ensure durability and recoverability in the event of a crash, these writes are simultaneously appended to a Write-Ahead Log (WAL) on disk. The WAL is a purely append-only sequential log, providing crash-recovery semantics without incurring the random I/O penalties associated with B-tree writes. When a MemTable reaches a predefined size (e.g., 64MB or 256MB), it becomes immutable and is then referred to as a ‘read-only MemTable’ or ‘flushable MemTable’ (AWS.Amazon.com, 2021 [2]). A new, empty MemTable is then created to handle subsequent writes.
Sorted String Tables (SSTables) / Data Files: Once an immutable MemTable is full, its contents are asynchronously flushed to disk as a new, immutable Sorted String Table (SST) file. These files contain sorted key-value pairs. Since each MemTable is sorted before flushing, each SST file is also sorted internally. This write-once, append-only nature of SST files is a cornerstone of LSM-tree’s efficiency, as it minimizes random writes to disk.
Compaction: As more and more MemTables are flushed, a growing number of SST files accumulate on disk. This proliferation of files, some containing redundant or stale data (due to updates or deletes), would negatively impact read performance and storage efficiency. To mitigate this, LSM-trees employ a background process called compaction. Compaction involves reading multiple SST files (potentially from different levels), merging their sorted contents, resolving conflicting or overwritten keys (keeping only the latest version), deleting data marked for deletion, and then writing a new, larger, sorted SST file to a lower level (or different compaction output). The old SST files are then asynchronously deleted. Compaction is critical for:
- Garbage Collection: Removing obsolete versions of data.
- Data Aggregation: Combining smaller files into larger ones.
- Storage Efficiency: Applying further compression to larger blocks of data.
- Read Performance: Reducing the number of SST files that need to be scanned during a read operation.
RocksDB, and thus MyRocks, supports various compaction strategies, primarily Leveled Compaction and Universal/Tiered Compaction. Leveled compaction arranges SST files into distinct levels (L0, L1, L2, … Ln), where files in higher levels contain increasingly older and more consolidated data. Compaction typically merges files from one level into the next. Tiered compaction, on the other hand, collects SST files into a few tiers, compacting them only when a tier exceeds a certain size threshold. Leveled compaction generally offers better read performance and storage efficiency but can incur higher write amplification (more data rewritten during compaction), whereas tiered compaction tends to have lower write amplification but potentially higher read amplification (more files to check during reads). MyRocks often uses a variant of leveled compaction as its default (Slideshare.net, n.d. [3]).
Bloom Filters: To accelerate read operations, especially for keys that do not exist, MyRocks utilizes Bloom filters. A Bloom filter is a probabilistic data structure that efficiently tests whether an element is a member of a set. For each SST file, a Bloom filter is created that represents all the keys contained within that file. Before accessing an SST file on disk, MyRocks can query its associated Bloom filter to quickly determine if the requested key might be in that file. If the Bloom filter returns a ‘no’, MyRocks can confidently skip reading that SST file, thereby significantly reducing disk I/O for non-existent keys or keys located in other files (AWS.Amazon.com, 2021 [2]). While Bloom filters can produce false positives (indicating a key might be present when it is not), they never produce false negatives.

This LSM-tree architecture inherently reduces write amplification compared to B-trees, which suffer from B-tree page splits and ripple effects during updates. The sequential nature of LSM-tree writes is particularly well-suited for SSDs, which prefer sequential writes and have a finite number of program-erase cycles.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

2.3 Data Compression Mechanisms

MyRocks places a strong emphasis on data compression, which is a key factor in its superior storage efficiency and, consequently, its I/O performance. By reducing the physical size of data on disk, MyRocks not only conserves valuable storage resources but also minimizes the amount of data that needs to be read from and written to storage devices. This reduction directly translates to improved I/O throughput and lower latency, as less data needs to be moved across the storage bus.

MyRocks leverages several industry-standard compression algorithms, allowing users to select the most appropriate one based on their specific workload characteristics and trade-offs between compression ratio and CPU overhead:

Snappy: Developed by Google, Snappy is known for its high compression and decompression speeds, making it suitable for scenarios where CPU overhead must be minimized, even if it means a slightly lower compression ratio. It’s often a good default for general-purpose use.
Zlib (DEFLATE): A more robust compression algorithm that typically achieves better compression ratios than Snappy but at the cost of higher CPU utilization during compression and decompression. Zlib is a good choice for scenarios where storage savings are prioritized and CPU resources are relatively abundant.
Zstandard (ZSTD): Developed by Meta (Facebook), Zstandard offers an excellent balance between compression ratio and speed. It often outperforms both Snappy (in compression ratio) and Zlib (in speed), providing a highly optimized solution for modern workloads. Zstandard includes multiple compression levels, allowing users to fine-tune the balance between speed and ratio (Slideshare.net, n.d. [3]).
LZ4: An extremely fast compression algorithm, primarily optimized for speed over compression ratio. It’s ideal for use cases where data must be processed with minimal latency and even a small reduction in data size can be beneficial.

MyRocks applies compression at the block level within SST files. When an SST file is written, its data is divided into blocks (e.g., 4KB, 8KB, 16KB blocks), and each block is compressed individually. This block-level compression allows for efficient random access to data within the file, as only the required block needs to be decompressed during a read operation, rather than the entire file. Furthermore, MyRocks can utilize dictionary compression, where a common dictionary of frequently occurring phrases or patterns is built and used to compress data more effectively, particularly for highly repetitive data. The choice of compression algorithm is configurable per table or globally, allowing administrators to optimize for different data types or access patterns. This multi-faceted approach to compression is a significant differentiator for MyRocks, enabling it to achieve storage savings often cited as 2x better than compressed InnoDB and 3-4x better than uncompressed InnoDB (MyRocks.io, n.d. [4]).

Many thanks to our sponsor Esdebe who helped us prepare this research report.

2.4 Column Families

RocksDB, and by extension MyRocks, introduces the concept of Column Families. While not directly analogous to relational database columns, Column Families in RocksDB represent isolated key-value spaces within a single RocksDB instance. Each Column Family has its own set of MemTables, SST files, and independent compaction processes. This allows for fine-grained control over how different sets of data are stored and managed.

In MyRocks, Column Families are primarily utilized to segregate primary table data from secondary index data. For example, the primary key and the corresponding row data for a MySQL table might reside in one Column Family, while each secondary index for that table could reside in its own distinct Column Family. This architectural separation offers several benefits:

Independent Tuning: Each Column Family can be configured with its own compression algorithm, block size, compaction strategy, and other RocksDB-specific parameters. This allows for optimized storage and performance based on the specific characteristics of primary data versus index data. For instance, index data (often smaller, more uniform) might benefit from different compression or cache settings than large, variable-length row data.
Reduced Interference: Compaction operations for one Column Family do not directly interfere with the compaction operations of another. This isolation helps maintain consistent performance, especially in highly concurrent environments where primary table writes might be competing with index updates.
Resource Allocation: While MyRocks internally manages Column Families, this feature provides a robust underlying structure for managing different logical data groups efficiently within the same database instance.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

2.5 Concurrency Control and Transactions

MyRocks, like InnoDB, supports ACID (Atomicity, Consistency, Isolation, Durability) transactions. However, its approach to concurrency control differs due to the underlying LSM-tree structure. MyRocks primarily employs optimistic concurrency control and relies on Multi-Version Concurrency Control (MVCC) principles, similar to InnoDB, but adapted for its append-only nature.

MVCC: When a row is updated or deleted, MyRocks does not physically overwrite or remove the data immediately. Instead, it writes a new version of the row (for updates) or a tombstone (for deletes) to the MemTable. Older versions of the data or the actual deleted rows are eventually removed during the background compaction process. This multi-versioning allows readers to access a consistent snapshot of the data without being blocked by writers, and vice-versa, significantly reducing contention.
Transaction Management: MyRocks manages transactions by coordinating writes to the MemTable and WAL. For committed transactions, the changes are durable once written to the WAL. Rollbacks are handled by simply discarding changes in memory that haven’t been flushed or by marking them as invalid. MyRocks implements a locking mechanism at a finer grain than InnoDB’s row-level locks, typically operating on key ranges or even specific keys within RocksDB. However, the exact locking semantics and their implications for contention vary slightly.
Atomic Writes: A single write to MyRocks (e.g., an INSERT statement) involves writing the key-value pair to the MemTable and the WAL atomically. For multi-statement transactions, MyRocks ensures atomicity across multiple key-value operations within RocksDB, allowing for all-or-nothing changes.

While MyRocks provides transactional guarantees, highly contentious OLTP workloads with frequent in-place updates and strong isolation levels (like REPEATABLE READ in InnoDB) might experience different performance characteristics compared to append-only workloads. The LSM-tree’s design often shines brightest where new data is predominantly added or existing data is frequently updated in a way that generates new versions rather than requiring immediate in-place modification.

3. Performance Characteristics

MyRocks’ architectural choices, particularly the LSM-tree and advanced compression, fundamentally shape its performance profile, making it exceptionally strong in certain areas while presenting trade-offs in others.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

3.1 Write Efficiency and Write Amplification

MyRocks’ most celebrated performance attribute is its superior write efficiency, especially for flash-based storage systems. This efficiency directly stems from the LSM-tree’s design, which minimizes write amplification. Write amplification (WA) is a critical metric, defined as the ratio of data physically written to the storage medium to the logical amount of data written by the application. High write amplification accelerates wear on flash devices (SSDs) and consumes significant I/O bandwidth, leading to performance bottlenecks and reduced SSD lifespan.

In B-tree based engines like InnoDB, updates and deletes often involve:
1. Reading the B-tree page from disk into memory.
2. Modifying the page in memory.
3. Writing the modified page back to disk (often requiring a full page write, even for small changes).
4. Writing to the doublewrite buffer for crash safety.
5. Writing to the redo log.

These operations can result in significant random I/O and high write amplification (e.g., 10x-30x or more for certain workloads), as small logical changes can trigger multiple physical writes to different locations on disk, potentially involving page splits and reorganizations.

Conversely, MyRocks’ LSM-tree approach inherently reduces write amplification because:
1. Append-only Writes: All writes are first buffered in memory (MemTable) and then flushed sequentially to disk as immutable SST files. This sequential writing pattern is highly efficient for SSDs, which prefer large, sequential writes.
2. Batching and Merging: Updates and deletes are not applied in-place. Instead, new versions or tombstones are appended. During compaction, multiple versions of a key are merged, and only the latest version (or the absence of a key if deleted) is written to the next level. This process effectively ‘compresses’ multiple logical writes into fewer physical writes, significantly reducing the total volume of data written to disk (MyRocks.io, n.d. [4]; SmallDatum.blogspot.com, 2016 [5]).

This reduction in write amplification extends the lifespan of flash storage, as SSDs have a finite number of program-erase cycles. For write-intensive workloads such as social media feeds, IoT data ingestion, and extensive logging systems, where millions of writes per second are common, MyRocks can sustain significantly higher write throughput with less wear on the underlying hardware compared to InnoDB.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

3.2 Read Performance

While MyRocks is lauded for its write performance, its read performance is also robust, though subject to different considerations than B-trees. The primary challenge for LSM-trees regarding reads is that a key-value pair might exist in multiple SST files across different levels, or it might not exist at all, requiring checks across multiple files. MyRocks employs several strategies to mitigate potential read amplification and ensure efficient read operations:

Block Cache: Similar to InnoDB’s buffer pool, MyRocks utilizes a block cache (often residing in RAM) to store recently accessed data blocks (compressed or decompressed) and index blocks from SST files. A large and well-managed block cache is crucial for MyRocks’ read performance, as it reduces the need to access slower disk storage. Configuring an adequate block cache size is one of the most important tuning parameters.
Bloom Filters: As discussed, Bloom filters associated with each SST file provide a probabilistic way to quickly determine if a key is not present in a file, allowing the engine to skip unnecessary disk I/O. This is particularly effective for read-misses (queries for non-existent keys).
Compaction Strategies: Effective compaction strategies are vital for read performance. By continually merging and consolidating SST files, compaction reduces the number of files that need to be scanned during a read operation. A poorly tuned or overwhelmed compaction process can lead to read amplification, where a read query might need to check many smaller, fragmented SST files across various levels to find the latest version of a key.
Leveling: In leveled compaction, data is organized into levels, with each level containing a subset of the data. Reads typically start from L0 (most recent data) and proceed downwards. If a key is not found in an upper level, the search continues to lower levels. While this can involve checking multiple files, the sorted nature of SST files and the use of block caches and Bloom filters optimize this process (AWS.Amazon.com, 2021 [2]).

Despite these optimizations, highly random read workloads that frequently access a small, non-cached working set or workloads that involve many updates to the same keys (which can create many versions across levels) might experience higher read latencies in MyRocks compared to InnoDB. This is because a read might need to traverse multiple SST files and levels to find the latest version of a key, potentially incurring multiple disk seeks. However, for workloads with a good cache hit ratio or predominantly sequential reads, MyRocks can offer competitive or even superior read performance.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

3.3 Storage Efficiency

MyRocks’ storage efficiency is a significant advantage, directly contributing to reduced infrastructure costs and improved I/O performance. This efficiency is achieved through a synergy of its LSM-tree architecture and sophisticated data compression techniques:

LSM-Tree’s Append-Only Nature: Unlike B-trees, which suffer from internal fragmentation due to in-place updates, page splits, and deleted rows occupying space until defragmentation, LSM-trees append data. Obsolete versions and deleted data are naturally purged during the compaction process, ensuring that only the latest, relevant data persists on disk. This garbage collection mechanism is highly efficient.
Advanced Compression Algorithms: As detailed in Section 2.3, MyRocks supports a range of powerful compression algorithms (Snappy, Zlib, Zstandard, LZ4) applied at the block level. The sorted nature of data within SST files further enhances compression ratios, as similar data values tend to be co-located within blocks, leading to better compressibility. MyRocks often achieves higher compression ratios than InnoDB, especially with Zstandard, leading to substantial disk space savings (MyRocks.io, n.d. [4]). Anecdotal and reported benchmarks frequently show MyRocks achieving up to 2x better compression than compressed InnoDB and 3-4x better compression than uncompressed InnoDB, making it an attractive choice for environments where storage costs are a primary concern (MyRocks.io, n.d. [4]). This efficiency also extends to backup storage, as less data needs to be archived.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

3.4 CPU and Memory Utilization

MyRocks’ resource consumption profile differs from InnoDB’s due to its LSM-tree architecture:

CPU Utilization: While MyRocks reduces I/O, it can be more CPU-intensive, particularly due to:
- Compaction: The background compaction process continuously merges and rewrites data, which involves CPU-intensive operations like sorting, merging, and compression/decompression. For very high write throughput, compaction can consume significant CPU cycles. However, this is largely a background process, and RocksDB is designed to throttle compaction to avoid impacting foreground operations excessively.
- Compression/Decompression: While reducing I/O, the act of compressing data before writing and decompressing it during reads consumes CPU. The choice of compression algorithm directly impacts this (e.g., Zlib is more CPU-intensive than Snappy).
Memory Utilization: MyRocks utilizes memory for:
- MemTables: In-memory buffers for active writes. Multiple MemTables can exist (active and immutable ones awaiting flush).
- Block Cache: Crucial for read performance, similar to InnoDB’s buffer pool. A larger block cache reduces disk I/O but consumes more RAM. MyRocks can use two types of block caches: a compressed block cache and an uncompressed block cache. The uncompressed block cache stores decompressed data blocks, while the compressed block cache stores compressed blocks, which saves memory but incurs CPU overhead for decompression during access.
- Index and Filter Blocks: Memory is also used to store index blocks and Bloom filters, which are essential for efficient lookups.

Compared to InnoDB, MyRocks might demonstrate higher CPU utilization for background tasks but potentially lower CPU for foreground I/O operations if the workload is write-heavy and I/O-bound in InnoDB. Memory requirements can be comparable or even higher for optimal performance, especially if a large block cache is configured to mitigate read amplification. The trade-off is often between CPU (for compaction/compression) and I/O (for disk writes/reads).

Many thanks to our sponsor Esdebe who helped us prepare this research report.

3.5 Transaction Performance

MyRocks provides full ACID transactional semantics, including support for rollbacks and atomicity. Its MVCC implementation ensures that readers do not block writers and vice-versa, allowing for high concurrency. However, the performance characteristics of transactions can vary compared to InnoDB:

Write Throughput: For workloads dominated by inserts or appends, MyRocks can achieve significantly higher transaction throughput than InnoDB due to its sequential write pattern and reduced write amplification. Transactions involving many small, sequential writes will benefit immensely.
Updates and Deletes: While MyRocks handles updates and deletes by writing new versions or tombstones, frequent updates to the same hot rows can lead to increased space amplification (many versions of the same key accumulating before compaction) and potentially higher read amplification (more files to scan for the latest version). However, for workloads where updates are less frequent per key, or where keys are often new (e.g., time-series data), MyRocks performs exceptionally well.
Long-Running Transactions: Long-running read transactions in MVCC systems can prevent the garbage collection of older data versions, leading to increased disk space usage and potentially impacting compaction efficiency. MyRocks manages this by tracking active snapshots and only deleting data versions older than the oldest active snapshot. This is a common consideration for all MVCC databases.

Overall, MyRocks’ transactional performance is optimized for high concurrency and throughput in append-heavy, write-intensive environments, making it a strong contender for applications that generate large volumes of data.

4. Operational Considerations

Deploying and managing MyRocks effectively requires understanding its unique characteristics and adjusting operational practices accordingly. While it integrates with MySQL’s standard tools, certain aspects demand specific attention.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

4.1 Backup Strategies

Implementing robust backup strategies is critical for any database, and MyRocks, with its LSM-tree architecture, offers interesting opportunities and specific considerations:

Physical Backups: MyRocks supports physical backups, which involve copying the underlying SST files and the Write-Ahead Log (WAL). Due to the immutable nature of SST files, physical backups can be efficient. A consistent snapshot can be taken by freezing writes briefly (or using file system snapshots) and then copying the files. The WAL ensures point-in-time recovery capabilities. This method is fast for large databases.
Differential and Incremental Backups: The immutability of SST files makes differential backups particularly attractive. Once an SST file is flushed to disk, it does not change. Therefore, differential backups can simply identify and copy only the newly created SST files and relevant WAL segments since the last full or differential backup. This significantly reduces the amount of data transferred and stored for subsequent backups, offering faster backup times and lower storage costs. Uber, for example, has leveraged this aspect for significant storage savings (InfoQ.com, 2024 [6]).
Logical Backups: Standard MySQL logical backup tools like mysqldump or mysqlpump can be used with MyRocks. These tools extract data row by row as SQL statements. While versatile and compatible, logical backups are generally slower and produce larger backup files compared to physical backups, making them less ideal for very large datasets.
Point-in-Time Recovery (PITR): MyRocks fully integrates with MySQL’s binary logging (binlog) mechanism. By combining a full physical backup with the binary logs, it’s possible to recover the database to any specific point in time, even up to the moment of failure. This is standard MySQL functionality that MyRocks inherits.
Hot Backups: MyRocks supports ‘hot’ backups where the database remains fully operational during the backup process, often achieved by leveraging RocksDB’s internal snapshotting capabilities and coordinating with MySQL’s FLUSH TABLES WITH READ LOCK (or similar for consistent snapshots).

Effective backup strategies for MyRocks often combine physical, incremental backups with binary logs for PITR, providing a balance of speed, efficiency, and recovery granularity.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

4.2 Migration from InnoDB

Migrating an existing MySQL database from InnoDB to MyRocks involves several steps, balancing data integrity with minimal downtime:

Assessment: Evaluate the workload. MyRocks is not a universal replacement for InnoDB. It excels in write-intensive, storage-constrained scenarios. For read-heavy OLTP or applications with complex transactions and high update-in-place contention, InnoDB might remain superior. Identify tables that would benefit most from MyRocks.
Data Migration Methods:
- ALTER TABLE ... ENGINE=MyRocks;: For smaller tables, this online (or offline, depending on MySQL version and features) DDL statement is the simplest. MySQL will internally copy data from the InnoDB table to a new MyRocks table and then atomically swap them. This locks the table for the duration of the operation.
- mysqldump and mysqlpump: Export the entire database (or specific tables) using mysqldump or mysqlpump with the --no-create-info flag (to avoid creating InnoDB tables) or by editing the dump file to change ENGINE=InnoDB to ENGINE=MyRocks. Then import into a MyRocks-configured instance. This method requires significant downtime for large databases.
- Online Schema Change Tools: Tools like Percona Toolkit’s pt-online-schema-change or Ghost’s gh-ost can perform the ALTER TABLE operation with minimal or no downtime by creating a new table, copying data, and using triggers to keep it in sync before a final cutover. These tools are often preferred for production migrations.
- Logical Replication: A highly recommended approach for large databases. Set up MySQL replication where the primary is InnoDB and the replica is MyRocks. Once the replica catches up and is verified, perform a planned failover (Meta’s approach, Engineering.FB.com, 2017 [7]). This allows for zero-downtime migration and a rollback path.
Consistency Verification: Post-migration, thoroughly verify data consistency between the source and target. Use checksums or record counts.
Performance Tuning and Testing: After migration, significant performance tuning and load testing are crucial. MyRocks has different performance characteristics and tuning parameters than InnoDB.

Migration should always be performed in a staged manner, starting with development and testing environments, then staging, before production, with thorough monitoring and rollback plans.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

4.3 Configuration and Tuning

Optimal performance in MyRocks is achieved through meticulous configuration and continuous tuning, as its LSM-tree nature exposes different knobs compared to B-trees. Key parameters include:

rocksdb_block_cache_size: This is analogous to InnoDB’s innodb_buffer_pool_size. It defines the size of the uncompressed data block cache. A larger cache reduces disk I/O and improves read performance. This is often the most critical parameter to tune. Setting it too small will lead to high read amplification from disk.
rocksdb_write_buffer_size and rocksdb_max_write_buffer_number: These control the size and number of MemTables. rocksdb_write_buffer_size determines when a MemTable is flushed to disk. rocksdb_max_write_buffer_number defines how many immutable MemTables can exist before MyRocks throttles incoming writes. Proper tuning ensures efficient flushing without overwhelming memory or causing write stalls.
rocksdb_compaction_readahead_size: This parameter determines the size of the readahead buffer used during compaction, impacting how quickly data can be read from disk for merging. Tuning this can improve compaction throughput.
rocksdb_level_compaction_dynamic_level_bytes and rocksdb_num_levels: These parameters influence how MyRocks distributes SST files across different levels and triggers compaction. Dynamic level bytes helps ensure that lower levels (older data) are larger, promoting efficient space usage and read performance. Tuning rocksdb_num_levels affects the depth of the LSM-tree.
rocksdb_bytes_per_sync: Controls how often data is synced to disk during sequential writes (e.g., during MemTable flush or compaction). A higher value can improve write throughput but increases the amount of data at risk in case of a crash.
Compression Algorithms (rocksdb_default_cf_options='compression=...'): As discussed, selecting the appropriate compression algorithm (Snappy, Zlib, Zstandard, LZ4) and potentially its compression level directly impacts storage savings, CPU utilization, and I/O efficiency (Slideshare.net, n.d. [8]).
Bloom Filters: rocksdb_bloom_bits controls the size and accuracy of Bloom filters. Higher values reduce false positives (improving read efficiency for non-existent keys) but consume more memory.
rocksdb_max_background_jobs: Controls the number of threads available for background tasks like flushing MemTables and performing compactions. Tuning this ensures that MyRocks can keep up with the write load and compaction pressure.

Monitoring key MyRocks metrics (e.g., compaction progress, block cache hit ratio, write stalls, I/O rates) is essential for identifying bottlenecks and adjusting these parameters effectively. Tools like SHOW ENGINE ROCKSDB STATUS provide valuable insights.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

4.4 Monitoring and Troubleshooting

Effective monitoring is paramount for MyRocks, as its internal LSM-tree dynamics can present different performance bottlenecks than InnoDB. Key metrics to observe include:

RocksDB-specific Status Variables: MySQL’s SHOW ENGINE ROCKSDB STATUS provides a wealth of information, including block cache usage, compaction statistics (bytes read/written, number of compactions), MemTable flushes, write stalls, and error rates. These are crucial for understanding MyRocks’ internal health.
Write Amplification: While not directly exposed as a single metric, it can be inferred by comparing logical writes (e.g., Innodb_rows_inserted, Innodb_rows_updated for a MyRocks table) with physical writes (bytes written to disk by MyRocks, which can be seen in OS-level disk metrics or MyRocks’ internal write statistics). High WA might indicate aggressive compaction or suboptimal configuration.
Read Amplification: Also inferred. High logical reads (e.g., Select_scan, Select_range) accompanied by high physical reads from disk (MyRocks’ read statistics, rocksdb_block_cache_hit_ratio being low) suggest read amplification. Optimizing Bloom filters, block cache, and compaction is key here.
Compaction Backlog/Stalls: If the number of pending compactions grows or MyRocks enters a write stall state (where writes are temporarily blocked because compaction cannot keep up), it’s a critical sign of a bottleneck. Adjusting rocksdb_max_background_jobs or rocksdb_max_write_buffer_number might be necessary.
Memory Usage: Monitor MyRocks’ memory footprint, especially the block cache and MemTables, to ensure it aligns with configured limits and doesn’t lead to OOM issues.
CPU Utilization: Keep an eye on CPU usage, especially for background threads involved in compaction and compression, to ensure it doesn’t saturate the system.

Troubleshooting MyRocks often involves analyzing these metrics to pinpoint whether performance issues stem from write amplification, read amplification, insufficient cache, or overwhelmed compaction processes. Logs (MySQL error log, RocksDB LOG file) are also invaluable for diagnostic purposes.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

4.5 Replication

MyRocks integrates seamlessly with MySQL’s native replication capabilities, utilizing the binary log (binlog) for logical replication. This means a MyRocks instance can act as a primary or a replica in a standard MySQL replication topology. The row-based binary logging format (RBR) is generally preferred for MyRocks, as it accurately records logical changes to rows, which the replica can then apply.

Primary MyRocks to Replica MyRocks: This is a straightforward setup. All changes on the primary MyRocks instance are logged to the binlog, and the replica applies these changes to its own MyRocks tables.
Primary InnoDB to Replica MyRocks: This configuration is particularly useful for migrations (as mentioned in Section 4.2) or for specific use cases where an InnoDB primary handles mixed workloads, and a MyRocks replica serves as a highly efficient write-optimized slave, possibly for analytical queries or specific high-write applications. The logical RBR ensures compatibility.
Primary MyRocks to Replica InnoDB: While less common, this setup is also possible, with the MyRocks primary logging changes to the binlog, and the InnoDB replica applying them. This might be considered for hybrid environments or specific application needs.

MyRocks ensures that transactions are written to the binlog after they are committed and durable, maintaining standard replication consistency models. For high-volume write environments, the efficiency of MyRocks on the primary can significantly reduce the write load on the underlying storage, making replication more robust.

5. Suitability for Hyperscale Workloads

MyRocks was explicitly designed by Meta to address the extreme demands of hyperscale environments, where data volumes are immense, write throughput is astronomical, and storage efficiency is a non-negotiable requirement. Its architectural strengths make it highly suitable for specific types of large-scale applications.

Many thanks to our sponsor Esdebe who helped us prepare this research report.

5.1 Scalability

MyRocks significantly enhances MySQL’s horizontal scalability potential by optimizing the underlying storage engine for high-volume data ingestion. While MySQL itself can be scaled horizontally through sharding, MyRocks makes each shard (or node) more efficient at handling its portion of the data:

Efficient Resource Utilization per Node: By reducing write amplification and maximizing storage compression, MyRocks enables a single MySQL server to handle more write throughput and store more data than an equivalent InnoDB instance. This means fewer physical servers (or virtual machines) are needed to handle a given data volume or write rate, leading to cost savings and simplified management of the sharded clusters.
Flash-Optimized: The design of LSM-trees is intrinsically suited for the characteristics of modern flash storage (SSDs). MyRocks leverages SSDs’ high sequential write speeds and can withstand high I/O operations per second (IOPS) without prematurely wearing out the drives, which is crucial for large-scale deployments that rely heavily on fast, durable storage.
Concurrency: MyRocks’ MVCC implementation and efficient handling of concurrent writes allow it to support a high number of parallel connections and transactions, which is a prerequisite for any hyperscale application.

In essence, MyRocks empowers individual MySQL nodes to be more robust and performant, which in turn facilitates building larger, more efficient sharded database architectures capable of handling petabytes of data and millions of transactions per second (MyRocks.io, n.d. [4]).

Many thanks to our sponsor Esdebe who helped us prepare this research report.

5.2 Use Cases

MyRocks is particularly well-suited for workloads characterized by:

High Write Throughput: Applications that generate continuous streams of data, such as real-time analytics, logging systems, and monitoring platforms.
Large Datasets: Environments where storage costs are a significant concern, and maximizing data density per server is critical.
Append-Only or Append-Mostly Workloads: Data that is primarily inserted and rarely updated in place, or updates that largely append new versions rather than modify existing ones. Examples include:
- Social Media Feeds: Storing vast quantities of user posts, likes, comments, and interactions, where new data is constantly being generated.
- Internet of Things (IoT) Data Ingestion: Collecting sensor data, device telemetry, and event streams from millions of devices. This data is typically time-series and append-only.
- Real-time Bidding (RTB) Systems: Recording bid requests, responses, and outcomes in online advertising, which are high-volume, ephemeral, and append-heavy.
- Extensive Logging Systems: Storing application logs, audit trails, and security events, where data is primarily written sequentially and occasionally queried.
- Messaging Queues/Event Stores: Persistent storage for high-throughput messaging systems or event streams.
- Archival Data: Storing historical data that is primarily written once and then read occasionally for reporting or analysis.

MariaDB, for example, highlights MyRocks’ suitability for high write load scenarios and data that benefits from high compression (MariaDB.com, n.d. [1]).

Many thanks to our sponsor Esdebe who helped us prepare this research report.

5.3 Trade-offs and Limitations

While MyRocks excels in its niche, it is not a silver bullet for all MySQL workloads. Understanding its trade-offs is crucial for appropriate adoption:

Read-Heavy OLTP with Random Small Reads: For traditional OLTP applications characterized by frequent, small, random reads and updates to a relatively small, hot working set, InnoDB often performs better. The potential for read amplification in LSM-trees (requiring checks across multiple SST files) can lead to higher latency for these types of queries, especially if the block cache is insufficient or compaction lags.
High Update-in-Place Workloads: While MyRocks handles updates, frequent updates to the same keys can generate many versions of a row across different SST files. This increases space amplification (more versions stored) and compaction load, potentially leading to performance degradation if not managed properly. InnoDB’s in-place update mechanism can be more efficient for these specific patterns.
Complex Query Workloads: While MyRocks benefits from MySQL’s query optimizer, complex analytical queries that involve large joins or aggregate functions across massive datasets might still benefit from columnar stores or specialized OLAP databases. MyRocks’ strength lies in efficient data persistence and retrieval for high-volume transactional data.
CPU Consumption: As noted, MyRocks can be more CPU-intensive than InnoDB due to continuous background compaction and compression/decompression. This requires careful CPU provisioning, especially for very high write throughput scenarios.
Complexity of Tuning: MyRocks has a different set of tuning parameters than InnoDB, and optimizing its performance often requires a deeper understanding of LSM-tree dynamics, compaction strategies, and cache management. This can present a learning curve for database administrators accustomed to InnoDB.

In summary, MyRocks is a specialized tool. It provides a compelling solution for the most demanding write-intensive and storage-constrained applications, particularly those utilizing flash storage. However, for traditional OLTP or primarily read-intensive workloads, InnoDB remains a highly competitive and often more straightforward choice.

6. Conclusion

MyRocks represents a significant evolutionary stride in MySQL storage engine technology, offering a highly compelling and specialized solution for modern applications grappling with the challenges of vast data volumes and intense write-heavy workloads. Its strategic integration of RocksDB, an industry-leading high-performance key-value store, into the established MySQL ecosystem provides a robust bridge between the familiar relational model and the cutting-edge efficiencies of the Log-Structured Merge-tree (LSM-tree) architecture. This symbiotic fusion uniquely positions MyRocks to address the limitations inherent in traditional B-tree-based engines like InnoDB, particularly concerning write amplification and storage footprint.

The core strength of MyRocks lies in its LSM-tree design, which prioritizes sequential writes and batching, leading to demonstrably superior write efficiency and significantly reduced wear on flash storage devices. This is complemented by its sophisticated data compression mechanisms, which achieve unparalleled storage savings, directly translating into lower infrastructure costs and improved I/O throughput. Operational considerations, while different from InnoDB, are well-supported through robust backup strategies, streamlined migration paths, and a comprehensive suite of configuration parameters that allow for granular tuning to specific workload requirements.

For hyperscale environments, MyRocks offers exceptional scalability by enabling individual MySQL nodes to manage higher write volumes and larger datasets more efficiently, thus facilitating the construction of robust sharded architectures. Its proven applicability in scenarios such as social media feeds, IoT data ingestion, and extensive logging systems underscores its value proposition for append-heavy, high-throughput applications.

However, it is crucial to recognize that MyRocks is not a universal panacea. Its inherent trade-offs, such as potentially higher read latency for highly random, non-cached reads or increased CPU consumption due to continuous background compaction, mean that InnoDB often remains the superior choice for classic, read-heavy Online Transaction Processing (OLTP) workloads or applications demanding frequent in-place updates to hot rows. The complexity of its tuning parameters also necessitates a deeper understanding of LSM-tree mechanics from database administrators.

In essence, MyRocks stands as a powerful, purpose-built tool within the diverse database landscape. By understanding its architectural underpinnings, performance characteristics, and operational nuances, organizations can strategically leverage MyRocks to unlock new levels of performance and cost efficiency for their most demanding, write-intensive, and storage-constrained workloads, solidifying its pivotal role in the evolving ecosystem of modern database solutions.

References

[1] MyRocks Storage Engine | MariaDB Documentation. (n.d.). Retrieved from https://mariadb.com/docs/platform/mariadb-faqs/storage-engines/myrocks-storage-engine

[2] Increase write throughput on Amazon RDS for MariaDB using the MyRocks storage engine | AWS Database Blog. (2021, November 9). Retrieved from https://aws.amazon.com/blogs/database/increase-write-throughput-on-amazon-rds-for-mariadb-using-the-myrocks-storage-engine/

[3] MyRocks Deep Dive | PPT. (n.d.). Retrieved from https://www.slideshare.net/slideshow/myrocks-deep-dive/61103198

[4] MyRocks: A space- and write-optimized MySQL database – Engineering at Meta. (2016, August 31). Retrieved from https://engineering.fb.com/2016/08/31/core-infra/myrocks-a-space-and-write-optimized-mysql-database/

[5] Small Datum: Why is MyRocks more write-efficient than InnoDB? (2016, November 22). Retrieved from https://smalldatum.blogspot.com/2016/11/why-is-myrocks-more-write-efficient_22.html

[6] Uber Achieves Significant Storage Savings with MyRocks Differential Backups – InfoQ. (2024, November 10). Retrieved from https://www.infoq.com/news/2024/11/uber-myrocks-backups/

[7] Migrating a database from InnoDB to MyRocks – Engineering at Meta. (2017, September 25). Retrieved from https://engineering.fb.com/2017/09/25/core-infra/migrating-a-database-from-innodb-to-myrocks/

[8] MyRocks Deep Dive | PPT. (n.d.). Retrieved from https://www.slideshare.net/slideshow/myrocks-deep-dive/61103198

Ellie Stokes says:

2025-07-11 at 11:24 pm

The document highlights MyRocks’ suitability for append-mostly workloads. How does MyRocks handle scenarios where data evolves over time, requiring frequent updates to existing records, and what strategies mitigate potential performance impacts in these situations?

MyRocks: A Comprehensive Analysis of Its Architecture, Performance, and Applications in Hyperscale Environments

Abstract

1. Introduction

2. MyRocks Architecture

2.1 Integration of RocksDB with MySQL

2.2 Log-Structured Merge-Tree (LSM-Tree) Structure

2.3 Data Compression Mechanisms

2.4 Column Families

2.5 Concurrency Control and Transactions

3. Performance Characteristics

3.1 Write Efficiency and Write Amplification

3.2 Read Performance

3.3 Storage Efficiency

3.4 CPU and Memory Utilization

3.5 Transaction Performance

4. Operational Considerations

4.1 Backup Strategies

4.2 Migration from InnoDB

4.3 Configuration and Tuning

4.4 Monitoring and Troubleshooting

4.5 Replication

5. Suitability for Hyperscale Workloads

5.1 Scalability

5.2 Use Cases

5.3 Trade-offs and Limitations

6. Conclusion

References

1 Comment

Leave a Reply to Ellie Stokes Cancel reply