Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow

By Eugene Yemelyanau, Jake Grice

Introduction

Tudum.com is Netflix’s official fan destination, enabling fans to dive deeper into their favorite Netflix shows and movies. Tudum offers exclusive first-looks, behind-the-scenes content, talent interviews, live events, guides, and interactive experiences. “Tudum” is named after the sonic ID you hear when pressing play on a Netflix show or movie. Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix.

Initial architecture

At the end of 2021, when we envisioned Tudum’s implementation, we considered architectural patterns that would be maintainable, extensible, and well-understood by engineers. With the goal of building a flexible, configuration-driven system, we looked to server-driven UI (SDUI) as an appealing solution. SDUI is a design approach where the server dictates the structure and content of the UI, allowing for dynamic updates and customization without requiring changes to the client application. Client applications like web, mobile, and TV devices, act as rendering engines for SDUI data. After our teams weighed and vetted all the details, the dust settled and we landed on an approach similar to Command Query Responsibility Segregation (CQRS). At Tudum, we have two main use cases that CQRS is perfectly capable of solving:

  • Tudum’s editorial team brings exclusive interviews, first-look photos, behind the scenes videos, and many more forms of fan-forward content, and compiles it all into pages on the Tudum.com website. This content comes onto Tudum in the form of individually published pages, and content elements within the pages. In support of this, Tudum’s architecture includes a write path to store all of this data, including internal comments, revisions, version history, asset metadata, and scheduling settings.
  • Tudum visitors consume published pages. In this case, Tudum needs to serve personalized experiences for our beloved fans, and accesses only the latest version of our content.
Initial Tudum data architecture

The high-level diagram above focuses on storage & distribution, illustrating how we leveraged Kafka to separate the write and read databases. The write database would store internal page content and metadata from our CMS. The read database would store read-optimized page content, for example: CDN image URLs rather than internal asset IDs, and movie titles, synopses, and actor names instead of placeholders. This content ingestion pipeline allowed us to regenerate all consumer-facing content on demand, applying new structure and data, such as global navigation or branding changes. The Tudum Ingestion Service converted internal CMS data into a read-optimized format by applying page templates, running validations, performing data transformations, and producing the individual content elements into a Kafka topic. The Data Service Consumer, received the content elements from Kafka, stored them in a high-availability database (Cassandra), and acted as an API layer for the Page Construction service and other internal Tudum services to retrieve content.

A key advantage of decoupling read and write paths is the ability to scale them independently. It is a well-known architectural approach to connect both write and read databases using an event driven architecture. As a result, content edits would eventually appear on tudum.com.

Challenges with eventual consistency

Did you notice the emphasis on “eventually?” A major downside of this architecture was the delay between making an edit and observing that edit reflected on the website. For instance, when the team publishes an update, the following steps must occur:

  1. Call the REST endpoint on the 3rd party CMS to save the data.
  2. Wait for the CMS to notify the Tudum Ingestion layer via a webhook.
  3. Wait for the Tudum Ingestion layer to query all necessary sections via API, validate data and assets, process the page, and produce the modified content to Kafka.
  4. Wait for the Data Service Consumer to consume this message from Kafka and store it in the database.
  5. Finally, after some cache refresh delay, this data would eventually become available to the Page Construction service. Great!

By introducing a highly-scalable eventually-consistent architecture we were missing the ability to quickly render changes after writing them — an important capability for internal previews.

In our performance profiling, we found the source of delay was our Page Data Service which acted as a facade for an underlying Key Value Data Abstraction database. Page Data Service utilized a near cache to accelerate page building and reduce read latencies from the database.

This cache was implemented to optimize the N+1 key lookups necessary for page construction by having a complete data set in memory. When engineers hear “slow reads,” the immediate answer is often “cache,” which is exactly what our team adopted. The KVDAL near cache can refresh in the background on every app node. Regardless of which system modifies the data, the cache is updated with each refresh cycle. If you have 60 keys and a refresh interval of 60 seconds, the near cache will update one key per second. This was problematic for previewing recent modifications, as these changes were only reflected with each cache refresh. As Tudum’s content grew, cache refresh times increased, further extending the delay.

RAW Hollow

As this pain point grew, a new technology was being developed that would act as our silver bullet. RAW Hollow is an innovative in-memory, co-located, compressed object database developed by Netflix, designed to handle small to medium datasets with support for strong read-after-write consistency. It addresses the challenges of achieving consistent performance with low latency and high availability in applications that deal with less frequently changing datasets. Unlike traditional SQL databases or fully in-memory solutions, RAW Hollow offers a unique approach where the entire dataset is distributed across the application cluster and resides in the memory of each application process.

This design leverages compression techniques to scale datasets up to 100 million records per entity, ensuring extremely low latencies and high availability. RAW Hollow provides eventual consistency by default, with the option for strong consistency at the individual request level, allowing users to balance between high availability and data consistency. It simplifies the development of highly available and scalable stateful applications by eliminating the complexities of cache synchronization and external dependencies. This makes RAW Hollow a robust solution for efficiently managing datasets in environments like Netflix’s streaming services, where high performance and reliability are paramount.

Revised architecture

Tudum was a perfect fit to battle-test RAW Hollow while it was pre-GA internally. Hollow’s high-density near cache significantly reduces I/O. Having our primary dataset in memory enables Tudum’s various microservices (page construction, search, personalization) to access data synchronously in O(1) time, simplifying architecture, reducing code complexity, and increasing fault tolerance.

Updated Tudum data architecture

In our simplified architecture, we eliminated the Page Data Service, Key Value store, and Kafka infrastructure, in favor of RAW Hollow. By embedding the in-memory client directly into our read-path services, we avoid per-request I/O and reduce roundtrip time.

Migration results

The updated architecture yielded a monumental reduction in data propagation times, and the reduced I/O led to faster request times as an added bonus. Hollow’s compression alleviated our concerns about our data being “too big” to fit in memory. Storing three years’ of unhydrated data requires only a 130MB memory footprint — 25% of its uncompressed size in an Iceberg table!

Writers and editors can preview changes in seconds instead of minutes, while still maintaining high-availability and in-memory caching for Tudum visitors — the best of both worlds.

But what about the faster request times? The diagram below illustrates the before & after timing to fulfil a request for Tudum’s home page. All of Tudum’s read-path services leverage Hollow in-memory state, leading to a significant increase in page construction speed and personalization algorithms. Controlling for factors like TLS, authentication, request logging, and WAF filtering, homepage construction time decreased from ~1.4 seconds to ~0.4 seconds!

Home page construction time

An attentive reader might notice that we have now tightly-coupled our Page Construction Service with the Hollow In-Memory State. This tight-coupling is used only in Tudum-specific applications. However, caution is needed if sharing the Hollow In-Memory Client with other engineering teams, as it could limit your ability to make schema changes or deprecations.

Key Learnings

  1. CQRS is a powerful design paradigm for scale, if you can tolerate some eventual consistency.
  2. Minimizing the number of sequential operations can significantly reduce response times. I/O is often the main enemy of performance.
  3. Caching is complicated. Cache invalidation is a hard problem. By holding an entire dataset in memory, you can eliminate an entire class of problems.

In the next episode, we’ll share how Tudum.com leverages Server Driven UI to rapidly build and deploy new experiences for Netflix fans. Stay tuned!

Credits

Thanks to Drew Koszewnik, Govind Venkatraman Krishnan, Nick Mooney, George Carlucci


Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Compaction Strategies, Performance, and Their Impact on Cassandra Node Density

This is the third post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. In the first post, I examined how streaming operations impact node density and laid out the groundwork for understanding why higher node density leads to significant cost savings. In the second post, I discussed how compaction throughput is critical to node density and introduced the optimizations we implemented in CASSANDRA-15452 to improve throughput on disaggregated storage like EBS.

The Developer’s Data Modeling Cheat Guide​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍‌​‌‍​‌‌‌​‌‍​‌‌​‌‌​‌‍​‌‌‍​​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‌​​‍‌‍​​‌‌​‌‌‍‌‍​​‌​‌‍​‍‌​‌‌‌‍​‌‍​‌​​‍​‍‌​‌​​‌​​‌​​‌‌​‍‌​‍‌​‌‌‍‌​​​​‍‌​​‌​‌​​‍‌​​​‌​​‌‍​‌​​‌‌‍‌‌‌‍​‌‌‍​​​‍​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍‌​‌‍​‌‌‌​‌‍​‌‌​‌‌​‌‍​‌‌‍​​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‌​​‍‌‍​​‌‌​‌‌‍‌‍​​‌​‌‍​‍‌​‌‌‌‍​‌‍​‌​​‍​‍‌​‌​​‌​​‌​​‌‌​‍‌​‍‌​‌‌‍‌​​​​‍‌​​‌​‌​​‍‌​​​‌​​‌‍​‌​​‌‌‍‌‌‌‍​‌‌‍​​​‍​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

In Cassandra 5.0, storage-attached indexes offer a new way to interact with your data, giving developers the flexibility to query multiple columns with filtering, range conditions, and better performance.

ScyllaDB’s Engineering Summit in Sofia, Bulgaria

From hacking to hiking: what happens when engineering sea monsters get together  ScyllaDB is a remote-first company, with team members spread across the globe. We’re masters of virtual connection, but every year, we look forward to the chance to step away from our screens and come together for our Engineering Summit. It’s a time to reconnect, exchange ideas, and share spontaneous moments of joy that make working together so special. This year, we gathered in Sofia, Bulgaria — a city rich in history and culture, set against the stunning backdrop of the Balkan mountains. Where Monsters Meet This year’s summit brought together a record-breaking number of participants from across the globe—over 150! As ScyllaDB continues its continued growth, the turnout well reflects our momentum and our team’s expanding global reach. We, ScyllaDB Monsters from all corners of the world, came together to share knowledge, build connections, and collaborate on shaping the future of our company and product. A team-building activity An elevator ride to one of the sessions The summit brought together not just the engineering teams but also our Customer Experience (CX) colleagues. With their insights into the real-world experiences of our customers, the CX team helped us see the bigger picture and better understand how our work impacts those who use our product. The CX team Looking Inward, Moving Forward The summit was packed with really insightful talks, giving us a chance to reflect on where we are and where we’re heading next. It was all about looking back at the wins we’ve had so far, getting excited about the cool new features we’re working on now, and diving deep into what’s coming down the pipeline. CEO and Co-founder, Dor Laor, kicking off the summit The sessions sparked fruitful discussions about how we can keep pushing forward and build on the strong foundation we’ve already laid. The speakers touched on a variety of topics, including: ScyllaDB X Cloud Consistent topology Data distribution with tablets Object storage Tombstone garbage collection Customer stories Improving customer experience And many more Notes and focus doodles Collaboration at Its Best: The Hackathon This year, we took the summit energy to the next level with a hackathon that brought out the best in creativity, collaboration, and problem-solving. Participants were divided into small teams, each tackling a unique problem. The projects were chosen ahead of time so that we had a chance to work on real challenges that could make a tangible impact on our product and processes. The range of projects was diverse. Some teams focused on adding new features, like implementing a notification API to enhance the user experience. Others took on documentation-related challenges, improving the way we share knowledge. But across the board, every team managed to create a functioning solution or prototype. At the hackathon The hackathon brought people from different teams together to tackle complex issues, pushing everyone a bit outside their comfort zone. Beyond the technical achievements, it was a powerful team-building experience, reinforcing our culture of collaboration and shared purpose. It reminded us that solving real-life challenges—and doing it together—makes our work even more rewarding. The hackathon will undoubtedly be a highlight of future summits to come! From Development to Dance And then, of course, came the party. The atmosphere shifted from work to celebration with live music from a band playing all-time hits, followed by a DJ spinning tracks that kept everyone on their feet. Live music at the party Almost everyone hit the dance floor—even those who usually prefer to sit it out couldn’t resist the rhythm. It was the perfect way to unwind and celebrate the success of the summit! Sea monsters swaying Exploring Sofia and Beyond During our time in Sofia, we had the chance to immerse ourselves in the city’s rich history and culture. Framed by the dramatic Balkan mountains, Sofia blends the old with the new, offering a mix of history, culture, and modern vibe. We wandered through the ancient ruins of the Roman Theater and visited the iconic Alexander Nevsky Cathedral, marveling at their beauty and historical significance. To recharge our batteries, we enjoyed delicious meals in modern Bulgarian restaurants. In front of Alexander Nevsky Cathedral But the adventure didn’t stop in the city. We took a day trip to the Rila Mountains, where the breathtaking landscapes and serene atmosphere left us in awe. One of the standout sights was the Rila Monastery, a UNESCO World Heritage site known for its stunning architecture and spiritual significance. The Rila Monastery After soaking in the peaceful vibes of the monastery, we hiked the trail leading to the Stob Earth Pyramids, a natural wonder that looked almost otherworldly. The Stob Pyramids The hike was rewarding, offering stunning views of the mountains and the unique rock formations below. It was the perfect way to experience Bulgaria’s natural beauty while winding down from the summit excitement. Happy hiking Looking Ahead to the Future As we wrapped up this year’s summit, we left feeling energized by the connections made, ideas shared, and challenges overcome. From brainstorming ideas to clinking glasses around the dinner table, this summit was a reminder of why in-person gatherings are so valuable—connecting not just as colleagues but as a team united by a common purpose. As ScyllaDB continues to expand, we’re excited for what lies ahead, and we can’t wait to meet again next year. Until then, we’ll carry the lessons, memories, and new friendships with us as we keep moving forward. Чао! We’re hiring – join our team! Our team

How ScyllaDB Simulates Real-World Production Workloads with the Rust-Based “latte” Benchmarking Tool

Learn why we use a little-known benchmarking tool for testing Before using a tech product, it’s always nice to know its capabilities and limits. In the world of databases, there are a lot of different benchmarking tools that help us assess that… If you’re ok with some standard benchmarking scenarios, you’re set – one of the existing tools will probably serve you well. But what if not? Rigorously assessing ScyllaDB, a high-performance distributed database, requires testing some rather specific scenarios, ones with real-world production workloads. Fortunately, there is a tool to help with that. It is latte: a Rust-based lightweight benchmarking tool for Apache Cassandra and ScyllaDB.
Special thanks to Piotr Kołaczkowski for implementing the latte benchmarking tool.
We (the ScyllaDB testing team) forked it and enhanced it. In this blog post, I’ll share why and how we adopted it for our specialized testing needs. About latte Our team really values latte’s “flexibility.” Want to create a schema using a user defined type (UDT), Map, Set, List, or any other data type? Ok. Want to create a materialized views and query it? Ok. Want to change custom function behavior based on elapsed time? Ok. Want to run multiple custom functions in parallel? Ok. Want to use small, medium, and large partitions? Ok. Basically, latte lets us define any schema and workload functions. We can do this thanks to its implementation design. The latte tool is a type of engine/kernel and rune scripts are essentially the “business logic” that’s written separately. Rune scripts are an enhanced, more powerful, analog of what cassandra-stress calls user profiles. The rune scripting language is dynamically-typed and native to the Rust programming language ecosystem. Here’s a simple example of a rune script: In the above example of a rune script, we defined 2 required functions (schema and prepare) and one custom to be used as our workload –myinsert. First, we create a schema: Then, we use the latte run command to call our custom myinsert function: The replication_factor parameter above is a custom parameter. If we do not specify it, then latte will use its default value, 3. We can define any number of custom parameters. How is latte different from other benchmarking Tools? Based on our team’s experiences, here’s how latte compares to the 2 main competitors: cassandra-stress and ycsb: How is our fork of latte different from the original latte project? At ScyllaDB, our main use case for latte is testing complex and realistic customer scenarios with controlled disruptions. But (from what we understand), the project was designed to perform general latency measurements in healthy DB clusters. Given these different goals, we changed some features (“overlapping features”) – and added other new ones (“unique to our fork”). Here’s an overview: Overlapping features differences Latency measurement. Fork-latte accounts for coordinated omission in latencies The original project doesn’t consider the “coordinated omission” phenomenon. Saturated DB impact. When a system under test cannot keep up with the load/stress, fork-latte tries to satisfy the “rate”, compensating for missed scheduler ticks ASAP. Source-latte pulls back on violating the rate requirement and doesn’t later compensate for missed scheduler ticks. This isn’t a “bug”; it is a design decision which also violates the idea of proper latency calculation related to the “coordinated omission” phenomenon. Retries. We enabled retries by default; there, it is disabled by default. Prepared statements. Fork-latte supports all the CQL data types available in ScyllaDB Rust Driver. The source project has limited support of CQL data types. ScyllaDB Rust Driver. Our fork uses the latest version – “1.2.0” The source project sticks to the old version “0.13.2” Stress execution reporting. Report is disabled by default in fork-latte. It’s enabled in source-latte. Features unique to our fork Preferred datacenter support. Useful for testing multi-DC DB setups Preferred rack support. Useful for testing multi-rack DB setups Possibility to get a list of actual datacenter values from the DB nodes that the driver connected to. Useful for creating schema with dc-based keyspaces Sine-wave rate limiting. Useful for SLA/Workload Prioritization demo and OLTP testing with peaks and lows. Batch query type support. Multi-row partitions. Our fork can create multi-row partitions of different sizes. Page size support for select queries. Useful using multi-row partitions feature. HDR histograms support. The source project has only 1 way to get the HDR histograms data It stores HDR histograms data in RAM till the end of a stress command execution and only in the end releases it as part of a report. Leaks RAM. Forked latte supports the above inherited approach and one more: Real-time streaming of HDR histogram data not storing in RAM. No RAM leaks. Rows count validation for select queries. Useful for testing data resurrection.   Example: Testing multi-row partitions of different sizes Let’s look at one specific user scenario where we applied our fork of latte to test ScyllaDB. For background, one of the user’s ScyllaDB production clusters was using large partitions which could be grouped by size in 3 groups: 2000, 4000 and 8000 rows per partition. 95% of the partitions had 2000 rows, 4% of partitions had 4000 rows, and the last 1% of partitions had 8000 rows. The target table had 20+ different columns of different types. Also, ScyllaDB’s Secondary Indexes (SI) feature was enabled for the target table. One day, on one of the cloud providers, latencies spiked and throughput dropped. The source of the problem was not immediately clear. To learn more, we needed to have a quick way to reproduce the customer’s workload in a test environment. Using the latte tool and its great flexibility, we created a rune script covering all the above specifics. The simplified rune script looks like the following: Assume we have a ScyllaDB cluster where one of the nodes has a 172.17.0.2 IP address. Here is the command to create the schema we need: And here is the command to populate the just-created table: To read from the main table and from the MV, use a similar command – just replacing the function name to get and get_from_mv respectively. So, the usage of the above commands allowed us to get a stable issue reproducer and work on its solution. Working with ScyllaDB’s Workload Prioritization feature In other cases, we needed to: Create a Workload Prioritization (WLP) demo. Test an OLTP setup with continuous peaks and lows to showcase giving priority to different workloads. And for these scenarios, we used a special latte feature called sine wave rate. This is an extension to the common rate-limiting feature. It allows us to specify how many operations per second we want to produce. It can be used with following command parameters: And looking at the monitoring, we can see the following picture of the operations per second graph: Internal testing of tombstones (validation) As of June 2025, forked latte supports row count validation. It is useful for testing data resurrection. Here is the rune script for latte to demonstrate these capabilities: As before, we create the schema first: Then, we populate the table with 100k rows using the following command: To check that all rows are in place, we use command similar to the one above, change the function to be get, and define the validation strategy to be fail-fast: The supported validation strategies are retry, fail-fast, ignore. Then, we run 2 different commands in parallel. Here is the first one, which deletes part of the rows: Here is the second one, which knows when we expect 1 row and when we expect none: And here is the timing of actions that take place during these 2 commands’ runtime: That’s a simple example of how we can check whether data got deleted or not. In long-running testing scenarios, we might run more parallel commands, make them depend on the elapsed time, and many more other flexibilities. Conclusions Yes, to take advantage of latte, you first need to study a bit of  rune scripting. But once you’ve done that to some extent, especially having available examples, it becomes a powerful tool that is capable of covering various scenarios of different types.

ScyllaDB Tablets: Answering Your Top Questions

What does your team need to know about tablets– at a purely pragmatic level? Here are answers to the top user questions. The latest ScyllaDB releases feature some significant architectural shifts. Tablets build upon a multi-year project to re-architect our legacy ring architecture. And our metadata is now fully consistent, thanks to the assistance of Raft. Together, these changes can help teams with elasticity, speed, and operational simplicity. Avi Kivity, our CTO and co-founder, provided a detailed look at why and how we made this shift in a series of blogs (Why ScyllaDB Moved to “Tablets” Data Distribution and How We Implemented ScyllaDB’s “Tablets” Data Distribution). Join Avi for a technical deep dive…at our upcoming livestream And we recently created this quick demo to show you what this looks like in action, from the point of view of a database user/operator: But what does your team need to know – at a purely pragmatic level? Here are some of the questions we’ve heard from interested users, and a short summary of how we answer them. What’s the TL;DR on tablets? Tablets are the smallest replication unit in ScyllaDB. Data gets distributed by splitting tables into smaller logical pieces called tablets, and this allows ScyllaDB to shift from a static to a dynamic topology. Tablets are dynamically balanced across the cluster using the Raft consensus protocol. This was introduced as part of a project to bring more elasticity to ScyllaDB, enabling faster topology changes and seamless scaling. Tablets acknowledge that most workloads do not follow a static traffic pattern. In fact, most often follow a cyclical curve with different baseline and peaks through a period of time. By decoupling topology changes from the actual streaming of data, tablets therefore present significant cost saving opportunities for users adopting ScyllaDB by allowing infrastructure to be scaled on-demand, fast. Previously, adding or removing nodes required a sequential, one-at-a-time and serializable process with data streaming and rebalancing. Now, you can add or remove multiple nodes in parallel. This significantly speeds up the scaling process and makes ScyllaDB much more elastic. Tablets are distributed on a per-table basis, with each table having its own set of tablets. The tablets are then further distributed across the shards in the ScyllaDB cluster. The distribution is handled automatically by ScyllaDB, with tablets being dynamically migrated across replicas as needed. Data within a table is split across tablets based on the average geometric size of a token range boundary. How do I configure tablets? Tablets are enabled by default in ScyllaDB 2025.1 and are also available with ScyllaDB Cloud. When creating a new keyspace, you can specify whether to enable tablets or not. There are also three key configuration options for tablets: 1) the enable_tablets boolean setting, 2) the target_tablet_size_in_bytes (default is 5GB), and 3) the tablets property during a CREATE KEYSPACE statement. Here are a few tips for configuring these settings: enable_tablets indicates whether newly created keyspaces should rely on tablets for data distribution. Note that tablets are currently not yet enabled for workloads requiring the use of Counters Tables, Secondary Indexes, Materialized Views, CDC, or LWT. target_tablet_size_in_bytes indicates the average geometric size of a tablet, and is particularly useful during tablet split and merge operations. The default indicates splits are done when a tablet reaches 10GB and merges at 2.5GB. A higher value means tablet migration throughput can be reduced (due to larger tablets), whereas a lower value may significantly increase the number of tablets. The tablets property allows you to opt for tablets on a per keyspace basis via the ‘enabled’ boolean sub-option. This is particularly important if some of your workloads rely on the currently unsupported features mentioned earlier: You may opt out for these tables and fallback to the still supported vNode-based replication strategy. Still under the tablets property, the ‘initial’ sub-option determines how many tablets are created upfront on a per-table basis. We recommend that you target 100 tablets/shard. In future releases, we’ll introduce Per-table tablet options to extend and simplify this process while deprecating the keyspace sub-option. How/why should I monitor tablet distribution? Starting with ScyllaDB Monitoring 4.7, we introduced two additional panels for the observability and distribution of tablets within a ScyllaDB cluster. These metrics are present within the Detailed dashboard under the Tablets section: The Tablets over time panel is a heatmap showing the tablet distribution over time. As the data size of a tablet-enabled table grows, you should observe the number of tablets increasing (tablet split) and being automatically distributed by ScyllaDB. Likewise, as the table size shrinks, the number of tablets should be reduced (tablet merge, but to no less than your initially configured ‘initial’ value within the keyspace tablets property). Similarly, as you perform topology changes (e.g., adding nodes), you can monitor the tablet distribution progress. You’ll notice that existing replicas will have their tablet count reduced while new replicas will increase. The Tablets per DC/Instance/Shard panel shows the absolute count of tablets within the cluster. Under heterogeneous setups running on top of instances of the same size, this metric should be evenly balanced. However, the situation changes for heterogeneous setups with different shard counts. In this situation, it is expected that larger instances will hold more tablets given their additional processing power. This is, in fact, yet another benefit of tablets: the ability to run heterogeneous setups and leave it up to the database to determine how to internally maximize each instance’s performance capabilities. What are the impacts of tablets on maintenance tasks like node cleanup? The primary benefit of tablets is elasticity. Tablets allow you to easily and quickly scale out and in your database infrastructure without hassle. This not only translates to infrastructure savings (like avoiding being overprovisioned for the peak all the time). It also allows you to reach a higher percentage of storage utilization before rushing to add more nodes – so you can better utilize the underlying infrastructure you pay for. Another key benefit of tablets is that they eliminate the need for maintenance tasks like node cleanup. Previously, after scaling out the cluster, operators would need to run node cleanup to ensure data was properly evicted from nodes that no longer owned certain token ranges. With tablets, this is no longer necessary. The compaction process automatically handles the migration of data as tablets are dynamically balanced across the cluster. This is a significant operational improvement that reduces the maintenance burden for ScyllaDB users. The ability to now run heterogeneous deployments without running through cryptic and hours-long tuning cycles is also a plus. ScyllaDB’s tablet load balancer is smart enough to figure out how to distribute and place your data. It considers the amount of compute resources available, reducing the risk of traffic hotspots or data imbalances that may affect your clusters’ performance. In the future, ScyllaDB will bring transparent repairs on a per-tablet basis, further eliminating the need for users to worry about repairing their clusters, and also provide “temperature-based balancing” so that hot partitions get split and other shards cooperate with the incoming load. Do I need to change drivers? ScyllaDB’s latest drivers are tablet-aware, meaning they understand the tablets concept and can route queries to the correct nodes and shards. However, the drivers do not directly query the internal system.tablets table. That could become unwieldy as the number of tablets grows. Furthermore, tablets are transient, meaning a replica owning a tablet may no longer be a natural endpoint for it as time goes by. Instead, the drivers use a dynamic routing process: when a query is sent to the wrong node/shard, the coordinator will respond with the correct routing information, allowing the driver to update its routing cache. This ensures efficient query routing as tablets are migrated across the cluster. When using ScyllaDB tablets, it’s more important than ever to use ScyllaDB shard-aware – and now also tablet-aware – drivers instead of Cassandra drivers. The existing drivers will still work, but they won’t work as efficiently because they lack the necessary logic to understand the coordinator-provided tablet metadata. Using the latest ScyllaDB drivers should provide a nice throughput and latency boost. Read more in How We Updated ScyllaDB Drivers for Tablets Elasticity. More questions? If you’re interested in tablets and we didn’t answer your question here, please reach out to us! Our Contact Us page offers a number of ways to interact, including a community forum and Slack.  

Introducing ScyllaDB X Cloud: A (Mostly) Technical Overview

ScyllaDB X Cloud just landed! It’s a truly elastic database that supports variable/unpredictable workloads with consistent low latency, plus low costs. The ScyllaDB team is excited to announce ScyllaDB X Cloud, the next generation of our fully-managed database-as-a-service. It features architectural enhancements for greater flexibility and lower cost. ScyllaDB X Cloud is a truly elastic database designed to support variable/unpredictable workloads with consistent low latency as well as low costs. A few spoilers before we get into the details: You can now scale out and scale in almost instantly to match actual usage, hour by hour. For example, you can scale all the way from 100K OPS to 2M OPS in just minutes, with consistent single-digit millisecond P99 latency. This means you don’t need to overprovision for the worst-case scenario or suffer latency hits while waiting for autoscaling to fully kick in. You can now safely run at 90% storage utilization, compared to the standard 70% utilization. This means you need fewer underlying servers and have substantially less infrastructure to pay for. Optimizations like file-based streaming and dictionary-based compression also speed up scaling and reduce network costs. Beyond the technical changes, there’s also an important pricing update. To go along with all this database flexibility, we’re now offering a “Flex Credit” pricing model. Basically, this gives you the flexibility of on-demand pricing with the cost advantage that comes from an annual commitment. Access ScyllaDB X Cloud Now If you want to get started right away, just go to ScyllaDB Cloud and choose the X Cloud cluster type when you create a cluster. This is our code name for the new type of cluster that enables greater elasticity, higher storage utilization, and automatic scaling. Note that X Cloud clusters are available from the ScyllaDB Cloud application (below) and API. They’re available on AWS and GCP, running on a ScyllaDB account or your company’s account with the Bring Your Own Account (BYOA) model. Sneak peek: In the next release, you won’t need to choose instance size or number of services if you select the X Cloud option. Instead, you will be able to define a serverless scaling policy and let X Cloud scale the cluster as required. If you want to learn more, keep reading. In this blog post, we’ll cover what’s behind the technical changes and also talk a little about the new pricing option. But first, let’s start with the why. Backstory Why did we do this? Consider this example from a marketing/AdTech platform that provides event-based targeting. Such a pattern, with predictable/cyclical daily peaks and low baseline off-hours, is quite common across retail platforms, food delivery services, and other applications aligned with customer work hours. In this case, the peak loads are 3x the base and require 2-3x the resources. With ScyllaDB X Cloud, they can provision for the baseline and quickly scale in/out as needed to serve the peaks. They get the steady low latency they need without having to overprovision – paying for peak capacity 24/7 when it’s really only needed for 4 hours a day. Tablets + just-in-time autoscaling If you follow ScyllaDB, you know that tablets aren’t new. We introduced them last year for ScyllaDB Enterprise (self-managed on the cloud or on-prem). Avi Kivity, our CTO, already provided a look at why and how we implemented tablets. And you can see tablets in action here: With tablets, data gets distributed by splitting tables into smaller logical pieces (“tablets”), which are dynamically balanced across the cluster using the Raft consensus protocol. This enables you to scale your databases as rapidly as you can scale your infrastructure. In a self-managed ScyllaDB deployment, tablets makes it much faster and simpler to expand and reduce your database capacity. However, you still need to plan ahead for expansion and initiate the operations yourself. ScyllaDB X Cloud lets you take full advantage of tablets’ elasticity. Scaling can be triggered automatically based on storage capacity (more on this below) or based on your knowledge of expected usage patterns. Moreover, as capacity expands and contracts, we’ll automatically optimize both node count and utilization. You don’t even have to choose node size; ScyllaDB X Cloud’s storage-utilization target does that for you. This should simplify admin and also save costs. 90% storage utilization ScyllaDB has always handled running at 100% compute utilization well by having automated internal schedulers manage compactions, repairs, and lower-priority tasks in a way that prioritizes performance. Now, it also does two things that let you increase the maximum storage utilization to 90%: Since tablets can move data to new nodes so much faster, ScyllaDB X Cloud can defer scaling until the very last minute Support for mixed instance sizes allows ScyllaDB X Cloud to allocate minimal additional resources to keep the usage close to 90% Previously, we recommended adding nodes at 70% capacity. This was because node additions were unpredictable and slow — sometimes taking hours or days — and you risked running out of space. We’d send a soft alert at 50% and automatically add nodes at 70%. However, those big nodes often sat underutilized. With ScyllaDB X Cloud’s tablets architecture, we can safely target 90% utilization. That’s particularly helpful for teams with storage-bound workloads. Support for mixed size clusters A little more on the “mixed instance size” support mentioned earlier. Basically, this means that ScyllaDB X Cloud can now add the exact mix of nodes you need to meet the exact capacity you need at any given time. Previous versions of ScyllaDB used a single instance size across all nodes in the cluster. For example, if you had a cluster with 3 i4i.16xlarge instances, increasing the capacity meant adding another i4i.16xlarge. That works, but it’s wasteful: you’re paying for a big node that you might not immediately need. Now with ScyllaDB X Cloud (thanks to tablets and support for mixed-instance sizes), we can scale in much smaller increments. You can add tiny instances first, then replace them with larger ones if needed. That means you rarely pay for unused capacity. For example, before, if you started with an i4i.16xlarge node that had 15 TB of storage and you hit 70% utilization, you had to launch another i4i.16xlarge — adding 15 TB at once. With ScyllaDB X Cloud, you might add two xlarge nodes (2 TB each) first. Then, if you need more storage, you add more small nodes, then eventually replace them with larger nodes. And by the way, i7i instances are now available too, and they are even more powerful. The key is granular, just-in-time scaling: you only add what you need, when you need it. This applies in reverse, too. Before, you had to decommission a large node all at once. Now, ScyllaDB X Cloud can remove smaller nodes gradually based on the policies you set, saving compute and storage costs. Network-focused engineering optimizations Every gigabyte leaving a node, crossing an Availability Zone (AZ) boundary, or replicating to another region shows up on your AWS, GCP, or Azure bill. That’s why we’ve done some engineering work at different layers of ScyllaDB to shrink those bytes—and the dollars tied to them. File-based streaming We anticipated that mutation-based streaming would hold us back once we moved to tablets. So we shifted to a new approach: stream the entire SSTable files without deserializing them into mutation fragments and re-serializing them back into SSTables on receiving nodes. As a result, less data is streamed over the network and less CPU is consumed, especially for data models that contain small cells. Think of it as Cassandra’s zero-copy streaming, except that we keep ownership metadata with each replica. This table shows the result: You can read more about this in the blog Why We Changed ScyllaDB’s Data Streaming Approach. Dictionary-based compression We also introduced dictionary-trained Zstandard (Zstd), which is pipeline-aware. This involved building a custom RPC compressor with external dictionary support, and a mechanism that trains new dictionaries on RPC traffic, distributes them over the cluster, and performs a live switch of connections to the new dictionaries. This is done in 4 key steps: Sample: Continuously sample RPC traffic for some time Train: Train a 100 kiB dictionary on a 16MiB sample Distribute: Distribute a new dictionary via system distributed table Switch: Negotiate the switch separately within each connection On the graph below, you can see LZ4 (Cassandra’s default) leaves you at 72% of the original size. Generic Zstd cuts that to 50%. Our per-cluster Zstd dictionary takes it down to 30%, which is a 3X improvement over the default Cassandra compression. Flex Credit To close, let’s shift from the technical changes to a major pricing change: Flex Credit. Flex Credit is a new way to consume a ScyllaDB Cloud subscription. It can be applied to ScyllaDB Cloud as well as ScyllaDB Enterprise. Flex Credit provides the flexibility of on-demand pricing at a lower cost via an annual commitment. In combination with X Cloud, Flex Credit can be a great tool to reduce cost. You can use Reserved pricing for a load that’s known in advance and use Flex for less predictable bursts. This saves you from paying the higher on-demand pricing for anything above the reserved part. How might this play out in your day-to-day work? Imagine your baseline workload handles 100K OPS, but sometimes it spikes to 400K OPS. Previously, you’d have to provision (and pay for) enough capacity to sustain 400K OPS at all times. That’s inefficient and costly. With ScyllaDB X Cloud, you reserve 100K OPS upfront. When a spike hits, we automatically call the API to spin up “flex capacity” – instantly scaling you to 400K OPS – and then tear it down when traffic subsides. You only pay for the extra capacity during the peak. Not sure what to choose? We can help advise based on your workload specifics (contact your representative or ping us here), but here’s some quick guidance in the meantime. Reserved Capacity: The most cost-effective option across all plans. Commit to a set number of cluster nodes or machines for a year. You lock in lower rates and guarantee capacity availability. This is ideal if your cluster size is relatively stable. Hybrid Model: Reserved + On-Demand: Commit to a baseline reserved capacity to lock in lower rates, but if you exceed that baseline (e.g., because you have a traffic spike), you can scale with on-demand capacity at an hourly rate. This is good if your usage is mostly stable but occasionally spikes. Hybrid Model: Reserved + Flex Credit: Commit to baseline reserved capacity for the lowest rates. For peak usage, use pre-purchased flex credit (which is discounted) instead of paying on-demand prices. Flex credit also applies to network and backup usage at standard provider rates. This is ideal if you have predictable peak periods (e.g., seasonal spikes, event-driven workload surges, etc.). You get the best of both worlds: low baseline costs and cost-efficient peak capacity. Recap In summary, ScyllaDB X Cloud uses tablets to enable faster, more granular scaling with mixed-instance sizes. This lets you avoid overprovisioning and safely run at 90% storage utilization. All of this will help you respond to volatile/unpredictable demand with low latencies and low costs. Moreover, flexible pricing (on-demand, flex credit, reserved) will help you pay only for what you need, especially when you have tablets scaling your capacity up and down in response to traffic spikes. There are also some network cost optimizations through file-based streaming and improved compression. Want to learn more? Our Co-Founder/CTO Avi Kivity will be discussing the design decisions behind ScyllaDB X Cloud’s elasticity and efficiency. Join us for the engineering deep dive on July 10. ScyllaDB X Cloud: An Inside Look with Avi Kivity

A New Way to Estimate DynamoDB Costs

We built a new DynamoDB cost analyzer that helps developers understand what their workloads will really cost DynamoDB costs can blindside you. Teams regularly face “bill shock”: that sinking feeling when you look at a shockingly high bill and realize that you haven’t paid enough attention to your usage, especially with on-demand pricing. Provisioned capacity brings a different risk: performance. If you can’t accurately predict capacity or your math is off, requests get throttled. It’s a delicate balancing act. Although AWS offers a DynamoDB pricing calculator, it often misses the nuances of real-world workloads (e.g., bursty traffic or uneven access patterns, or using global tables or caching). We wanted something better. In full transparency, we wanted something better to help the teams considering ScyllaDB as a DynamoDB alternative. So we built a new DynamoDB cost calculator that helps developers understand what their workloads will really cost. Although we designed it for teams comparing DynamoDB with ScyllaDB, we believe it’s useful for anyone looking to more accurately estimate their DynamoDB costs, for any reason. You can see the live version at: calculator.scylladb.com How We Built It We wanted to build something that would work client side, without the need for any server components. It’s a simple JavaScript single page application that we currently host on GitHub pages. If you want to check out the source code, feel free to take a look at https://github.com/scylladb/calculator To be honest, working with the examples at https://calculator.aws/ was a bit of a nightmare, and when you “show calculations,” you get these walls of text: I was tempted to take a shorter approach, like: Monthly WCU Cost = WCUs × Price_per_WCU_per_hour × 730 hours/month But every time I simplified this, I found it harder to get parity between what I calculated and the final price in AWS’s calculation. Sometimes the difference was due to rounding, other times it was due to the mixture of reserved + provision capacity, and so on. So to make it easier (for me) to debug, I faithfully followed their calculations line by line and tried to replicate this in my own rather ugly function: https://github.com/scylladb/calculator/blob/main/src/calculator.js I may still refactor this into smaller functions. But for now, I wanted to get parity between theirs and ours. You’ll see that there are also some end-to-end tests for these calculations — I use those to test for a bunch of different configurations. I will probably expand on these in time as well. So that gets the job done for On Demand, Provisioned (and Reserved) capacity models. If you’ve used AWS’s calculator, you know that you can’t specify things like a peak (or peak width) in On Demand. I’m not sure about their reasoning. I decided it would be easier for users to specify both the baseline and peak for reads and writes (respectively) in On Demand, much like Provisioned capacity. Another design decision was to represent the traffic using a chart. I do better with visuals, so seeing the peaks and troughs makes it easier for me to understand – and I hope it does for you as well. You’ll also notice that as you change the inputs, the URL query parameters change to reflect those inputs. That’s designed to make it easier to share and reference specific variations of costs. There’s some other math in there, like figuring out the true cost of Global Tables and understanding derived costs of things like network transfer or DynamoDB Accelerator (DAX). However, explaining all that is a bit too dense for this format. We’ll talk more about that in an upcoming webinar (see the next section). The good news is that you can estimate these costs in addition to your workload, as they can be big cost multipliers when planning out your usage of DynamoDB. Explore “what if” scenarios for your own workloads Analyzing Costs in Real-World Scenarios The ultimate goal of all this tinkering and tuning is to help you explore various “what-if” scenarios from a DynamoDB cost perspective.  To get started, we’re sharing the cost impacts of some of the more interesting DynamoDB user scenarios we’ve come across at ScyllaDB. My colleague Gui and I just got together for a deep dive into how factors like traffic surges, multi-datacenter expansion, and the introduction of caching (e.g., DAX) impact DynamoDB costs. We explored how a few (anonymized) teams we work with ended up blindsided by their DynamoDB bills and the various options they considered for getting costs back under control. Watch the DynamoDB costs chat now