Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow
By Eugene Yemelyanau, Jake Grice

Introduction
Tudum.com is Netflix’s official fan destination, enabling fans to dive deeper into their favorite Netflix shows and movies. Tudum offers exclusive first-looks, behind-the-scenes content, talent interviews, live events, guides, and interactive experiences. “Tudum” is named after the sonic ID you hear when pressing play on a Netflix show or movie. Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix.
Initial architecture
At the end of 2021, when we envisioned Tudum’s implementation, we considered architectural patterns that would be maintainable, extensible, and well-understood by engineers. With the goal of building a flexible, configuration-driven system, we looked to server-driven UI (SDUI) as an appealing solution. SDUI is a design approach where the server dictates the structure and content of the UI, allowing for dynamic updates and customization without requiring changes to the client application. Client applications like web, mobile, and TV devices, act as rendering engines for SDUI data. After our teams weighed and vetted all the details, the dust settled and we landed on an approach similar to Command Query Responsibility Segregation (CQRS). At Tudum, we have two main use cases that CQRS is perfectly capable of solving:
- Tudum’s editorial team brings exclusive interviews, first-look photos, behind the scenes videos, and many more forms of fan-forward content, and compiles it all into pages on the Tudum.com website. This content comes onto Tudum in the form of individually published pages, and content elements within the pages. In support of this, Tudum’s architecture includes a write path to store all of this data, including internal comments, revisions, version history, asset metadata, and scheduling settings.
- Tudum visitors consume published pages. In this case, Tudum needs to serve personalized experiences for our beloved fans, and accesses only the latest version of our content.

The high-level diagram above focuses on storage & distribution, illustrating how we leveraged Kafka to separate the write and read databases. The write database would store internal page content and metadata from our CMS. The read database would store read-optimized page content, for example: CDN image URLs rather than internal asset IDs, and movie titles, synopses, and actor names instead of placeholders. This content ingestion pipeline allowed us to regenerate all consumer-facing content on demand, applying new structure and data, such as global navigation or branding changes. The Tudum Ingestion Service converted internal CMS data into a read-optimized format by applying page templates, running validations, performing data transformations, and producing the individual content elements into a Kafka topic. The Data Service Consumer, received the content elements from Kafka, stored them in a high-availability database (Cassandra), and acted as an API layer for the Page Construction service and other internal Tudum services to retrieve content.
A key advantage of decoupling read and write paths is the ability to scale them independently. It is a well-known architectural approach to connect both write and read databases using an event driven architecture. As a result, content edits would eventually appear on tudum.com.
Challenges with eventual consistency
Did you notice the emphasis on “eventually?” A major downside of this architecture was the delay between making an edit and observing that edit reflected on the website. For instance, when the team publishes an update, the following steps must occur:
- Call the REST endpoint on the 3rd party CMS to save the data.
- Wait for the CMS to notify the Tudum Ingestion layer via a webhook.
- Wait for the Tudum Ingestion layer to query all necessary sections via API, validate data and assets, process the page, and produce the modified content to Kafka.
- Wait for the Data Service Consumer to consume this message from Kafka and store it in the database.
- Finally, after some cache refresh delay, this data would eventually become available to the Page Construction service. Great!
By introducing a highly-scalable eventually-consistent architecture we were missing the ability to quickly render changes after writing them — an important capability for internal previews.
In our performance profiling, we found the source of delay was our Page Data Service which acted as a facade for an underlying Key Value Data Abstraction database. Page Data Service utilized a near cache to accelerate page building and reduce read latencies from the database.
This cache was implemented to optimize the N+1 key lookups necessary for page construction by having a complete data set in memory. When engineers hear “slow reads,” the immediate answer is often “cache,” which is exactly what our team adopted. The KVDAL near cache can refresh in the background on every app node. Regardless of which system modifies the data, the cache is updated with each refresh cycle. If you have 60 keys and a refresh interval of 60 seconds, the near cache will update one key per second. This was problematic for previewing recent modifications, as these changes were only reflected with each cache refresh. As Tudum’s content grew, cache refresh times increased, further extending the delay.
RAW Hollow
As this pain point grew, a new technology was being developed that would act as our silver bullet. RAW Hollow is an innovative in-memory, co-located, compressed object database developed by Netflix, designed to handle small to medium datasets with support for strong read-after-write consistency. It addresses the challenges of achieving consistent performance with low latency and high availability in applications that deal with less frequently changing datasets. Unlike traditional SQL databases or fully in-memory solutions, RAW Hollow offers a unique approach where the entire dataset is distributed across the application cluster and resides in the memory of each application process.
This design leverages compression techniques to scale datasets up to 100 million records per entity, ensuring extremely low latencies and high availability. RAW Hollow provides eventual consistency by default, with the option for strong consistency at the individual request level, allowing users to balance between high availability and data consistency. It simplifies the development of highly available and scalable stateful applications by eliminating the complexities of cache synchronization and external dependencies. This makes RAW Hollow a robust solution for efficiently managing datasets in environments like Netflix’s streaming services, where high performance and reliability are paramount.
Revised architecture
Tudum was a perfect fit to battle-test RAW Hollow while it was pre-GA internally. Hollow’s high-density near cache significantly reduces I/O. Having our primary dataset in memory enables Tudum’s various microservices (page construction, search, personalization) to access data synchronously in O(1) time, simplifying architecture, reducing code complexity, and increasing fault tolerance.

In our simplified architecture, we eliminated the Page Data Service, Key Value store, and Kafka infrastructure, in favor of RAW Hollow. By embedding the in-memory client directly into our read-path services, we avoid per-request I/O and reduce roundtrip time.
Migration results
The updated architecture yielded a monumental reduction in data propagation times, and the reduced I/O led to faster request times as an added bonus. Hollow’s compression alleviated our concerns about our data being “too big” to fit in memory. Storing three years’ of unhydrated data requires only a 130MB memory footprint — 25% of its uncompressed size in an Iceberg table!
Writers and editors can preview changes in seconds instead of minutes, while still maintaining high-availability and in-memory caching for Tudum visitors — the best of both worlds.
But what about the faster request times? The diagram below illustrates the before & after timing to fulfil a request for Tudum’s home page. All of Tudum’s read-path services leverage Hollow in-memory state, leading to a significant increase in page construction speed and personalization algorithms. Controlling for factors like TLS, authentication, request logging, and WAF filtering, homepage construction time decreased from ~1.4 seconds to ~0.4 seconds!

An attentive reader might notice that we have now tightly-coupled our Page Construction Service with the Hollow In-Memory State. This tight-coupling is used only in Tudum-specific applications. However, caution is needed if sharing the Hollow In-Memory Client with other engineering teams, as it could limit your ability to make schema changes or deprecations.
Key Learnings
- CQRS is a powerful design paradigm for scale, if you can tolerate some eventual consistency.
- Minimizing the number of sequential operations can significantly reduce response times. I/O is often the main enemy of performance.
- Caching is complicated. Cache invalidation is a hard problem. By holding an entire dataset in memory, you can eliminate an entire class of problems.
In the next episode, we’ll share how Tudum.com leverages Server Driven UI to rapidly build and deploy new experiences for Netflix fans. Stay tuned!
Credits
Thanks to Drew Koszewnik, Govind Venkatraman Krishnan, Nick Mooney, George Carlucci
Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Compaction Strategies, Performance, and Their Impact on Cassandra Node Density
This is the third post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. In the first post, I examined how streaming operations impact node density and laid out the groundwork for understanding why higher node density leads to significant cost savings. In the second post, I discussed how compaction throughput is critical to node density and introduced the optimizations we implemented in CASSANDRA-15452 to improve throughput on disaggregated storage like EBS.
The Developer’s Data Modeling Cheat Guide
In Cassandra 5.0, storage-attached indexes offer a new way to interact with your data, giving developers the flexibility to query multiple columns with filtering, range conditions, and better performance.ScyllaDB’s Engineering Summit in Sofia, Bulgaria
From hacking to hiking: what happens when engineering sea monsters get together ScyllaDB is a remote-first company, with team members spread across the globe. We’re masters of virtual connection, but every year, we look forward to the chance to step away from our screens and come together for our Engineering Summit. It’s a time to reconnect, exchange ideas, and share spontaneous moments of joy that make working together so special. This year, we gathered in Sofia, Bulgaria — a city rich in history and culture, set against the stunning backdrop of the Balkan mountains. Where Monsters Meet This year’s summit brought together a record-breaking number of participants from across the globe—over 150! As ScyllaDB continues its continued growth, the turnout well reflects our momentum and our team’s expanding global reach. We, ScyllaDB Monsters from all corners of the world, came together to share knowledge, build connections, and collaborate on shaping the future of our company and product. A team-building activity An elevator ride to one of the sessions The summit brought together not just the engineering teams but also our Customer Experience (CX) colleagues. With their insights into the real-world experiences of our customers, the CX team helped us see the bigger picture and better understand how our work impacts those who use our product. The CX team Looking Inward, Moving Forward The summit was packed with really insightful talks, giving us a chance to reflect on where we are and where we’re heading next. It was all about looking back at the wins we’ve had so far, getting excited about the cool new features we’re working on now, and diving deep into what’s coming down the pipeline. CEO and Co-founder, Dor Laor, kicking off the summit The sessions sparked fruitful discussions about how we can keep pushing forward and build on the strong foundation we’ve already laid. The speakers touched on a variety of topics, including: ScyllaDB X Cloud Consistent topology Data distribution with tablets Object storage Tombstone garbage collection Customer stories Improving customer experience And many more Notes and focus doodles Collaboration at Its Best: The Hackathon This year, we took the summit energy to the next level with a hackathon that brought out the best in creativity, collaboration, and problem-solving. Participants were divided into small teams, each tackling a unique problem. The projects were chosen ahead of time so that we had a chance to work on real challenges that could make a tangible impact on our product and processes. The range of projects was diverse. Some teams focused on adding new features, like implementing a notification API to enhance the user experience. Others took on documentation-related challenges, improving the way we share knowledge. But across the board, every team managed to create a functioning solution or prototype. At the hackathon The hackathon brought people from different teams together to tackle complex issues, pushing everyone a bit outside their comfort zone. Beyond the technical achievements, it was a powerful team-building experience, reinforcing our culture of collaboration and shared purpose. It reminded us that solving real-life challenges—and doing it together—makes our work even more rewarding. The hackathon will undoubtedly be a highlight of future summits to come! From Development to Dance And then, of course, came the party. The atmosphere shifted from work to celebration with live music from a band playing all-time hits, followed by a DJ spinning tracks that kept everyone on their feet. Live music at the party Almost everyone hit the dance floor—even those who usually prefer to sit it out couldn’t resist the rhythm. It was the perfect way to unwind and celebrate the success of the summit! Sea monsters swaying Exploring Sofia and Beyond During our time in Sofia, we had the chance to immerse ourselves in the city’s rich history and culture. Framed by the dramatic Balkan mountains, Sofia blends the old with the new, offering a mix of history, culture, and modern vibe. We wandered through the ancient ruins of the Roman Theater and visited the iconic Alexander Nevsky Cathedral, marveling at their beauty and historical significance. To recharge our batteries, we enjoyed delicious meals in modern Bulgarian restaurants. In front of Alexander Nevsky Cathedral But the adventure didn’t stop in the city. We took a day trip to the Rila Mountains, where the breathtaking landscapes and serene atmosphere left us in awe. One of the standout sights was the Rila Monastery, a UNESCO World Heritage site known for its stunning architecture and spiritual significance. The Rila Monastery After soaking in the peaceful vibes of the monastery, we hiked the trail leading to the Stob Earth Pyramids, a natural wonder that looked almost otherworldly. The Stob Pyramids The hike was rewarding, offering stunning views of the mountains and the unique rock formations below. It was the perfect way to experience Bulgaria’s natural beauty while winding down from the summit excitement. Happy hiking Looking Ahead to the Future As we wrapped up this year’s summit, we left feeling energized by the connections made, ideas shared, and challenges overcome. From brainstorming ideas to clinking glasses around the dinner table, this summit was a reminder of why in-person gatherings are so valuable—connecting not just as colleagues but as a team united by a common purpose. As ScyllaDB continues to expand, we’re excited for what lies ahead, and we can’t wait to meet again next year. Until then, we’ll carry the lessons, memories, and new friendships with us as we keep moving forward. Чао! We’re hiring – join our team! Our teamHow ScyllaDB Simulates Real-World Production Workloads with the Rust-Based “latte” Benchmarking Tool
Learn why we use a little-known benchmarking tool for testing Before using a tech product, it’s always nice to know its capabilities and limits. In the world of databases, there are a lot of different benchmarking tools that help us assess that… If you’re ok with some standard benchmarking scenarios, you’re set – one of the existing tools will probably serve you well. But what if not? Rigorously assessing ScyllaDB, a high-performance distributed database, requires testing some rather specific scenarios, ones with real-world production workloads. Fortunately, there is a tool to help with that. It is latte: a Rust-based lightweight benchmarking tool for Apache Cassandra and ScyllaDB.Special thanks to Piotr Kołaczkowski for implementing the latte benchmarking tool.We (the ScyllaDB testing team) forked it and enhanced it. In this blog post, I’ll share why and how we adopted it for our specialized testing needs. About latte Our team really values latte’s “flexibility.” Want to create a schema using a user defined type (UDT), Map, Set, List, or any other data type? Ok. Want to create a materialized views and query it? Ok. Want to change custom function behavior based on elapsed time? Ok. Want to run multiple custom functions in parallel? Ok. Want to use small, medium, and large partitions? Ok. Basically, latte lets us define any schema and workload functions. We can do this thanks to its implementation design. The
latte
tool is a type of engine/kernel and
rune scripts
are essentially the “business logic”
that’s written separately. Rune scripts
are an
enhanced, more powerful, analog of what cassandra-stress calls
user profiles
. The rune scripting language is
dynamically-typed and native to the Rust programming language
ecosystem. Here’s a simple example of a rune script: In the above
example of a rune script, we defined 2 required functions
(schema
and prepare
) and one custom to be
used as our workload –myinsert
. First, we create a
schema: Then, we use the latte run
command to call our
custom myinsert
function: The
replication_factor
parameter above is a custom
parameter. If we do not specify it, then latte will use its default
value, 3
. We can define any number of custom
parameters. How is latte different from other benchmarking Tools?
Based on our team’s experiences, here’s how latte
compares to the 2 main competitors: cassandra-stress
and ycsb
: How is our fork of latte different from the
original latte project? At ScyllaDB, our main use case for latte is
testing complex and realistic customer scenarios with controlled
disruptions. But (from what we understand), the project was
designed to perform general latency measurements in healthy DB
clusters. Given these different goals, we changed some features
(“overlapping features”) – and added other new ones (“unique to our
fork”). Here’s an overview: Overlapping features differences
Latency measurement. Fork-latte accounts for coordinated
omission in latencies The original project doesn’t consider the
“coordinated omission” phenomenon. Saturated DB impact. When a
system under test cannot keep up with the load/stress, fork-latte
tries to satisfy the “rate”, compensating for missed scheduler
ticks ASAP. Source-latte pulls back on violating the rate
requirement and doesn’t later compensate for missed scheduler
ticks. This isn’t a “bug”; it is a design decision which also
violates the idea of proper latency calculation related to the
“coordinated omission” phenomenon. Retries. We enabled retries by
default; there, it is disabled by default. Prepared statements.
Fork-latte supports all the CQL data types available in ScyllaDB
Rust Driver. The source project has limited support of CQL data
types. ScyllaDB Rust Driver. Our fork uses the latest version –
“1.2.0” The source project sticks to the old version “0.13.2”
Stress execution reporting. Report is disabled by default in
fork-latte. It’s enabled in source-latte. Features unique to our
fork Preferred datacenter support. Useful for testing multi-DC DB
setups Preferred rack support. Useful for testing multi-rack DB
setups Possibility to get a list of actual datacenter values from
the DB nodes that the driver connected to. Useful for creating
schema with dc-based keyspaces Sine-wave rate limiting. Useful for
SLA/Workload
Prioritization demo and OLTP testing with peaks and lows. Batch
query type support. Multi-row partitions. Our fork can create
multi-row partitions of different sizes. Page size support for
select queries. Useful using multi-row partitions feature. HDR
histograms support. The source project has only 1 way to get the
HDR histograms data It stores HDR histograms data in RAM till the
end of a stress command execution and only in the end releases it
as part of a report. Leaks RAM. Forked latte supports the above
inherited approach and one more: Real-time streaming of HDR
histogram data not storing in RAM. No RAM leaks. Rows count
validation for select queries. Useful for testing
data resurrection. Example: Testing multi-row partitions
of different sizes Let’s look at one specific user scenario where
we applied our fork of latte to test ScyllaDB. For background, one
of the user’s ScyllaDB production clusters was using large
partitions which could be grouped by size in 3 groups: 2000, 4000
and 8000 rows per partition. 95% of the partitions had 2000 rows,
4% of partitions had 4000 rows, and the last 1% of partitions had
8000 rows. The target table had 20+ different columns of different
types. Also, ScyllaDB’s Secondary Indexes (SI) feature was enabled
for the target table. One day, on one of the cloud providers,
latencies spiked and throughput dropped. The source of the problem
was not immediately clear. To learn more, we needed to have a quick
way to reproduce the customer’s workload in a test environment.
Using the latte
tool and its great flexibility, we
created a rune script covering all the above specifics. The
simplified rune script looks like the following: Assume we have a
ScyllaDB cluster where one of the nodes has a
172.17.0.2
IP address. Here is the command to create
the schema we need: And here is the command to populate the
just-created table: To read from the main table and from the MV,
use a similar command – just replacing the function name to
get
and get_from_mv
respectively. So, the
usage of the above commands allowed us to get a stable issue
reproducer and work on its solution. Working with ScyllaDB’s
Workload Prioritization feature In other cases, we needed to:
Create a
Workload Prioritization (WLP) demo. Test an OLTP setup with
continuous peaks and lows to showcase giving priority to different
workloads. And for these scenarios, we used a special latte feature
called sine wave rate
. This is an extension to the
common rate-limiting
feature. It allows us to specify
how many operations per second we want to produce. It can be used
with following command parameters: And looking at the monitoring,
we can see the following picture of the operations per second
graph: Internal testing of tombstones (validation) As of June 2025,
forked latte supports row count validation. It is useful for
testing
data resurrection. Here is the rune script for latte to
demonstrate these capabilities: As before, we create the schema
first: Then, we populate the table with 100k rows using the
following command: To check that all rows are in place, we use
command similar to the one above, change the function to be
get
, and define the validation strategy to be
fail-fast
: The supported validation strategies are
retry
, fail-fast
, ignore
.
Then, we run 2 different commands in parallel. Here is the first
one, which deletes part of the rows: Here is the second one, which
knows when we expect 1 row and when we expect none: And here is the
timing of actions that take place during these 2 commands’ runtime:
That’s a simple example of how we can check whether data got
deleted or not. In long-running testing scenarios, we might run
more parallel commands, make them depend on the elapsed time, and
many more other flexibilities. Conclusions Yes, to take advantage
of latte, you first need to study a bit of rune
scripting. But
once you’ve done that to some extent, especially having available
examples, it becomes a powerful tool that is capable of covering
various scenarios of different types.