Planet Cassandra

Big ScyllaDB Performance Gains on Google Cloud’s New Smaller Z3 Instances

16 July 2025, 2:00 pm by ScyllaDB

Benchmarks of ScyllaDB on Google Cloud’s new Z3 small instances achieved higher throughput and lower latency than N2 equivalents, especially under heavy load ScyllaDB recently had the privilege of examining Google Cloud’s shiny new small shape Z3 GCE instances in an early preview. The Z3 series is optimized for workloads that require low latency and high performance access to large data sets. Likewise, ScyllaDB is engineered to deliver predictable low latency, even with workloads exceeding millions of OPS per machine. Naturally, both ScyllaDB and Google Cloud were curious to see how these innovations translated to performance gains with data-intensive use cases. So, we partnered with Google Cloud to test ScyllaDB on the new instances. TL;DR When we tested ScyllaDB on these new Z3 small shape instances vs. the previous generation of N2 instances, we found significant throughput improvements as well as reduced latencies…particularly at high load scenarios. Why the New Z3 Instances Matter Z3 is Google Cloud’s first generation of Storage Optimized VMs, specifically designed to combine the latest CPU, memory, network, and high-density local SSD advancements. It introduces 36 TB of local SSD with up to 100 Gbps network throughput in its largest shape and brings in significant software-level improvements like partitioned placement policies, enhanced maintenance configurations, and optimized Hyperdisk support. The Z3 series has been available for over a year now. Previously, Z3 was only available in large configurations (88 and 176 vCPUs). With this new addition to the Z3 family, users can now choose from a broader range of high-performance instances, including shapes with 8, 16, 22, 32, and 44 vCPUs – all built on 4th Gen Intel Xeon Scalable (Sapphire Rapids), DDR5 memory, and local SSDs configured for maximum density and throughput. The new instance types — especially those in the 8 to 44 vCPU range — allow ScyllaDB to extend Z3 performance advantages to a broader set of workloads and customer profiles. And now that ScyllaDB X Cloud just introduced support for mixed-instance clusters, it’s the perfect timing for these new instances. Our customers can use them to expand and contract capacity with high precision. Or they can start small, then seamlessly shift to larger instances as their traffic grows. Test Methodology We evaluated the new Z3 instances against our current N2-based configurations using our standard weekly regression testing suite. These tests focus on measuring latency across a range of throughput levels, including an unthrottled phase to identify maximum operations per second. For all tests, each cluster consisted of 3 ScyllaDB nodes. The Z3 clusters used z3-highmem-16-highlssd instances, while the N2 clusters used n2-highmem-16 instances with attached 6 TB high-performance SSDs to match the Z3 clusters’ storage. Both instance families come with 16 vCPUs and 128 GB RAM. The replication factor was set to 3 to reflect our typical production setup. Four workloads were tested on ScyllaDB version 2025.1.2 with vnode-based keyspaces: Read (100% cache hit) Read (100% cache miss) Write Mixed (50% reads, 50% writes) For load generation, we used cassandra-stress with 1kb row size (one column). Each workload was progressively throttled to multiple fixed throughput levels, followed by an unthrottled phase. For throttled scenarios, we aimed for sub-millisecond to ~10ms latencies. For unthrottled loads, latency was disregarded to maximize throughput measurements. Benchmark Results First off, here’s an overview of the throughput results, combined: Now for the details… 1. Read Workload (100% Cache Hit) Latency results Load N2 P99 [ms] Z3 P99 [ms] 150k 0.64 0.5 300k 1.37 0.86 450k 7.23 6.23 600k Couldn’t meet op/s 10.02 700k Couldn’t meet op/s 13.1 The Z3 cluster consistently delivered better tail latencies across all load levels. For higher loads, the N2 based cluster couldn’t keep up, so we presented only results for the Z3 cluster. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 569,566 1,151,739 102 Due to superb performance gains from the CPU family upgrade, the Z3 cluster achieved a staggering 102% higher throughput than the N2 did at the unthrottled level. 2. Read Workload (100% Cache Miss) Latency results Load N2 P99 [ms] Z3 P99 [ms] 80k 2.53 2.02 165k 3.99 3.11 250k Couldn’t meet op/s 4.7 Again, the Z3 cluster achieved better latency results across all tested loads and could serve higher throughput while keeping latencies low. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 236,528 310,880 31 With a 100% cache read workload that’s bounded by a mix of disk and CPU performance, the Z3 cluster achieved a significant 31% gain in maximum throughput. 3. Write Workload Latency results Load N2 P99 [ms] Z3 P99 [ms] 200k 3.27 3.21 300k >100 ms 4.19 Although latencies remained relatively similar under moderate load, the N2 instances couldn’t sustain them under higher loads. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 349,995 407,951 17 Due to heavy compactions and intensive disk utilization, the write workload also takes advantage of Z3’s advancements. Here, it achieved 17% higher throughput. 4. Mixed Workload (50% Read / 50% Write) Latency results Load N2 P99 Write [ms] Z3 P99 Write [ms] N2 P99 Read [ms] Z3 P99 Read [ms] 50k 2.07 2.04 2.08 2.11 150k 2.27 2.65 2.65 2.93 300k 4.71 3.88 5.12 4.15 450k >100 ms 15.49 >100 ms 16.13 The Z3 cluster maintained similar latency characteristics to the N2 one in lower throughput ranges. In higher ones, it kept a consistent edge since it was able to serve data reliably at a wider range. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 519,154 578,380 11 With a 50% read:write ratio, the Z3 instances achieved 11% higher throughput for both read and write operations. Our Verdict on the New Z3 Instances The addition of Z3 smaller shapes brings new flexibility to ScyllaDB Cloud users. Whether you’re looking to scale down while retaining high SSD performance or ramp up throughput in cost-sensitive environments, Z3 offers a compelling alternative to N2. We’re excited to support the smaller Z3 instance types in ScyllaDB Cloud. These VMs will complement the existing N2 options and enable more flexible deployment profiles for workloads that demand high storage IOPS and network bandwidth without committing to extremely large core counts. What’s Next This first round of testing found that performance improvements on Z3 become significantly more pronounced as the load scales. We believe that stems from ScyllaDB’s ability to fully utilize the underlying hardware. Moving forward, we’ll continue validating Z3 under other scenarios (e.g., higher disk utilization, large partitions, compaction pressure, heterogeneous cluster mixing) and uplift our internal tuning recommendations accordingly.

Real-Time Machine Learning with ScyllaDB as a Feature Store

15 July 2025, 2:56 pm by ScyllaDB

What ML feature stores require and how ScyllaDB fits in as fast, scalable online feature store In this blog post, we’ll explore the role of feature stores in real-time machine learning (ML) applications and why ScyllaDB is a strong choice for online feature serving. We’ll cover the basics of features, how feature stores work, their benefits, the different workload requirements, and how latency plays a critical role in ML applications. We’ll wrap up by looking at popular feature store frameworks like Feast and how to get started with ScyllaDB as your online feature store. What is a feature in machine learning? A feature is a measurable property used to train or serve a machine learning model. Features can be raw data points or engineered values derived from the raw data. For instance, in a social media app like ShareChat, features might include: Number of likes in the last 10 minutes Number of shares over the past 7 days Topic of the post Image credit: Ivan Burmistrov and Andrei Manakov (ShareChat) These data points help predict outcomes such as user engagement or content recommendation. A feature vector is simply a collection of features related to a specific prediction task. For example, this is what a feature vector could look like for a credit scoring application. zipcode person_age person_income loan_amount loan_int_rate (%) 94109 25 120000 10000 12 Selecting relevant data points and transforming them into features takes up a significant portion of the work in machine learning projects. It is also an ongoing process to refine and optimize features so the model being trained becomes more accurate over time. Feature store architectures In order to efficiently work with features, you can create a central place to manage the features that are available within your organization. A central feature store enables: A standard process to create new features Storage of features for simplified access Discovery and reuse of features across teams Serving features for both model training and inference Most architectures distinguish between two stores/databases: Offline store for model training (bulk writes/reads) Online store for inference (real-time, low-latency writes/reads) A typical feature store pipeline starts with ingesting raw data (from data lakes or streams), performing feature engineering, saving features in both stores, and then serving them through two separate pipelines: one for training and one for inference. Benefits of a centralized feature store Centralized feature stores offer several advantages: Avoid duplication: teams can reuse existing features Self-serve access: data scientists can generate and query features independently Unified pipelines: even though training and inference workloads are vastly different, they can still be queried using the same abstraction layer This results in faster iteration, more consistency, and better collaboration across ML workflows. Different workloads in feature stores Let’s break down the two very distinct workload requirements that exist within a feature store: model training and real-time inference. 1. Model training (offline store) In order to make predictions you need to train a machine learning model first. Training requires a large and high-quality dataset. You can store this dataset in an offline feature store. Here’s a run down of what characteristics matter most for model training workloads: Latency: Not a priority Volume: High (millions to billions of records) Frequency: Infrequent, scheduled jobs Purpose: Retrieve a large chunk of historical data Basically, offline stores need to efficiently store huge datasets. 2. Real-time inference (online store) Once you have a model ready, you can run real-time inference. Real-time inference takes the input provided by the user and turns it into a prediction. Here’s a look at what characteristics matter most for real-time inference: Latency: High priority Volume: Low per request but high throughput (up to millions of operations/second) Frequency: Constant, triggered by user actions (e.g. ordering food) Purpose: Serve up-to-date features for making predictions quickly For example, consider a food delivery app. The user’s recent cart contents, age, and location might be turned into features and used instantly to recommend other items to purchase. This would require real-time inference – and latency makes or breaks the user experience. Why latency matters Latency (in the context of this article) refers to the time between sending a query and receiving the response from the feature store. For real-time ML applications – especially user-facing ones– low latency is critical for success. Imagine a user at checkout being shown related food items. If this suggestion takes too long to load due to a slow online store, the opportunity is lost. The end-to-end flow from Ingesting the latest data Querying relevant features Running inference Returning a prediction must happen in milliseconds. Choosing a feature store solution Once you decide to build a feature store, you’ll quickly find that there are dozens of frameworks and providers, both open source and commercial, to choose from: Feast (open source): Provides flexible database support (e.g., Postgres, Redis, Cassandra, ScyllaDB) Hopsworks: Tightly coupled with its own ecosystem AWS SageMaker: Tied to the AWS stack (e.g., S3, DynamoDB) And lots of others Which one is best? Factors like your team’s technical expertise, latency requirements, and required integrations with your existing stack all play a role. There’s no one-size-fits-all solution. If you are worried about the scalability and performance of your online feature store, then database flexibility should be a key consideration. There are feature stores (e.g. AWS SageMaker, GCP Vertex, Hopsworks etc.) that provide their own database technology as the online store. On one hand, this might be convenient to get started because everything is handled by one provider. But this can also become a problem later on. Imagine choosing a vendor like this with a strict P99 latency requirement (e.g., <15ms P99). The requirement is successfully met during the proof of concept (POC). But later you experience latency spikes – maybe because your requirements change or there’s a surge of new users in your app or some other unpredictable reason. You want to switch to a different online store database backend to save costs. The problem is you cannot… at least not easily. You are stuck with the built-in solution. It’s unfeasible to migrate off just the online store part of your architecture because everything is locked in. If you want to avoid these situations, you can look into tools that are flexible regarding the offline and online store backend. Tools like Feast or FeatureForm allow you to bring your own database backend, both for the online and offline stores. This is a great way to avoid vendor lock-in and make future database migrations less painful in case latency spikes occur or costs rise. ScyllaDB as an online feature store ScyllaDB is a high-performance NoSQL database that’s API compatible with Apache Cassandra and DynamoDB API. It’s implemented in C++, uses a shard-per-core architecture, and includes an embedded cache system, making it ideal for low-latency, high-throughput feature store applications. Why ScyllaDB? Low latency (single-digit millisecond P99 performance) High availability and resilience High throughput at scale (petabyte-scale deployments) No vendor lock-in (runs on-prem or in any cloud) Drop-in replacement for existing Cassandra/DynamoDB setups Easy migration from other NoSQL databases (Cassandra, DynamoDB, MongoDB, etc) Integration with the feature store framework Feast ScyllaDB shines in online feature store use cases where real-time performance, availability, and latency predictability are critical. ScyllaDB + Feast integration Feast is a popular open-source feature store framework that supports both online and offline stores. One of its strengths is the ability to plug in your own database sources, including ScyllaDB. Read more about the ScyllaDB + Feast integration in the docs. Get started with a feature store tutorial Want to try using ScyllaDB as your online feature store? Check out our tutorials that walk you through the process of creating a ScyllaDB cluster and building a real-time inference application. Tutorial: Price prediction inference app with ScyllaDB Tutorial: Real-time app with Feast & ScyllaDB Feast + ScyllaDB integration GitHub: ScyllaDB as a feature store code examples Have questions or want help setting it up? Submit a post in the forum!

Integrating support for AWS PrivateLink with Apache Cassandra® on the NetApp Instaclustr Managed Platform

15 July 2025, 12:00 pm by Apache Cassandra - Instaclustr

Discover how NetApp Instaclustr leverages AWS PrivateLink for secure and seamless connectivity with Apache Cassandra®. This post explores the technical implementation, challenges faced, and the innovative solutions we developed to provide a robust, scalable platform for your data needs.

Last year, NetApp achieved a significant milestone by fully integrating AWS PrivateLink support for Apache Cassandra® into the NetApp Instaclustr Managed Platform. Read our AWS PrivateLink support for Apache Cassandra General Availability announcement here. Our Product Engineering team made remarkable progress in incorporating this feature into various NetApp Instaclustr application offerings. NetApp now offers AWS PrivateLink support as an Enterprise Feature add-on for the Instaclustr Managed Platform for Cassandra, Kafka®, OpenSearch®, Cadence®, and Valkey™.

The journey to support AWS PrivateLink for Cassandra involved considerable engineering effort and numerous development cycles to create a solution tailored to the unique interaction between the Cassandra application and its client driver. After extensive development and testing, our product engineering team successfully implemented an enterprise ready solution. Read on for detailed insights into the technical implementation of our solution.

What is AWS PrivateLink?

PrivateLink is a networking solution from AWS that provides private connectivity between Virtual Private Clouds (VPCs) without exposing any traffic to the public internet. This solution is ideal for customers who require a unidirectional network connection (often due to compliance concerns), ensuring that connections can only be initiated from the source VPC to the destination VPC. Additionally, PrivateLink simplifies network management by eliminating the need to manage overlapping CIDRs between VPCs. The one-way connection allows connections to be initiated only from the source VPC to the managed cluster hosted in our platform (target VPC)—and not the other way around.

To get an idea of what major building blocks are involved in making up an end-to-end AWS PrivateLink solution for Cassandra, take a look at the following diagram—it’s a simplified representation of the infrastructure used to support a PrivateLink cluster:

simplified representation of the infrastructure used to support a PrivateLink cluster

In this example, we have a 3-node Cassandra cluster at the far right with one Cassandra node per Availability Zone (or AZ). Next, we have the VPC Endpoint Service and a Network Load Balancer (NLB). The Endpoint Service is essentially the AWS PrivateLink, and by design AWS needs it to be backed by an NLB–that’s pretty much what we have to manage on our side.

On the customer side, they must create a VPC Endpoint that enables them to privately connect to the AWS PrivateLink on our end; naturally, customers will also have to use a Cassandra client(s) to connect to the cluster.

AWS PrivateLink support with Instaclustr for Apache Cassandra

To incorporate AWS PrivateLink support with Instaclustr for Apache Cassandra on our platform, we came across a few technical challenges. First and foremost, the primary challenge was relatively straightforward: Cassandra clients need to talk to each individual node in a cluster.

However, the problem is that nodes in an AWS PrivateLink cluster are only assigned private IPs; that is what the nodes would announce by default when Cassandra clients attempt to discover the topology of the cluster. Cassandra clients cannot do much with the received private IPs as they cannot be used to connect to the nodes directly in an AWS PrivateLink setup.

We devised a plan of attack to get around this problem:

Make each individual Cassandra node listen for CQL queries on unique ports.
Configure the NLB so it can route traffic to the appropriate node based on the relevant unique port.
Let clients implement the AddressTranslator interface from the Cassandra driver. The custom address translator will need to translate the received private IPs to one of the VPC Endpoint Elastic Network Interface (or ENI) IPs without altering the corresponding unique ports.

To understand this approach better, consider the following example:

Suppose we have a 3-node Cassandra cluster. According to the proposed approach we will need to do the followings:

Let the nodes listen on ports 172.16.0.1:6001 (in AZ1), 172.16.0.2: 6002 (in AZ2) and 172.16.0.3: 6003 (in AZ3)
Configure the NLB to listen on the same set of ports
Define and associate target groups based on the port. For instance, the listener on port 6002 will be associated with a target group containing only the node that is listening on port 6002.
As for how the custom address translator is expected to work, let’s assume the VPC Endpoint ENI IPs are 192.168.0.1 (in AZ1), 192.168.0.2 (in AZ2) and 192.168.0.3 (in AZ3). The address translator should translate received addresses like so:
```
- 172.16.0.1:6001 --> 192.168.0.1:6001
- 172.16.0.2:6002 --> 192.168.0.2:6002
- 172.16.0.3:6003 --> 192.168.0.3:6003
```

The proposed approach not only solves the connectivity problem but also allows for connecting to appropriate nodes based on query plans generated by load balancing policies.

Around the same time, we came up with a slightly modified approach as well: we realized the need for address translation can be mostly mitigated if we make the Cassandra nodes return the VPC Endpoint ENI IPs in the first place.

But the excitement did not last for long! Why? Because we quickly discovered a key problem: there is a limit to the number of listeners that can be added to any given AWS NLB of just 50.

While 50 is certainly a decent limit, the way we designed our solution meant we wouldn’t be able to provision a cluster with more than 50 nodes. This was quickly deemed to be an unacceptable limitation as it is not uncommon for a cluster to have more than 50 nodes; many Cassandra clusters in our fleet have hundreds of nodes. We had to abandon the idea of address translation and started thinking about alternative solution approaches.

Introducing Shotover Proxy

We were disappointed but did not lose hope. Soon after, we devised a practical solution centred around using one of our open source products: Shotover Proxy.

Shotover Proxy is used with Cassandra clusters to support AWS PrivateLink on the Instaclustr Managed Platform. What is Shotover Proxy, you ask? Shotover is a layer 7 database proxy built to allow developers, admins, DBAs, and operators to modify in-flight database requests. By managing database requests in transit, Shotover gives NetApp Instaclustr customers AWS PrivateLink’s simple and secure network setup with the many benefits of Cassandra.

Below is an updated version of the previous diagram that introduces some Shotover nodes in the mix:

simplified representation of the infrastructure used to support a PrivateLink cluster with Shotover nodes included

As you can see, each AZ now has a dedicated Shotover proxy node.

In the above diagram, we have a 6-node Cassandra cluster. The Cassandra cluster sitting behind the Shotover nodes is an ordinary Private Network Cluster. The role of the Shotover nodes is to manage client requests to the Cassandra nodes while masking the real Cassandra nodes behind them. To the Cassandra client, the Shotover nodes appear to be Cassandra nodes, and it is only them that make up the entire cluster! This is the secret recipe for AWS PrivateLink for Instaclustr for Apache Cassandra that enabled us to get past the challenges discussed earlier.

So how is this model made to work?

Shotover can alter certain requests from—and responses to—the client. It can examine the tokens allocated to the Cassandra nodes in its own AZ (aka rack) and claim to be the owner of all those tokens. This essentially makes them appear to be an aggregation of the nodes in its own rack.

Given the purposely crafted topology and token allocation metadata, while the client directs queries to the Shotover node, the Shotover node in turn can pass them on to the appropriate Cassandra node and then transparently send responses back. It is worth noting that the Shotover nodes themselves do not store any data.

Because we only have 1 Shotover node per AZ in this design and there may be at most about 5 AZs per region, we only need that many listeners in the NLB to make this mechanism work. As such, the 50-listener limit on the NLB was no longer a problem.

The use of Shotover to manage client driver and cluster interoperability may sound straight forward to implement, but developing it was a year-long undertaking. As described above, the initial months of development were devoted to engineering CQL queries on unique ports and the AddressTranslator interface from the Cassandra driver to gracefully manage client connections to the Cassandra cluster. While this solution did successfully provide support for AWS PrivateLink with a Cassandra cluster, we knew that the 50-listener limit on the NLB was a barrier for use and wanted to provide our customers with a solution that could be used for any Cassandra cluster, regardless of node count.

The next few months of engineering were then devoted to the Proof of Concept of an alternative solution with the goal to investigate how Shotover could manage client requests for a Cassandra cluster with any number of nodes. And so, after a solution to support a cluster with any number of nodes was successfully proved, subsequent effort was then devoted to work through stability testing the new solution, the results of that engineering being the stable solution described above.

We have also conducted performance testing to evaluate the relative performance of a PrivateLink-enabled Cassandra cluster compared to its non-PrivateLink counterpart. Multiple iterations of performance testing were executed as some adjustments to Shotover were identified from test cases and resulted in the PrivateLink-enabled Cassandra cluster throughput and latency measuring near to a standard Cassandra cluster throughput and latency.

The following was our experimental setup for identifying the max throughput in terms of Operations per second of a Cassandra PrivateLink cluster in comparison to a non-Cassandra PrivateLink cluster

Baseline node size: i3en.xlarge
Shotover Proxy node size on Cassandra Cluster: CSO-PRD-c6gd.medium-54
Cassandra version: 4.1.3
Shotover Proxy version: 0.2.0
Other configuration: Repair and backup disabled, Client Encryption disabled

Throughput results

Operation	Operation rate with PrivateLink and Shotover	Operation rate without PrivateLink
Mixed-small (3 Nodes)	16608	16206
Mixed-small (6 Nodes)	33585	33598
Mixed-small (9 Nodes)	51792	51798

Across different cluster sizes, we observed no significant difference in operation throughput between PrivateLink and non-PrivateLink configurations.

Latency results

Latency benchmarks were conducted at ~70% of the observed peak throughput (as above) to simulate realistic production traffic.

Operation	Ops/second	Setup	Mean Latency (ms)	Median Latency (ms)	P95 Latency (ms)	P99 Latency (ms)
Mixed-small (3 Nodes)	11630	Non-PrivateLink	9.90	3.2	53.7	119.4
		PrivateLink	9.50	3.6	48.4	118.8
Mixed-small (6 Nodes)	23510	Non-PrivateLink	6	2.3	27.2	79.4
		PrivateLink	9.10	3.4	45.4	104.9
Mixed-small (9 Nodes)	36255	Non-PrivateLink	5.5	2.4	21.8	67.6
		PrivateLink	11.9	2.7	77.1	141.2

Results indicate that for lower to mid-tier throughput levels, AWS PrivateLink introduced minimal to negligible overhead. However, at higher operation rates, we observed increased latency, most notably at the p99 mark—likely due to network level factors or Shotover.

The increase in latency is expected as AWS PrivateLink introduces an additional hop to route traffic securely, which can impact latencies, particularly under heavy load. For the vast majority of applications, the observed latencies remain within acceptable ranges. However, for latency-sensitive workloads, we recommend adding more nodes (for high load cases) to help mitigate the impact of the additional network hop introduced by PrivateLink.

As with any generic benchmarking results, performance may vary depending on specific data model, workload characteristics, and environment. The results presented here are based on specific experimental setup using standard configurations and should primarily be used to compare the relative performance of PrivateLink vs. Non-PrivateLink networking under similar conditions.

Why choose AWS PrivateLink with NetApp Instaclustr?

NetApp’s commitment to innovation means you benefit from cutting-edge technology combined with ease of use. With AWS PrivateLink support on our platform, customers gain:

Enhanced security: All traffic stays private, never touching the internet.
Simplified networking: No need to manage complex CIDR overlaps.
Enterprise scalability: Handles sizable clusters effortlessly.

By addressing challenges, such as the NLB listener cap and private-to-VPC IP translation, we’ve created a solution that balances efficiency, security, and scalability.

Experience PrivateLink today

The integration of AWS PrivateLink with Apache Cassandra® is now generally available with production-ready SLAs for our customers. Log in to the Console to create a Cassandra cluster with support for AWS PrivateLink with just a few clicks today. Whether you’re managing sensitive workloads or demanding performance at scale, this feature delivers unmatched value.

Want to see it in action? Book a free demo today and experience the Shotover-powered magic of AWS PrivateLink firsthand.

Resources

Getting started: Visit the documentation to learn how to create an AWS PrivateLink-enabled Apache Cassandra cluster on the Instaclustr Managed Platform.
Connecting clients: Already created a Cassandra cluster with AWS PrivateLink? Click here to read about how to connect Cassandra clients in one VPC to an AWS PrivateLink-enabled Cassandra cluster on the Instaclustr Platform.
General availability announcement: For more details, read our General Availability announcement on AWS PrivateLink support for Cassandra.

The post Integrating support for AWS PrivateLink with Apache Cassandra® on the NetApp Instaclustr Managed Platform appeared first on Instaclustr.

Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow

10 July 2025, 7:31 pm by Netflix TechBlog - Medium

By Eugene Yemelyanau, Jake Grice

Introduction

Tudum.com is Netflix’s official fan destination, enabling fans to dive deeper into their favorite Netflix shows and movies. Tudum offers exclusive first-looks, behind-the-scenes content, talent interviews, live events, guides, and interactive experiences. “Tudum” is named after the sonic ID you hear when pressing play on a Netflix show or movie. Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix.

Initial architecture

At the end of 2021, when we envisioned Tudum’s implementation, we considered architectural patterns that would be maintainable, extensible, and well-understood by engineers. With the goal of building a flexible, configuration-driven system, we looked to server-driven UI (SDUI) as an appealing solution. SDUI is a design approach where the server dictates the structure and content of the UI, allowing for dynamic updates and customization without requiring changes to the client application. Client applications like web, mobile, and TV devices, act as rendering engines for SDUI data. After our teams weighed and vetted all the details, the dust settled and we landed on an approach similar to Command Query Responsibility Segregation (CQRS). At Tudum, we have two main use cases that CQRS is perfectly capable of solving:

Tudum’s editorial team brings exclusive interviews, first-look photos, behind the scenes videos, and many more forms of fan-forward content, and compiles it all into pages on the Tudum.com website. This content comes onto Tudum in the form of individually published pages, and content elements within the pages. In support of this, Tudum’s architecture includes a write path to store all of this data, including internal comments, revisions, version history, asset metadata, and scheduling settings.
Tudum visitors consume published pages. In this case, Tudum needs to serve personalized experiences for our beloved fans, and accesses only the latest version of our content.

The high-level diagram above focuses on storage & distribution, illustrating how we leveraged Kafka to separate the write and read databases. The write database would store internal page content and metadata from our CMS. The read database would store read-optimized page content, for example: CDN image URLs rather than internal asset IDs, and movie titles, synopses, and actor names instead of placeholders. This content ingestion pipeline allowed us to regenerate all consumer-facing content on demand, applying new structure and data, such as global navigation or branding changes. The Tudum Ingestion Service converted internal CMS data into a read-optimized format by applying page templates, running validations, performing data transformations, and producing the individual content elements into a Kafka topic. The Data Service Consumer, received the content elements from Kafka, stored them in a high-availability database (Cassandra), and acted as an API layer for the Page Construction service and other internal Tudum services to retrieve content.

A key advantage of decoupling read and write paths is the ability to scale them independently. It is a well-known architectural approach to connect both write and read databases using an event driven architecture. As a result, content edits would eventually appear on tudum.com.

Challenges with eventual consistency

Did you notice the emphasis on “eventually?” A major downside of this architecture was the delay between making an edit and observing that edit reflected on the website. For instance, when the team publishes an update, the following steps must occur:

Call the REST endpoint on the 3rd party CMS to save the data.
Wait for the CMS to notify the Tudum Ingestion layer via a webhook.
Wait for the Tudum Ingestion layer to query all necessary sections via API, validate data and assets, process the page, and produce the modified content to Kafka.
Wait for the Data Service Consumer to consume this message from Kafka and store it in the database.
Finally, after some cache refresh delay, this data would eventually become available to the Page Construction service. Great!

By introducing a highly-scalable eventually-consistent architecture we were missing the ability to quickly render changes after writing them — an important capability for internal previews.

In our performance profiling, we found the source of delay was our Page Data Service which acted as a facade for an underlying Key Value Data Abstraction database. Page Data Service utilized a near cache to accelerate page building and reduce read latencies from the database.

This cache was implemented to optimize the N+1 key lookups necessary for page construction by having a complete data set in memory. When engineers hear “slow reads,” the immediate answer is often “cache,” which is exactly what our team adopted. The KVDAL near cache can refresh in the background on every app node. Regardless of which system modifies the data, the cache is updated with each refresh cycle. If you have 60 keys and a refresh interval of 60 seconds, the near cache will update one key per second. This was problematic for previewing recent modifications, as these changes were only reflected with each cache refresh. As Tudum’s content grew, cache refresh times increased, further extending the delay.

RAW Hollow

As this pain point grew, a new technology was being developed that would act as our silver bullet. RAW Hollow is an innovative in-memory, co-located, compressed object database developed by Netflix, designed to handle small to medium datasets with support for strong read-after-write consistency. It addresses the challenges of achieving consistent performance with low latency and high availability in applications that deal with less frequently changing datasets. Unlike traditional SQL databases or fully in-memory solutions, RAW Hollow offers a unique approach where the entire dataset is distributed across the application cluster and resides in the memory of each application process.

This design leverages compression techniques to scale datasets up to 100 million records per entity, ensuring extremely low latencies and high availability. RAW Hollow provides eventual consistency by default, with the option for strong consistency at the individual request level, allowing users to balance between high availability and data consistency. It simplifies the development of highly available and scalable stateful applications by eliminating the complexities of cache synchronization and external dependencies. This makes RAW Hollow a robust solution for efficiently managing datasets in environments like Netflix’s streaming services, where high performance and reliability are paramount.

Revised architecture

Tudum was a perfect fit to battle-test RAW Hollow while it was pre-GA internally. Hollow’s high-density near cache significantly reduces I/O. Having our primary dataset in memory enables Tudum’s various microservices (page construction, search, personalization) to access data synchronously in O(1) time, simplifying architecture, reducing code complexity, and increasing fault tolerance.

In our simplified architecture, we eliminated the Page Data Service, Key Value store, and Kafka infrastructure, in favor of RAW Hollow. By embedding the in-memory client directly into our read-path services, we avoid per-request I/O and reduce roundtrip time.

Migration results

The updated architecture yielded a monumental reduction in data propagation times, and the reduced I/O led to faster request times as an added bonus. Hollow’s compression alleviated our concerns about our data being “too big” to fit in memory. Storing three years’ of unhydrated data requires only a 130MB memory footprint — 25% of its uncompressed size in an Iceberg table!

Writers and editors can preview changes in seconds instead of minutes, while still maintaining high-availability and in-memory caching for Tudum visitors — the best of both worlds.

But what about the faster request times? The diagram below illustrates the before & after timing to fulfil a request for Tudum’s home page. All of Tudum’s read-path services leverage Hollow in-memory state, leading to a significant increase in page construction speed and personalization algorithms. Controlling for factors like TLS, authentication, request logging, and WAF filtering, homepage construction time decreased from ~1.4 seconds to ~0.4 seconds!

An attentive reader might notice that we have now tightly-coupled our Page Construction Service with the Hollow In-Memory State. This tight-coupling is used only in Tudum-specific applications. However, caution is needed if sharing the Hollow In-Memory Client with other engineering teams, as it could limit your ability to make schema changes or deprecations.

Key Learnings

CQRS is a powerful design paradigm for scale, if you can tolerate some eventual consistency.
Minimizing the number of sequential operations can significantly reduce response times. I/O is often the main enemy of performance.
Caching is complicated. Cache invalidation is a hard problem. By holding an entire dataset in memory, you can eliminate an entire class of problems.

In the next episode, we’ll share how Tudum.com leverages Server Driven UI to rapidly build and deploy new experiences for Netflix fans. Stay tuned!

Credits

Thanks to Drew Koszewnik, Govind Venkatraman Krishnan, Nick Mooney, George Carlucci

Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Compaction Strategies, Performance, and Their Impact on Cassandra Node Density

10 July 2025, 12:00 am by Posts on RustyRazorblade Consulting

This is the third post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. In the first post, I examined how streaming operations impact node density and laid out the groundwork for understanding why higher node density leads to significant cost savings. In the second post, I discussed how compaction throughput is critical to node density and introduced the optimizations we implemented in CASSANDRA-15452 to improve throughput on disaggregated storage like EBS.

The Developer’s Data Modeling Cheat Guide‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‌‍‌‍‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‍‌‍‌‍‍‌‌‌‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‍‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‍‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‌‌‍‌‌‌‌‍‍‌‌‌‌‍‍‌‌‌‌‍‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‌‍‍‌‌‍‌‌‌‍‌‍‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‍‌‍‌‍‍‌‌‌‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‍‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‍‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‌‌‍‍‌‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

9 July 2025, 6:43 am by Datastax Technical HowTo's

In Cassandra 5.0, storage-attached indexes offer a new way to interact with your data, giving developers the flexibility to query multiple columns with filtering, range conditions, and better performance.

ScyllaDB’s Engineering Summit in Sofia, Bulgaria

2 July 2025, 1:01 pm by ScyllaDB

From hacking to hiking: what happens when engineering sea monsters get together ScyllaDB is a remote-first company, with team members spread across the globe. We’re masters of virtual connection, but every year, we look forward to the chance to step away from our screens and come together for our Engineering Summit. It’s a time to reconnect, exchange ideas, and share spontaneous moments of joy that make working together so special. This year, we gathered in Sofia, Bulgaria — a city rich in history and culture, set against the stunning backdrop of the Balkan mountains. Where Monsters Meet This year’s summit brought together a record-breaking number of participants from across the globe—over 150! As ScyllaDB continues its continued growth, the turnout well reflects our momentum and our team’s expanding global reach. We, ScyllaDB Monsters from all corners of the world, came together to share knowledge, build connections, and collaborate on shaping the future of our company and product. A team-building activity An elevator ride to one of the sessions The summit brought together not just the engineering teams but also our Customer Experience (CX) colleagues. With their insights into the real-world experiences of our customers, the CX team helped us see the bigger picture and better understand how our work impacts those who use our product. The CX team Looking Inward, Moving Forward The summit was packed with really insightful talks, giving us a chance to reflect on where we are and where we’re heading next. It was all about looking back at the wins we’ve had so far, getting excited about the cool new features we’re working on now, and diving deep into what’s coming down the pipeline. CEO and Co-founder, Dor Laor, kicking off the summit The sessions sparked fruitful discussions about how we can keep pushing forward and build on the strong foundation we’ve already laid. The speakers touched on a variety of topics, including: ScyllaDB X Cloud Consistent topology Data distribution with tablets Object storage Tombstone garbage collection Customer stories Improving customer experience And many more Notes and focus doodles Collaboration at Its Best: The Hackathon This year, we took the summit energy to the next level with a hackathon that brought out the best in creativity, collaboration, and problem-solving. Participants were divided into small teams, each tackling a unique problem. The projects were chosen ahead of time so that we had a chance to work on real challenges that could make a tangible impact on our product and processes. The range of projects was diverse. Some teams focused on adding new features, like implementing a notification API to enhance the user experience. Others took on documentation-related challenges, improving the way we share knowledge. But across the board, every team managed to create a functioning solution or prototype. At the hackathon The hackathon brought people from different teams together to tackle complex issues, pushing everyone a bit outside their comfort zone. Beyond the technical achievements, it was a powerful team-building experience, reinforcing our culture of collaboration and shared purpose. It reminded us that solving real-life challenges—and doing it together—makes our work even more rewarding. The hackathon will undoubtedly be a highlight of future summits to come! From Development to Dance And then, of course, came the party. The atmosphere shifted from work to celebration with live music from a band playing all-time hits, followed by a DJ spinning tracks that kept everyone on their feet. Live music at the party Almost everyone hit the dance floor—even those who usually prefer to sit it out couldn’t resist the rhythm. It was the perfect way to unwind and celebrate the success of the summit! Sea monsters swaying Exploring Sofia and Beyond During our time in Sofia, we had the chance to immerse ourselves in the city’s rich history and culture. Framed by the dramatic Balkan mountains, Sofia blends the old with the new, offering a mix of history, culture, and modern vibe. We wandered through the ancient ruins of the Roman Theater and visited the iconic Alexander Nevsky Cathedral, marveling at their beauty and historical significance. To recharge our batteries, we enjoyed delicious meals in modern Bulgarian restaurants. In front of Alexander Nevsky Cathedral But the adventure didn’t stop in the city. We took a day trip to the Rila Mountains, where the breathtaking landscapes and serene atmosphere left us in awe. One of the standout sights was the Rila Monastery, a UNESCO World Heritage site known for its stunning architecture and spiritual significance. The Rila Monastery After soaking in the peaceful vibes of the monastery, we hiked the trail leading to the Stob Earth Pyramids, a natural wonder that looked almost otherworldly. The Stob Pyramids The hike was rewarding, offering stunning views of the mountains and the unique rock formations below. It was the perfect way to experience Bulgaria’s natural beauty while winding down from the summit excitement. Happy hiking Looking Ahead to the Future As we wrapped up this year’s summit, we left feeling energized by the connections made, ideas shared, and challenges overcome. From brainstorming ideas to clinking glasses around the dinner table, this summit was a reminder of why in-person gatherings are so valuable—connecting not just as colleagues but as a team united by a common purpose. As ScyllaDB continues to expand, we’re excited for what lies ahead, and we can’t wait to meet again next year. Until then, we’ll carry the lessons, memories, and new friendships with us as we keep moving forward. Чао! We’re hiring – join our team! Our team

How ScyllaDB Simulates Real-World Production Workloads with the Rust-Based “latte” Benchmarking Tool

1 July 2025, 12:21 pm by ScyllaDB

Learn why we use a little-known benchmarking tool for testing Before using a tech product, it’s always nice to know its capabilities and limits. In the world of databases, there are a lot of different benchmarking tools that help us assess that… If you’re ok with some standard benchmarking scenarios, you’re set – one of the existing tools will probably serve you well. But what if not? Rigorously assessing ScyllaDB, a high-performance distributed database, requires testing some rather specific scenarios, ones with real-world production workloads. Fortunately, there is a tool to help with that. It is latte: a Rust-based lightweight benchmarking tool for Apache Cassandra and ScyllaDB.

Special thanks to Piotr Kołaczkowski for implementing the latte benchmarking tool.

We (the ScyllaDB testing team) forked it and enhanced it. In this blog post, I’ll share why and how we adopted it for our specialized testing needs. About latte Our team really values latte’s “flexibility.” Want to create a schema using a user defined type (UDT), Map, Set, List, or any other data type? Ok. Want to create a materialized views and query it? Ok. Want to change custom function behavior based on elapsed time? Ok. Want to run multiple custom functions in parallel? Ok. Want to use small, medium, and large partitions? Ok. Basically, latte lets us define any schema and workload functions. We can do this thanks to its implementation design. The latte tool is a type of engine/kernel and rune scripts are essentially the “business logic” that’s written separately. Rune scripts are an enhanced, more powerful, analog of what cassandra-stress calls user profiles. The rune scripting language is dynamically-typed and native to the Rust programming language ecosystem. Here’s a simple example of a rune script: In the above example of a rune script, we defined 2 required functions (schema and prepare) and one custom to be used as our workload –myinsert. First, we create a schema: Then, we use the latte run command to call our custom myinsert function: The replication_factor parameter above is a custom parameter. If we do not specify it, then latte will use its default value, 3. We can define any number of custom parameters. How is latte different from other benchmarking Tools? Based on our team’s experiences, here’s how latte compares to the 2 main competitors: cassandra-stress and ycsb: How is our fork of latte different from the original latte project? At ScyllaDB, our main use case for latte is testing complex and realistic customer scenarios with controlled disruptions. But (from what we understand), the project was designed to perform general latency measurements in healthy DB clusters. Given these different goals, we changed some features (“overlapping features”) – and added other new ones (“unique to our fork”). Here’s an overview: Overlapping features differences Latency measurement. Fork-latte accounts for coordinated omission in latencies The original project doesn’t consider the “coordinated omission” phenomenon. Saturated DB impact. When a system under test cannot keep up with the load/stress, fork-latte tries to satisfy the “rate”, compensating for missed scheduler ticks ASAP. Source-latte pulls back on violating the rate requirement and doesn’t later compensate for missed scheduler ticks. This isn’t a “bug”; it is a design decision which also violates the idea of proper latency calculation related to the “coordinated omission” phenomenon. Retries. We enabled retries by default; there, it is disabled by default. Prepared statements. Fork-latte supports all the CQL data types available in ScyllaDB Rust Driver. The source project has limited support of CQL data types. ScyllaDB Rust Driver. Our fork uses the latest version – “1.2.0” The source project sticks to the old version “0.13.2” Stress execution reporting. Report is disabled by default in fork-latte. It’s enabled in source-latte. Features unique to our fork Preferred datacenter support. Useful for testing multi-DC DB setups Preferred rack support. Useful for testing multi-rack DB setups Possibility to get a list of actual datacenter values from the DB nodes that the driver connected to. Useful for creating schema with dc-based keyspaces Sine-wave rate limiting. Useful for SLA/Workload Prioritization demo and OLTP testing with peaks and lows. Batch query type support. Multi-row partitions. Our fork can create multi-row partitions of different sizes. Page size support for select queries. Useful using multi-row partitions feature. HDR histograms support. The source project has only 1 way to get the HDR histograms data It stores HDR histograms data in RAM till the end of a stress command execution and only in the end releases it as part of a report. Leaks RAM. Forked latte supports the above inherited approach and one more: Real-time streaming of HDR histogram data not storing in RAM. No RAM leaks. Rows count validation for select queries. Useful for testing data resurrection. Example: Testing multi-row partitions of different sizes Let’s look at one specific user scenario where we applied our fork of latte to test ScyllaDB. For background, one of the user’s ScyllaDB production clusters was using large partitions which could be grouped by size in 3 groups: 2000, 4000 and 8000 rows per partition. 95% of the partitions had 2000 rows, 4% of partitions had 4000 rows, and the last 1% of partitions had 8000 rows. The target table had 20+ different columns of different types. Also, ScyllaDB’s Secondary Indexes (SI) feature was enabled for the target table. One day, on one of the cloud providers, latencies spiked and throughput dropped. The source of the problem was not immediately clear. To learn more, we needed to have a quick way to reproduce the customer’s workload in a test environment. Using the latte tool and its great flexibility, we created a rune script covering all the above specifics. The simplified rune script looks like the following: Assume we have a ScyllaDB cluster where one of the nodes has a 172.17.0.2 IP address. Here is the command to create the schema we need: And here is the command to populate the just-created table: To read from the main table and from the MV, use a similar command – just replacing the function name to get and get_from_mv respectively. So, the usage of the above commands allowed us to get a stable issue reproducer and work on its solution. Working with ScyllaDB’s Workload Prioritization feature In other cases, we needed to: Create a Workload Prioritization (WLP) demo. Test an OLTP setup with continuous peaks and lows to showcase giving priority to different workloads. And for these scenarios, we used a special latte feature called sine wave rate. This is an extension to the common rate-limiting feature. It allows us to specify how many operations per second we want to produce. It can be used with following command parameters: And looking at the monitoring, we can see the following picture of the operations per second graph: Internal testing of tombstones (validation) As of June 2025, forked latte supports row count validation. It is useful for testing data resurrection. Here is the rune script for latte to demonstrate these capabilities: As before, we create the schema first: Then, we populate the table with 100k rows using the following command: To check that all rows are in place, we use command similar to the one above, change the function to be get, and define the validation strategy to be fail-fast: The supported validation strategies are retry, fail-fast, ignore. Then, we run 2 different commands in parallel. Here is the first one, which deletes part of the rows: Here is the second one, which knows when we expect 1 row and when we expect none: And here is the timing of actions that take place during these 2 commands’ runtime: That’s a simple example of how we can check whether data got deleted or not. In long-running testing scenarios, we might run more parallel commands, make them depend on the elapsed time, and many more other flexibilities. Conclusions Yes, to take advantage of latte, you first need to study a bit of rune scripting. But once you’ve done that to some extent, especially having available examples, it becomes a powerful tool that is capable of covering various scenarios of different types.