Be Part of Something Big – Speak at Monster Scale Summit
Share your “extreme scale engineering” expertise with ~20K like-minded engineers Whether you’re designing, implementing, or optimizing systems that are pushed to their limits, we’d love to hear about your most impressive achievements and lessons learned – at Monster Scale Summit 2026. Become a Monster Scale Summit Speaker What’s Monster Scale Summit? Monster Scale Summit is a technical conference that connects the community of people working on performance-sensitive data-intensive applications. Engineers, architects, and SREs from gamechangers around the globe will be gathering virtually to explore “monster scale” challenges with respect to extreme levels of throughput, data, and global distribution. It’s a lot like P99 CONF (also hosted by ScyllaDB) – a two-day event that’s free, fully virtual, and highly interactive. The core difference is that it’s focused on extreme scale engineering vs. all things performance. Last time, we hosted industry giants like Kelsey Hightower, Martin Kleppmann, Discord, Slack, Canva… Browse past sessions Details please! When: March 11 + 12 Where: Wherever you’d like! It’s intentionally virtual, so you can present and interact with attendees from anywhere around the world. Topics: Core topics include distributed databases, streaming and real-time processing, intriguing system designs, methods for balancing latency/concurrency/throughput, SRE techniques proven at scale, and infrastructure built for unprecedented demands. What we’re looking for: We welcome a broad range of talks about tackling the challenges that arise in the most massive, demanding environments. The conference prioritizes technical talks sharing first-hand experiences. Sessions are just 18-20 minutes – so consider this your TED Talk debut! Share your ideasBeyond Apache Cassandra
ScyllaDB is no longer “just” a faster Cassandra. In 2008, Apache Cassandra set a new standard for database scalability. Born to support Facebook’s Inbox Search, it has since been adopted by tech giants like Uber, Netflix, and Apple – where it’s run by experts who also serve as Cassandra contributors (alongside DataStax/IBM). And as its adoption scaled, Cassandra remained true to its core mission of scaling on commodity hardware with high availability. But what about performance? Simplicity? Efficiency? Elasticity? In 2015, ScyllaDB was born to go beyond Cassandra’s suboptimal resource utilization. Fresh from creating KVM and hacking the Linux kernel, the founders believed that their low-level engineering approach could squeeze considerably more power from the underlying infrastructure. The timing was ideal: just a year earlier, Netflix had published their numbers showing how to push Apache Cassandra to 1 million write RPS. This was an impressive feat, but one that required significant infrastructure investments and tuning efforts. The idea was quite simple (in theory, at least): take Apache Cassandra’s scalable architecture and reimplement it close to the metal while keeping wire protocol compatibility. Not relying on Java meant less latency variability (plus no stop the world pauses), while a unique shard-per-core architecture maximized servers’ throughput even under heavy system load. To prevent contention, everything was made asynchronous, and all these optimizations were paired with autonomous internal schedulers for minimal operational overhead. That was 10 years ago. While I can’t speak to Cassandra’s current direction, ScyllaDB evolved quite significantly since then – shifting from “just” a faster Cassandra alternative to a database with its own identity and unique feature set. Spoiler: In this video, I walk you through some key differences between ScyllaDB and how it differs from Apache Cassandra. I discuss the differences in performance, elasticity, and capabilities such as workload prioritization. You can see how ScyllaDB maps data per CPU core, scales in parallel, and de-risks topology changes—allowing it to handle millions of OPS with predictable low latencies (and without constant tuning and babysitting). ScyllaDB’s Evolution The first generation of ScyllaDB was all about raw performance. That’s when we introduced the shard-per-core asynchronous architecture, row-based cache, and advanced schedulers that achieve predictable low latencies. ScyllaDB’s second generation aimed for feature parity with Cassandra, but we actually went beyond that. For example, we introduced our Materialized views and production-ready Global Secondary Indexes (something that Cassandra still flags as experimental). Likewise, ScyllaDB also introduced support for local secondary indexes in that same year; those were just introduced in Cassandra 5 (after at least three different indexing implementations). Moreover, our Paxos implementation for lightweight transactions eliminated much of the overhead and limitations in Cassandra’s alternative implementation. The third generation marked our shift to the cloud, along with continued innovation. This is when ScyllaDB Alternator—our DynamoDB-compatible API—was introduced. We added support for ZSTD compression in 2020 (Cassandra only adopted it late in 2021). During this period, we dramatically improved repair speeds with row-level repair and introduced workload prioritization (more on this in the next section). The fourth generation of ScyllaDB emerged around the time AWS announced their i3en instance family, with high-density nodes holding up to 60TB of data (something Cassandra still struggles to handle effectively). During this period, we introduced the Incremental Compaction Strategy (ICS), allowing users to utilize up to 70% of their storage before scaling out. This later evolved into a hybrid compaction strategy (and we now support 90% storage utilization). We also introduced Change Data Capture (CDC) with a fundamentally different approach from Cassandra’s. And we significantly extended the CQL protocol with concepts such as shard-awareness, BYPASS CACHE, per-query configurable TIMEOUTs, and much more. Finally, we arrive at the fifth generation of ScyllaDB, which is still unfolding. This phase represents our path toward strong consistency and elasticity with Raft and Tablets. For more about the significance of this, read on… Capabilities That Set ScyllaDB Apart Our engineers have introduced lots of interesting features over the past decade. Based on my interactions with former Cassandra users, I think these are the most interesting to discuss here. Tablets Data Distribution Each ScyllaDB table is split into smaller fragments (“tablets”) to evenly distribute data and load across the system. Tablets bring elasticity to ScyllaDB, allowing you to instantly double, triple, or even 10x your cluster size to accommodate unpredictable traffic surges. They also enable more efficient use of storage, reaching up to 90% utilization. Since teams can quickly scale out in response to traffic spikes, they can satisfy latency SLAs without needing to overprovision “just in case.” Raft-based Strong Consistency for Metadata Raft introduces strong consistency to ScyllaDB’s metadata. Gone are the days when a schema change could push your cluster into disagreement or you’d lose access because you forgot to update the replication factor of your authentication keyspace (issues that still plague Cassandra). Workload Prioritization Workload prioritization allows you to consolidate multiple workloads under a single cluster, each with its own SLA. Basically, it controls how different workloads compete for system resources. Teams use it to prioritize urgent application requests that require immediate response times versus others that can tolerate slighter delays (e.g., large scans). Common use cases include balancing real-time vs batch processing, splitting writes from reads, and workload/infrastructure consolidation. Repair-based Operations Repair-based operations ensure your cluster data stays in sync, even during topology changes. This addresses a long-standing data consistency flaw in Apache Cassandra, where operations like replacing failed nodes can result in data loss. ScyllaDB also fully eliminates the problem of data resurrection, thanks to repair-based tombstone garbage collection. Incremental Compaction Incremental compaction (ICS) has been the default compaction strategy in ScyllaDB for over five years. ICS greatly reduces the temporary space amplification, resulting in more disk space being available for storing user data – and that eliminates the typical requirement of 50% free space in your drive. There is no comparable Cassandra feature. Cassandra just recently introduced Unified Compaction, which has yet to prove itself. Row-based Cache ScyllaDB’s row-based cache is also unique. It is enabled by default and requires no manual tuning. With the BYPASS CACHE extension, you can prevent cache pollution by keeping important items from being invalidated. Additionally, SSTable index caching significantly reduces I/O access time when fetching data from disk. Per-shard Concurrency Limits and Rate Limiters ScyllaDB includes per-shard concurrency limits and rate limiters per partition to protect against unexpected spikes. Whether dealing with a misbehaving client or a flood of requests to a specific key, ScyllaDB ensures resilience where Cassandra often falls short. DynamoDB Compatibility ScyllaDB also offers a DynamoDB-compatible layer, further distancing itself from its Apache Cassandra origins. This lets teams run their DynamoDB workloads on any cloud or on-prem – without code changes, and with 50% lower cost. This has helped quite a few teams consolidate multiple workloads on ScyllaDB. What’s Next? At the recent Monster SCALE Summit, CEO/co-founder Dor Laor shared a peek at what’s next for ScyllaDB. A few highlights… Ready now (see this blog post and product page for details): The ability to safely run at 90% storage utilization Support for clusters with mixed instance type nodes Dynamic provisioning and flex credit Short-term: Vector search Strongly consistent tables Fault injection service Transparent repairs Object and tiered storage Raft for strongly consistent tables Longer-term Multi-key transactions Analytics and transformations with UDFs Automated large partition balancing Immutable infrastructure for greater stability and reliability A replication mode for more flexible and efficient infrastructure changes For details, watch the complete talk here: To close, ScyllaDB is faster than Cassandra (I’ll share our latest benchmark results here soon). But both ScyllaDB and Cassandra have evolved to the point that ScyllaDB is no longer “just” a faster Cassandra. We’ve evolved beyond Cassandra. If your project needs more predictable performance – and/or could benefit from the elasticity, efficiency, and simplicity optimizations we’ve been focusing on for years now – you might also want to consider evolving beyond Cassandra.We Built a Tool to Diagnose ScyllaDB Kubernetes Issues
Introducing Scylla Operator Analyze, a tool to help platform engineers and administrators deploy ScyllaDB clusters running on Kubernetes Imagine it’s a Friday afternoon. Your company is migrating all the data to ScyllaDB and you’re in the middle of setting up the cluster on Kubernetes. Then, something goes wrong. Your time today is limited, but the sheer volume of ScyllaDB configuration feels endless. To help you detect problems in ScyllaDB deployments, we built Scylla Operator Analyze, a command-line tool designed to automatically analyze Kubernetes-based ScyllaDB clusters, identify potential misconfigurations, and offer actionable diagnostics. In modern infrastructure management, Kubernetes has revolutionized how we orchestrate containers and manage distributed systems. However, debugging complex Kubernetes deployments remains a significant challenge, especially in production-grade, high-performance environments like those powered by ScyllaDB. In this blog post, we’ll explain what Scylla Operator Analyze is, how it works, and how it may help platform engineers and administrators deploy ScyllaDB clusters running on Kubernetes. The repo we’ve been working on is available here. It’s a fork of Scylla Operator, but the project hasn’t been merged upstream (it’s highly experimental). What is Scylla Operator Analyze? Scylla Operator Analyze is a Go-based command-line utility that extends Scylla Operator by introducing a diagnostic command. Its goal is straightforward: automatically inspect a given Kubernetes deployment and report problems it identified in the deployment configuration. We designed our tool to help ScyllaDB’s technical support staff to quickly diagnose known issues reported by our clients, both by providing solutions for simple problems, and helpful insights in more complex cases. However, it’s also freely available as a subcommand of the Scylla Operator binary. The next few sections share how we implemented the tool. If you want to go straight to example usage, skip to the Making a diagnosis section. Capturing the cluster state Kubernetes deployments consist of many components with various functions. Collectively, they are called resources. The Kubernetes API presents them to the client as objects containing fields with information about their configuration and current state. Two modes of operation Scylla Operator Analyze supports two ways of collecting these data: Live Cluster Connection The tool can connect directly to a Kubernetes cluster using the client-go API. Once connected, it retrieves data from Kubernetes resources and compiles it into an internal representation. Archive-Based Analysis (Must-Gather) Alternatively, the tool can analyze archived cluster states created using a utility calledmust-gather
. These archives
contain YAML descriptions of resources, allowing offline analysis.
Diagnosis by analyzing symptoms Symptoms are high-level objects
representing certain issues that could occur while deploying a
ScyllaDB cluster. A symptom contains the diagnosis of the problem
and a suggestion on how to fix it, as well as a method for checking
if the problem occurs in a given deployment (we cover this in the
section about selectors). In order to create objects representing
more complex problems, symptoms can be used to create tree-like
structures. For example, a problem that could manifest itself in a
few different ways could be represented by many symptoms checking
for all the different spots the problem could affect. Those
symptoms would be connected to one root symptom, describing the
cause of the problem. This way, if any of the sub-symptoms report
that their condition is met, the tool can display the root cause
instead of one specific manifestation of that problem. Example
of a symptom and the workflow used to detect it. In this
example, let’s assume that the driver is unable to provide storage,
but NodeConfig does not report a nonexistent device. When checking
if the symptom occurs, the tool will perform the following steps.
Check if the NodeConfig reports a nonexistent device – no Check if
the driver is unable to provide storage – yes. At this point we
know the symptom occurs, so we don’t need to check for any more
subsymptoms. Since one of the subsymptoms occurs, the main symptom
(NodeConfig configured with nonexistent volume) is reported to the
user. Deployment condition description Resources As described
earlier, Kubernetes deployments can be considered collections of
many interconnected resources. All resources are described using
so-called fields. Fields contain information identifying resources,
deployment configuration and descriptions of past and current
states. Together, these data give the controllers all the
information they need to supervise the deployment. Because of that,
they are very useful for debugging issues and are the main source
of information for our tool. Resources’ fields contain a special
kind field, which describes what the resource is and
indicates what other fields are available. Some fundamental
Kubernetes resource kinds include Pods, Services, etc. Those can
also be extended with custom ones, such as the
ScyllaCluster resource kind defined by the Scylla
Operator. This provides the most basic kind of grouping of
resources in Kubernetes. Other fields are grouped in sections
called Metadata, which provide identifying information,
Spec, which contain configuration and Status,
which contain current status. Such a description in YAML format may
look something like this: apiVersion: v1 kind: Pod metadata:
creationTimestamp: "2024-12-03T17:47:06Z" labels: scylla/cluster:
scylla scylla/datacenter: us-east-1 scylla/scylla-version: 6.2.0
name: scylla-us-east-1-us-east-1a-0 namespace: scylla spec:
volumes: - name: data persistentVolumeClaim: claimName:
data-scylla-us-east-1-us-east-1a-0 status: conditions: -
lastTransitionTime: "2024-12-03T17:47:06Z" message: '0/1 nodes are
available: pod has unbound immediate PersistentVolumeClaims.
preemption: 0/1 nodes are available: 1 Preemption is not helpful
for scheduling.' reason: Unschedulable status: "False" type:
PodScheduled phase: Pending
Selectors An accurate
description of symptoms (presented in the previous section)
requires a method for describing conditions in the deployment using
information contained in the resources’ fields. Moreover, because
of the distributed nature of both Kubernetes deployments and
ScyllaDB, these descriptions must also specify how the resources
are related to one another. Our tool comes with a package providing
selectors. They offer a simple, yet powerful, way to
describe deployment conditions using Kubernetes objects in a way
that’s flexible and allows for automatic processing using the
provided selection engine. A selector can be thought of as a query
because it specifies the kinds of resources to select and criteria
which they should satisfy. Selectors are constructed using four
main methods of the selector structure builder. First, the
developer specifies resources to be selected with the
Select method by specifying their kind and a predicate
which should be true for the selected resources. The predicate is
provided as a standard Go closure to allow for complex conditions
if needed. Next, the developer may call the Relate method
to define a relationship between two kinds of resources. This is
again defined using a Go closure as a predicate, which must hold
for the two objects to be considered in the same result set. This
can establish a context within which an issue should be inspected
(for example: connecting a Pod to relevant Storage resources).
Finally, constraints for individual resources in the result set can
be specified with the Where method, similarly to how it is
done in the Select method. This method is mainly meant to
be used with the SelectWithNil method. The
SelectWithNil method is the same as the Select
method; the only difference is that it allows returning a special
nil value instead of a resource instance. This
nil value signifies that no resources of a given
kind match all the other resources in the resulting set. Thanks to
this, selectors can also be used to detect a scenario where a
resource is missing just by examining the context of related
resources. An example selector — shortened for brevity — may look
something like this: selector. New(). Select("scylla-pod",
selector.Type[*v1.Pod](), func(p *v1.Pod) (bool, error) { /* ... */
}). SelectWithNil("storage-class",
selector.Type[*storagev1.StorageClass](), nil). Select("pod-pvc",
selector.Type[*v1.PersistentVolumeClaim](), nil).
Relate("scylla-pod", "pod-pvc", func(p *v1.Pod, pvc
*v1.PersistentVolumeClaim) (bool, error) { for _, volume := range
p.Spec.Volumes { vPvc := volume.PersistentVolumeClaim if vPvc !=
nil && (*vPvc).ClaimName == pvc.Name { return true, nil } }
return false, nil }). Relate("pod-pvc", "storage-class", /* ...
*/). Where("storage-class", func(sc *storagev1.StorageClass) (bool,
error) { return sc == nil, nil })
In symptom definitions,
selectors for a corresponding condition are used and are usually
constructed alongside them. Such a selector provides a description
of a faulty condition. This means that if there is a matching set
of resources, it can be inferred that the symptom occurs. Finally,
the selector can then be used, given all the deployments resources,
to construct an iterator-like object that provides a list of all
the sets of resources that match the selector. Symptoms can then
use those results to detect issues and generate diagnoses
containing useful debugging information. Making a diagnosis When a
symptom relating to a problematic condition is detected, a
diagnosis for a user is generated. Diagnoses are
automatically generated report objects summarizing the problem and
providing additional information. A diagnosis consists of
an issue description, identifiers of resources related to the
fault, and hints for the user (when available). Hints may contain,
for example, a description of steps to remedy the issue or a
reference to a bug tracker. In the final stage of analysis, those
diagnoses are presented to the user and the output may look
something like this: Diagnoses: scylladb-local-xfs StorageClass
used by a ScyllaCluster is missing Suggestions: deploy
scylladb-local-xfs StorageClass (or change StorageClass) Resources
GVK: /v1.PersistentVolumeClaim,
scylla/data-scylla-us-east-1-us-east-1a-0 (4…)
scylla.scylladb.com/v1.ScyllaCluster, scylla/scylla
(b6343b79-4887-497b…) /v1.Pod, scylla/scylla-us-east-1-us-east-1a-0
(0e716c3f-6432-4eeb-b5ff-…) Learn more As we suggested, Kubernetes
deployments of ScyllaDB involve many interacting components, each
of which has its own quirks. Here are a few strategies to help in
diagnosing the problems you encounter: Run
Scylla Doctor Check our troubleshooting
guide Look for open
issues on our GitHub Check our forum Ask us on Slack Learn more about
ScyllaDB at ScyllaDB
University Good luck, fellow troubleshooter! Building easy-cass-mcp: An MCP Server for Cassandra Operations
I’ve started working on a new project that I’d like to share, easy-cass-mcp, an MCP (Model Context Protocol) server specifically designed to assist Apache Cassandra operators.
After spending over a decade optimizing Cassandra clusters in production environments, I’ve seen teams consistently struggle with how to interpret system metrics, configuration settings, schema design, and system configuration, and most importantly, how to understand how they all impact each other. While many teams have solid monitoring through JMX-based collectors, extracting and contextualizing specific operational metrics for troubleshooting or optimization can still be cumbersome. The good news is that we now have the infrastructure to make all this operational knowledge accessible through conversational AI.
How GE Healthcare Took DynamoDB on Prem for Its AI Platform
How GE Healthcare moved a DynamoDB‑powered AI platform to hospital data centers, without rewriting the app How do you move a DynamoDB‑powered AI platform from AWS to hospital data centers without rewriting the app? That’s the challenge that Sandeep Lakshmipathy (Director of Engineering at GE Healthcare) decided to share with the ScyllaDB community a few years back. We noticed an uptick in people viewing this video recently, so we thought we’d share it here, in blog form. Watch or read, your choice. Intro Hi, I’m Sandeep Lakshmipathy, the Director of Engineering for the Edison AI group at GE Healthcare. I have about 20 years of experience in the software industry, working predominantly in product and platform development. For the last seven years I’ve been in the healthcare domain at GE, rolling out solutions for our products. Let me start by setting some context with respect to the healthcare challenges that we face today. Roughly 130M babies are born every year; about 350K every single day. There’s a 40% shortage of healthcare workers to help bring these babies into the world. Ultrasound scans help ensure the babies are healthy, but those scans are user‑dependent, repetitive, and manual. Plus, clinical training is often neglected. Why am I talking about this? Because AI solutions can really help in this specific use case and make a big difference. Now, consider this matrix of opportunities that AI presents. Every single tiny dot within each cell is an opportunity in itself. The newborn‑baby challenge I just highlighted is one tiny speck in this giant matrix. It shows what an infinite space this is, and how AI can address each challenge in a unique way. GE Healthcare is tackling these opportunities through a platform approach. Edison AI Workbench (cloud) We ingest data from many devices and customers: scanners, research networks, and more. Data is then annotated and used to train models. Once the models are trained, we deploy them onto devices. The Edison AI Workbench helps data scientists view and annotate data, train models, and package them for deployment. The whole Edison AI Workbench runs in AWS and uses AWS resources to provide a seamless experience to the data scientists and annotators who are building AI solutions for our customers. Bringing Edison AI Workbench on‑prem When we showed this solution to our research customers, they said, “Great, we really like the features and the tools….but can we have Edison AI Workbench on‑prem?” So, we started thinking: How do we take something that lives in the AWS cloud, uses all those resources, and relies heavily on AWS services – and move it onto an on‑prem server while still giving our research customers the same experience? That’s when we began exploring different options. Since DynamoDB was one of the main things tying us to the AWS cloud, we started looking for a way to replace it in the on‑prem world. After some research, we saw that ScyllaDB was a good DynamoDB replacement because it provides API compatibility with DynamoDB. Without changing much code and keeping all our interfaces the same, we migrated the Workbench to on‑prem and quickly delivered what our research customers asked for. Why ScyllaDB Alternator (DynmamoDB-Compatible API)? Moving cloud assets on‑prem is not trivial; expertise, time‑to‑market, service parity, and scalability all matter. We also wanted to keep our release cycles short: in the cloud we can push features every sprint; on‑prem, we still need regular updates. Keeping the database layer similar across cloud and on‑prem minimized rework. Quick proofs of concept confirmed that ScyllaDB + Alternator met our needs, and using Kubernetes on‑prem let us port microservices comfortably. The ScyllaDB team has always been available with respect to developer‑level interactions, quick fixes in nightly builds, and constant touch‑points with technical and marketing teams. All of this helped us move fast. For example, DynamoDB Streams wasn’t yet in ScyllaDB when we adopted it (back in 2020), but the team provided work‑arounds until the feature became available. They also worked with us on licensing to match our needs. This partnership was crucial to the solution’s evolution. By partnering with the ScyllaDB team, we could take a cloud‑native Workbench to our on‑prem research customers in healthcare. Final thoughts Any AI solution rollout depends on having the right data volume and balance. It’s all the annotations that drive model quality. Otherwise, the model will be brittle, and it won’t have the necessary diversity. Supporting all these on‑prem Workbench use cases helps because it takes the tools to where the data is. The cloud workbench handles data in the cloud data lake. But at the same time, our research customers who are partnering with us can use this on-prem, taking the tools to where the data is: in their hospital network.Real-Time Database Read Heavy Workloads: Considerations and Best Practices
Explore the challenges associated with real-time read-heavy database workloads and get tips for addressing them Reading and writing are distinctly different beasts. This is true with reading/writing words, reading/writing code, and also when we’re talking about reading/writing data to a database. So, when it comes to optimizing database performance, your read:write ratio really does matter. We recently wrote about performance considerations that are important for write-heavy workloads – covering factors like LSM tree vs B-tree engines, payload size, compression, compaction, and batching. But read-heavy database workloads bring a different set of challenges; for example: Scaling a cache: Many teams try to speed up reads by adding a cache in front of their database, but the cost and complexity can become prohibitive as the workload grows. Competing workloads: Things might work well initially, but as new use cases are added, a single workload can end up bottlenecking all the others. Constant change: As your dataset grows or user behaviors shift, hotspots might surface. In this article, we explore high-level considerations to keep in mind when you have a latency-sensitive read-heavy workload. Then, we’ll introduce a few ScyllaDB capabilities and best practices that are particularly helpful for read-heavy workloads. What Do We Mean by “a Real-Time Read Heavy Workload”? First, let’s clarify what we mean by a “real-time read-heavy” workload. We’re talking about workloads that: Involve a large amount of sustained traffic (e.g., over 50K OPS) Involve more reads than writes Are bound by strict latency SLAs (e.g., single digit millisecond P99 latency) Here are a few examples of how they manifest themselves in the wild: Betting: Everyone betting on a given event is constantly checking individual player, team, and game stats as the match progresses. Social networks: A small subset of people are actually posting new content, while the vast majority of users are typically just browsing through their feeds and timelines. Product Catalogs: As with social media, there’s a lot more browsing than actual updating. Considerations Next, let’s look at key considerations that impact read performance in real-time database systems. The Database’s Read Path To understand how databases like ScyllaDB process read operations, let’s recap its read path. When you submit a read (a SELECT statement), the database first checks for the requested data in memtables, which are in-memory data structures that temporarily hold your recent writes. Additionally, the database checks whether the data is present in the cache. Why is this extra step necessary? Because the memtable may not always hold the latest data. Sometimes data could be written out-of-order, especially if applications consume data from unordered sources. As the protocol allows for clients to manipulate record timestamps to prevent correct ordering, checking both the memtable and the cache is necessary to ensure that the latest write takes gets returned. Then, the database takes one of two actions: If the data is stored on the disk, the database populates the cache to speed up subsequent reads. If the data doesn’t exist on disk, the database notes this absence in the cache – avoiding further unnecessary lookups there. As memtables flush to disk, the data also gets merged with the cache. That way, the cache ends up reflecting the latest on-disk data. Hot vs. Cold Reads Reading from cache is always faster than reading from disk. The more data your database can serve directly from cache, the better its performance (since reading data from memory has a practically unlimited fetch ceiling). But how can you tell whether your reads are going to cache or disk? Monitoring. You can use tools such as the ScyllaDB Monitoring stack to learn all about your cache hits and misses. The fewer cache misses, the better your read latencies. ScyllaDB uses a Least Recently Used (LRU) caching strategy, similar to Redis and Memcached. When the cache gets full, the least-accessed data is evicted to make room for new entries. With this LRU approach, you need to be mindful about your reads. You want to avoid situations where a few “expensive” reads end up evicting important items from your cache. If you don’t optimize cache usage, you might encounter a phenomenon called “cache thrashing.” That’s what happens when you’re continuously evicting and replacing items in your cache, essentially rendering the cache ineffective. For instance, full table scans can create significant cache pressure, particularly when your working set size is larger than your available caching space. During a scan, if a competing workload relies on reading frequently cached data, its read latency will momentarily increase because those items were evicted. To prevent this situation, expensive reads should specify options like ScyllaDB’s BYPASS_CACHE to prevent its results from evicting important items. Paging Paging is another important factor to consider. It’s designed to prevent the database from running out of memory when scanning through large results. Basically, rows get split into pages as defined by your page size, and selecting an appropriate page size is essential for minimizing end-to-end latency. For example, assume you have a quorum read request in a 3-node cluster. Two replicas must respond for the request to be successful. Each replica computes a single page, which then gets reconciled by the coordinator before returning data back to the client. Note that: ScyllaDB latencies are reported per page. If your application latencies are high, but low on the database side, it is an indication that your clients may be often paging. Smaller page sizes increase the number of client-server roundtrips. For example, retrieving 1,000 rows with a page size of 10 requires 100 client-server round trips, impacting latency. Testing various page sizes helps finding the optimal balance. Most drivers default to 5,000 rows per page, which works well in most cases, but you may want to increase from the defaults when scanning through wide rows, or during full scans – at the expense of letting the database work more before receiving a response. Sometimes trial and error is needed to get the page size nicely tuned for your application. Tombstones In Log-Structured Merge-tree (LSM-tree) databases like ScyllaDB, handling tombstones (markers for deleted data) is also important for read performance. Tombstones ensure that deletions are properly propagated across replicas to avoid deleted data from being “resurrected.” They’re critical for maintaining correctness. However, read-heavy workloads with frequent deletions may have to process lots of tombstones to return a single page of live data. This can really impact latency. For example, consider this extreme example. Here, tracing data shows that a simple select query took a whopping 6 seconds to process a single row because it had to go through 10 million tombstones. There are a couple ways to avoid this: tuning compaction strategies, such as the more aggressive LeveledCompactionStrategy, or using ICS Space Amplification Goal, or optimizing your access patterns to scanning through fewer dead rows on every point query. Optimizing Read-Heavy Workloads with ScyllaDB While ScyllaDB’s LSM tree storage engine makes it quite well-suited for write-heavy workloads, our engineers have introduced a variety of features that optimize it for ultra-low latency reads as well. ScyllaDB Cache One of ScyllaDB’s key components for achieving low latency is its unique caching mechanism. Many databases rely on the operating system’s page cache, which can be inefficient and doesn’t provide the level of control needed for predictable low latency. The OS cache lacks workload-specific context, making it difficult to prioritize which items should remain in memory and which can be safely evicted. At ScyllaDB, our engineering team addressed this by implementing our own unified internal cache. When ScyllaDB starts, it locks most of the server’s memory and directly manages it, bypassing the OS cache. Additionally, ScyllaDB’s cache uses a shared-nothing approach, giving each shard/vCPU its own cache, memtable, and SSTable. This eliminates the need for concurrency locks and reduces context switching, further maximizing performance. You can read more about that unified cache in this engineering blog post. SSTable Index Caching Another performance-focused feature of ScyllaDB is its ability to cache SSTable indexes in memory. Since working sets often exceed the memory available, reads sometimes go to disk. However, disk access is costly. By caching SSTable indexes, ScyllaDB reduces disk IO costs by up to 3x. This significantly improves read performance – particularly during cache misses. ScyllaDB’s index caching is demand-driven: entries are cached upon access and evicted on demand. If your workload reads heavily from disk, it’s often helpful to increase the size of this index cache. Workload Prioritization Competing workloads can lead to latency issues, as we mentioned at the beginning of this article. ScyllaDB provides a solution for this: its Workload Prioritization feature, which allows you to assign priority levels to different workloads. This is particularly useful if you have workloads with varying latency requirements, as it lets you prioritize latency-sensitive queries over others. You assign service levels to each workload, then ScyllaDB’s internal scheduler handles query prioritization according to those predefined levels. To learn more, see my recent talk from ScyllaDB Summit. Heat-Weighted Load Balancing (HWLB) Heat-Weighted Load Balancing (HWLB) is a powerful ScyllaDB feature that’s commonly overlooked. HWLB mitigates performance issues that can arise when a replica node restarts with a cold cache, like after a rolling restart for a configuration change or an upgrade. In such cases, other nodes notice that the replica’s cache is cold and gradually start directing requests to the restarted node until its cache eventually warms up. The HWLB algorithm controls how requests are routed to a cold replica. The mathematical formula behind this gradual allocation is shown above – it explains the pacing of requests sent to a node as it warms up. HWLB ensures that nodes with a cold cache do not immediately receive full traffic, in turn preventing abrupt latency spikes. When restarting ScyllaDB replicas, pay attention to the Reciprocal Miss Rate (HWLB) panel within the ScyllaDB Monitoring. Nodes with a higher ratio will serve more reads compared to other nodes. Prepared statements with ScyllaDB’s shard-aware drivers On the client side, using prepared statements is a critical best practice. A prepared statement is a query parsed by ScyllaDB and then saved for later use. Prepared statements allow ScyllaDB to route queries directly to replica nodes and shards that hold the requested data. Without prepared statements, a query may be routed to a node without the required data – resulting in extra round trips. With prepared statements, queries are always routed efficiently, minimizing network overhead and improving response times. Try it out: This ScyllaDB University lesson walks you through prepared statements. High concurrency Perhaps the most important tip here is to remember that ScyllaDB loves concurrency… but only up to a certain point. If you send too few requests to the database, you won’t be able to fully maximize its potential. However, if you have unbounded concurrency – you send too many requests to the database – that excessive concurrency can cause performance degradation. To find the sweet spot, apply this formula: *Concurrency = Throughput × Latency*. For example, if you want to run 200K operations per second with an average latency of 1ms, you would aim for a concurrency level of 200. Using this calculation, adjust your driver settings – setting the number of connections and maximum in-flight requests per connection to meet your target concurrency. If your driver settings yield a concurrency higher than needed, reduce them. If it’s lower, increase them accordingly. Wrapping Up As we’ve discussed, there are a lot of ways you can keep latencies low with read-heavy workloads – even on databases such as ScyllaDB which are also optimized for write-heavy workloads. In fact, ScyllaDB performance is comparable to dedicated caching solutions like Memcached for certain workloads. If you want to learn more, here are some firsthand perspectives from teams who tackled some interesting read-heavy challenges: Discord: With millions of users actively reading and searching chat history, Discord needs ultra-low-latency reads and high throughput to maintain real-time interactions at scale. Epic Games: To support Unreal Engine Cloud, Epic Games needed a high-speed, scalable metadata store that could handle rapid cache invalidation and support metadata storage for game assets. Zeroflucs: To power their sports betting application, ZeroFlucs had to process requests in near real-time, constantly, and in a region local to both the customer and the data. Also, take a look at the following video, where we go into even greater depth on these read-heavy challenges and also walk you through what these workloads look like on ScyllaDB.easy-cass-stress Joins the Apache Cassandra Project
I’m taking a quick break from my series on Cassandra node density to share some news with the Cassandra community: easy-cass-stress has officially been donated to the Apache Software Foundation and is now part of the Apache Cassandra project ecosystem as cassandra-easy-stress.
Why This Matters
Over the past decade, I’ve worked with countless teams struggling with Cassandra performance testing and benchmarking. The reality is that stress testing distributed systems requires tools that can accurately simulate real-world workloads. Many tools make this difficult by requiring the end user to learn complex configurations and nuance. While consulting at The Last Pickle, I set out to create an easy to use tool that lets people get up and running in just a few minutes
Alan Shimel and Dor Laor on Database Elasticity, ScyllaDB X Cloud
Alan and Dor chat about elasticity, 90% storage utilization, powering feature stores and other AI use cases, ScyllaDB’s upcoming vector search release, and much more ScyllaDB recently announced ScyllaDB X Cloud: a truly elastic database that supports variable/unpredictable workloads with consistent low latency, plus low costs. To explore what’s new and how X Cloud impacts development teams, DevOps luminary Alan Shimel recently connected with ScyllaDB Co-founder and CEO Dor Laor for a TechStrongTV interview. Alan and Dor chatted about database elasticity, 90% storage utilization, powering ML feature stores and other AI use cases, ScyllaDB’s upcoming vector search release, and much more. If you prefer to read rather than watch, here’s a transcript (lightly edited for brevity and clarity) of the core conversation. What is ScyllaDB X Cloud Alan: Dor, you guys recently announced ScyllaDB X Cloud. Tell us about it. Dor: ScyllaDB is available as a self-managed database (ScyllaDB Enterprise) and also as a fully managed database, a service in the cloud (ScyllaDB Cloud). Recently, we released a new version called ScyllaDB X Cloud. What’s special about it? It allows us to be the most elastic database on the market. Why would you need elasticity? At first, teams look for a database that offers performance and the high availability needed for mission-critical use cases. Afterwards, when they start using it, they might scale to a very high extent, and that often costs quite a bit of money. Sometimes your peak level varies and the usage varies. There are cases where a Black Friday or similar event occurs, and then you need to scale your deployments. This is predictable scale. There’s also unpredictable scale: sometimes you have a really good success, traffic ends up surging, and you need to scale the database to service it. In many cases, elasticity is needed daily. Some sites have predictable traffic patterns: it increases when people wake up, traffic subsides for a while, then there’s another peak later. If you provision 100% for the peak load, you’re wasting resources through a lot of the day. However, with traditional databases, keeping your capacity aligned with your actual traffic requires constantly meddling with the number of servers and infrastructure. Many users have petabyte-sized deployments, and moving around petabytes within several hours is an extremely difficult task. X Cloud lets you move data fast. Teams can double or quadruple throughput within minutes. That’s the value of X Cloud. See ScyllaDB scaling in this short demo: Elastic Scaling Alan: You know, in many ways, this whole idea of elasticity feels like déjà vu. I was told this about the cloud in general, right? That was one of the things about being in the cloud: burstable. But in the cloud, it proved to be like a balloon. If you blow up a balloon it stretches out – but then you let the air out, and the balloon is still stretched out. And for so many people, cloud elasticity is just one way. You can always make it bigger, but how many people really do make it smaller? Is this something that’s also true in ScyllaDB? Is it truly that elastic? Dor: Everything you described is absolutely true. Many times, people just keep on adding more and more data. If you just add and add but you don’t delete it, we have different capabilities to help with that. There are normally two reasons why you’d want to scale out and why you’d like to scale in. Number one is storage. So if your storage grows, you will scale out and add resources. If you delete data, it will automatically scale back in again. And our current release has two unique things to address that. First, it has autoscale with storage. The storage automatically scales, and our compute is bundled together with the storage. I’ll tell you a secret: at the end of the day, we run servers, and each server has memory and disk and networking and compute all together. That’s one of the reasons why we have really good performance. We bundle compute and storage. So if the amount of data people store decreases, we automatically decrease the number of servers, and we do it with 90% utilization. So the server, the disk capacity, can go up to 90% of the shared cluster infrastructure. It’s a very high percentage. Previously, we went from 50% to 70%. Now, because we’re very elastic, and we can move data very fast, we can go up to 90%. The scaling is all automated. When you go to 90%, we automatically provision more servers. If you go below 90% (there is some threshold like 85%), then we automatically decrease the number of servers. And we also do it in a very precise way. For example, assume you have three gigantic servers – we support servers up to 256 CPUs. If you run out of space, you could add another gigantic server, but then your utilization will suddenly be very low. So it’s not ideal to add a big server if you just need another 1% or 2% of utilization. X Cloud automatically selects the server size for you, and we mix server sizes. So if you only need an extra 5% along with these very big servers, we’re going to add a small, tiny server with two vCPUs next to those other big servers – and we will keep replacing it automatically for you, without you having to worry about it. That translates to value. The total cost of ownership will be low because we are targeting 90% utilization of this capacity. The other reason why you’d want to scale out is sometimes throughput and CPU consumption. This is even easier because we need to move less storage. We let our customers scale in and out multiple times an hour. ScyllaDB and AI, Vector Search Alan: Got it, that’s great, very in-depth. Thank you very much for that. You know, Dor, all the news today is “AI agentic, AI generative, AI LLMs…” Underlying all of this, though, is data. They’re training on data, storing data, using data…How has this affected ScyllaDB’s business? Dor: It definitely drives our business. We have three main pillars related to AI. Number one, there’s traditional machine learning. ScyllaDB is used for machine learning feature stores, for real-time personalization and customization. For example, Tripadvisor is a customer of ours. They use ScyllaDB for a feature store that helps find the best recommendations, deals, and advice for their users. Number two is for AI use cases that need large, scalable storage underneath. For example, a major vehicle vendor is using ScyllaDB to train their AI model for autonomous self-driving cars. Lots of AI use cases need to store and access massive amounts of data, fast. Third is vector search, which is a core component of RAG and many agentic AI pipelines. ScyllaDB will release a vector search product by the end of the year – right now, it’s in closed beta. Extreme Automation Alan: You heard it here! I just want to make sure we hit the main points of the X Cloud offering. With all of these things that you’ve mentioned already, really what are the results? You’ve improved compression, improved streaming, helping to reduce storage and cloud costs…and network too, right? That’s an important piece of the equation as well. The other thing I want to emphasize for our audience is that all this is offered as a “database as a service,” so you don’t have to worry about your infrastructure. Dor: Absolutely. We have a high amount of automation that acts on behalf of the end user. This isn’t about us doing manual operations on your behalf. The elasticity is all automated and natural, like the cluster is breathing. Users just set the scaling policies and the cluster will take it from that point onward. Everything runs under the hood, including backups. For example, imagine a customer who runs at 89% utilization and a backup brings them temporarily beyond 90% capacity. The second it crosses the 90% trigger, we will automatically provision more servers to the cluster. Once the backup is complete and the image backup snapshot is loaded to S3, then the image will be deleted, the cluster will go back below 90% utilization, and we can automatically decrease the number of servers. Everything is automated. It’s really fascinating to see all of those use cases run under the hood. Engineering Deep Dive: ScyllaDB X Cloud Interview with CTO Avi Kivity Want to learn more? ScyllaDB Co-Founder/CTO Avi Kivity recently discussed the design decisions behind ScyllaDB X Cloud’s elasticity and efficiency. Watch NowAzure fault domains vs availability zones: Achieving zero downtime migrations
The challenges of operating production-ready enterprise systems in the cloud are ensuring applications remain up to date, secure and benefit from the latest features. This can include operating system or application version upgrades, but it is not limited to advancements in cloud provider offerings or the retirement of older ones. Recently, NetApp Instaclustr undertook a migration activity for (almost) all our Azure fault domain customers to availability zones and Basic SKU IP addresses.
Understanding Azure fault domains vs availability zones
“Azure fault domain vs availability zone” reflects a critical distinction in ensuring high availability and fault tolerance. Fault domains offer physical separation within a data center, while availability zones expand on this by distributing workloads across data centers within a region. This enhances resiliency against failures, making availability zones a clear step forward.
The need for migrating from fault domains to availability zones
NetApp Instaclustr has supported Azure as a cloud provider for our Managed open source offerings since 2016. Originally this offering was distributed across fault domains to ensure high availability using “Basic SKU public IP Addresses”, but this solution had some drawbacks when performing particular types of maintenance. Once released by Azure in several regions we extended our Azure support to availability zones which have a number of benefits including more explicit placement of additional resources, and we leveraged “Standard SKU Public IP’s” as part of this deployment.
When we introduced availability zones, we encouraged customers to provision new workloads in them. We also supported migrating workloads to availability zones, but we had not pushed existing deployments to do the migration. This was initially due to the reduced number of regions that supported availability zones.
In early 2024, we were notified that Azure would be retiring support for Basic SKU public IP addresses in September 2025. Notably, no new Basic SKU public IPs would be created after March 1, 2025. For us and our customers, this had the potential to impact cluster availability and stability – as we would be unable to add nodes, and some replacement operations would fail.
Very quickly we identified that we needed to migrate all customer deployments from Basic SKU to Standard SKU public IPs. Unfortunately, this operation involves node-level downtime as we needed to stop each individual virtual machine, detach the IP address, upgrade the IP address to the new SKU, and then reattach and start the instance. For customers who are operating their applications in line with our recommendations, node-level downtime does not have an impact on overall application availability, however it can increase strain on the remaining nodes.
Given that we needed to perform this potentially disruptive maintenance by a specific date, we decided to evaluate the migration of existing customers to Azure availability zones.
Key migration consideration for Cassandra clusters
As with any migration, we were looking at performing this with zero application downtime, minimal additional infrastructure costs, and as safe as possible. For some customers, we also needed to ensure that we do not change the contact IP addresses of the deployment, as this may require application updates from their side. We quickly worked out several ways to achieve this migration, each with its own set of pros and cons.
For our Cassandra customers, our go to method for changing cluster topology is through a data center migration. This is our zero-downtime migration method that we have completed hundreds of times, and have vast experience in executing. The benefit here is that we can be extremely confident of application uptime through the entire operation and be confident in the ability to pause and reverse the migration if issues are encountered. The major drawback to a data center migration is the increased infrastructure cost during the migration period – as you effectively need to have both your source and destination data centers running simultaneously throughout the operation. The other item of note, is that you will need to update your cluster contact points to the new data center.
For clusters running other applications, or customers who are more cost conscious, we evaluated doing a “node by node” migration from Basic SKU IP addresses in fault domains, to Standard SKU IP addresses in availability zones. This does not have any short-term increased infrastructure cost, however the upgrade from Basic SKU public IP to Standard SKU is irreversible, and different types of public IPs cannot coexist within the same fault domain. Additionally, this method comes with reduced rollback abilities. Therefore, we needed to devise a plan to minimize risks for our customers and ensure a seamless migration.
Developing a zero-downtime node-by-node migration strategy
To achieve a zero-downtime “node by node” migration, we explored several options, one of which involved building tooling to migrate the instances in the cloud provider but preserve all existing configurations. The tooling automates the migration process as follows:
- Begin with stopping the first VM in the cluster. For cluster availability, ensure that only 1 VM is stopped at any time.
- Create an OS disk snapshot and verify its success, then do the same for data disks
- Ensure all snapshots are created and generate new disks from snapshots
- Create a new network interface card (NIC) and confirm its status is green
- Create a new VM and attach the disks, confirming that the new VM is up and running
- Update the private IP address and verify the change
- The public IP SKU will then be upgraded, making sure this operation is successful
- The public IP will then be reattached to the VM
- Start the VM
Even though the disks are created from snapshots of the original disks, we encountered several discrepancies in our testing, with settings between the original VM and the new VM. For instance, certain configurations, such as caching policies, did not automatically carry over, requiring manual adjustments to align with our managed standards.
Recognizing these challenges, we decided to extend our existing node replacement mechanism to streamline our migration process. This is done so that a new instance is provisioned with a new OS disk with the same IP and application data. The new node is configured by the Instaclustr Managed Platform to be the same as the original node.
The next challenge: our existing solution is built so that the replaced node was provisioned to be the exact same as the original. However, for this operation we needed the new node to be placed in an availability zone instead of the same fault domain. This required us to extend the replacement operation so that when we triggered the replacement, the new node was placed in the desired availability zone. Once this operation completed, we had a replacement tool that ensured that the new instance was correctly provisioned in the availability zone, with a Standard SKU, and without data loss.
Now that we had two very viable options, we went back to our existing Azure customers to outline the problem space, and the operations that needed to be completed. We worked with all impacted customers on the best migration path for their specific use case or application and worked out the best time to complete the migration. Where possible, we first performed the migration on any test or QA environments before moving onto production environments.
Collaborative customer migration success
Some of our Cassandra customers opted to perform the migration using our data center migration path, however most customers opted for the node-by-node method. We successfully migrated the existing Azure fault domain clusters over to the Availability Zone that we were targeting, with only a very small number of clusters remaining. These clusters are operating in Azure regions which do not yet support availability zones, but we were able to successfully upgrade their public IP from Basic SKUs that are set for retirement to Standard SKUs.
No matter what provider you use, the pace of development in cloud computing can require significant effort to support ongoing maintenance and feature adoption to take advantage of new opportunities. For business-critical applications, being able to migrate to new infrastructure and leverage these opportunities while understanding the limitations and impact they have on other services is essential.
NetApp Instaclustr has a depth of experience in supporting business critical applications in the cloud. You can read more about another large-scale migration we completed The worlds Largest Apache Kafka and Apache Cassandra Migration or head over to our console for a free trial of the Instaclustr Managed Platform.
The post Azure fault domains vs availability zones: Achieving zero downtime migrations appeared first on Instaclustr.
Blowing Up Your DynamoDB Bill
Why real-world DynamoDB usage scenarios often lead to unexpected expenses In my last post on DynamoDB costs, I covered how unpredictable workloads lead to unpredictable costs in DynamoDB. Now let’s go deeper. Once you’ve understood the basics, like the nutty 7.5x inflation of on-demand compared to reserved, or the excessive costs around item size, replication and caching … you’ll realize that DynamoDB costs aren’t just about read/write volume – it’s a lot more nuanced in the real-world. Round ‘em up! A million writes per second at 100 bytes isn’t in the same galaxy as a million writes at 5KB. Why? Because DynamoDB meters out the costs in rounded-up 1KB chunks for writes (and 4KB for reads). Writing a 1.2KB item? You’re billed for 2KB. Reading a 4.5KB item with strong consistency? You get charged for 8KB. You’re not just paying for what you use, you’re paying for rounding up. Remember this character in Superman III taking ½ a cent from each paycheck? It’s the same deal (and yes, $85,789.90 was a lot of money in 1983) … Wasted capacity is unavoidable at scale, but it becomes very real, very fast when you cross that boundary on every single operation. And don’t forget that hard cap of 400KB per item. That’s not a pricing issue directly, but it’s something that has motivated DynamoDB customers to look at alternatives. Our DynamoDB cost calculator lets you model all of this. What it doesn’t account for are some of the real-world landmines – like the fact that a conflict resolved write (such as concurrent updates in multiple regions) still costs you for each attempt, even if only the last write wins. Or when you build your own TTL expiration logic, maybe pulling a bunch of items in a scan, checking timestamps in app code, or issuing deletes. All that data transfer and (replicated) write/delete activity adds up fast … even though you’re trying to “clean up.” We discussed these tricky situations in detail in a recent DynamoDB costs webinar, which you can now watch on-demand. Global tables are a global pain So you want low latency for users worldwide? Global tables are the easiest way to do that. Some might even say that it’s “batteries-included.” But those batteries come with a huge price tag. Every write gets duplicated across additional regions. Write a 3.5KB item and replicate it to 4 regions? Now you’re paying for 4 x 4KB (rounded up, of course). Don’t forget to tack on inter-region network transfer. That’s another hit at premium pricing. And sorry, you cannot reserve those replicated writes either. You’re paying for that speed, several times over, and the bill scales linearly with your regional growth. It gets worse when multiple regions write to the same item concurrently. DynamoDB resolves the conflict (last write wins) but you still pay for every attempt. Losing writes? Still charged. Our cost calculator lets you model all this. We use conservative prices for US-East, but the more exotic the region, the more likely the costs will be higher. As an Australian, I feel your pain. So have a think about that batteries-included global tables replication cost, and please remember, it’s per table! DAX caching with a catch Now do you want even tighter read latency, especially for your latency-sensitive P99? DynamoDB Accelerator (DAX) helps, but it adds overhead, both operational and financial. Clusters need to be sized right, hit ratios tuned, and failover cases handled in your application. Miss the cache, pay for the read. Fail to update the cache, risk stale data. Even after you have tuned it, it’s not free. DAX instances are billed by the hour, at a flat rate, and once again, without reserved instance options like you might be accustomed to. Our DynamoDB cost calculator lets you simulate cache hit ratios, data set sizes, instance types and nodes. It won’t predict cache efficiency, but it will help you catch those cache gotchas. Multi-million-dollar recommendation engine A large streaming service built a global recommendation engine with DynamoDB. Daily batch jobs generate fresh recommendations and write them to a 1PB single table, replicated across 6 regions. They optimized for latency and local writes. The cost? Every write to the base table plus 5 replicated writes. Every user interaction triggered a write (watch history, feedback, preferences). And thanks to that daily refresh cycle, they were rewriting the table – whether or not anything changed. They used provision capacity, scaling up for anticipated traffic spikes, but still struggled with latency. Cache hit rates were too low to make Redis or DAX cost-effective. The result? Base workload alone cost tens of millions per year, and the total doubled after accommodating peaks in traffic spikes and batch load processes. For many teams, that’s more than the revenue of the product itself! So, they turned to ScyllaDB. After they switched to our pricing model based on provisioned capacity (not per-operation billing), ScyllaDB was able to significantly compress their data stored, while also improving network compression between AZs and regions. They had the freedom to do this on any cloud (or even on-premise). They slashed their costs, improved performance, and removed the need to overprovision for spikes. Daily batch jobs run faster and their business continues to scale without their database bill doing the same. Another case of caching to survive An adtech company using DynamoDB ran into cache complexity the hard way. They deployed 48 DAX nodes across 4 regions to hit their P99 latency targets. Each node is tailored to that region’s workload (after a lot of trial and error). Their writes (246 bytes/item) were wasting 75% of the write unit billed. Their analytics workload tanked live traffic during spikes. And perhaps worst of all, auto-scaling triggers just weren’t fast enough, resulting in request throttling and application failures. The total DynamoDB and DAX cost was hundreds of thousands per year. ScyllaDB offered a much simpler solution. Built-in row caching used instance memory at no extra cost with no external caching layer to maintain. They also ran their analytics and OLTP workloads side by side using workload prioritization with no hit to performance. Even better, their TTL-based session expiration was handled automatically without extra read/delete logic. Cost and complexity dropped, and they’re now a happy customer. Watch the DynamoDB costs video If you missed the webinar, be sure to check out the DynamoDB costs video – especially where Guilherme covers all these real-world workloads in detail. Key takeaways: DynamoDB costs are non-linear and shaped by usage patterns, not just throughput. Global tables, item size, conflict resolution, cache warmup and more can turn “reasonable” usage into a 7-figure nightmare. DAX and auto-scaling aren’t magic; they need tuning and still cost significant money to get right. Our DynamoDB cost calculator helps model these hidden costs and compare different setups, even if you’re not using ScyllaDB. And finally, if you’re a team with unpredictable costs and performance using DynamoDB, make the switch to ScyllaDB and enjoy the benefits of predictable pricing, built-in efficiency and more control over your database architecture. If you want to discuss the nuances of your specific use case and get your technical questions answered, chat with us here.How Yieldmo Cut Database Costs and Cloud Dependencies
Rethinking latency-sensitive DynamoDB apps for multicloud, multiregion deployment “The entire process of delivering an ad occurs within 200 to 300 milliseconds. Our database lookups must complete in single-digit milliseconds. With billions of transactions daily, the database has to be fast, scalable, and reliable. If it goes down, our ad-serving infrastructure ceases to function.” – Todd Coleman, technical co-founder and chief architect at Yieldmo Yieldmo’s online advertising business depends on processing hundreds of billions of daily ad requests with subsecond latency responses. The company’s services initially depended on DynamoDB, which the team valued for simplicity and stability. However, DynamoDB costs were becoming unsustainable at scale and the team needed multicloud flexibility as Yieldmo expanded to new regions. An infrastructure choice was threatening to become a business constraint. In a recent talk at Monster SCALE Summit, Todd Coleman, Yieldmo’s technical co-founder and chief architect, shared the technical challenges the company faced and why the team ultimately moved forward with ScyllaDB’s DynamoDB-compatible API. You can watch his complete talk below or keep reading for a recap. Lag = Lost Business Yieldmo is an online advertising platform that connects publishers and advertisers in real time as a page loads. Nearly every ad request triggers a database query that retrieves machine learning insights and device-identity information. These queries enable its ad servers to: Run effective auctions Help partners decide whether to bid Track which ads they’ve already shown to a device so advertisers can manage frequency caps and optimize ad delivery The entire ad pipeline completes in a mere 200 to 300 milliseconds, with most of that time consumed by partners evaluating and placing bids. More specifically: When a user visits a website, an ad request is sent to Yieldmo. Yieldmo’s platform analyzes the request. It solicits potential ads from its partners. It conducts an auction to determine the winning bid. The database lookup must happen before any calls to partners. And these lookups must complete with single-digit millisecond latencies. Coleman explained, “With billions of transactions daily, the database has to be fast, scalable and reliable. If it goes down, our ad-serving infrastructure ceases to function.” DynamoDB Growing Pains Yieldmo’s production infrastructure runs on AWS, so DynamoDB was a logical choice as the team built their app. DynamoDB proved simple and reliable, but two significant challenges emerged. First, DynamoDB was becoming increasingly expensive as the business scaled. Second, the company wanted the option to run ad servers on cloud providers beyond AWS. Coleman shared, “In some regions, for example, the US East Coast, AWS and GCP [Google Cloud Platform] data centers are close enough that latency is minimal. There, it’s no problem to hit our DynamoDB database from an ad server running in GCP. However, when we attempted to launch a GCP-based ad-serving cluster in Amsterdam while accessing DynamoDB in Dublin, the latency was far too high. We quickly realized that if we wanted true multicloud flexibility, we needed a database that could be deployed anywhere.” DynamoDB Alternatives Yieldmo’s team started exploring DynamoDB alternatives that would suit their extremely read-heavy database workloads. Their write operations fall into two categories: A continuous stream of real-time data from their partners, essential for matching Yieldmo’s data with theirs Batch updates driven by machine learning insights derived from their historical data Given this balance of high-frequency reads and structured writes, they were looking for a database that could handle large-scale, low-latency access while efficiently managing concurrent updates without degradation in performance. The team first considered staying with DynamoDB and adding a caching layer. However, they found that caching couldn’t fix the geographic latency issue and cache misses would be even slower with this option. They also explored Aerospike, which offered speed and cross-cloud support. However, they learned that Aerospike’s in-memory indexing would have required a prohibitively large and expensive cluster to handle Yieldmo’s large number of small data objects. Additionally, migrating to Aerospike would have required extensive and time-consuming code changes. Then they discovered ScyllaDB, which also provided speed and cross-cloud support, but with a DynamoDB-compatible API (Alternator) and lower costs. Coleman shared, “ScyllaDB supported cross-cloud deployments, required a manageable number of servers and offered competitive costs. Best of all, its API was DynamoDB-compatible, meaning we could migrate with minimal code changes. In fact, a single engineer implemented the necessary modifications in just a few days.” ScyllaDB evaluation, migration and results To start evaluating how ScyllaDB worked in their environment, the team migrated a subset of ad servers in a single region. This involved migrating multiple terabytes while keeping real-time updates. Process-wise, they had ScyllaDB’s Spark-based migration tool copy historical data, paused ML batch jobs and leveraged their Kafka architecture to replay recent writes into ScyllaDB. Moving a single DynamoDB table with ~28 billion objects (~3.3 TB) took about 10 hours. The next step was to migrate all data across five AWS regions. This phase took about two weeks. After evaluating the performance, Yieldmo promoted ScyllaDB to primary status and eventually stopped writing to DynamoDB in most regions. Reflecting on the migration almost a year later, Coleman summed up, “The biggest benefit is multicloud flexibility, but even without that, the migration was worthwhile. Database costs were cut roughly in half compared with DynamoDB, even with reserved-capacity pricing, and we saw modest latency improvements. ScyllaDB has proven reliable: Their team monitors our clusters, alerts us to issues and advises on scaling. Ongoing maintenance overhead is comparable to DynamoDB, but with greater independence and substantial cost savings.” How ScyllaDB compares to DynamoDBScyllaDB Cloud: Fully-Managed in Your Own Google Cloud Account
You can now run ScyllaDB’s monstrously fast and scalable NoSQL database within your own Google Cloud (GCP) accounts We’re pleased to share that ScyllaDB Cloud is now available with the Bring Your Own (Cloud) Account model on Google Cloud. This means: ScyllaDB runs inside your private Google Cloud account. Your data remains fully under your control and never leaves your Google Cloud account. Your database operations, updates, monitoring, and maintenance are all managed by ScyllaDB Cloud. Existing cloud contracts and loyalty programs can be applied to your ScyllaDB Cloud spend. This is the same deployment model that we’ve offered on AWS for nearly 4 years. The BYOA model is frequently requested by teams who want both: The fully managed ScyllaDB Cloud service with near-zero operations and maintenance. The regionality, governance, and billing benefits that come from running in your private cloud account It’s especially well-suited for highly regulated industries like healthcare and finance with Data privacy, compliance, and data sovereignty guarantees. With BYOA, all ScyllaDB servers, storage, networking, and IP addresses are created in your cloud account. Data never leaves your VPC environment; all database resources remain under your ownership and governance policies. For additional security, ScyllaDB Cloud runs Bring Your Own Key (BYOK), our transparent database-level encryption, encrypting all the data with CMK. If you are the target of a cyberattack, or you have a security breach, you can protect the data immediately by revoking the database key. Under the BYOA model, the infrastructure costs are paid directly to the cloud provider. That means your organization can apply its existing GCP commitments and take advantage of any available discounts, credits, or enterprise agreements (e.g., Committed Use, Sustained Use, Enterprise Agreements(EA)). ScyllaDB Cloud costs are reduced to license and support fees.NOTE: The Bring Your Own (Cloud) Account feature is often addressed as BYOC, spotlighting the “Cloud” aspect. We prefer the term “account” as it more accurately represents our offering, though both concepts are closely related.How ScyllaDB BYOA Works on Google Cloud Once BYOA service is enabled for your GCP project , the ScyllaDB Cloud control plane can use the Google Cloud API to create the necessary resources in your designated GCP project. After the network is configured, ScyllaDB Cloud securely connects to your cluster’s VPC to provision and manage ScyllaDB database clusters. You can configure a VPC peering connection between your application VPC and your ScyllaDB dedicated cluster VPC (as shown on the right side of the diagram). Our wizard will guide you through the configuration process for your GCP project. Using the wizard, you will configure one IAM role with policies to provision the required resources within the GCP project. ScyllaDB Cloud will operate using this role. Configuration To use the Bring Your Own Account feature, you will need to choose one project in your GCP account. This project will be used as a destination to provision your clusters. The specific policies required can be found here. Make sure your Cloud quotas are as per the recommendation. Here’s a short guide on how you can configure your GCP account to work with ScyllaDB Cloud. You will need permissions to a GCP account and a very basic understanding of Terraform. Once you complete the setup, you can use your GCP Project as any other deployment target. In the create new cluster screen, you can select this project next to ScyllaDB Cloud hosted option. In the “Create New Cluster” screen, you will be able to select this project alongside the ScyllaDB Cloud hosted option. You can select a geographical area (Region), the nature of access (private/public), and the expected instance type based on the volume of traffic, ScyllaDB Cloud will create a ScyllaDB cluster for you. From there, you can choose a geographical region, specify the type of access (public or private), and select the appropriate instance type based on your expected traffic volume. ScyllaDB Cloud will then provision and configure a cluster for you accordingly. Next steps ScyllaDB Cloud BYOA is currently live on Google Cloud Platform. If you’re ready to set up your account, you can go to http://cloud.scylladb.com to use our onboarding wizard and our step-by-step documentation. Our team is available to support you — from setup to production. Just ping your existing representative or reach out via forums, Slack, chat, etc.
Why DynamoDB Costs Catch Teams Off Guard
From inevitable overprovisioning to the “on-demand” tax: why DynamoDB is bloody hard to cost-control I recently built a DynamoDB cost calculator with the specific goal of helping potential ScyllaDB customers understand the true cost of running DynamoDB. Now, if you step back and look at my goal, it doesn’t make much sense, right? If somebody is already using DynamoDB, wouldn’t they already know how much it costs to run the technology at scale? Naively, this is what I thought too, at first. But then, I started to peel back the inner workings of DynamoDB cost calculations. At that point, I realized that there are many reasons why teams end up paying hundreds of thousands (if not millions) of dollars to run DynamoDB at scale. The main thing I found: DynamoDB is easy to adopt, but bloody hard to cost-control. My workmate Guilherme and I delivered a webinar along these lines, but if you don’t have time to watch, read on to discover the key findings. The first common misunderstanding is precisely what DynamoDB charges you for. You’ve probably already heard terms like Read Capacity Units and Write Capacity Units, and get the gist of “You pay for what you use” in terms of number of reads and writes. But let’s start with the basics. DynamoDB writes are expensive… If you look at pricing for on-demand capacity, you’ll see that a read request unit (RRU) costs $0.125 per million units, and a write request unit (WRU) costs $0.625 per million units. So, writes are 5 times more expensive than reads. I don’t know the exact technical reason, but it’s no doubt something to do with the write path being heavier (durability, consistency, indexing etc) and perhaps some headroom. 5x does seem a bit on the steep side for databases and one of the first traps from a cost perspective. You can easily find yourself spending an order of magnitude more if your workload is write-heavy, especially in on-demand mode. Speaking of which…there’s the other mode: provisioned capacity. As the name suggests, this means you can specify how much you’re going to use (even if you don’t use it), and hopefully pay a bit less. Let’s check the ratio though. A Read Capacity Unit (RCU) costs $0.00013 per RCU and a Write Capacity Unit (WCU) costs $0.00065, so writes are unsurprisingly 5 times more expensive than reads. So even in provisioned mode, you’re still paying a 5x penalty on writes. Thus, is significant, especially for high-volume write workloads. No provisioned discount on writes for you! You’re not provisioning requests, you’re provisioning rates… Here’s the catch: provisioned capacity units are measured per second, not per million requests, like in on-demand. That tripped me up initially. Why not just provision the total number of requests? But from AWS’s perspective, it makes perfect business sense. You’re paying for the ability to handle N operations per second, whether you use that capacity or not. So if your traffic is bursty, or you’re over provisioning to avoid request throttling (more on that in a bit), you’re essentially paying for idle capacity. Put simply, you’re buying sustained capacity, even if you only need it occasionally. Just like my gym membership 😉 Reserved capacity… So here’s the deal: if you reserve capacity, you’re betting big upfront to hopefully save a bit later. If you’re confident in your baseline usage, AWS gives you the option to reserve DynamoDB capacity, just like with EC2 or RDS. It’s a prepaid 1 or 3 year commitment, where you lock in a fixed rate of reads and writes per second. And yes, it’s still a rate, not a total number of requests. One gotcha: there’s no partial upfront option; it’s pay in full or walk away. Let’s look at a simple use case to compare the pricing models… Say your workload averages 10,000 reads/sec and 10,000 writes/sec over an hour. On-Demand pricing: Writes: $22.50/hr … 10,000 * 3600 * 0.625 / 1M Reads: $4.50/hr … 10,000 * 3600 * 0.125 / 1M (5x cheaper than writes, as usual) Provisioned pricing (non-reserved): Writes: $6.50/hr … 10,000 * $0.00065 Reads: $1.30/hr … 10,000 * $0.00013 Provisioned with 1-Year Reserved: Writes: ~$2.99/hr Reads: ~$0.59/hr “Hey, where’s the reserved math?” I hear you. Let’s just say: You take the reserved pricing for 100 WCUs ($0.0128/hr) and RCUs ($0.0025/hr), divide by 730 hours in a month, divide by 12 months in a year, divide again by 100 units, multiply by your needed rate… then round it, cry a little, and paste in the “math lady” meme. Or better yet, use our calculator. My point is: Provisioned is ~3.4x cheaper than on-demand Reserved is ~7.5x cheaper than on-demand On-demand is for people who love overpaying, or loathe predicting Btw, AWS recommends on-demand for: Traffic patterns that evolve over time Spiky or batchy workloads Low utilization (drops to zero or below 30% of peak) Which is basically every real-life workload — at least for the customers of ScyllaDB. So yes, expect to pay a premium for that flexibility unless your traffic looks like a textbook sine wave and you have a crystal ball. It’s not the size of the item, but it is… Here’s another trap. It’s one that you might not hit until you use real application data…at which point you’ll immediately regret overlooking it. In DynamoDB, you don’t just pay per operation; you pay per chunk of data transferred. And the chunk sizes differ between reads and writes: Writes are billed per 1KB (Write Request Units or WRUs) Reads are billed per 4KB (Read Request Units or RRUs) So if you write a 1.1KB item, that’s 2 WRUs. Write a 3KB item? Still 3 WRUs, every 1KB (or part thereof) gets counted. Reads work the same way, just at 4KB boundaries. Read a 1KB item? 1 RRU. Read a 4.1KB item? That’s 2 RRUs. Isn’t rounding up fun? I’m sure there’s strong technical reasons for these boundaries. You can see the trap here. Combine this with the 5x cost of a write compared to a read, and things can get nasty quickly, especially if your item size straddles those thresholds without you realizing. It’s probably ok if you have a fixed item size in your schema, but definitely not ok with the types of use cases we see at ScyllaDB. For example, customers might have nested JSON or blob fields which can shrink or grow with usage. And remember, it’s actual item size, not just logical schema size. Overprovisioning, because you have to … Another pain point, and devious omission from AWS’s own calculator, is the need to overprovision when using provisioned capacity. It sounds counterintuitive, but you’re forced to overprovision – not because you want to, but because DynamoDB punishes you if you don’t. In provisioned mode, every request is subject to strict throughput limits because, if you recall earlier, a fixed rate is what you’re paying for. If you slide past the provisioned capacity, you’ll hit ProvisionedThroughputExceededException. I love the clarity of this type of exception message. I don’t love what it actually does, though: request throttling. There’s a small 300s window of burst capacity that retains unused read and write capacity. But beyond that, your app just fails. So, the best way to counter this is to overprovision. By how much? That warrants an “it depends” answer. But it does depend on your workload type. We added this functionality to our calculator so you can dynamically overprovision by a percentage, just to factor in the additional costs to your workload. Obviously, these costs can add up quickly because in practice, you’re paying for the peak even if you operate in the trough. If you don’t provision high enough capacity, your peaks risk being throttled, giving you customer-facing failures at the worst possible time. Before we move on … If there’s a recurring theme here, it’s this: DynamoDB’s pricing isn’t inherently wrong. You do pay for what you use. However, it’s wildly unforgiving for any workload that doesn’t look like a perfect, predictable sine wave. Whether it’s: The 5x write cost multiplier The 7.5x on-demand cost multiplier Opaque per-second provisioned rates Punitive rounding and artificial boundaries of item sizes Or just the need to overprovision to avoid face-planting during peak load …You’re constantly having to second guess your architecture just to stay ahead of cost blowouts. The irony? DynamoDB is branded as “serverless” and “fully managed” yet you end up managing capacity math, throttling errors, arcane pricing tiers, and endless throughput gymnastics. Having observed many of our customer’s spreadsheet forecasts (and AWS Cost Explorer exports) for DynamoDB, even mature teams running large-scale systems have no idea what the cost is…until it’s too late. That’s why we built a calculator that models real workloads, not just averages. Because the first step to fixing costs is to understand where they’re coming from. In my next blog post, I walk through some real-world examples of customers that switched from DynamoDB to ScyllaDB to show the true impact of traffic patterns, item sizes, caches and multi region topologies. Stay tuned or skip ahead and model your own workloads at calculator.scylladb.com. Model your own DynamoDB workloads on our new cost calculatorBig ScyllaDB Performance Gains on Google Cloud’s New Smaller Z3 Instances
Benchmarks of ScyllaDB on Google Cloud’s new Z3 small instances achieved higher throughput and lower latency than N2 equivalents, especially under heavy load ScyllaDB recently had the privilege of examining Google Cloud’s shiny new small shape Z3 GCE instances in an early preview. The Z3 series is optimized for workloads that require low latency and high performance access to large data sets. Likewise, ScyllaDB is engineered to deliver predictable low latency, even with workloads exceeding millions of OPS per machine. Naturally, both ScyllaDB and Google Cloud were curious to see how these innovations translated to performance gains with data-intensive use cases. So, we partnered with Google Cloud to test ScyllaDB on the new instances. TL;DR When we tested ScyllaDB on these new Z3 small shape instances vs. the previous generation of N2 instances, we found significant throughput improvements as well as reduced latencies…particularly at high load scenarios. Why the New Z3 Instances Matter Z3 is Google Cloud’s first generation of Storage Optimized VMs, specifically designed to combine the latest CPU, memory, network, and high-density local SSD advancements. It introduces 36 TB of local SSD with up to 100 Gbps network throughput in its largest shape and brings in significant software-level improvements like partitioned placement policies, enhanced maintenance configurations, and optimized Hyperdisk support. The Z3 series has been available for over a year now. Previously, Z3 was only available in large configurations (88 and 176 vCPUs). With this new addition to the Z3 family, users can now choose from a broader range of high-performance instances, including shapes with 8, 16, 22, 32, and 44 vCPUs – all built on 4th Gen Intel Xeon Scalable (Sapphire Rapids), DDR5 memory, and local SSDs configured for maximum density and throughput. The new instance types — especially those in the 8 to 44 vCPU range — allow ScyllaDB to extend Z3 performance advantages to a broader set of workloads and customer profiles. And now that ScyllaDB X Cloud just introduced support for mixed-instance clusters, it’s the perfect timing for these new instances. Our customers can use them to expand and contract capacity with high precision. Or they can start small, then seamlessly shift to larger instances as their traffic grows. Test Methodology We evaluated the new Z3 instances against our current N2-based configurations using our standard weekly regression testing suite. These tests focus on measuring latency across a range of throughput levels, including an unthrottled phase to identify maximum operations per second. For all tests, each cluster consisted of 3 ScyllaDB nodes. The Z3 clusters used z3-highmem-16-highlssd instances, while the N2 clusters used n2-highmem-16 instances with attached 6 TB high-performance SSDs to match the Z3 clusters’ storage. Both instance families come with 16 vCPUs and 128 GB RAM. The replication factor was set to 3 to reflect our typical production setup. Four workloads were tested on ScyllaDB version 2025.1.2 with vnode-based keyspaces: Read (100% cache hit) Read (100% cache miss) Write Mixed (50% reads, 50% writes) For load generation, we used cassandra-stress with 1kb row size (one column). Each workload was progressively throttled to multiple fixed throughput levels, followed by an unthrottled phase. For throttled scenarios, we aimed for sub-millisecond to ~10ms latencies. For unthrottled loads, latency was disregarded to maximize throughput measurements. Benchmark Results First off, here’s an overview of the throughput results, combined: Now for the details… 1. Read Workload (100% Cache Hit) Latency results Load N2 P99 [ms] Z3 P99 [ms] 150k 0.64 0.5 300k 1.37 0.86 450k 7.23 6.23 600k Couldn’t meet op/s 10.02 700k Couldn’t meet op/s 13.1 The Z3 cluster consistently delivered better tail latencies across all load levels. For higher loads, the N2 based cluster couldn’t keep up, so we presented only results for the Z3 cluster. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 569,566 1,151,739 102 Due to superb performance gains from the CPU family upgrade, the Z3 cluster achieved a staggering 102% higher throughput than the N2 did at the unthrottled level. 2. Read Workload (100% Cache Miss) Latency results Load N2 P99 [ms] Z3 P99 [ms] 80k 2.53 2.02 165k 3.99 3.11 250k Couldn’t meet op/s 4.7 Again, the Z3 cluster achieved better latency results across all tested loads and could serve higher throughput while keeping latencies low. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 236,528 310,880 31 With a 100% cache read workload that’s bounded by a mix of disk and CPU performance, the Z3 cluster achieved a significant 31% gain in maximum throughput. 3. Write Workload Latency results Load N2 P99 [ms] Z3 P99 [ms] 200k 3.27 3.21 300k >100 ms 4.19 Although latencies remained relatively similar under moderate load, the N2 instances couldn’t sustain them under higher loads. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 349,995 407,951 17 Due to heavy compactions and intensive disk utilization, the write workload also takes advantage of Z3’s advancements. Here, it achieved 17% higher throughput. 4. Mixed Workload (50% Read / 50% Write) Latency results Load N2 P99 Write [ms] Z3 P99 Write [ms] N2 P99 Read [ms] Z3 P99 Read [ms] 50k 2.07 2.04 2.08 2.11 150k 2.27 2.65 2.65 2.93 300k 4.71 3.88 5.12 4.15 450k >100 ms 15.49 >100 ms 16.13 The Z3 cluster maintained similar latency characteristics to the N2 one in lower throughput ranges. In higher ones, it kept a consistent edge since it was able to serve data reliably at a wider range. Maximum throughput results Load N2 Throughput Z3 Throughput Diff % Max 519,154 578,380 11 With a 50% read:write ratio, the Z3 instances achieved 11% higher throughput for both read and write operations. Our Verdict on the New Z3 Instances The addition of Z3 smaller shapes brings new flexibility to ScyllaDB Cloud users. Whether you’re looking to scale down while retaining high SSD performance or ramp up throughput in cost-sensitive environments, Z3 offers a compelling alternative to N2. We’re excited to support the smaller Z3 instance types in ScyllaDB Cloud. These VMs will complement the existing N2 options and enable more flexible deployment profiles for workloads that demand high storage IOPS and network bandwidth without committing to extremely large core counts. What’s Next This first round of testing found that performance improvements on Z3 become significantly more pronounced as the load scales. We believe that stems from ScyllaDB’s ability to fully utilize the underlying hardware. Moving forward, we’ll continue validating Z3 under other scenarios (e.g., higher disk utilization, large partitions, compaction pressure, heterogeneous cluster mixing) and uplift our internal tuning recommendations accordingly.Real-Time Machine Learning with ScyllaDB as a Feature Store
What ML feature stores require and how ScyllaDB fits in as fast, scalable online feature store In this blog post, we’ll explore the role of feature stores in real-time machine learning (ML) applications and why ScyllaDB is a strong choice for online feature serving. We’ll cover the basics of features, how feature stores work, their benefits, the different workload requirements, and how latency plays a critical role in ML applications. We’ll wrap up by looking at popular feature store frameworks like Feast and how to get started with ScyllaDB as your online feature store. What is a feature in machine learning? A feature is a measurable property used to train or serve a machine learning model. Features can be raw data points or engineered values derived from the raw data. For instance, in a social media app like ShareChat, features might include: Number of likes in the last 10 minutes Number of shares over the past 7 days Topic of the post Image credit: Ivan Burmistrov and Andrei Manakov (ShareChat) These data points help predict outcomes such as user engagement or content recommendation. A feature vector is simply a collection of features related to a specific prediction task. For example, this is what a feature vector could look like for a credit scoring application. zipcode person_age person_income loan_amount loan_int_rate (%) 94109 25 120000 10000 12 Selecting relevant data points and transforming them into features takes up a significant portion of the work in machine learning projects. It is also an ongoing process to refine and optimize features so the model being trained becomes more accurate over time. Feature store architectures In order to efficiently work with features, you can create a central place to manage the features that are available within your organization. A central feature store enables: A standard process to create new features Storage of features for simplified access Discovery and reuse of features across teams Serving features for both model training and inference Most architectures distinguish between two stores/databases: Offline store for model training (bulk writes/reads) Online store for inference (real-time, low-latency writes/reads) A typical feature store pipeline starts with ingesting raw data (from data lakes or streams), performing feature engineering, saving features in both stores, and then serving them through two separate pipelines: one for training and one for inference. Benefits of a centralized feature store Centralized feature stores offer several advantages: Avoid duplication: teams can reuse existing features Self-serve access: data scientists can generate and query features independently Unified pipelines: even though training and inference workloads are vastly different, they can still be queried using the same abstraction layer This results in faster iteration, more consistency, and better collaboration across ML workflows. Different workloads in feature stores Let’s break down the two very distinct workload requirements that exist within a feature store: model training and real-time inference. 1. Model training (offline store) In order to make predictions you need to train a machine learning model first. Training requires a large and high-quality dataset. You can store this dataset in an offline feature store. Here’s a run down of what characteristics matter most for model training workloads: Latency: Not a priority Volume: High (millions to billions of records) Frequency: Infrequent, scheduled jobs Purpose: Retrieve a large chunk of historical data Basically, offline stores need to efficiently store huge datasets. 2. Real-time inference (online store) Once you have a model ready, you can run real-time inference. Real-time inference takes the input provided by the user and turns it into a prediction. Here’s a look at what characteristics matter most for real-time inference: Latency: High priority Volume: Low per request but high throughput (up to millions of operations/second) Frequency: Constant, triggered by user actions (e.g. ordering food) Purpose: Serve up-to-date features for making predictions quickly For example, consider a food delivery app. The user’s recent cart contents, age, and location might be turned into features and used instantly to recommend other items to purchase. This would require real-time inference – and latency makes or breaks the user experience. Why latency matters Latency (in the context of this article) refers to the time between sending a query and receiving the response from the feature store. For real-time ML applications – especially user-facing ones– low latency is critical for success. Imagine a user at checkout being shown related food items. If this suggestion takes too long to load due to a slow online store, the opportunity is lost. The end-to-end flow from Ingesting the latest data Querying relevant features Running inference Returning a prediction must happen in milliseconds. Choosing a feature store solution Once you decide to build a feature store, you’ll quickly find that there are dozens of frameworks and providers, both open source and commercial, to choose from: Feast (open source): Provides flexible database support (e.g., Postgres, Redis, Cassandra, ScyllaDB) Hopsworks: Tightly coupled with its own ecosystem AWS SageMaker: Tied to the AWS stack (e.g., S3, DynamoDB) And lots of others Which one is best? Factors like your team’s technical expertise, latency requirements, and required integrations with your existing stack all play a role. There’s no one-size-fits-all solution. If you are worried about the scalability and performance of your online feature store, then database flexibility should be a key consideration. There are feature stores (e.g. AWS SageMaker, GCP Vertex, Hopsworks etc.) that provide their own database technology as the online store. On one hand, this might be convenient to get started because everything is handled by one provider. But this can also become a problem later on. Imagine choosing a vendor like this with a strict P99 latency requirement (e.g., <15ms P99). The requirement is successfully met during the proof of concept (POC). But later you experience latency spikes – maybe because your requirements change or there’s a surge of new users in your app or some other unpredictable reason. You want to switch to a different online store database backend to save costs. The problem is you cannot… at least not easily. You are stuck with the built-in solution. It’s unfeasible to migrate off just the online store part of your architecture because everything is locked in. If you want to avoid these situations, you can look into tools that are flexible regarding the offline and online store backend. Tools like Feast or FeatureForm allow you to bring your own database backend, both for the online and offline stores. This is a great way to avoid vendor lock-in and make future database migrations less painful in case latency spikes occur or costs rise. ScyllaDB as an online feature store ScyllaDB is a high-performance NoSQL database that’s API compatible with Apache Cassandra and DynamoDB API. It’s implemented in C++, uses a shard-per-core architecture, and includes an embedded cache system, making it ideal for low-latency, high-throughput feature store applications. Why ScyllaDB? Low latency (single-digit millisecond P99 performance) High availability and resilience High throughput at scale (petabyte-scale deployments) No vendor lock-in (runs on-prem or in any cloud) Drop-in replacement for existing Cassandra/DynamoDB setups Easy migration from other NoSQL databases (Cassandra, DynamoDB, MongoDB, etc) Integration with the feature store framework Feast ScyllaDB shines in online feature store use cases where real-time performance, availability, and latency predictability are critical. ScyllaDB + Feast integration Feast is a popular open-source feature store framework that supports both online and offline stores. One of its strengths is the ability to plug in your own database sources, including ScyllaDB. Read more about the ScyllaDB + Feast integration in the docs. Get started with a feature store tutorial Want to try using ScyllaDB as your online feature store? Check out our tutorials that walk you through the process of creating a ScyllaDB cluster and building a real-time inference application. Tutorial: Price prediction inference app with ScyllaDB Tutorial: Real-time app with Feast & ScyllaDB Feast + ScyllaDB integration GitHub: ScyllaDB as a feature store code examples Have questions or want help setting it up? Submit a post in the forum! Update: I just completed a developer workshop with Feast maintainer, Francisco Javier Arceo:Build Real-Time ML Apps with Python, Feast & NoSQL. You can watch it on demand now.Integrating support for AWS PrivateLink with Apache Cassandra® on the NetApp Instaclustr Managed Platform
Discover how NetApp Instaclustr leverages AWS PrivateLink for secure and seamless connectivity with Apache Cassandra®. This post explores the technical implementation, challenges faced, and the innovative solutions we developed to provide a robust, scalable platform for your data needs.
Last year, NetApp achieved a significant milestone by fully integrating AWS PrivateLink support for Apache Cassandra® into the NetApp Instaclustr Managed Platform. Read our AWS PrivateLink support for Apache Cassandra General Availability announcement here. Our Product Engineering team made remarkable progress in incorporating this feature into various NetApp Instaclustr application offerings. NetApp now offers AWS PrivateLink support as an Enterprise Feature add-on for the Instaclustr Managed Platform for Cassandra, Kafka®, OpenSearch®, Cadence®, and Valkey™.
The journey to support AWS PrivateLink for Cassandra involved considerable engineering effort and numerous development cycles to create a solution tailored to the unique interaction between the Cassandra application and its client driver. After extensive development and testing, our product engineering team successfully implemented an enterprise ready solution. Read on for detailed insights into the technical implementation of our solution.
What is AWS PrivateLink?
PrivateLink is a networking solution from AWS that provides private connectivity between Virtual Private Clouds (VPCs) without exposing any traffic to the public internet. This solution is ideal for customers who require a unidirectional network connection (often due to compliance concerns), ensuring that connections can only be initiated from the source VPC to the destination VPC. Additionally, PrivateLink simplifies network management by eliminating the need to manage overlapping CIDRs between VPCs. The one-way connection allows connections to be initiated only from the source VPC to the managed cluster hosted in our platform (target VPC)—and not the other way around.
To get an idea of what major building blocks are involved in making up an end-to-end AWS PrivateLink solution for Cassandra, take a look at the following diagram—it’s a simplified representation of the infrastructure used to support a PrivateLink cluster:
In this example, we have a 3-node Cassandra cluster at the far right with one Cassandra node per Availability Zone (or AZ). Next, we have the VPC Endpoint Service and a Network Load Balancer (NLB). The Endpoint Service is essentially the AWS PrivateLink, and by design AWS needs it to be backed by an NLB–that’s pretty much what we have to manage on our side.
On the customer side, they must create a VPC Endpoint that enables them to privately connect to the AWS PrivateLink on our end; naturally, customers will also have to use a Cassandra client(s) to connect to the cluster.
AWS PrivateLink support with Instaclustr for Apache Cassandra
To incorporate AWS PrivateLink support with Instaclustr for Apache Cassandra on our platform, we came across a few technical challenges. First and foremost, the primary challenge was relatively straightforward: Cassandra clients need to talk to each individual node in a cluster.
However, the problem is that nodes in an AWS PrivateLink cluster are only assigned private IPs; that is what the nodes would announce by default when Cassandra clients attempt to discover the topology of the cluster. Cassandra clients cannot do much with the received private IPs as they cannot be used to connect to the nodes directly in an AWS PrivateLink setup.
We devised a plan of attack to get around this problem:
- Make each individual Cassandra node listen for CQL queries on unique ports.
- Configure the NLB so it can route traffic to the appropriate node based on the relevant unique port.
- Let clients implement the AddressTranslator interface from the Cassandra driver. The custom address translator will need to translate the received private IPs to one of the VPC Endpoint Elastic Network Interface (or ENI) IPs without altering the corresponding unique ports.
To understand this approach better, consider the following example:
Suppose we have a 3-node Cassandra cluster. According to the proposed approach we will need to do the followings:
- Let the nodes listen on ports 172.16.0.1:6001 (in AZ1), 172.16.0.2: 6002 (in AZ2) and 172.16.0.3: 6003 (in AZ3)
- Configure the NLB to listen on the same set of ports
- Define and associate target groups based on the port. For instance, the listener on port 6002 will be associated with a target group containing only the node that is listening on port 6002.
- As for how the custom address translator is expected to work,
let’s assume the VPC Endpoint ENI IPs are 192.168.0.1 (in AZ1),
192.168.0.2 (in AZ2) and 192.168.0.3 (in AZ3). The address
translator should translate received addresses like so:
- 172.16.0.1:6001 --> 192.168.0.1:6001 - 172.16.0.2:6002 --> 192.168.0.2:6002 - 172.16.0.3:6003 --> 192.168.0.3:6003
The proposed approach not only solves the connectivity problem but also allows for connecting to appropriate nodes based on query plans generated by load balancing policies.
Around the same time, we came up with a slightly modified approach as well: we realized the need for address translation can be mostly mitigated if we make the Cassandra nodes return the VPC Endpoint ENI IPs in the first place.
But the excitement did not last for long! Why? Because we quickly discovered a key problem: there is a limit to the number of listeners that can be added to any given AWS NLB of just 50.
While 50 is certainly a decent limit, the way we designed our solution meant we wouldn’t be able to provision a cluster with more than 50 nodes. This was quickly deemed to be an unacceptable limitation as it is not uncommon for a cluster to have more than 50 nodes; many Cassandra clusters in our fleet have hundreds of nodes. We had to abandon the idea of address translation and started thinking about alternative solution approaches.
Introducing Shotover Proxy
We were disappointed but did not lose hope. Soon after, we devised a practical solution centred around using one of our open source products: Shotover Proxy.
Shotover Proxy is used with Cassandra clusters to support AWS PrivateLink on the Instaclustr Managed Platform. What is Shotover Proxy, you ask? Shotover is a layer 7 database proxy built to allow developers, admins, DBAs, and operators to modify in-flight database requests. By managing database requests in transit, Shotover gives NetApp Instaclustr customers AWS PrivateLink’s simple and secure network setup with the many benefits of Cassandra.
Below is an updated version of the previous diagram that introduces some Shotover nodes in the mix:
As you can see, each AZ now has a dedicated Shotover proxy node.
In the above diagram, we have a 6-node Cassandra cluster. The Cassandra cluster sitting behind the Shotover nodes is an ordinary Private Network Cluster. The role of the Shotover nodes is to manage client requests to the Cassandra nodes while masking the real Cassandra nodes behind them. To the Cassandra client, the Shotover nodes appear to be Cassandra nodes, and it is only them that make up the entire cluster! This is the secret recipe for AWS PrivateLink for Instaclustr for Apache Cassandra that enabled us to get past the challenges discussed earlier.
So how is this model made to work?
Shotover can alter certain requests from—and responses to—the client. It can examine the tokens allocated to the Cassandra nodes in its own AZ (aka rack) and claim to be the owner of all those tokens. This essentially makes them appear to be an aggregation of the nodes in its own rack.
Given the purposely crafted topology and token allocation metadata, while the client directs queries to the Shotover node, the Shotover node in turn can pass them on to the appropriate Cassandra node and then transparently send responses back. It is worth noting that the Shotover nodes themselves do not store any data.
Because we only have 1 Shotover node per AZ in this design and there may be at most about 5 AZs per region, we only need that many listeners in the NLB to make this mechanism work. As such, the 50-listener limit on the NLB was no longer a problem.
The use of Shotover to manage client driver and cluster interoperability may sound straight forward to implement, but developing it was a year-long undertaking. As described above, the initial months of development were devoted to engineering CQL queries on unique ports and the AddressTranslator interface from the Cassandra driver to gracefully manage client connections to the Cassandra cluster. While this solution did successfully provide support for AWS PrivateLink with a Cassandra cluster, we knew that the 50-listener limit on the NLB was a barrier for use and wanted to provide our customers with a solution that could be used for any Cassandra cluster, regardless of node count.
The next few months of engineering were then devoted to the Proof of Concept of an alternative solution with the goal to investigate how Shotover could manage client requests for a Cassandra cluster with any number of nodes. And so, after a solution to support a cluster with any number of nodes was successfully proved, subsequent effort was then devoted to work through stability testing the new solution, the results of that engineering being the stable solution described above.
We have also conducted performance testing to evaluate the relative performance of a PrivateLink-enabled Cassandra cluster compared to its non-PrivateLink counterpart. Multiple iterations of performance testing were executed as some adjustments to Shotover were identified from test cases and resulted in the PrivateLink-enabled Cassandra cluster throughput and latency measuring near to a standard Cassandra cluster throughput and latency.
Related content: Read more about creating an AWS PrivateLink-enabled Cassandra cluster on the Instaclustr Managed Platform
The following was our experimental setup for identifying the max throughput in terms of Operations per second of a Cassandra PrivateLink cluster in comparison to a non-Cassandra PrivateLink cluster
- Baseline node size:
i3en.xlarge
- Shotover Proxy node size on Cassandra Cluster:
CSO-PRD-c6gd.medium-54
- Cassandra version:
4.1.3
- Shotover Proxy version:
0.2.0
- Other configuration: Repair and backup disabled, Client Encryption disabled
Throughput results
Operation | Operation rate with PrivateLink and Shotover | Operation rate without PrivateLink |
Mixed-small (3 Nodes) | 16608 | 16206 |
Mixed-small (6 Nodes) | 33585 | 33598 |
Mixed-small (9 Nodes) | 51792 | 51798 |
Across different cluster sizes, we observed no significant difference in operation throughput between PrivateLink and non-PrivateLink configurations.
Latency results
Latency benchmarks were conducted at ~70% of the observed peak throughput (as above) to simulate realistic production traffic.
Operation | Ops/second | Setup | Mean Latency (ms) | Median Latency (ms) | P95 Latency (ms) | P99 Latency (ms) |
Mixed-small (3 Nodes) | 11630 | Non-PrivateLink | 9.90 | 3.2 | 53.7 | 119.4 |
PrivateLink | 9.50 | 3.6 | 48.4 | 118.8 | ||
Mixed-small (6 Nodes) | 23510 | Non-PrivateLink | 6 | 2.3 | 27.2 | 79.4 |
PrivateLink | 9.10 | 3.4 | 45.4 | 104.9 | ||
Mixed-small (9 Nodes) | 36255 | Non-PrivateLink | 5.5 | 2.4 | 21.8 | 67.6 |
PrivateLink | 11.9 | 2.7 | 77.1 | 141.2 |
Results indicate that for lower to mid-tier throughput levels, AWS PrivateLink introduced minimal to negligible overhead. However, at higher operation rates, we observed increased latency, most notably at the p99 mark—likely due to network level factors or Shotover.
The increase in latency is expected as AWS PrivateLink introduces an additional hop to route traffic securely, which can impact latencies, particularly under heavy load. For the vast majority of applications, the observed latencies remain within acceptable ranges. However, for latency-sensitive workloads, we recommend adding more nodes (for high load cases) to help mitigate the impact of the additional network hop introduced by PrivateLink.
As with any generic benchmarking results, performance may vary depending on specific data model, workload characteristics, and environment. The results presented here are based on specific experimental setup using standard configurations and should primarily be used to compare the relative performance of PrivateLink vs. Non-PrivateLink networking under similar conditions.
Why choose AWS PrivateLink with NetApp Instaclustr?
NetApp’s commitment to innovation means you benefit from cutting-edge technology combined with ease of use. With AWS PrivateLink support on our platform, customers gain:
- Enhanced security: All traffic stays private, never touching the internet.
- Simplified networking: No need to manage complex CIDR overlaps.
- Enterprise scalability: Handles sizable clusters effortlessly.
By addressing challenges, such as the NLB listener cap and private-to-VPC IP translation, we’ve created a solution that balances efficiency, security, and scalability.
Experience PrivateLink today
The integration of AWS PrivateLink with Apache Cassandra® is now generally available with production-ready SLAs for our customers. Log in to the Console to create a Cassandra cluster with support for AWS PrivateLink with just a few clicks today. Whether you’re managing sensitive workloads or demanding performance at scale, this feature delivers unmatched value.
Want to see it in action? Book a free demo today and experience the Shotover-powered magic of AWS PrivateLink firsthand.
Resources
- Getting started: Visit the documentation to learn how to create an AWS PrivateLink-enabled Apache Cassandra cluster on the Instaclustr Managed Platform.
- Connecting clients: Already created a Cassandra cluster with AWS PrivateLink? Click here to read about how to connect Cassandra clients in one VPC to an AWS PrivateLink-enabled Cassandra cluster on the Instaclustr Platform.
- General availability announcement: For more details, read our General Availability announcement on AWS PrivateLink support for Cassandra.
The post Integrating support for AWS PrivateLink with Apache Cassandra® on the NetApp Instaclustr Managed Platform appeared first on Instaclustr.