Apache Cassandra Performance Tuning: What We Learned
This blog post (tries to) consolidate what we’ve learned from years of tuning Apache Cassandra for performance Here at ScyllaDB, we often run internal and external performance comparisons. Internal testing helps ensure ScyllaDB’s performance advantage, track performance regressions, and maintain compatibility, including catching subtle API semantic-layer changes early. External comparisons are our way to aggregate the performance results for the general public every once in a while. Performance tuning can be a double-edged sword. Overlook one aspect, and you may end up under- or overestimating one’s performance numbers – and that may introduce deep ramifications down the road. While ScyllaDB and Cassandra both share a common API layer and feature set, both systems have fundamentally different architectures. This naturally adds to differences in how each system is tested and tuned. This blog post (tries to) consolidate what we’ve learned from years of tuning Apache Cassandra for performance. We spent a good amount of time hunting down the information we needed. Hopefully, the details described here help others improve their existing Cassandra cluster performance, as well as conduct more meaningful performance comparisons. Side-note: ScyllaDB shares how to reproduce our tests, including references on which settings and parameters we tuned. Check out our Cassandra 4 vs Cassandra 3.11 comparison, my recent talk on how ScyllaDB compares to Cassandra 5, and the comparison between Cassandra vNodes and ScyllaDB tablets as some concrete examples. Overview Perhaps the most relevant Apache Cassandra tuning source publicly available is Amy’s Cassandra 2.1 tuning guide. Despite its 2.1 reference (released in 2014), we find that most of the guidance (or, at least, the high-level concepts) provided there survived the ashes of time, including the array of settings that administrators need to configure by hand. Despite the over-a-decade-long difference, one of Amy’s particular thoughts stands out, and should guide you whenever you’re working with Apache Cassandra tuning:“The inaccuracy of some comments in Cassandra configs is an old tradition, dating back to 2010 or 2011. (…) What you need to know is that a lot of the advice in the config commentary is misleading. Whenever it says “number of cores” or “number of disks” is a good time to be suspicious. (…)” – Excerpt from Amy’s Cassandra 2.1 tuning guide, cassandra.yaml sectionApache Cassandra was originally conceived to run on commodity hardware. It is shipped under the assumption that the end user will configure and tune it for their specific environment. And it also assumes users know what they’re doing. What’s counterintuitive about Apache Cassandra tuning is how small settings can have an outsized impact on performance. Figure 1 perfectly demonstrates this aspect. It shows how both throughput and latencies vary significantly under different GC, compaction, and disk read-ahead settings. Figure 1 – Apache Cassandra 5 performance under different settings One last note before we dive right into tuning specifics: our goal is not to replace Amy’s well-covered, exhaustive guide. Instead, take our words as a complementary reference. We also don’t claim to be experts in the art of Cassandra performance tuning or troubleshooting; rather, we’re practitioners who learned some things (the hard way). Cassandra-Specific Tuning At a minimum, focus your efforts on the following files:
cassandra.yaml
jvm[NN]-server.options
jvm-server.options cassandra.yaml To help
users get started, a stock Apache Cassandra installation ships with
two config files. The first file –
cassandra.yaml – is oriented for users
upgrading from a previous Cassandra release and comes with
backward-compatible settings. The second –
cassandra_latest.yaml – “contains
configuration defaults that enable the latest features of
Cassandra, including improved functionality as well as higher
performance. This version is provided for new users of Cassandra
who want to get the most out of their cluster, and for users
evaluating the technology.” Source:
the Cassandra project. If you spin a fresh
cassandra:5 container or simply initiate your
tuning journey without taking this into consideration, you’ll end
up running your deployment under compatibility mode. The following
command demonstrates how a freshly spun Cassandra 5 container
starts under compatibility mode, rather than enabling its latest
features: root@container:/etc/cassandra# diff cassandra.yaml
cassandra_latest.yaml | sed
's/^>/[cassandra_latest.yaml]/g;s/^</[cassandra.yaml]/g' |
egrep 'compatibility|memtable' | sort [cassandra.yaml]
memtable_allocation_type: heap_buffers [cassandra.yaml]
storage_compatibility_mode: CASSANDRA_4 [cassandra_latest.yaml]
memtable_allocation_type: offheap_objects [cassandra_latest.yaml]
storage_compatibility_mode: NONE It’s beyond the scope of
this write-up to provide an exhaustive list of settings you should
pay attention to when setting up Cassandra. The stock
cassandra.yaml is often irrelevant, and we ended
up simply replacing it with the
cassandra\_latest.yaml instead. If you are
starting a fresh new cluster, we highly recommend you do the same.
However, you probably stream_throughput_outbound_megabits_per_sec option
Both
Cassandra 4.1 and
Cassandra 5.0 docs referenced the
stream_throughput_outbound option Only reading
this
Instaclustr article (or carefully interpreting
cassandra\_latest.yaml) eventually shed some light on the
correct option:
entire_sstable_stream_throughput_outbound. In other
words, 3 distinct settings exist for tuning the previous 3 major
releases of Apache Cassandra – and one of them was incorrectly
documented under the official project’s page. This raises concerns
about the feasibility of upgrading from older releases. Given these
constraints, we highly encourage organizations to conduct a careful
review and full round of testing on their own. This is not an edge
case; others noted similar upgrade problems on the Apache
Cassandra Mailing List. With that in mind, here are some
examples of misleading Cassandra config comments and why upgrades
deserve some extra diligence: CASSANDRA-16315
– Covers the concurrent_compactors setting CASSANDRA-7139
– Describes how that same concurrent_compactors
setting default was production unsafe when introduced CASSANDRA-20692
– Describes how a commitlog correctness issue slipped through to
Cassandra 5 JVM settings Test Kind Garbage Collector Read-ahead
Compaction Throughput P99 Latency Throughput Cassandra RA4
Compaction256 ZGC 4KB 256MB/s 6.662ms 120K/s Cassandra RA4
Compaction0 ZGC 4KB Unthrottled 8.159ms 120K/s Cassandra RA8
Compaction256 ZGC 8KB 256MB/s 4.657ms 100K/s Cassandra RA8
Compaction0 ZGC 8KB Unthrottled 4.903ms 100K/s Cassandra G1GC G1GC
4KB 256MB/s 5.521ms 40K/s Tuning the JVM is the least fun part of
operating a Cassandra cluster. It can be a journey on its own,
really. The good news is that Cassandra 5 includes support for
JDK17, and users may now opt-in for using ZGC rather
than the decades-long G1 garbage collector. Unless you
are a Java expert and know exactly what you are doing, this
theLastPickle article is perhaps your best resource for tuning
Cassandra’s JVM. You could read that and call it a day. Still, here
are some details on what we’ve discovered along the way, since the
DataStax (now IBM)
Tuning Java resources page only advises under a remark
of adjusting “settings gradually and test each incremental
change”: We’ve consistently measured lower latencies and
higher throughput using ZGC under a handful of
different scenarios. Although we’ve seen some users reporting good
G1 performance results, this doesn’t align with what
we’ve experimented with in practice. Remember that Cassandra relies
on both off-heap as well as on-heap memory. The heap size will
depend on how much RAM your setup has. Since we primarily test on
128GB RAM machines, we found that allocating beyond 32G would be
wasteful.
theLastPickle‘s article mentioned earlier makes a good point
about compressed OOPs, though we believe this should be relevant
for RAM constrained systems. We didn’t observe any noticeable
benefits/disadvantages from having 31G/32G in our
results. Most of the JVM settings will sit under the
jvm17-server.options file (if you’re using
JDK17). However, there is yet another file
(jvm-server.options, note there’s no Java
version) that you should also edit. Apparently Cassandra has some
built-in scriptology in cassandra.in.sh that
looks up the latter and inherits options from it. Then, if your
heap settings (-Xmx & -Xms) are unset, it
will automatically define it for you: ################# #
HEAP SETTINGS # ################# # Heap size is automatically
calculated by cassandra-env based on this # formula: max(min(1/2
ram, 1024MB), min(1/4 ram, 8GB)) # That is: # - calculate 1/2 ram
and cap to 1024MB # - calculate 1/4 ram and cap to 8192MB # - pick
the max # # For production use you may wish to adjust this for your
environment. # If that's the case, uncomment the -Xmx and Xms
options below to override the # automatic calculation of JVM heap
memory. # # It is recommended to set min (-Xms) and max (-Xmx) heap
sizes to # the same value to avoid stop-the-world GC pauses during
resize, and # so that we can lock the heap in memory on startup to
prevent any # of it from being swapped out. #-Xms4G #-Xmx4G
Therefore, uncomment and override the two lines above for your
environment. After you are done, you may want to circle back to the
cassandra.yaml file because there are some
settings that influence your heap allocation. For example:
networking_cache_size file_cache_size
memtable_offheap_space
repair_session_space among others… If you feel like
Cassandra is choking and the system is not under heap pressure,
then playing with these settings is probably your next step. Sadly,
this is where things become trial-and-error, and even more time
consuming. (Though, in Cassandra’s defense, tuning most of these
parameters is workload specific). About Cassandra Caching
Apache Cassandra ships two caching-related settings:
row_cache_size and key_cache_size You
should almost never enable either of these settings
(0GiB means these are disabled). The only exception is
when your workload has a (VERY) high cache hit ratio and is
relatively static. The table below shows how both Row & Key caches
have a negative performance impact in Cassandra during a scale-out:
Kind Step
Throughput Retries
Cassandra 5.0 – Page Cache 3 > 6 nodes 56K ops/sec 2010
Cassandra 5.0 – Page Cache 6 > 9 nodes 112K ops/sec 0
Cassandra 5.0 – Row & Key Cache 3 > 6 nodes 56K ops/sec 5004
Cassandra 5.0 – Row & Key Cache 6 > 9 nodes 112K ops/sec
8779 Likewise, Figure 2 shows how throughput varies significantly
under a fully cached workload:
Figure 2 – Cassandra Row Cache vs OS Page Cache
performance (speedup falls between 1.14x to 1.5x)
Figure 2 – Cassandra Row Cache vs OS Page Cache performance
(speedup falls between 1.14x to 1.5x) An old
DataStax (IBM) documentation page strongly discourages its use,
noting that users should prefer using the OS page cache instead:
Note: Utilizing the appropriate OS
page cache will result in better performance than using row
caching. Counterintuitively, DataStax (IBM) later recommends
enabling the Row Cache when the number of reads dominate compared
to writes: Tip: Enable a row cache
only when the number of reads is much bigger (rule of thumb is 95%)
than the number of writes. Consider using the operating system page
cache instead of the row cache, because writes to a partition
invalidate the whole partition in the cache. OS Tuning
Operating system tuning for Cassandra shares many similarities with
other databases. Preventing swapping, tuning the kernel via
sysctl, setting disk read_ahead_kb
settings, configuring user limits and enabling Transparent
HugePages are the primary settings we touch when deploying
Cassandra. This is (undoubtedly) a non-exhaustive list, although it
should cover the strategies seen across most production Cassandra
deployments in practice. Depending on your setup, you may want to
further check: your
clocksource – particularly under Xen hypervisors; whether
cpupower supports setting the CPU scaling governor to
“performance” mode; experimenting with jemalloc;
configuring SMP IRQ
Affinity; and pinning Cassandra to specific CPUs via taskset(1).
Disks We primarily store Cassandra related files (including its
related logs) on locally-attached NVMe disks, as commonly found
within cloud hyperscalers. If there’s more than one attached disk
to the VM, we combine them into a RAID-0 array using
mdadm. In addition, we use XFS as the
backing filesystem, particularly as it’s the same we use for
ScyllaDB. We also set only one-hit merges, limit
read_ahead_kb to just 4kB, and disable
the IO scheduler (if any): MD_NAME=nvme1n1 sudo sh -c "echo 1
> /sys/block/$MD_NAME/queue/nomerges" sudo sh -c "echo 4 >
/sys/block/$MD_NAME/queue/read_ahead_kb" sudo sh -c "echo none >
/sys/block/$MD_NAME/queue/scheduler" Some important remarks:
the scheduler command may “fail” in modern Cloud
instances (and that’s fine); when using mdadm, tune
each block device individually backing the RAID device;
read_ahead_kb is a workload dependent setting. We
often test small partition lookups, but workloads with larger
wide-rows may benefit from increasing that setting. Memory We don’t
configure swapping at all to keep matters simple. The rationale is
that Cassandra already benefits from the OS page cache, and we
leave over half of the server’s RAM just for it. During our tests,
we also observed that enabling Transparent Huge pages,
especially with ZGC, contributed positively to
Cassandra’s performance. Although the improvement wasn’t
remarkable, we observed positive results similar to what both
Amy and
Netflix reported. The provided links already go in-depth on how
to enable THP, as well as how to configure Cassandra
to benefit from it. Keep in mind, however, that we recommend you
set the -XX:+AlwaysPreTouch JVM option regardless of
whether THP is enabled or not. That’s because it’s
known to improve overall JVM runtime performance at the expense of
increased JVM startup times. Kernel and User limits Put simply, you
don’t want Cassandra to be limited on either networking, memory
allocation, or the number of files it can open. We set
sysctl.conf.d/99-cassandra.conf to the following
values: net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10
net.core.rmem_default=16777216 net.core.wmem_default=16777216
net.core.optmem_max=40960 vm.max_map_count = 1048575
net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096
65536 16777216 net.core.rmem_max = 16777216 net.core.wmem_max =
16777216 net.core.netdev_max_backlog = 2500 net.core.somaxconn =
65000 net.ipv4.tcp_ecn = 0 net.ipv4.tcp_window_scaling = 1
net.ipv4.ip_local_port_range = 10000 65535 net.ipv4.tcp_syncookies
= 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0
net.ipv4.tcp_fack = 1 net.ipv4.tcp_dsack = 1
net.ipv4.tcp_orphan_retries = 1 vm.dirty_background_bytes =
10485760 vm.dirty_bytes = 1073741824 vm.zone_reclaim_mode = 0
fs.file-max = 1073741824 vm.max_map_count = 1073741824
Lastly, the user running Cassandra must be allowed to allocate
enough resources for the process to run. As our VMs are
short-lived, we enable unlimited limits.conf
consumption to all users: * - nofile 1000000 * - memlock
unlimited * - fsize unlimited * - data unlimited * - rss unlimited
* - stack unlimited * - cpu unlimited * - nproc unlimited * - as
unlimited * - locks unlimited * - sigpending unlimited * - msgqueue
unlimited Parting Thoughts As demonstrated, Apache Cassandra
performance tuning is far from a one-size-fits-all solution. The
settings described throughout this article represent what worked
for our specific hardware setups and workload profiles. If your
deployment spans different hardware, many of the values presented
here will likely need to be revisited. This brings us to (perhaps)
the most underappreciated cost in Cassandra operations: dependency.
That is, every tuning decision is implicitly a contract with the
underlying hardware. Adding more disks, increasing CPU/RAM,
changing workloads are some overlooked aspects that will require
entirely new tuning cycles and re-evaluating your previous
decisions. ScyllaDB was designed with this problem in mind. Its
shard-per-core architecture and self-tuning capabilities
automatically adapt to the underlying hardware, eliminating much of
the manual iteration and tuning described here. There’s no JVM at
all, and most of the OS heavy lifting is carried out for you via an
automated script shipped alongside the core database. If Cassandra
performance has been a bottleneck, you’re concerned about the
recent IBM acquisition, or you’ve simply spent too much time
fighting tuning instead of building – give ScyllaDB a try. And if
you want to have a technical
discussion about your use case, let us know. “Key-Value” is Misleading. Access Patterns are Key.
Access patterns determine your data model, your I/O costs, and which database is the best fit for your workload I’ve been part of enough key-value database evaluations to recognize the pattern. When the conversation starts with benchmarks, the evaluation inevitably ends with regret. The benchmark answers “which is faster?” It doesn’t tell you which model fits how your application actually reads and writes data – and that’s what matters. Every data modeling decision should begin with access patterns, regardless of the technology on the table. What does your application read? At what granularity? What does it write? How often? How large? Let those answers drive the data model, then pick the technology. Flip that order and you pay for it. A fast database like ScyllaDB amplifies schema decisions: good models perform well, bad ones break faster. Edgar Codd invented First Normal Form (1NF) in 1970 to save disk space, but a terabyte of NVMe now costs about the same as lunch. So, even though the rule outlasted the constraint that justified it, we are still teaching it. That’s partly why so many teams expect to normalize their data with ScyllaDB the way they would a relational schema. But if they don’t get the order right (access patterns> data model> technology), they won’t get the performance that the engine was built to deliver. A lot of the confusion comes down to terminology. “Key-value” is one of the most overloaded labels in the database industry. We use it to describe both: A system that maps a string to an opaque blob A system that maps a partition key plus a clustering key to typed, individually addressable columns with partial-update semantics. Lumping these together hides the architectural decisions that determine your I/O patterns and your infrastructure costs. “Key-value” is often used to describe three very different data models. They differ in capability and in how deeply you can address your data. Pick the wrong one for your access patterns and you pay for it in I/O overhead, infrastructure cost, and write throughput. ScyllaDB can operate across multiple levels of this hierarchy. The one you select influences your I/O patterns, your update costs, and your infrastructure spend. Key-Value vs Wide-Column: Four Levels of Access Pattern Depth Instead of looking at feature lists, it’s better to compare these models by access pattern depth: at what level can you address, read, and write your data? Level 1: Key level. One key maps to one value. The value is opaque. The database has no knowledge of what is inside it. You get it and you put it in. This is K-V, the model behind most caching layers and session stores. Redis is the canonical example. The ceiling is the value boundary – you can replace it, you cannot address inside it. Level 2: Row level. A primary key maps to a set of named bins. Each bin holds a schemaless value. You can address individual bins by name, you can project specific bins in a read, and you can also update bins independently. This is K-V Wide Table, one key, multiple named fields, no schema enforcement on values. This model adds meaningful structure over K-V without requiring upfront schema design. Aerospike is the canonical example here. The ceiling is the bin boundary – you can update a bin, but you cannot address inside one. Level 3: Column level. A partition key combined with a clustering key addresses a row. Each column in that row is individually typed. The database understands the type of every value it stores. This is KKV Wide Table, the two-key model is what puts the second K in KKV. Typed columns enable the database to make smarter decisions about storage layout, compression, and update semantics. Cassandra reaches this level. The ceiling is the column boundary – typed and addressable, but complex values inside a column must be declared frozen. In other words, the entire value is serialized as a single blob that the engine cannot see into. Level 4: Within-column level. This is a key differentiator for KKV Wide Table. The engine starts working at a granularity that the other models can’t reach. A KKV Wide Table column can hold a collection: a map, a set, a list, a user-defined type, or nested combinations of these. Whether the database can address what’s inside that collection determines your actual access pattern depth. A frozen collection is serialized as a single blob. The engine stores it, retrieves it, and replaces it, but cannot see inside it. An unfrozen collection is stored element by element. Each entry is individually addressable. That distinction is the central architectural argument at this level. Cassandra touches this level but can’t reliably live here. Unfrozen collections exist in Cassandra, but tombstone accumulation makes them a liability in production. In ScyllaDB, Level 4 becomes practical. With an unfrozen collection, ScyllaDB stores each element individually. Whether you add an entry to a map, append to a list, or remove an element from a set – no read is required first and the database operates at element level. With a frozen collection, ScyllaDB serializes the entire value as a single cell. The engine can’t address inside it. For whole-value access patterns, that’s not a limitation, it’s an optimization. With this: There’s no per-element metadata. Reads pull one contiguous cell. Writes replace one contiguous cell. ScyllaDB’s UDT performance benchmarks show frozen collections outperforming unfrozen ones by up to 228% on write throughput and 162% on read throughput for 50-field UDTs. For the right access pattern, frozen is the faster choice. Don’t focus on frozen vs unfrozen; look at access pattern first and the right tool should follow from there. Figure: Frozen vs. unfrozen UDT, 50-field profile accessed as a whole. Frozen write throughput 228% higher, read throughput 162% higher. One cell write vs. 50-element writes plus 50 metadata records. The problem isn’t that it’s frozen; the access pattern mismatch is what’s causing the performance difference. An engineer who needs element-level updates and chooses frozen UDTs has, for those columns, given back Level 4 access. The operation degrades to read-modify-write: read the entire value, apply the change in memory, write it back as a whole. That is the same pattern a K-V Wide Table bin requires. The technology supports Level 4, but the schema choice has opted out of it. Figure: Four levels of access pattern depth. K-V gives key-level access. K-V Wide Table adds bin projection. KKV Wide Table adds typed columns and, with unfrozen collections, element-level access. Frozen collections are a performance optimization for whole-value access patterns, not a fallback. The opposite mistake is also a problem. An engineer who uses large unfrozen collections for values they always access as a whole pays per-element TTL and timestamp metadata on every element in the collection – at compaction time, continuously. A map with 10K entries carries 10K individual metadata records. That overhead snowballs over time. Choose frozen collections when you access the value as a whole. Choose small unfrozen collections when you need element-level updates. Large unfrozen collections are their own design smell, regardless of access pattern. Figure: Read granularity, requesting one field from a 30-field record. K-V reads the entire blob. K-V Wide Table reads the entire record and returns one bin. KKV Wide Table reads only the requested column, leaving 29 columns untouched on disk. How Access Pattern Depth Meets Memory: Three Scenarios The relationship between your dataset size and available memory determines which architecture is working with its strengths and which one is working against them. Figure: Data model behavior across memory scenarios, relative I/O and cost overhead for K-V, K-V Wide Table, and KKV Wide Table as dataset size moves from fits-in-RAM through keys-only-in-RAM to neither-fits-in-RAM. Scenario 1: Everything Fits in Memory When the entire dataset lives in RAM, a memory-resident hash index is fast. Point lookups are a hash computation and a pointer dereference. This is where K-V and K-V Wide Table architectures shine for read latency. But “what’s fast?” and “what’s cost-effective?” are different questions. If your dataset is 2 TB, you are paying for 2 TB of RAM across your cluster. An architecture designed around SSDs with efficient memory-resident metadata can deliver reads in the low hundreds of microseconds while your data lives on storage that costs a fraction of RAM per gigabyte. Although the access pattern performance difference on reads may be negligible, the infrastructure cost difference is not. Figure: Storage cost at scale, all-RAM vs NVMe SSD across dataset sizes from 0.5 TB to 32 TB. DDR5 ECC at ~$8/GB vs NVMe SSD at ~$0.10/GB. The gap compounds quickly past 1 TB. This is also the scenario where honesty matters. If your access pattern is truly “put blob, get blob” on ephemeral data with simple lookups, a K-V store is the right tool. The operational simplicity is a genuine advantage. There are fewer moving parts and fewer things to misconfigure. If your values are small and your queries never need to reach inside them, a K-V store will serve you well and be easy to operate. Scenario 2: Keys Fit in Memory, Values Do Not This is what K-V Wide Table architectures market as their sweet spot. Here, you have a primary index in memory, records on SSD, and fast key lookups that pull values from disk. For simple reads, bin projection works well here. Request three specific bins, get three bins back. You are not forced to read the entire record on every read. The problem surfaces at Level 4. Assume one bin holds a serialized map of user preferences and you need to update a single entry in that map. In this case, the system must: Read the entire bin from disk Deserialize the collection structure in memory Apply the modification Serialize the updated structure Write the entire bin back. That is a read-modify-write cycle on every collection update, regardless of how small the change is. The K-V Wide Table model has no path to Level 4 access. The bin is the floor. A KKV Wide Table model with unfrozen collections handles the same update without a read. The new map entry goes directly to the write-ahead log and the in-memory table. There’s no deserialization or full-bin read. The merge with existing data happens during compaction, as a background operation that does not block the write path. Compression: typed columns vs. schemaless bins. K-V Wide Table bins are schemaless. Within an SSTable block, different records interleave bin data without type information. That limits what a compressor can do across records. A KKV Wide Table stores typed column data within the same partition contiguously in SSTable blocks. For example, ScyllaDB writes all values for the event_ts column across rows in a partition together. Because those values share the same type, a dictionary-based compressor like zstd has much more to work with. This is not columnar storage in the analytics sense. ScyllaDB is an LSM-tree row-based engine at the partition level, not Parquet. The compression benefit comes from typed column homogeneity within SSTable blocks rather than a columnar storage layout. Frozen vs. unfrozen compression tradeoffs. Frozen UDTs compress well for a specific reason. A frozen UDT is a single cell with a consistent serialized layout. The same 50-field structure appears as the same byte sequence across records, which dictionary compression handles efficiently. Unfrozen collections are a different story. Each element carries its own TTL and timestamp metadata. ScyllaDB groups column values within SSTable blocks, which helps the element values themselves compress, but the metadata overhead scales with collection cardinality. For small unfrozen collections, it’s negligible. For large unfrozen collections, it can negate a meaningful portion of the compression gain. The compression advantage of typed columns applies most cleanly to simple typed columns and small unfrozen collections. Figure: K-V Wide Table SSTable blocks mix types across schemaless bins, limiting compression. KKV Wide Table SSTable blocks group typed column data within partitions. Frozen UDTs compress well as consistent serialized blobs. Unfrozen collections carry per-element metadata that can offset compression gains at high cardinality. Data locality. In a shard-per-core architecture (e.g., ScyllaDB’s), all columns within a partition live on the same CPU core. A read that touches three columns in a single partition involves zero cross-core coordination. This avoids locking and message passing between threads. This data locality might not be significant at low throughput. However, it matters a lot at hundreds of thousands of operations per second. Scenario 3: Neither Keys Nor Values Fit in Memory This is where memory-dependent index architectures hit a wall. If your architecture puts the primary index in RAM and your keyspace outgrows available memory, you are either: Adding nodes to hold the index, or Paging index entries to disk, which adds a disk read in front of every data read An architecture built for disk-resident data from the start does not have this problem. ScyllaDB (and to a degree Cassandra) uses Bloom filters to determine probabilistically whether a partition exists in a given SSTable without loading a full index into memory. Partition index summaries provide efficient lookup with a small, fixed memory footprint regardless of key count. And compaction strategies manage on-disk data organization to keep read amplification bounded. This is all strategic design for an architecture that assumes data will not fit in memory. Don’t just think about whether a system can handle disk-resident data; consider whether it was designed for it. The Update Path: Where Access Depth Becomes I/O Pattern Most evaluations obsess over reads. However, the update path is where access pattern depth differences tend to surface at scale. Consider updating a single element in a collection, one value in a map with 500 entries. In a K-V Wide Table architecture, collection updates require a full read-modify-write cycle: read the entire bin from disk, deserialize the collection structure in memory, apply the modification, serialize the updated structure, then write the entire bin back. Under concurrent updates to the same record, this becomes a serialization bottleneck. Under write-heavy workloads, write throughput is gated by read throughput. Figure: K-V Wide Table collection update path. A single-element update requires reading, deserializing, modifying, serializing, and rewriting the entire bin. In a KKV Wide Table architecture with unfrozen collections, the same update works like this: write the new value for that map entry directly to the memtable. This avoids the read, the deserialization, and the serialization. The entry lands in the write-ahead log and the in-memory table. The merge with existing data happens during compaction, as a background operation. Figure: KKV Wide Table update path with unfrozen collection. The write goes directly to WAL and memtable. No read required. Compaction merges data in the background. This is where access pattern honesty matters most. The append-only unfrozen update is fast for element-level changes to bounded collections. When your access pattern is whole-value, you write the entire UDT atomically and read it back as a unit. Here, frozen is the right choice. There is no read penalty and no per-element overhead. The ScyllaDB UDT benchmark shows 228% write throughput improvement for frozen UDTs in exactly this scenario: a 50-field UDT accessed and written as a whole. The frozen cell is one write operation. The equivalent unfrozen collection is 50 element writes plus 50 metadata records. The difference at 1,000 operations per second is negligible. But at 100,000 operations per second, with large collections and concurrent writes, the wrong frozen/unfrozen choice becomes the bottleneck in either direction. Figure: Write latency vs. collection size for a single-entry update. K-V Wide Table read-modify-write latency grows linearly with the number of entries in the collection. KKV Wide Table unfrozen update latency stays flat, the write goes to the WAL and memtable regardless of collection size. Figure: Single-element update latency vs. collection size, illustrating how wasted I/O grows with collection size for read-modify-write architectures, while direct-write latency remains constant. Choosing Honestly: Key-Value, K-V Wide Table, or KKV Wide Table These three models exist because different access patterns have different requirements. K-V is the right model for caching, session storage, and any workload where the access pattern is “put blob, get blob.” Its simplicity is a real advantage because you end up with fewer moving parts and fewer things to misconfigure. If your values are small and your queries never need to reach inside them, a K-V store will serve you well and be easy to operate. K-V Wide Table adds meaningful capability for workloads that need to address individual fields without upfront schema design. It’s a pragmatic choice for moderate-scale applications where operational simplicity matters, bin-level read projection is sufficient, and collection updates are infrequent or small. It sits at Level 2–3 access depth and does that job well. KKV Wide Table earns its complexity when your access patterns require Level 3 or 4 depth: frequent updates to large collections, datasets that will outgrow available memory, workloads where typed column compression meaningfully reduces storage cost, or write-heavy workloads that cannot afford read-modify-write on every collection update. The richer data model requires upfront schema design and demands that you get frozen versus unfrozen semantics right. Don’t rely on your intuition; choose strategically, based on your actual access pattern: Use frozen when you always read or write the whole value. A 50-field profile UDT that you always write and read back as a unit is a frozen candidate. The performance data supports it. Use small unfrozen collections when you need element-level updates. Append to a list. Update one key in a map. This is what unfrozen exists for. Use large unfrozen collections only if your access pattern is genuinely element-granular and your collection cardinality stays bounded. Per-element metadata overhead compounds. It affects both compaction cost and compression ratios. Figure: Decision flow for choosing a data model based on required access pattern depth. Don’t focus on which model is “best.” Think about which model best matches the access patterns your workload will experience in production. Start with the access patterns. Let the data model follow. Then pick the technology that supports that model at the depth you need. Get that order right and the database works with you. Get it wrong, and you spend your time working around it. *** If your use case requires low latencies at scale, and you’re frustrated with fighting your current database, ScyllaDB Cloud might be worth a look. Find me on LinkedIn – I’m always happy to talk data models.What’s new in Cassandra® 6? A roundup of features for users and operators
Apache Cassandra 6 is shaping up to be significant release as some of its biggest changes affect the core behavior of the database:
- How metadata is coordinated
- How Cassandra is moving toward broader transaction support via Accord protocol
- How repair is scheduled, and
- How operators inspect and manage the system.
Let’s focus on a few changes that stand out:
- Accord transactions
- Transactional Cluster Metadata (TCM)
- Automated repair
- Constraints framework
- Zstandard dictionary compression, and
- Cursor-based compaction improvements.
Taken together, these changes point to a version of Cassandra that is becoming more structured internally and easier to operate.
Accord transactions for ACID guaranteesAccord is a general-purpose transaction framework that uses a leaderless consensus protocol to have highly available transactions and is used in Cassandra 6. The goal is broader transactional support across multiple keys, with strict serializable isolation and without a central bottleneck.
This matters because multi-key consistency is hard to handle cleanly in application code. Once a workflow spans more than one partition, the application often ends up doing coordination work that really belongs in the database.
Accord enables ACID behavior on transactional tables, which lets developers coordinate multi-step, multi-partition changes with stronger correctness guarantees, reducing the amount of custom consistency logic they have to build in the application.
Including multi-partition, conditional work has historically been difficult to express cleanly in Cassandra. For operators, it signals that transactions are becoming a more important part of the platform and something to watch closely as Cassandra continues to mature.
Read our deep dive on Accord transactions here.
Transactional Cluster Metadata (TCM)TCM changes how Cassandra coordinates cluster-wide metadata. TCM introduces a Cluster Metadata Service that keeps an ordered log of metadata changes and makes those changes visible in a more consistent, coordinated way. That includes things like membership, token ownership, and schema state.
This was introduced because Cassandra’s older model depended heavily on eventual consistency and the Gossip Protocol to spread metadata changes across the cluster. TCM is meant to make those changes more explicit, more ordered, and easier to reason about.
For operators, this is one of the biggest architectural shifts in Cassandra 6. It does not mean Gossip Protocol disappears everywhere, but it does mean Cassandra is moving away from Gossip as the primary way cluster membership, schema, and data placement changes are coordinated and made visible. For users, the result should be more predictable schema and topology operations.
Automated repair orchestrationAutomated repair brings repair orchestration into Cassandra itself. Repair is the mechanism Cassandra uses to reconcile replicas over time so they stay consistent, and the goal is to make repair scheduling and coordination a built-in database service rather than something operators must orchestrate with external tools.
This was introduced because repair is essential, but historically it has placed a real burden on operators. Teams have had to build their own schedules, decide how to run repair safely, and keep it consistent over time.
For operators, automated repair could be one of the most practical changes in the release. It reduces manual coordination, supports full and incremental repair, adds useful safeguards, and makes repair easier to treat as a normal part of cluster maintenance—just like it has happened with major compactions with Unified Compaction Strategy in Cassandra 5. For users, it means a better chance that maintenance happens regularly and with fewer gaps.
At NetApp Instaclustr, our expert TechOps team already orchestrates laborious tasks like repair for our Apache Cassandra customers, ensuring their clusters stay online. Our platform handles the complexity so you can get up and running fast.
Constraints framework for data validationThe constraints framework lets Cassandra enforce more targeted
validation rules as part of the table schema. It enforces them at
write time instead of relying entirely on application code to
reject invalid data. Some examples of constraints include: Scalar
(>, <, >=,
<=), LENGTH(),
OCTET_LENGTH(), NOT NULL,
JSON(), REGEXP().
A simple example of an in-line constraint:
CREATE TABLE users ( username text PRIMARY KEY, age int CHECK age >= 0 and age < 120 );
This was introduced because Cassandra already had some broad limits, but they were not very granular or expressive. The constraints framework gives teams a more precise way to protect the shape of their data and guard against bad writes from misconfigured clients.
Operators gain more control and better predictability around what gets written into the cluster. For developers, it means some validation can move closer to the schema instead of being duplicated across every service.
Zstd dictionary compressionZstandard, or Zstd, dictionary compression extends SSTable compression by letting Cassandra use trained Zstd dictionaries for repetitive data patterns. Instead of relying only on generic compression, it can use a dictionary built from representative data to improve results.
This was introduced to primarily improve compression ratio while keeping the design manageable in production. It is recommended to use minimal dictionaries and only adopt new ones when they’re noticeably better.
This makes compression more configurable and more visible for operators. It adds training workflows, dictionary lifecycle management, and observability into dictionary size and cached dictionary memory usage. For users, the main benefit is better storage efficiency, because data with strong repeating patterns can compress better, leading to potential performance gains.
You can read more about the constraints framework and Zstd dictionary compression in our article detailing recent CEPs.
Cursor-based compaction improvementsCursor-based compaction is a new low-allocation compaction path in Cassandra 6 that processes SSTable data in a more streaming-oriented way, using reusable cursor-like readers and writers instead of constantly creating large numbers of temporary in-memory objects. In practical terms, it is designed to reduce heap allocation and garbage collection overhead during compaction.
Compaction is one of Cassandra’s most important background processes, and when it becomes cheaper and more efficient, nodes can spend less time fighting garbage collection and less heap on temporary work. For operators, that can mean smoother performance and better efficiency on large datasets. For developers, it is mostly an under-the-hood improvement, but one that can help clusters behave more consistently under load.
Conclusion: A more manageable databaseWhat stands out about Cassandra 6 is that many of its biggest changes are not isolated features. They reshape core parts of how Cassandra behaves and how it is operated.
Accord introduces a broader transactional model. TCM changes how metadata is coordinated. Automated repair brings a core maintenance task into the database. Constraints make schemas more defensive. Zstd dictionary compression improves how Cassandra approaches storage efficiency, and cursor-based compaction makes the system easier to run.
Taken together, Cassandra 6 focused on making the database more deliberate internally and more manageable operationally.
Stay tuned for a preview release of Cassandra 6 on the Instaclustr Platform!
Ready to get started?If you want to experience the power of Apache Cassandra without the operational headache, we have you covered. If you are an existing customer and would like to try Cassandra 5 before 6.0 is released, you can spin up a cluster today. If you don’t have an account yet, sign up for a free trial and experience the latest generation of Apache Cassandra on the Instaclustr Managed Platform.
Read all our technical documentation here.
Discover the 10 rules you need to know when managing Apache Cassandra.
If you are using a relational database and are interested in vector search, check out this blog on support for pgvector, which is available as an add-on for Instaclustr for PostgreSQL services.
The post What’s new in Cassandra® 6? A roundup of features for users and operators appeared first on Instaclustr.
Introducing ScyllaDB Agent Skills
A new set of best practices and usage patterns for AI agents working with ScyllaDB Cloud clusters Today we’re releasing a curated set of best practices and usage patterns for AI agents working with ScyllaDB Cloud clusters. If you just want to grab the skills and go build, here you go:npx skills add
scylladb/agent-skills If you want to understand why these
skills are useful and what problems they solve, read on. ** You may
have noticed a short warning at the bottom of many AI applications:
“AI can make mistakes. Double-check the output.” Or something along
those lines. This is also true when it comes to working with
databases. We’ve seen agents reach for the wrong driver, fail to
connect to ScyllaDB Cloud, generate schemas that fit a relational
database but not NoSQL, and produce queries that technically
execute but perform poorly at scale.
For more on agents getting things wrong, see this video…These problems can all be minimized by using agent skills. What are Agent Skills? Agent Skills are markdown files that give your AI agent best practices and domain-specific knowledge. They follow the standard format and help your agent reduce hallucinations. They are also essential to give the agent up-to-date information. Since LLM training data doesn’t include real-time updates by default, these skills help bridge that gap. A specialized skill helps make the agent’s behavior more consistent and predictable. Available ScyllaDB Skills The ScyllaDB Agent Skills cover three distinct areas: scylladb-cloud-setup: Guides agents through the full connection flow: retrieving cluster credentials from the Cloud Console, selecting the correct shard-aware driver for the user’s language, configuring DC-aware load balancing with the right datacenter name, and verifying the connection. scylladb-data-modeling: Encodes query-first design methodology, partition key and clustering column patterns, anti-patterns (
ALLOW FILTERING, hot partitions,
unbounded partition growth), time-series bucketing, and guidance on
when to use secondary indexes versus denormalized tables. The goal
is to create schemas and queries that hold up under production load
(just returning correct results in development is not sufficient).
scylladb-vector-search: Covers vector index
creation, ANN queries, filtering strategies (global vs. local
indexes and when each applies), quantization, and driver
configuration. You can install all three at once, or pick only what
your project needs. Each skill loads on demand when a relevant task
comes up, they don’t interfere with each other. Let’s look at the
main areas where AI systems get ScyllaDB wrong. Shard-aware drivers
ScyllaDB has its own family of shard-aware drivers for Python,
Java, Go, Rust, C++, and
more. Agents sometimes decide to download the wrong driver.
While it may appear to work, unofficial drivers bypass ScyllaDB’s
shard-aware routing and degrade performance. In other cases, agents
may hallucinate non-existent drivers. Besides making it impossible
to connect to the ScyllaDB cluster, this also introduces a security
risk: you may install a fake package designed to trick the AI (this
is called slopsquatting).
Connecting to ScyllaDB Cloud Connecting to ScyllaDB Cloud requires
DC-aware load balancing configured with the exact datacenter name
(e.g. AWS_US_EAST_1) from your cluster. If your agent
gets that wrong, the driver will fail to connect. Data modeling
ScyllaDB’s data model requires you to have a query-first approach.
You design tables around your access patterns, not your entities.
Agents tend to be trained more heavily on SQL and relational
databases than on NoSQL systems such as ScyllaDB. That means they
are more likely to generate an entity-first schema, then use
ALLOW FILTERING to force queries. This can result in
suboptimal performance when using ScyllaDB. Vector Search Vector
search on ScyllaDB is powerful but specific. There are global and
local vector indexes with different filtering semantics and
performance considerations. There’s an ANN OF
operator, and quantization options that matter at scale. Choosing
the wrong index type for a filtered query can hurt performance.
Getting started Install all skills using the Vercel Skills CLI
(requires Node.js): npx skills add
scylladb/agent-skills Or install a specific skill: npx
skills add scylladb/agent-skills --skill
scylladb-data-modeling You can also install manually by
cloning the GitHub repository
and copying the skill folders into your agent’s skills directory:
Agent Skills directory Claude Code ~/.claude/skills/
Cursor ~/.cursor/skills/ OpenAI Codex
~/.codex/skills/ OpenCode
~/.config/opencode/skills/ The skills follow the
Agent Skills open standard
and work with any agent that supports it, including Claude Code,
Cursor, Codex, and GitHub Copilot. Native Claude Code and Cursor
plugins are coming soon. We recommend installing all three skills
in any project that uses ScyllaDB. You get full coverage of the
areas where agents most commonly go wrong, with no overhead when
those skills aren’t relevant to the current task. As of now, the
skills cover the CQL interface; Alternator (DynamoDB
API) is not yet included. Feedback is welcome. Create an
issue on GitHub! New Research on Cloud Database Trends: Technical Risks, Cost Pressures, and Migration Triggers
Good enough until it isn’t: the database complacency trap A database is like a water heater. When all is well, it just does its job in the background. You don’t fantasize about replacing it or envy the one your friend just got. Really, you don’t even think about it — until something goes awry. But new research reveals a key difference: With databases, the problems don’t blindside you. Some 38% of technology leaders worry that their current database won’t meet their needs in the near future. However, they aren’t acting on it. They wait until some compelling event (e.g., a production incident, usage spike, budget cut, or cloud strategy pivot) pushes the database to the top of the priority list. That’s just one of the interesting findings from the Futurum Group’s latest research study, commissioned by ScyllaDB, which explores the latest trends in cloud database cost pressures, performance risks, and migration motivations. Respondents include technical decision-makers who shape cloud database strategy as well as team members directly responsible for the database. Guy Currier, Futurum Group Chief Analyst, summarizes the findings this way: “Those technology leaders expressed complacency with their cloud databases at the same time as concern and caution. This combination suggests that although they would prefer not to take immediate action, they know they will have to move when compelling events force a change.” The full report, Is Cloud Database Complacency Affecting Your Business Objectives?, is available now. Here are some key takeaways. Comfort masks concern A third of the leaders surveyed report satisfaction with the performance of their current cloud databases. Yet, 38% worry that their database isn’t fit to support future AI/ML workloads and the resulting explosion in data volume. The prime characteristic of these workloads is their unpredictability; past database performance is a poor indicator of future behavior as the technology evolves and as volumes increase. “Organizations experience what we might call ‘good enough for now’ syndrome,” Currier noted. “Their databases handle today’s workloads adequately, but leaders doubt these solutions will scale to meet tomorrow’s demands.” Cloud database costs are also a major concern. The research found that 35% of leaders want to improve performance but feel constrained by budget. Another 35% are concerned about rising costs despite being satisfied with performance. The top cloud database cost drivers include: Unexpected loads (40%) New or strict technical requirements (38%) Networking bandwidth growth (38%) Storage growth (38%) The 10% cost-savings tipping point Nearly 40% of organizations are meeting their cloud database budgets, but just as many consider their predictable costs too high. As Currier explains, “Organizations might tolerate high costs when they can plan for them. However, this tolerance creates an opening for solutions that can deliver similar predictability at lower price points.” That opening is quite specific: A 10% cost reduction is all it would take for many tech leaders to consider migrating their cloud database. Why so low? Likely, the answer lies in scale. When database costs climb into the millions annually – which is not unusual for platforms like DynamoDB, according to the research – even a modest 10% translates to substantial savings. Event-driven database migration triggers Still, technical leaders don’t proactively seek alternatives that are more cost-efficient or better prepared for the technical needs of current/future AI/ML workloads. They wait for trigger events that force them into a crisis-driven decision. Leadership changes (36%) and major production incidents (32%) emerged as the primary catalysts. Other significant triggers include: Load spikes (32%) Cost reductions of 10% or more (31%) Maintenance burdens (31%) Performance issues (29%) Volatile costs (28%) Most of these triggers highlight the reactive nature of these migrations, rather than proactive, strategic changes. Note that volatile database costs drive 28% of switching decisions, suggesting that sheer unpredictability can be nearly as disruptive as high costs. “Database decisions are rarely made in a vacuum,” the research report notes. “Even when teams identify performance or cost inefficiencies, acting on them competes with feature delivery, roadmap commitments, limited operational bandwidth, and against their familiar tech stack.” Early warning signs While water heater issues tend to surface without warning, database issues can usually be anticipated. There are several early warning signs that a database is starting to become a constraint: Cost is growing faster than throughput. When database spend rises faster than the throughput it’s handling, the system may not be as scalable as it appears. Teams patch their way forward (e.g., with caches) to sustain performance. But the cost per query keeps climbing. Rising tail latency. When P95 or P99 latency starts to climb during peak periods or background operations, it indicates the system is nearing its breaking point. These changes might be dismissed if they don’t immediately violate SLAs, but they’re canaries in the coal mine. Increasing operational friction. More manual tuning, more frequent capacity adjustments, more time spent managing the database to maintain the same level of performance…all these signal diminishing returns from the current architecture. Disproportionate complexity for organic growth. When routine scaling or new workload support requires outsized engineering effort, it’s a sign that the database has become a constraint rather than an enabler. From reactive to strategic Recognizing these signals is one thing, but actually acting on them before a crisis forces your hand is another. Some due diligence now will help you stay ahead of it. Get a general sense of what options are available for your use cases Define vendor-neutral evaluation criteria Stress test your existing database to understand its breaking point – before production traffic exposes it for you Set clear decision triggers (e.g., specific performance thresholds, cost targets, and capability gaps) Map your database capabilities against your 12–24 month strategic roadmap, not just your current workloads As Currier concludes: “Your database might be ‘good enough for now,’ but if that isn’t aligned with where your business needs to go, complacency is already costing you.” Download the full report here; you’ll also get access to an expert panel discussing the research findings.Native Vector Search for the DynamoDB API
Developers building on the DynamoDB API can run vector similarity search without the complexity of bolted-on “Zero ETL” For users in the DynamoDB environment, implementing vector search has been overly complicated. Amazon’s “Zero ETL” forces a dual-service approach (managing both DynamoDB and OpenSearch) and requires using two separate APIs just for Vector Semantic Search queries. ScyllaDB believes this is unnecessary complexity. We’re eliminating the heavy lifting by integrating vector search capabilities into Alternator, our DynamoDB-compatible API. This gives DynamoDB users high-performance similarity search within their familiar API, without the need for extra clusters or constant API context-switching. Architectural Differences: Unified vs. Fragmented Amazon’s approach to vector search exports data to S3 and then syncs it to OpenSearch via DynamoDB Streams. While “Zero ETL” sounds hands-off, you’re still responsible for the cost and complexity of a separate search cluster. The AWS cost is composed of DynamoDB, DynamoDB Streams, S3, OpenSearch, and the OSIS pipeline. Each of these elements’ pricing is complex on its own. Amazon Vector Search (using Open Search) for DynamoDB architecture. Source: AWS Blog. ScyllaDB Alternator simplifies this by integrating the vector store engine directly into the backend. Simple module: The ScyllaDB database hosts both the data and the vector index. Native API: You perform vector searches using DynamoDB Query operations. Performance: 10 Million Vectors on a Budget In our latest benchmark using a 10-million-vector dataset (768-dimensional Cohere embeddings), a modest five-node ScyllaDB cluster delivered over 12K QPS with single-digit millisecond latency.Setup: 10M vectors; 768 dimensions; K: 10 (retrieve top K values); No QuantizationResults Recall: ~90% Throughput: 12,763 QPS P99 Latency: 7.8 ms Cost: $1,643 / Month for 1Y full up front Estimating the AWS cost for this case is not trivial. The write-path includes DynamoDB (storage+ops), DynamoDB streams, S3 (storage, API), OpenSearch (data nodes, master nodes, EBS), and the OSIS pipeline. To read more on the pricing of Amazon Zero ETL, see Implementing search on Amazon DynamoDB data using zero-ETL integration with Amazon OpenSearch service. Code Examples Note: The exact JSON format might change in the next few months. 1. Enabling a Vector Index You can enable vector indexing during
CreateTable or via
UpdateTable. Note the new
VectorSecondaryIndexUpdates parameter. // Adding
a vector index to an existing table { "TableName":
"ProductCatalog", "AttributeDefinitions": [ {"AttributeName":
"ProductEmbedding", "AttributeType": "V"} ],
"VectorSecondaryIndexUpdates": [ { "Create": { "IndexName":
"VectorIdx", "VectorAttribute": { "AttributeName":
"ProductEmbedding", "Dimensions": 768 }, "IndexOptions": {
"SimilarityFunction": "COSINE", "M": 32, "ef_construction": 256 } }
} ] } Pro Tip: You will get the best
results with ScyllaDB’s optimized “V” (Vector)
type. Although you can use standard DynamoDB Lists, the
“V” type will store data as a tight array of 32-bit floats – and
that saves storage while boosting performance. 2. Performing a
Vector Search To search, use the Query operation with the ScyllaDB
VectorSearch parameter. { "TableName":
"ProductCatalog", "IndexName": "VectorIdx", "VectorSearch": {
"QueryVector": [0.12, 0.05, ..., 0.88], "Oversampling": 1.5 },
"Limit": 10, "ReturnVectorSearchSimilarity": "SIMILARITY" }
Example Use Cases Semantic Product Search Instead of relying on
exact keyword matches, users can find products based on intent. For
example, a search for “waterproof rugged hiking gear” can surface
relevant items even if those exact words aren’t in the title. RAG
(Retrieval-Augmented Generation) For knowledge bases, precision is
non-negotiable. Using the High Recall
configuration, ScyllaDB delivers 99.2% recall. That way, the LLM
receives the most accurate context possible for generating
responses. Semantic Deduplication At the Max
Throughput end of the spectrum, ScyllaDB can quickly scan
millions of incoming vectors to find near-duplicates. That prevents
redundant data from cluttering your system – reducing costs and
improving performance. Conclusion With ScyllaDB, DynamoDB users now
have a “fast track” to AI-ready infrastructure. By unifying storage
and vector search into a single API, you eliminate the operational
tax of “Zero ETL” without sacrificing the sub-millisecond
performance ScyllaDB is known for. ScyllaDB Vector Search Benchmark: 10M Vectors on a Compact Cluster
Even a small, compact setup achieved up to 12,840 QPS at k=10 with a serial P99 latency of 5.5 ms Our 1-billion-vector benchmark demonstrated that ScyllaDB Vector Search can sustain 252,000 QPS with 2 ms P99 latency across a large-scale deployment. But not every workload starts at a billion vectors. Many production use cases (e.g., product catalogs, knowledge bases for RAG, and semantic caches) live comfortably in the 10–100 million range. This post presents a smaller benchmark: a 10-million-vector dataset of 768-dimensional Cohere embeddings on a compact five-node cluster. It used three modest storage nodes and two memory-optimized search nodes, all running on AWS Graviton. We explore four index configurations that span the recall-throughput spectrum, from near-perfect recall to maximum throughput. The results show that even this small setup can deliver up to 12,840 QPS at k=10 with a serial P99 latency of 5.5 ms — without any quantization. Architecture at a Glance First, some background. ScyllaDB Vector Search separates storage and indexing responsibilities while keeping the system unified from the user’s perspective. The ScyllaDB storage nodes hold both the structured attributes and the vector embeddings in the same distributed table. Meanwhile, a dedicated Vector Store service — implemented in Rust and powered by the USearch engine — consumes updates from ScyllaDB via CDC and builds approximate nearest neighbor (ANN) indexes in memory. Queries are issued through standard CQL:SELECT … ORDER BY vector_column ANN OF
? LIMIT k; The queries are internally routed to the Vector
Store service, which performs the HNSW similarity search and
returns the candidate rows. This design allows each layer to scale
independently, optimizing for its own workload characteristics and
eliminating resource interference. For a detailed architectural
deep-dive, see the
1-billion-vector benchmark and the technical blog
Building a Low-Latency Vector Search Engine for ScyllaDB.
Benchmark Setup Here’s a look at the dataset and hardware used for
the benchmark. Dataset Property
Value Vectors 10,000,000
Dimensions 768 Embedding model
Cohere Similarity function COSINE
Quantization None (f32) Hardware
Role Instance
vCPUs RAM Count
Storage nodes i8g.large 2 16 GB 3 Search
nodes r7g.2xlarge 8 64 GB 2 With 768-dimensional f32
vectors and M values up to 64, the in-memory index size can be
estimated as: Memory ≈ N × (D × 4 + M × 16) × 1.2 For the largest
configuration (M=64): 10M × (768 × 4 + 64 × 16) × 1.2 ≈ 49
GB, which fits comfortably in the 64 GB of a single
r7g.2xlarge search node. No quantization is needed at this
scale. Experiments We tested four HNSW index
configurations, progressively lowering graph connectivity (M) and
search effort (ef_search) to shift the balance from
recall toward throughput. Experiment
M ef_construction
ef_search k tested
#1 (high quality) 64 384 192 100, 10
#2 (balanced) 32 256 128 100, 10
#3 (high throughput) 24 256 64 100, 10
#4 (max throughput) 20 256 48 10 The three HNSW
parameters control different aspects of the index:
M
(maximum_node_connections): Maximum edges per node in
the HNSW graph. Higher values create a richer, better-connected
graph that improves recall at the cost of more memory and slower
inserts and queries. ef_construction
(construction_beam_width): Controls how thoroughly the
algorithm searches for the best neighbors when inserting a new
vector. Higher values produce a higher-quality graph but slow down
index building. This is a one-time cost.
ef_search
(search_beam_width): The main tuning knob for query
performance. Controls the size of the candidate beam during search.
Higher values evaluate more candidates, which improves recall but
increases query latency. Since vector index options cannot be
changed after creation, each experiment required dropping and
recreating the index. Here are the CQL statements used: --
Experiment #1: M=64, ef_construction=384, ef_search=192 CREATE
CUSTOM INDEX vdb_bench_collection_vector_idx ON
vdb_bench.vdb_bench_collection (vector) USING 'vector_index' WITH
OPTIONS = { 'search_beam_width': '192', 'construction_beam_width':
'384', 'maximum_node_connections': '64', 'similarity_function':
'COSINE' }; -- Experiment #2: M=32, ef_construction=256,
ef_search=128 CREATE CUSTOM INDEX vdb_bench_collection_vector_idx
ON vdb_bench.vdb_bench_collection (vector) USING 'vector_index'
WITH OPTIONS = { 'search_beam_width': '128',
'construction_beam_width': '256', 'maximum_node_connections': '32',
'similarity_function': 'COSINE' }; -- Experiment #3: M=24,
ef_construction=256, ef_search=64 CREATE CUSTOM INDEX
vdb_bench_collection_vector_idx ON vdb_bench.vdb_bench_collection
(vector) USING 'vector_index' WITH OPTIONS = { 'search_beam_width':
'64', 'construction_beam_width': '256', 'maximum_node_connections':
'24', 'similarity_function': 'COSINE' }; -- Experiment #4: M=20,
ef_construction=256, ef_search=48 CREATE CUSTOM INDEX
vdb_bench_collection_vector_idx ON vdb_bench.vdb_bench_collection
(vector) USING 'vector_index' WITH OPTIONS = { 'search_beam_width':
'48', 'construction_beam_width': '256', 'maximum_node_connections':
'20', 'similarity_function': 'COSINE' }; The benchmark was
run using VectorDBBench with
the upcoming ScyllaDB Python driver built on a Rust core (a dev
version is available at
python-rs-driver). VectorDBBench ramps concurrency from 1 to
150 concurrent search clients and measures QPS, P99 and average
latency at each level. A separate serial run of 1,000 queries
measures recall and nDCG against brute-force ground truth. Results
Peak QPS Comparison To start our analysis, let’s examine the
maximum throughput that each index configuration can sustain under
peak concurrency. When strictly looking at the highest throughput
achieved:
The bar chart highlights the dramatic impact of index parameters at
k=10: throughput rises sharply as the index becomes lighter. At
k=100, the differences are much smaller; all configurations cluster
between 2,300 and 3,000 QPS. QPS vs Concurrency The chart below
shows how each index configuration scales as concurrency ramps from
1 to 150 clients.
At k=10, the lighter configurations (Experiments
#3 and #4) scale nearly linearly up to 60–80 concurrent clients
before saturating. Experiment #4 demonstrates the benefit of a
leaner graph: it achieves 5.5X higher peak QPS
than Experiment #1 at k=10. At k=100, all
configurations converge to a narrower throughput band (2,300–3,025
QPS). This shows that retrieving 100 neighbors dominates the
per-query cost regardless of index parameters. P99 and Average
Latency vs Concurrency As expected, increasing throughput adds
queuing delay, and that leads to higher tail latencies.
<!-- Note: The original document had 6 images. The source note
lists the order as 4-1-2-3-5-6. The text contains 7 [image]
placeholders. Based on the document's structure, I will assume the
sixth placeholder corresponds to the last chart (Average Latency)
and omit the extra placeholder, as the source note only accounts
for six images. I will adjust the numbering below. The original
list was 4-1-2-3-5-6. I will use the final placeholder (Image 6
from source) here. The next section has another chart, so I will
add a seventh placeholder and mark it as 'Source Unknown'.
Lighter configurations start at dramatically lower baseline
latencies. Experiment #4 maintains sub-6 ms P99 latency up to 30
concurrent clients, while Experiment #1 starts above 13 ms, even at
concurrency 1. All configurations show latency rising
proportionally once throughput saturates. This is the expected
queuing behavior when the system is at capacity. QPS vs P99 Latency
(Pareto View) Plotting throughput directly against tail latency
provides a Pareto frontier of our benchmark configurations:
This view makes the operational trade-off easier to read than the
concurrency charts alone. At k=10, Experiments #3 and #4 push the
frontier outward, with much higher QPS at the same or lower tail
latency. At k=100, the frontier is tighter, which again shows that
returning more neighbors dominates the total cost per query. Recall
vs Peak QPS Finally, plotting recall helps select the optimal index
strategy based on business requirements:
This chart summarizes the core choice in a single picture: should
you spend compute on accuracy or throughput? Experiment #1 sits at
the high-recall end, Experiment #4 at the high-throughput end, and
Experiment #2 emerges as the practical middle ground for workloads
that need both. Scenario Analysis With the charts above as a visual
reference, let’s examine the three main usage scenarios that emerge
from the data. Scenario 1: Maximum Throughput Experiments #3 (M=24,
ef_search=64) and #4 (M=20, ef_search=48) target workloads where
throughput is the primary objective and moderate recall is
acceptable — for example, coarse candidate retrieval stages in
recommendation pipelines or semantic deduplication. At
k=10, Experiment #4 reached a peak of
12,840 QPS at concurrency 100, with a serial P99
latency of just 5.5 ms and recall of
92.0%. Experiment #3 achieved 9,719
QPS with marginally better recall at
95.0% and a serial P99 of 6.0 ms.
Even at k=100, these lightweight configurations
delivered competitive throughput: Experiment #3 peaked at
3,025 QPS (87.9% recall), which is comparable to
the heavier configurations. Retrieval of 100 neighbors per query
inherently requires more work, which limits the throughput range
across all configurations. Scenario 2: High Recall Experiment #1
(M=64, ef_search=192) prioritizes accuracy for applications that
cannot tolerate missed results (e.g., high-fidelity semantic
search, retrieval-augmented generation [RAG] pipelines, or
compliance-sensitive retrieval). At k=10, the
system delivered 99.2% recall and 99.1%
nDCG — essentially indistinguishable from exact
brute-force search. Peak QPS reached 2,324 with a
serial P99 latency of 14.6 ms. At
k=100, recall was 96.8% with
2,345 QPS and a serial P99 of 15.2
ms. The higher latency and lower throughput are a direct
consequence of the richer graph (64 connections per node) and wider
search beam (192 candidates), which evaluate substantially more
distance computations per query. Scenario 3: Balanced Experiment #2
(M=32, ef_search=128) takes the middle ground, offering strong
recall with significantly better throughput than the high-recall
configuration. At k=10, it achieved 97.7%
recall with 4,897 QPS — roughly double
the throughput of Experiment #1, with only a 1.5 percentage-point
recall reduction. The serial P99 was 8.7 ms. At
k=100, recall was 92.0% with
2,975 QPS and a serial P99 of 9.6
ms. This configuration represents a practical sweet spot
for many production deployments where both recall and throughput
matter. Summary Tables k=100 Metric #1
M=64 ef_s=192 #2 M=32 ef_s=128 #3
M=24 ef_s=64 Peak QPS 2,345 (c=150) 2,975
(c=40) 3,025 (c=40) QPS @ c=10 947 1,314 1,489
Serial P99 Latency 15.2 ms 9.6 ms 7.8 ms
P99 Latency @ c=1 15.5 ms 9.9 ms 8.1 ms
P99 Latency @ c=100 81.2 ms 49.9 ms 49.6 ms
Recall 96.8% 92.0% 87.9% nDCG
97.3% 93.1% 89.7% k=10 Metric #1 M=64
ef_s=192 #2 M=32 ef_s=128 #3 M=24
ef_s=64 #4 M=20 ef_s=48 Peak
QPS 2,324 (c=100) 4,897 (c=80) 9,719 (c=80) 12,840 (c=100)
QPS @ c=10 1,054 1,602 2,046 2,311 Serial
P99 Latency 14.6 ms 8.7 ms 6.0 ms 5.5 ms P99
Latency @ c=1 14.0 ms 8.5 ms 6.2 ms 5.5 ms P99
Latency @ c=100 81.0 ms 38.1 ms 18.0 ms 12.3 ms
Recall 99.2% 97.7% 95.0% 92.0%
nDCG 99.1% 97.6% 94.9% 92.0% Key Takeaways
k=10 vs k=100: At k=10, lighter index parameters
yield massive throughput gains (up to 5.5X) with modest recall
loss. At k=100, all configurations converge to a narrow QPS band
(~1.3X range) because retrieving more neighbors dominates per-query
cost. Recall trade-offs are favorable: At k=10,
recall drops only 7.2 pp (99.2% to 92.0%) for a 5.5X QPS increase.
At k=100, the trade-off is steeper: 8.9 pp for just 1.3X gain.
Latency tracks index weight: Serial P99 drops from
14.6 ms to 5.5 ms at k=10, and from 15.2 ms to 7.8 ms at k=100, as
lighter graphs require fewer distance computations.
Saturation points differ: Experiments #1–#3
plateau around c=40–80; Experiment #4 scales further to c=100
before saturating, reflecting its lower per-query compute cost.
Conclusion These results show that ScyllaDB Vector Search delivers
strong performance even on a compact, five-node cluster with 10
million 768-dimensional vectors. A pair of r7g.2xlarge search nodes
provides enough memory to hold the full HNSW index at f32 precision
– without requiring any quantization. The three storage nodes with
replication factor 3, combined with vector search nodes distributed
across availability zones, also provide high availability. The
system is designed to tolerate node failures without data loss or
service interruption. Depending on the index configuration, the
system can prioritize near-perfect recall (99.2% at k=10) or
maximize throughput (12,840 QPS at k=10 with 92% recall), with
practical balanced options in between. This 10M scenario represents
the accessible end of the scale. For workloads that push into
hundreds of millions or billions of vectors, quantization,
additional search nodes and larger instances extend the same
architecture. See the ScyllaDB
1-billion-vector benchmark for results at extreme scale, and
look for our upcoming 100-million-vector benchmark
post. At K=10, the performance bottleneck resides within the vector
index nodes, leaving ScyllaDB with significant headroom. This means
you can likely add a Vector Search index to your cluster and
continue running a similar workload on your existing ScyllaDB
infrastructure – without needing to scale your database
nodes. The full Jupyter notebook with interactive charts and all
data is available
in this repository. Ready to try it yourself? Follow the
ScyllaDB Vector Search Quick Start Guide to get started. ScyllaDB X Cloud: Your Questions Answered
A technical FAQ on ScyllaDB X Cloud: architecture, autoscaling, compression, use cases, and more It’s been a few months since ScyllaDB X Cloud landed. In case you missed the news, here’s a quick recap… ScyllaDB X Cloud is the next generation of ScyllaDB’s fully-managed database-as-a-service. It’s a truly elastic database designed to support variable/unpredictable workloads with consistent low latency as well as low costs. Users can scale out and scale in almost instantly to match actual usage. For example, you can scale all the way from 100K OPS to 2M OPS in just minutes, with consistent single-digit millisecond P99 latency. This means you don’t need to overprovision for the worst-case scenario or suffer the lag traditionally associated with ramping up capacity in response to a sudden surge. Some key features (all covered in Introducing ScyllaDB X Cloud: A (Mostly) Technical Overview): Tablets + just-in-time autoscaling Up to 90% storage utilization Support for mixed size clusters File-based streaming Dictionary-based compression Flex credit Here’s a look at ScyllaDB X Cloud in action: Not surprisingly, users have been quite curious about all these changes and new options. So we thought we’d collect some of the most common questions here, along with our answers. In no particular order… What are the key differences between a “standard” ScyllaDB Cloud database and “ScyllaDB X Cloud”? Compared to a standard ScyllaDB Cloud database, ScyllaDB X Cloud provides two major advantages: Faster scaling in and out. Higher storage utilization (90% vs. 70%). The above advantages are the result of two technical updates: X Cloud always uses Tablets, while standard databases can use a mix of vNode and Tablets keyspaces. X Cloud enables mixed sized clusters, so you can define more granular cluster and storage sizes. In which cases should you choose a “standard” ScyllaDB Cloud Database vs X Cloud? None! We’ve reached full parity now. Materialized views, CDC, Alternator (DynamoDB API), even counters – it’s all supported. Can I migrate from one type of ScyllaDB Cloud database to the other? Yes. If you are using a standard database with Tablets only, you can migrate this database to X Cloud. If you are using vNode keyspaces, you cannot (yet). How does X Cloud achieve higher storage utilization? Two factors enable higher storage utilization: Faster scaling removes the need to over-reserve storage space (or “sandbag”) while waiting for the cluster to expand Support for mixed instance sizes allows for more granular cluster size How can I start an X Cloud cluster? Simply choose the “X Cloud” Cluster Type on ScyllaDB Cloud’s Launch Cluster page. How can I set the scaling policy? Can I change it later, while the database is in production? (UI/API) The scaling policy is part of the X Cloud cluster properties. You can either set it when launching the cluster or update it later. The policy is optional. It defines the minimum required resources for your database in terms of vCPU and Storage. If you’re not sure how to set it, you can keep the default minimum values (zero) as is. The cluster will scale automatically if and when storage is approaching the threshold, and you can scale the vCPU as required by your workload. Note that the parameters affect each other since more storage may require more compute power. How are X Cloud and Tablets related? X Cloud takes advantage of (and depends on) Tablets to achieve faster scale and higher storage utilization. That means all Keyspaces in X Cloud must use Tablets, which is already the default for ScyllaDB Cloud. How can X Cloud help reduce database costs? There are a few ways that X Cloud reduces cost. The primary factor is the extreme elasticity. You can scale the cluster in and out, even multiple times per day, to meet the demand. If you cannot reliably plan the cluster usage, you can reserve a minimal deployment and pay for bursts using Flex Credit. The higher storage utilization means you use less cloud resources. Improved compression, both on the wire and at rest, reduces cost further. What’s a good use case for ScyllaDB X Cloud? Am I a good candidate for ScyllaDB X Cloud? New (greenfield) workloads should use X Cloud. Workloads that require frequent scaling out/in will benefit the most. For example: A workload with significant fluctuation throughout the day (e.g., peak hours during the evening). A workload with expected high demand on specific days of the year (e.g., Super Bowl, IPL games, or Black Friday). With X Cloud, scaling can be done days in advance. You don’t need to do it one or more weeks ahead. Difficult-to-predict workloads, with common (but volatile) bursts. How many times per day can X Cloud scale? As often as required. Although new nodes start serving requests very fast, it still takes time for the data balancing to be complete if you’re working with rather large nodes. Does X Cloud support multi-DC (region) deployment? Does each region scale independently? X Cloud does not yet support multi-datacenter deployment. Multi-DC support is coming with the 2026.2 release. Scaling Policy: I asked for storage of Y TB and got a bigger cluster with storage of W TB…why? Same for vCPU? vCPU, RAM, and Storage are not independent variables. ScyllaDB will allocate each of these 3 variables to support the required value of the other two. For example, higher storage requires more RAM – which requires more vCPU. The policy UI reflects the expected deployment per each resource selection. Can I suspend / resume the dynamic scaling? Currently: no. Can I restore a backup from X Cloud database to a standard database and vice versa? Yes, you can. Is X Cloud production ready? Absolutely, customers are already using it in production. Why should I care about advanced compression? What is the advantage of having it? ScyllaDB already supported compression before X Cloud – including at-rest and in-transit. However, dictionary-based compression is much more effective in reducing data overhead. By compressing data further, you save on disk space utilization (combined with up to 90% disk space utilization) as well as inter-AZ networking for data replication and high availability. X Cloud claims faster scaling. How fast is it really? The legacy vNode-based architecture imposed some limitations: Nodes could only be added one at a time, even across DCs. Data was replicated in rows – that is, rows were being transferred over the wire. A node only started serving requests after its streaming was fully completed. This process could easily take hours, if not days, to complete on large clusters. Now, X Cloud leverages tablets to remove those limits: Nodes can be added in parallel, multiple nodes at a time, including across DCs. Nodes join the cluster instantly, then start streaming data later. Streaming under Tablets relies on file-based streaming, transferring gigabytes of data per second in a very efficient process. As Tablet transfers complete, nodes start to serve requests immediately; this increases as more transfers complete, until the cluster rebalancing is completed. This allows X Cloud to scale to an unlimited number of nodes at a single step – and streaming data is made super efficient by file-based streaming. A cluster can go from 100K ops per second to 2M ops per second in a matter of a few minutes, not hours or days. Can I use Vector Search with X Cloud? Yes, you can! Enable the Vector Search option at the bottom of the Launch Cluster page and choose the Vector Search instances. Note that Vector Search index nodes scale independently from ScyllaDB nodes. You can learn more about Vector Search here.6 Reasons ScyllaDB Costs a Fraction of DynamoDB
Why teams typically experience 50% (or greater) cost reductions when moving from DynamoDB to ScyllaDB DynamoDB is expensive at scale. Some of that cost is fundamental to the managed service model. But much of it is the pricing model, the way DynamoDB charges per read, per write, per byte, and per region. ScyllaDB rethinks pricing from first principles. The result: teams typically see more than 50% cost reductions on equivalent workloads. In this post, I’ll share a few reasons why. Cheap writes DynamoDB charges 5x more for writes than reads. Write a 1 KB item and it costs 5 write capacity units. Read the same 1 KB item and it costs 0.25 read capacity units. ScyllaDB pricing is based on provisioned cluster capacity (nodes), not per operation. Whether you do 10K writes/sec or 100K writes/sec on a 3-node cluster, the ScyllaDB cost remains the same. Write-heavy workloads for AI, real-time analytics, logging, time-series data and IoT sensors often see the biggest savings. Take a look at our AI Feature Store example. A batch workload scenario with overnight peaks approximately 3x the daytime average on DynamoDB will cost $2.2M/year. The same workload on ScyllaDB would cost $145K/year. In other words, that’s at least 15x savings just switching to ScyllaDB. No need for a separate cache DynamoDB’s baseline latency is in the 10-20ms range. For many applications, that’s unacceptable. In those cases, teams commonly deploy DAX, Redis, or Memcached on top. That adds cost, complexity, and another service to operate and monitor. ScyllaDB was built for low latency. Internal caching and a shard-per-core architecture deliver sub-millisecond latencies on reads. For most workloads, an external cache is unnecessary. Let’s look at a retail example with a read-heavy workload that is cached and running on demand. On DynamoDB running with DAX, that workload would cost $1.6M/year. The same workload on ScyllaDB would cost $271K/year (and even less if you switch to a hybrid plan). That’s at least 6x cheaper using ScyllaDB. Plus: there are fewer moving parts, simpler operations, and no cache coherency headaches. Affordable multi-region data centers DynamoDB Global Tables charge replicated writes (rWCUs) at a premium: roughly 2x the cost of normal writes. Moreover, cross-region data transfer incurs AWS’s standard rates: $0.02-0.09/GB. For a workload doing 10K writes/sec with 5 KB payloads across 2 regions, data transfer alone can add $10K+/month. A social media scenario modeled across 3 regions on DynamoDB would cost $11.0M/year. The huge cost is partly because the write capacity cannot be reserved, and you effectively pay twice for the writes. The same workload on ScyllaDB would cost $591K/year. That’s a monstrous +$10M/year saving by switching to ScyllaDB. ScyllaDB handles multi-DC replication natively. You provision nodes in each data center, and replication is built into the protocol along with shard-aware and rack-aware drivers. This helps minimize network overhead and avoids the per-operation premium. You pay for the cluster nodes; replication comes with the territory. Large items don’t cost more In DynamoDB, a 1 KB write costs 1 WCU, and a 10KB write costs 10 WCUs. Item size directly drives billing. This incentivizes shrinking payloads, compressing data, and splitting tables. Architectural decisions are driven by cost, not design. A simple on-demand scenario with DynamoDB using 3 KB item sizes would cost $633K/year. ScyllaDB would cost $39K/year. Along with multi-region, item size remains one of the biggest cost levers to pull when looking for savings on DynamoDB. ScyllaDB billing is independent of item size. Store 1 KB items or 100 KB items and the cluster cost is unchanged. You architect around performance and correctness, not billing thresholds. Making multi-tenancy work for you DynamoDB is multi-tenant infrastructure. That’s how AWS achieves efficiency. But it also means: You pay for provisioned capacity AWS oversubscribes hardware Idle capacity benefits AWS, not you You pay for the full machine, but AWS shares it with everyone else. Multi-tenant infrastructure reduces cost for AWS but increases risk for users. Large DynamoDB outages (like us-east-1) impact thousands of customers simultaneously. When shared infrastructure fails, the blast radius is enormous. ScyllaDB flips that model. You get a dedicated cluster, which gives you: Isolation by design The ability to run multiple workloads The option to share idle capacity internally This is especially powerful for: Multi-tenant SaaS Microservices Multiple environments (dev/staging/prod) Instead of provisioning 100 tables separately, you provision one cluster and use it fully. You control your infrastructure. AWS monetizes multi-tenancy. ScyllaDB lets you monetize it. Flexible and predictable pricing DynamoDB is excellent for certain use cases: serverless applications with unpredictable spikes, multi-tenant services that need table-level isolation, and teams that prioritize operational simplicity over cost. But if you’re running a predictable, scale-intensive workload – especially one that’s write-heavy, multi-region, or stores large items – then DynamoDB’s per-operation pricing model becomes a massive cost driver. ScyllaDB’s node-based, cluster-centric model is fundamentally more cost-efficient for these scenarios. Combined with its performance and operational features, it’s why teams see more than 50% cost reductions. Want to see the actual numbers for your workload? Use the ScyllaDB Cost Calculator at calculator.scylladb.com to model a comparison between your current DynamoDB spend and equivalent ScyllaDB infrastructure.Apache Cassandra® 6 Accord transactions: What you need to know
There have always been architectural trade-offs when considering a distributed database like Apache Cassandra versus a relational database. Cassandra excels at linear horizontal scalability, multi-region replication, and fault-tolerant uptime that relational systems couldn’t match. This comes at the expense of general-purpose ACID (Atomicity, Consistency, Isolation, Durability) transactions which allows the ability to express complex, multi-row operations with guaranteed consistency.
With Cassandra 6 on its way to general availability status (and an alpha already released), we’re approaching a turning point where we can revisit whether these trade-offs will still exist. The latest version delivers general-purpose ACID transactions through a new protocol called Accord. With Cassandra 6, those transactional guarantees will be native, without compromising Cassandra’s operational model or availability.
TransactionsIn database parlance, a transaction says, “These operations belong together. They must all be applied, or none of them.” The classic example is a bank transfer. When you move money from one account to another, two things must happen: a debit and a credit. If the debit succeeds but the credit fails, money has disappeared. A transaction prevents this issue by guaranteeing the two operations are atomic, meaning they succeed or fail as a unit; combined with isolation, no other process can see an immediate or half-finished state.
Experiences like these depend on transactional guarantees at the data layer, which rely on ACID semantics, particularly atomicity and isolation, to prevent inconsistent intermediate states.
For most developers who have worked with relational databases, transactions are so fundamental they’re almost invisible. For Cassandra users, comparable guarantees across multiple partitions or tables historically required significant application-level coordination or weren’t natively supported.
Coordination at scale is fundamentally hardBecause Cassandra is designed to deal with data replication and scaling, coordinating atomic changes across multiple nodes is inherently challenging (e.g., decrement a balance here, increment one there). All participating replicas must agree on an order of operations. Distributed consensus protocols exist to solve exactly this, but prior approaches came with trade-offs.
Raft and Zab are examples of protocols that use leaders, which is not suitable for Cassandra since nodes are treated equally.
More information about prior solutions can be found in more details in CEP-15, but generally, leader-based approaches pose issues at scale.
The Accord protocolThe Accord protocol, proposed in CEP-15, is built to achieve fast, general-purpose distributed transactions that remain stable under the same failure conditions Cassandra already tolerates— with no elected leaders.
How it orders transactionsAccord is leaderless so any node can coordinate any transaction. Transactions are assigned unique timestamps using hybrid logical clocks, where each node appends its own unique ID to its clock value to ensure global uniqueness across the cluster. Conflicting transactions execute in timestamp order across all replicas. Under normal conditions, a transaction reaches consensus in a single round trip.
The reorder bufferThe challenge with timestamp-based ordering in a geo-distributed system is that two transactions started concurrently from different regions might arrive at replicas in different orders, breaking fast-path consensus. Accord solves this by having replicas buffer incoming transactions. The wait time is precisely bounded to be just long enough to account for clock differences between nodes and network latency, and no longer. This guarantees that replicas always process transactions in the correct order without needing extra message rounds.
Fast-path electoratesWhen replicas fail, other leaderless protocols fall back to slower, more expensive message patterns. Accord avoids this by dynamically adjusting which replicas participate in fast-path decisions as failures occur. The result is that Accord maintains fast-path availability under failure, avoiding the degradation to slower message patterns that other leaderless protocols experience.
The net effect: strict serializable isolation across multiple partitions and tables, in a single round trip, with no leaders, and preserving performance characteristics under the same minority‑failure conditions that Cassandra is designed to tolerate.
New CQL syntax to support transactionsThe most visible change for developers is new CQL syntax.
Transactions in Cassandra 6 are wrapped in BEGIN
TRANSACTION and COMMIT TRANSACTION blocks,
similar to SQL syntax.
Let’s examine a flight booking transaction that must simultaneously reserve a seat and deduct loyalty miles from two separate tables. Note: Cassandra 6 is pre-release. Syntax shown reflects the current alpha and may evolve before general availability.
BEGIN TRANSACTION LET seat = (SELECT available FROM flight_seats WHERE flight_id = 'ZZ101' AND seat_number = '14C'); LET miles = (SELECT balance FROM loyalty_accounts WHERE member_id = 'M-7823'); IF seat.available = true AND miles.balance >= 25000 THEN UPDATE flight_seats SET available = false, booked_by = 'M-7823' WHERE flight_id = 'ZZ101' AND seat_number = '14C'; UPDATE loyalty_accounts SET balance = miles.balance - 25000 WHERE member_id = 'M-7823'; END IF COMMIT TRANSACTION ;
Everything between BEGIN TRANSACTION and
COMMIT TRANSACTION executes atomically with strict
serializable isolation from the perspective of all other concurrent
transactions. The LET clause reads current values from
the database and binds them to variables. The IF block uses those
values to guard the writes. If the seat is already taken or the
member doesn’t have enough miles, nothing happens. Both updates
either apply together or not at all, across two different tables
and two different partition keys.
This is logic that previously had to live in the application, complete with retry handling, race condition guards, and compensating operations if something failed halfway through. Now it lives in the database.
Enabling Accord in Cassandra 6: The CMS dependencyWe can’t talk about Accord without discussing Cluster Metadata Service (CMS). Before Accord transactions are functional, Cluster Metadata Service (CMS), introduced alongside Accord as CEP-21, must be enabled. For teams upgrading from Cassandra 5, this is the most significant operational change in the release.
CMS is required. Accord needs every replica to have the same authoritative view of cluster topology showing which nodes own which data, and which replicas participate in a given transaction. Before Cassandra 6, this information was propagated via the eventually consistent Gossip Protocol. This is suitable for normal reads and writes, but Accord’s correctness depends on knowing precisely who the transaction participants are before committing. CMS replaces Gossip-based metadata propagation with a distributed, linearized transaction log, giving all nodes a consistent view of cluster state. Without it, Accord’s guarantees don’t hold.
Upgrading from Cassandra 5 to 6—plan carefullyThe upgrade cannot begin until every node in the cluster is running Cassandra 6. CMS initialization requires full cluster agreement; no mixed-version clusters are supported. Before upgrading, disable any automation that could trigger schema changes, node bootstrapping, decommissions, or replacements. These operations are blocked during the upgrade window, and if they fire on an older node before CMS is initialized, the migration can fail in ways that require manual intervention to recover.
Once all nodes are upgraded, run nodetool cms
initialize on one node to activate CMS. This creates the
service with a single member, which is enough to unblock metadata
operations but is not suitable for production. Follow up
immediately with nodetool cms reconfigure to add more
members. CMS uses Paxos internally and requires a minimum of three
nodes for a viable quorum, with more recommended for production
depending on cluster size.
Important: CMS initialization is not easily reversible. Plan the upgrade window accordingly and treat it as a one-way operational step.
On a fresh Cassandra 6 cluster that wasn’t migrated from a previous version, CMS is automatically enabled. First, one node is designated as the initial CMS member. From there, CMS membership scales automatically based on cluster size, with the service adding members as the cluster grows without requiring manual intervention.
Of course, for Instaclustr users, our platform and techops team will take care of most of this for you and walk you through any requirements on your side when the time comes to upgrade.
Coexistence with Lightweight Transactions (LWT)Existing LWT syntax (IF NOT EXISTS, IF
EXISTS, conditional UPDATE/INSERT statements)
continues to work and fundamentally differs from Accord
transactions as LWT is scoped to a single partition and is
extremely limited. Accord doesn’t replace or break existing
applications. Using BEGIN TRANSACTION/END TRANSACTION
is how developers opt into the broader cross-partition
guarantees.
Every prior approach to distributed transactions required accepting one of three constraints: a global leader (single point of failure, WAN latency penalty), limited to single-partition scope (LWT), or degraded performance under failure (prior leaderless protocols). The Accord paper’s central claim is that these constraints are not fundamental. They are artifacts of specific protocol design choices.
By combining flexible fast-path electorates with a timestamp reorder buffer on top of a leaderless execution model, Accord achieves:
- True cross-partition atomicity across multiple tables and partition keys
- Strict serializable isolation with formally proven correctness
- Single round-trip latency under normal operating conditions
- Failure‑tolerant steady‑state performance, avoiding the systematic degradation seen in earlier leaderless protocols
- No elected leaders, consistent with Cassandra’s existing operational model
This opens workloads that were previously natively incompatible with Cassandra: financial transaction processing, distributed inventory reservation, multi-step workflow coordination, and any application where ‘commit these changes together or not at all’ is a strict correctness requirement.
Looking aheadThough the Accord protocol is still maturing, the fundamental capability is finally here. We now have general-purpose, leaderless, multi-partition ACID transactions natively in Apache Cassandra.
The historically difficult problem of achieving strict serializable isolation in a geo-distributed system without compromising fault tolerance now has a proven, working answer.
For Cassandra users, this raises an exciting question: which workloads have you been routing to relational databases specifically because they needed transactional guarantees? It is time to reevaluate.
Stay tuned for a preview release of Cassandra 6 on the Instaclustr Platform and get ready to experience the power of ACID transactions on Cassandra for yourself!
The post Apache Cassandra® 6 Accord transactions: What you need to know appeared first on Instaclustr.
4 DynamoDB Configuration Changes for Significant Cost Savings
Learn about ways to cut DynamoDB costs with minimal code changes, zero migration, and no architectural upheaval If you’re running DynamoDB at scale, your bill might be tens of thousands of dollars higher than it needs to be. However, most teams don’t need a complete migration or architecture overhaul to save significantly. These configuration changes, all easily implemented, can reduce your costs by 50-80%. This guide covers the biggest wins for DynamoDB cost optimization, with the real math behind each recommendation. We will be sharing links to the ScyllaDB Cost Calculator at calculator.scylladb.com, which lets you model different workload scenarios with customized parameters and compare ScyllaDB pricing to DynamoDB pricing at the click of a button. Switch from on-demand to provisioned + reserved capacity This is the single biggest DynamoDB cost lever for most teams. On-demand capacity is convenient at first, with no planning required and just pay-as-you-go. But it’s also expensive. After AWS’s recent price reduction, on-demand costs 7.5x more than provisioned capacity. Before the drop, it was roughly 15x. Either way, the math is brutal. Let’s look at a simple example: a mid-sized workload running 10,000 reads/sec and 10,000 writes/sec, 24/7. On-Demand: ~$239K/year Provisioned: ~$71K/year Reserved: ~$34K/year That’s a 7x difference between on-demand and reserved. Even if your workload isn’t perfectly predictable, reserved capacity often pays for itself within months. The trade-off here is that you need a predictable load and the financial flexibility to commit. If your traffic varies wildly (or you’re short-term focused) provisioned mode without reservation is the middle ground. Still, it’s 3.3x cheaper than on-demand. Optimize item sizes DynamoDB’s billing is granular: writes are charged per 1KB of item size, and reads per 4KB. This means a 1.1KB item costs the same as a 2KB item on writes. If your items are consistently over these thresholds by a small margin, you’re paying 2-3x more than necessary. Let’s look at the same simple example, but with increasing item size for comparison. On-Demand with 1KB items: ~$239K/year On-Demand with 10KB items: ~$2M/year On-Demand with 100KB items: ~$20M/year Common culprits for higher DynamoDB costs here: Nested JSON with whitespace or redundant fields Variable-length strings with no trimming Metadata or audit fields added to every item Base64-encoded payloads What should you do? Compress JSON payloads before storage, remove redundant attributes, move infrequently accessed data to a separate table, or use a columnar storage strategy. Trimming just 200 bytes per item – across millions of items and thousands of writes/sec – adds up to thousands per month. Deploy DAX (DynamoDB Accelerator) for read-heavy workloads If your workload skews heavily toward reads and you’re not using an in-memory cache layer yet, DAX is one of the highest ROI moves you can make. DAX sits in front of DynamoDB and caches frequently accessed items in memory. Cache hits bypass DynamoDB entirely, meaning you avoid the RCU charge. For hot items queried thousands of times per minute, a single DAX cluster can reduce DynamoDB read capacity needs. Let’s look at another simple example: a read-heavy workload running 80,000 reads/sec and 1,000 writes/sec, 24/7. On-Demand: ~$335K/year On-Demand with DAX: ~$158K/year The cost math: a medium sized DAX cluster (3 nodes, cache.r5g.8xlarge) costs roughly $9K/month. A high hit rate on your cache will proportionally reduce your more expensive read costs. That can lead to potentially hundreds of thousands of dollars saved with DynamoDB. Bonus: DAX also improves latency dramatically. Cache hits respond in microseconds rather than milliseconds. Use the DynamoDB Infrequent Access (IA) table class Not all tables are created equal. If you have tables where data is accessed rarely but storage is high (think audit logs, historical records, compliance archives, or cold lookup tables), then the Standard-IA table class can save you substantially on storage. The pricing difference: Standard class: $0.25/GB Standard-IA class: $0.10/GB (up to 60% savings) The catch is that IA has a minimum item size of 100 bytes and a minimum billing duration. It’s designed for cold data. So, if you’re frequently scanning or querying these tables, IA isn’t the right fit (read costs are identical, but you lose the write discount). However, for true archive tables accessed only occasionally, it’s a no-brainer. The bottom line These four DynamoDB changes require minimal code changes, zero migration, and no architectural upheaval. They’re configuration changes, caching tweaks, and data optimization. Combined, they typically deliver massive cost reductions. Start with switching to provisioned + reserved (highest impact), then layer in the others based on your workload shape. Ready to model your savings? Use the ScyllaDB Cost Calculator at calculator.scylladb.com to compare your current DynamoDB costs against these optimizations. And to save even more, see how ScyllaDB compares.Shrinking the Search: Introducing ScyllaDB Vector Quantization
Learn how ScyllaDB Vector Quantization shrinks your vector index memory by up to 30x for cost-efficient, real-time AI applications Earlier this year, ScyllaDB launched integrated Vector Search, delivering sub-2ms P99 latencies for billion-vector datasets. However, high-dimensional vectors are notoriously memory-hungry. To help with memory efficiency, ScyllaDB recently introduced Vector Quantization. This allows you to shrink the index memory footprint for storing vectors by up to 30x (excluding index structure) without sacrificing the real-time performance ScyllaDB is known for. What is Quantization? To understand how we compress massive AI datasets, let’s look to the fundamentals of computer science. As Sam Rose explains in the ngrok blog on quantization, computers store numbers in bits, and representing high-precision decimal numbers (floating point) requires a significant number of them. Standard vectors use 32-bit floating point (f32) precision, where each dimension takes 4 bytes. Quantization is the process of compromising on this “floating point precision” to save space. By sacrificing some significant figures of accuracy, we can represent vectors as smaller 16-bit floats or even 8-bit or 1-bit integers. As Sam notes, while this results in a “precision compromise,” modern AI models are remarkably robust to this loss of information. They often maintain high quality even when compressed significantly. The Trade-off: Memory vs. Accuracy In ScyllaDB 2026.1, quantization is an index-only feature. The original source data remains at full precision in storage, while the in-memory HNSW index is compressed. This allows you to choose the level of “information loss” you are willing to accept for a given memory budget: Level Bytes/Dim Memory Savings Best For f32 (default) 4 1x (None) Small datasets, highest possible recall. f16 / bf16 2 ~2x Good balance of accuracy and memory. i8 1 ~4x Large datasets with moderate recall loss. b1 0.125 ~32x Maximum savings for massive datasets. CRITICAL NOTE: Quantization only compresses the vector data itself. The HNSW graph structure (the “neighbor lists” that make search fast) remains uncompressed to ensure query performance. Because of this fixed graph overhead, an i8 index typically provides a total memory reduction of ~3x rather than a raw 4x. Calculating Your Memory Needs To size your ScyllaDB Vector Search cluster effectively, be sure to consider both vector data and graph overhead. The total memory required for a vector index can be estimated with this formula: Memory ≈ N * (D * B + m * 16) * 1.2 N: Total number of vectors. D: Dimensions (e.g., 768 or 1536). B: Bytes per dimension based on quantization level (f32=4, i8=1, b1=0.125). m: Maximum connections per node (default 16). 1.2: 20% operational headroom for system processes and query handling. Example: 10 Million OpenAI Embeddings (768 Dimensions) Using this formula, let’s see how quantization affects your choice of AWS EC2 instances on ScyllaDB Cloud (which primarily utilizes the r7g Graviton and r7i Intel families): f32 (No Quantization): Requires ~40 GB RAM. You would need an r7g.2xlarge (64 GB) to ensure headroom. i8 Quantization: Requires ~12 GB RAM. You can comfortably drop to an r7g.xlarge (32 GB). b1 (1-bit): Requires ~4 GB RAM. This fits on a tiny r7g.medium (8 GB). By moving from f32 to i8, you can drop 2-3 instance tiers. This gets you significant cost savings. Improving Accuracy with Oversampling and Rescoring To mitigate the accuracy loss from quantization, ScyllaDB provides two complementary mechanisms. Oversampling retrieves a larger candidate set during the initial index search, increasing the chance that the true nearest neighbors are included. When a client requests the top K vectors, the algorithm retrieves ceiling(K * oversampling) candidates, sorts them by distance, and returns only the top K. A larger candidate pool means better recall without any extra round-trips to the application. Even without quantization, setting oversampling above 1.0 can improve recall on high-dimensionality datasets. Rescoring is a second-pass operation that recalculates distances using the original full-precision vectors stored in ScyllaDB, then re-ranks candidates before returning results. Because it must fetch and recompute exact distances for every candidate, rescoring can reduce search throughput by roughly 2x – so enable it only when high recall is critical. Note that rescoring is only beneficial when quantization is enabled; for unquantized indexes (default f32), the index already contains full-precision data, making the rescoring pass redundant. Both features are configured as index options when creating a vector index:CREATE CUSTOM INDEX ON myapp.comments(comment_vector)
USING 'vector_index' WITH OPTIONS = { 'similarity_function':
'COSINE', 'quantization': 'i8', 'oversampling': '5.0', 'rescoring':
'true' }; When (and When Not) to Use Quantization
Use quantization when: You are managing millions
or billions of vectors and need to control costs. You are
memory-constrained but can tolerate a small drop in recall. You are
using high-dimensional vectors (≥ 768), where the savings are most
pronounced. Avoid quantization when: You have a
small dataset where memory is not a bottleneck. Highest possible
recall is your only priority. Your application cannot afford the
~2x throughput reduction that comes with
rescoring—the process of recalculating exact
distances using the original f32 data to improve accuracy. Choosing
the Right Configuration for Your Scenario Here are some guidelines
to help you select the right configuration:
Scenario Recommendation Small
dataset, high recall required Use default f32 — no quantization
needed. Large dataset, memory-constrained Use i8 or f16 with
oversampling of 3.0–10.0. Add rescoring: true only if very high
recall is required. Very large dataset, approximate results
acceptable Use b1 for maximum memory savings. Enable oversampling
to compensate for accuracy loss. High-dimensionality vectors (≥
768) Consider oversampling > 1.0 even with f32 to improve
recall. Try ScyllaDB Vector Search Now Quantization is just one
part of the
ScyllaDB 2026.1 release, which also includes
Filtering,
Similarity Values, and
Real-Time Ingestion. With these tools, you can build
production-grade RAG applications that are both blazingly fast and
cost-efficient. Vector Search is available in ScyllaDB Cloud.
Get Started: Check out the
Quick Start Guide to Vector Search in ScyllaDB Cloud.
Deep Dive: Read our past posts on
building a Movie Recommendation App or our
1-billion vector benchmark. Documentation:
View the full ScyllaDB
Cloud Vector Search Documentation. Try ScyllaDB Cloud for free
today and see how quantization can supercharge your AI
infrastructure. The Great Stream Fix: Interleaving Writes in Seastar with AI-Powered Invariants Tracing
How we used AI-assisted invariant-based testing to locate and resolve tricky hidden bugs with complex state transitions Seastar is a high-performanceC++ framework for writing asynchronous server
applications. It powers projects like ScyllaDB and Redpanda. One of its core rules is
simple but strict: no blocking allowed. Every operation that could
take time (e.g., reading from disk, writing to a socket, waiting
for a lock) must be expressed asynchronously by returning a future
that resolves when the work is completed. This makes Seastar
applications extremely efficient on modern hardware. However, it
also means that even seemingly mundane things, like writing data to
a stream, require careful thought about ownership, lifetimes, and
buffering. Moreover Seastar’s output stream has always
experienced a limitation: the inability to freely mix small,
buffered writes with large, zero-copy chunks. It was something that
developers avoided and tolerated – but we always considered it
something worth improving … someday. Fixing this requires a deep
dive into complex state transitions, which inherently creates a
high risk for introducing sequencing bugs. A standard coding
approach won’t work; the task requires a way to trace the system’s
state across millions of test cases. This post describes the
process of using AI-assisted invariant-based testing to try to
locate and resolve these tricky hidden bugs. TL;DR What could have
been an extremely complicated fix ultimately was actually
surprisingly smooth and effective. Output streams Output
stream is Seastar’s output byte flow abstraction. It’s used
wherever data needs to go out of an application. For example, it’s
used for disk files, network connections, and stackable virtual
streams that transform data on the fly (such as compression or
encryption layers sitting on top of another stream). Whatever the
underlying sink is, the output stream presents a
uniform interface to the caller. It gives callers two ways to push
data through: Buffered writes: Copy bytes into an
internal buffer; flush when the buffer fills up or
when explicitly requested. Zero-copy writes: Hand
over memory buffers directly; the stream passes it to the sink
without copying a single byte of the buffer data.
Zero-copy is important for large blobs since we want
to avoid copying megabytes of data. Buffered writes
are important for building up small pieces efficiently. In a real
application, it’s natural to interleave both: write a small header
into the buffer, then attach a large payload as a zero-copy
buffer, then write a small trailer. There is also a
trim_to_size stream option. When enabled, the stream
guarantees that no chunk delivered to the underlying sink exceeds
the configured stream buffer size. This matters for
sinks that have an upper limit on how much data they can accept in
a single call – certain network APIs, for instance, or aligned disk
I/O. Without it, a larger buffer can pass through as-is. The
Problem Until recently, mixing the two write modes was not
supported. Internally, buffered and zero-copy writes used two
different storages: internal buffer for the former
data, and dedicated container for the latter. There
was no clean way to append buffered bytes onto the tail of pending
zero-copy data while preserving ordering. The code simply asserted
that the zero-copy container was empty whenever a
buffered write arrived and vice-versa. The nearby code
comment, however, stated that mixing writes was not supported
yet – so the intention to fix it had always been there.
The goal of the work described here was to make it happen. Start
with the Tests We figured we should build a solid test foundation
before touching the implementation. We had some pre-existing tests
for output streams, but they were really just a
collection of ad-hoc cases (specific input sequences with hardcoded
expected outputs). This was fine for catching regressions but not
great for systematically exploring the large space of possible
inputs against drastic code changes. The new approach was
invariant-based testing. Rather than checking exact output
sequences, the tests need to verify that certain properties always
hold, regardless of input. Specifically, we wanted to check that:
All written bytes arrive at the sink, in order, with no corruption.
Every chunk delivered to the sink (except the last) must be at
least stream_size bytes with no undersized non-last
chunks. With the trimming option enabled, all outgoing chunks must
be exactly stream_size bytes. With these invariants
defined, the test iterates over all combinations of chunk sizes (1
byte through 3x times the stream_size bytes) and all
assignments of write type (buffered or zero-copy) to
each chunk. For n chunks ,that’s 2^n type
patterns plus trimming option giving about 1.6 million combinations
in total. The ad-hoc tests were then removed – the invariant test
subsumed them. One practical issue: 1.6 million cases ran fast in a
regular build (~5 seconds), but under sanitizers
(ASan, UBSan) it ballooned to over two
minutes. Given the whole seastar test suite runs for
several minutes, this new timing had to be improved somehow. The
fix was to turn an exhaustive test into a fuzzy one: in debug
builds, shuffle all 2^n masks, always keep the
all-buffered and all-zero-copy patterns, and sample ~10% of the
rest. That brought sanitizer runs down to less than twenty seconds.
Implementing the Fix With tests in place, the implementation work
began. The key challenge was making the internal
buffer and zero-copy container interoperate
cleanly. Two transitions required handling: Buffered → zero-copy
Zero-copy → buffered Buffered → zero-copy When a
zero-copy write arrives and there’s buffered data.
That data needs to be folded into the zero-copy
container so that ordering is preserved. The naive approach
– trim buffer to its filled length and move it into container –
works, but it wastes the rest of the buffer
allocation. Instead, the filled buffer prefix is shared into
the container as a view or sub-span, and the buffer itself is
advanced past it, thus sharing the underlying memory. This way, the
tail of the original allocation is still available for
future buffered writes after the zero-copy
sequence. No reallocation is needed on the mode switch. This
tail – trimmer buffer, pointing at unused capacity within the
original allocation – is what we call the
remnant. It is a new concept introduced by
this change. Before mixed-mode writes were supported, the buffer
was always either full (and flushed) or freshly allocated. The
remnant is an optimization. But (as will become clear
shortly), its existence also introduced several subtle failure
modes that took time to track down. Zero-copy →
buffered When a buffered write arrives and
the zero-copy container is non-empty, the new data can
just be appended to the internal buffer regularly. A
new zero-copy arrival, if it happens, would keep the
buffer ordering as described above. The flushing path was also
updated to handle the case where buffered and zero-copy
data are both available at that time. The buffered
tail is folded into the zero-copy vector before
forwarding both downstream. Bugs Found Along the Way Once the
implementation was written, the tests were run. They failed – many
times, in different ways. To facilitate debugging, each test
failure logged the exact stream configuration and the write
sequence that triggered the problem, for example:
stream_size=5 trim_to_size=0 input_chunks=[3, 2]
write_types=[buffered, zero_copy] This made the diagnosis
unusually direct. Given a specific failing case, we could trace the
execution step by step – mentally simulating the stream, tracking
the bytes accumulated in the buffer and zero-copy container
state at each point, and comparing that against what the
state should be at that moment. The stream’s logic is not
huge, but it has enough branches and interacting state variables to
make it excessively time- and effort-consuming for a human. So, we
turned to AI to hopefully save time and effort. Each time a test
failed, instead of walking the input_stream state
step-by-step, we prompted the AI with the exact test-case
description string, the code itself, and the request to find where
the problem could be. The AI then started “thinking” and its
mind-flow looked as if it was simulating the
input_stream code execution step-by-step, like a
developer would do, just much faster. This turned out to be an
incredibly fast way to find where the invariant breaks. In many
cases, it could correctly analyze the desired state the stream had
to happen in. Sometimes, it even suggested excellent ways to fix
the problem that was found. Identifying the root cause is often
only half the work; understanding which fix is correct, safe, and
doesn’t introduce new assumptions that will break later is the
other half. AI found the problems for us and was quite helpful in
helping us fix them. For example, one of the first debugging
sessions with AI looked like this… Developer:
The test fails like that fatal error: in
"test_splitting_invariants": data integrity check failed:
stream_size=5 trim_to_size=0 input_chunks=[1, 1, 1, 4]
write_types=[zero_copy, buffered, buffered, buffered]
Please, analyze
AI: Let me trace through this case carefully.
stream_size=5, trim_to_size=false, writes:
[zc(1), buf(1), buf(1), buf(4)] After
zc(1): _zc_bufs = [tb(1)], _zc_len = 1, _end =
0 _zc_len < _size, nothing flushed
After buf(1): Goes to write(const char*,
1) fast path — but !_buf, so falls to
slow_write _end=0, n=1 < _size=5, so
skips the !_end && n >= _size branch Allocates
_buf, copies 1 byte, _end = 1
After buf(1): Fast path: fits in _buf, _end
= 2 After buf(4): _end=2, n=4, n >
_size - _end = 3, so goes to slow_write
_end=2, so skips the !_end && n >=
_size branch now = min(4, 5-2) = 3, copies 3
bytes, _end = 5 now != n (3
!= 4), so allocates next with
max(4-3, 5) = 5 bytes, copies 1 byte into it
_zc_bufs is non-empty, so enters the new branch: seals
_buf.share(0, 5) into _zc_bufs
_zc_len += 5 → _zc_len = 6 _zc_len += n - now =
1 → _zc_len = 7 pushes next into
_zc_bufs trim_to_size=false → calls
zero_copy_put(_zc_bufs) So zero_copy_put
sends [tb(1), tb(5), tb(5)] — but next
only has 1 byte of valid data in it, yet it’s a 5-byte allocation!
We’re pushing the full _size-sized next
buffer but only n - now = 1 bytes were written into
it. The buffer isn’t trimmed before being pushed. The fix: trim
next to n - now bytes before pushing it
into _zc_bufs when _zc_bufs is non-empty.
Shall I go ahead and fix it? The root-case was obvious and the
suggested fix was simple and correct. So, we accepted the
suggestion and the debugging session went on. We went through
several rounds like that, with each step producing more lengthy
analysis and more sophisticated fix proposals. At some point, a
test with a small buffered write followed by a
zero-copy write showed data corruption downstream.
Tracing the execution revealed that after trimming the
internal buffer past the filled region, the
trimmed buffer still appeared non-null
even though it had zero usable capacity remaining. The
fast-path check for buffer availability wasn’t
prepared for that and didn’t trigger reallocation on
the next write. As a result, bytes were written into unowned
memory. Another case involved a buffered write code
branch which handles writes larger than
stream_size by chopping them into chunks. After a mode
switch, the internal buffer might become a small
remnant – smaller than the tail chunk the
code in question was trying to store there. Tracing through the
code, we saw that it was the place where the capacity
check wasn’t prepared to meet with the remnant
buffer. It silently assumed that the internal
buffer always had room for a full-sized tail. The result was
an ASan-detected heap overflow. A more
subtle issue arose around the same remnant buffer in a
different scenario. When buffered write chopping code
encounters a tail chunk that is smaller than the
stream_size, but larger than the
remnant's remaining capacity, it has to make a choice.
It could either fill the remnant partially and
asynchronously put it before allocating a fresh buffer for the
rest, or simply abandon the remnant and allocate a
fresh full buffer. The first option is more space-efficient, but
would require an async flushing inside what is
otherwise a synchronous setup step, significantly complicating the
code. The second option wastes the unused bytes of the
remnant's allocation – but crucially, it doesn’t leak
them. The remnant shares its underlying allocation
with the sealed buffer already in the zero-copy
container, so the memory is freed once that buffer is
flushed and all references to the allocation are dropped. The
deliberate trade-off – wasted but not leaked – was worth making,
and a comment in the code explains the reasoning for whoever reads
it next. Each bug effectively had the same shape: a subtle
assumption about stream state that held in the
original single-mode code silently broke in mixed-mode scenarios.
The invariant test exposed the bugs by providing a
minimal reproducible case and a clear description of which
invariant was violated. Plus, it also made each one straightforward
to reason about and fix. The Result The work touches tests and
implementation in roughly equal measure, which feels about right
for a change like this. The test suite grew from a handful of
hand-crafted cases into an exhaustive invariant-based
framework that covers all combinations of chunk sizes and
write types – something that would have been impractical to write
by hand. On the implementation side, the long-standing restriction
on mixed-mode writes is gone. Buffered and
zero-copy writes can now be freely interleaved in any
order, with the stream handling the transitions internally. This
preserves ordering and the chunk-size invariants that
sinks depend on. In general, writing a test that covers as many
possible situations as possible and then making sure that the code
passes those tests is a very good approach. It makes sure the end
code is correct. In rare cases when the test covers all
possible situations the code may have to deal with, we can say that
“the code is officially bug free.” Making AI facilitate testing
turned out to be the best decision made in this work. Given the
amount of test cases and the number of possible combinations of
input_stream inner states, debugging each failing test
case would be a nightmare for the developer. The Hidden Insanity of DynamoDB Pricing
Learn how to navigate some of the sneakiest aspects of DynamoDB pricing DynamoDB’s pricing model has some head-scratching quirks that slyly inflate bills by hundreds of thousands of dollars per year. Most of these aren’t malicious; they’re just design decisions from 2012 that made sense at the time, but became increasingly absurd at scale. This post walks through four of the most egregious examples and the real cost impact on teams running large workloads. Cost per item size is punitive DynamoDB charges you for writes per 1KB chunk and reads per 4KB chunk. This means: 1KB write = 1 WCU 1.1KB write = 2 WCUs (you’re charged for 2KB, but only used 1.1KB) 1.5KB write = 2 WCUs 2.1KB write = 3 WCUs Every byte over a threshold doubles your cost for that operation. It’s a tax on items that don’t fit neatly into the billing boundary. And almost nothing fits neatly: JSON payloads with nested objects, variable-length strings, metadata, timestamps… Most real-world items end up hitting those boundaries, so you risk paying 2x or more for the overage. Consider a team logging 100M events per day, averaging 1.2KB each. That’s ~120M writes, almost all hitting the 2KB billing threshold. They’re paying for 200M KB instead of 120M KB. That’s a 67% surcharge baked into every bill. If their write cost is $10,000/month, that surcharge alone is ~$6,700/month in wasted capacity. On demand comes at a premium On-demand pricing was introduced as a convenience layer for unpredictable workloads. It saves teams the pain of provisioning and forecasting (“just pay for what you use”). The trade-off is that pricing is steep. Even after AWS’s recent price cut (it used to be ~15x!), on-demand is 7.5x more expensive than provisioned capacity. For a team that starts on on-demand and never switches, the cost difference is catastrophic. For example, say a SaaS company launches a new product on DynamoDB; they start with on-demand for convenience and quickly scale to 20K reads/sec and 20K writes/sec. On-demand now costs $39K/month. Switching to provisioned would drop that down to $11K/month. And teams often don’t switch because ‘it works’ or ‘the bill surprise hasn’t happened yet.’ The convenience tax on DynamoDB is insane. Even if you wanted to retain that flexibility, ScyllaDB would cost $3K/month for on-demand or just $1K/month with a hybrid subscription + flex component. Multi-region network costs are deceptive Global Tables already charge replicated writes (rWCUs) at a premium. But there’s a second hidden cost too: data transfer. AWS charges for cross-region data transfer at standard EC2 rates: $0.02/GB to adjacent regions, up to $0.09/GB to distant regions. As a result, Global Tables end up costing 2-3x more than expected. These hidden network costs often don’t appear as a line item on your DynamoDB bill. They’re rolled up into ‘Data Transfer’ charges. Many teams don’t notice or attribute it correctly. ScyllaDB can’t escape the variable costs of cross-region data transfer that AWS enforces. However, we have a number of cost reduction mechanisms that assist with these costs. ScyllaDB handles multi-DC replication natively. You provision nodes in each data center, and replication is built into the protocol. There are also shard-aware and rack-aware drivers, which help minimize network overhead. Add network compression, and your cross-region data costs get even lower. Reserved capacity requires you to predict capacity Reserved capacity offers massive discounts, up to 70% off. But there’s a catch: you must commit for 1 or 3 years upfront, and you must predict your read and write throughput independently. This is absurdly difficult. Your workload changes: new features launch, old features get deprecated, customer behavior shifts, and traffic patterns evolve. Predicting the exact read/write ratio years out is impossible. Teams either over-commit (wasting money on unused capacity) or under-commit (paying on-demand rates for the overage). Example: You commit to 200K reads/sec and 500K writes/sec for 1 year. On DynamoDB, that is going to cost $1.4M/year for the upfront and annual commitment. But six months into the year, growth exceeds your capacity estimates and your application starts having requests throttled. You revert to autoscaling a mixture of reserved plus on-demand. Now, you’re paying the 7.5x markup – and that costly misjudgment is locked in for the remainder of the year. The solution? Over-commit to hedge your bets. This guarantees you’re wasting money on overprovisioning, just to avoid even higher on-demand charges. It’s a no-win scenario. Compare this to ScyllaDB with a hybrid subscription + flex component that automatically scales to your requirements throughout the year, which might cost $133K/year to start with. Radically less expensive and more flexible (on both compute and storage requirements) thanks to true elastic scaling with X Cloud. Why does this matter? These four pricing quirks aren’t hypothetical. Combined, they add tens of thousands to six figures per year to bills across the industry. They’re especially brutal for write-heavy workloads, multi-region systems, and large items. And because they’re partially hidden, buried in separate line items, masked by the per-operation model, or justified by architectural constraints… Teams often don’t realize how much they’re paying. Some of this is inevitable with a fully managed service. But databases built on different cost models can deliver the same durability, consistency, flexibility and scale at a fraction of the price. For example, this is the case with ScyllaDB, which charges by the node and includes replication and large items at no extra cost. Curious what your workload actually costs? Use the ScyllaDB DynamoDB Cost Calculator at calculator.scylladb.com to model your real costs, including all the hidden charges, and see how ScyllaDB pricing stacks up.Powering a Billion Dreams: Scaling Meesho’s E-commerce Platform
How ScyllaDB plays a critical role in handling Meesho’s millions of transactions – optimizing our catalog rankings and ensuring ultra-low-latency operations With over a billion Indians set to shop online, Meesho is redefining e-commerce by making it accessible, affordable, and inclusive at an unprecedented scale. But scaling for Bharat isn’t just about growth—it’s about building a tech backbone that can handle massive traffic surges, dynamic pricing, real-time recommendations, and seamless user experiences. Let me take you behind the scenes of Meesho’s journey to democratize e-commerce while operating at monster scale. We’ll cover how ScyllaDB plays a critical role in handling Meesho’s millions of transactions – optimizing our catalog rankings and ensuring ultra-low-latency operations. Note: Adarsha Das from Meesho will be presenting a keynote at the upcoming Monster Scale Summit India/APAC. That talk is on BharatMLStack, an open-source, end-to-end machine learning infrastructure stack built at Meesho to support real-time and batch ML workloads at Bharat scale. Join Monster Scale Summit India/APAC — it’s free and virtual About Meesho In case you’re not familiar with Meesho, we’re an Indian e-commerce platform. The company was founded in 2015 to connect small and medium enterprises in India. Meesho helps consumers from these areas access products from all over India, beyond their local markets. Meesho focuses on bringing affordable product selections to Tier 2 cities and smaller markets. The company operates with a zero-commission model that reduces barriers for sellers. We function as an asset-light marketplace that connects sellers, logistics partners, and consumers. We make the listing process quite simple. Sellers just need to take a picture of the product, upload it, set the price, and start selling. Why Personalization is Essential for Meesho Meesho’s architecture aims to support people who are new to e-commerce. Tech-savvy users from Tier 1 cities likely know how to use search, tweak keywords, and find what they want. But someone from a Tier 2 city, new to e-commerce, needs discovery to be simpler. That’s why we invested in a lot of tech to build personalized experiences on the app. Specifically, we invested significantly in AI and personalization technologies to create intuitive app experiences. We personalize all the way from the moment the app opens to order completion. For example, different users see different homepages and product selections based on their preferences and purchase history. We also personalize for sellers, helping them create product descriptions that make sense to their buyers. Real-Time Feed-First Personalization Meesho meets these needs with a fundamentally feed-first app. We create a tailored product feed, ranking products based on preferences and actions (searches, clicks, etc). To do this, we built a CTR (click-through rate) prediction model to decide what product tiles to show each user, and in what order. Two people logging in will see different selections based on their behavior. Given all this, Meesho had to move from traditional recommendation systems to real-time, highly personalized experiences. Batch processing wasn’t sufficient; our personalization must respond instantly to recent user actions. That requires low-latency databases and systems at scale, with the ability to support millions of sellers and users on the app simultaneously. Why ScyllaDB We experimented with a few different databases and data stores: SQL, NoSQL, columnar, and non-columnar. Some worked at certain scales. But as we kept growing, we had to reinvent our storage strategy. Then we discovered ScyllaDB, which met our needs and proved itself at Meesho scale. More specifically, ScyllaDB provided… Horizontal Scaling Given the ever-increasing scale of Meesho – where user transactions kept increasing and users kept growing over years – horizontal scalability was very important to us. Today, I might be running with X nodes. If that becomes 2X tomorrow, how do you scale in a live manner? Being a low-cost e-commerce platform, we are conscious about server spend, so we try to emulate traffic patterns by dynamically scaling up and down based on demand. For example, not all 24 hours have the same number of orders; there are peaks and lows. We want to provision for baseline load and auto-scale for demand without downtime, since the cost of downtime for a business like ours is very high. Downtime can result in user churn and loss of trust, so we prioritize reliability and availability above all. Moreover, we expect that adding new nodes will linearly increase throughput. For example, if I run an X-node cluster and add nodes, I should get a proportionate throughput increase. This is critical as we scale up or down. We observed that in distributed systems with a primary-secondary configuration, the primary can become a bottleneck. So, we wanted a peer-to-peer architecture like ScyllaDB’s, where each node can service writes as well as reads. ScyllaDB gives us linear scalability. Low-Level Optimizations for Efficiency The database’s efficiency is also a factor for us. A major challenge we saw in JVM- or Java-based systems was garbage collection and related overheads. These impact performance, interrupt scaling, and limit hardware utilization. That’s why we prefer C++-based or other low-level language implementations, with minimal JVM or garbage collection issues, and minimal memory overhead. Most of our use cases require low-latency, real-time personalization, where every bit of memory is used for application logic and data, not overhead. Smart Architecture and Fault Tolerance Having a smart, fault-tolerant architecture was another consideration. Much of our user base is in Tier 3 and 4 cities, where network connectivity is sometimes flaky. We want to provide a Tier 1 user experience to Tier 4 users, so low latency is critical. We prioritize keeping latency within a few milliseconds. One of ScyllaDB’s key features is token-aware routing. When a query comes, it goes directly to the node with relevant data – reducing network hops since each node acts as its own master. This is the kind of distributed architecture we were looking for, and the token-level routing helps with horizontal scalability. Reliability and fault tolerance are also major requirements. When running on a public cloud, a big pain point is a particular zone going down. We’ve seen cloud regions and zones go down before. To minimize impact, we look for automatic data replication across zones and seamless failover in case of failures, so that user impact is minimal. Building trust with first-time e-commerce users is hard. If we lose it, getting them back is even harder. That’s why this capability is critical. Operational Simplicity Another thing we wanted is operational simplicity—having a system where adding or removing nodes is as simple as running a script or clicking a button. We like having an engine where we don’t need to tune everything ourselves. Results So Far We’ve been using ScyllaDB to power very low-latency systems at high throughput, for both reads and writes. We started with small workflows, scaled to platform workflows like ML platform and experimentation performance, and continue to scale. It’s been a good journey so far, and we’re looking forward to using it for more use cases.Instaclustr product update: March 2026
Here’s a roundup of the latest features and updates that we’ve recently released.
If you have any particular feature requests or enhancement ideas that you would like to see, please get in touch with us.
Major announcements Introducing AI Cluster Health: Smarter monitoring made simpleTurn complex metrics into clear, actionable insights with AI-powered health indicators—now available in the NetApp Instaclustr console. The new AI Cluster Health page simplifies cluster monitoring, making it easy to understand your cluster’s state at a glance without requiring deep technical expertise. This AI-driven analysis reviews recent metrics, highlights key indicators, explains their impact, and assigns an easy traffic-light health score for a quick status overview.
NetApp introduces Apache Kafka® Tiered Storage support on GCP in public previewTiered Storage is now available in public review for Instaclustr for Apache Kafka on Google Cloud Platform. This feature enables customers to optimize storage costs and improve scalability by offloading older Kafka log segments from local disk to Google Cloud Storage (GCS), while keeping active data local for fast access. Kafka clients continue to consume data seamlessly with no changes required, allowing teams to reduce infrastructure costs, simplify cluster scaling, and extend retention periods for analytics or compliance.
Other significant changes Apache Cassandra®- Released Apache Cassandra 4.0.19 and 5.0.6 into general availability on the NetApp Instaclustr Managed Platform, giving customers access to the latest stability, performance, and security improvements.
- Multi–data center Apache Cassandra clusters can now be provisioned across public and private networks via the Instaclustr Console, API, and Terraform provider, enabling customers to provision multi-DC clusters from day one.
- Single CA feature is now available for Apache Cassandra clusters. See Single CA for more details.
- GCP n4 machine types are now supported for Apache Cassandra in the available regions.
- Apache Kafka and Kafka Connect 4.1.1 are now generally available.
- Added Client Telemetry feature support for Kafka in private preview.
- Single CA feature is now available for Apache Kafka clusters. See Single CA for more details.
- ClickHouse 25.8.11 has been added to our managed platform in General Availability.
- Enabled ClickHouse
system.session_logtable that plays a key role in tracking session lifecycle and auditing user activities for enhanced session monitoring. This helps you with troubleshooting client-side connectivity issues and provides insights into failed connections.
- OpenSearch 2.19.4 and 3.3.2 have been released to general availability.
- Added support for the OpenSearch Assistant feature in OpenSearch Dashboards for clusters with Dashboards and AI Search enabled.
- PostgreSQL version 18.1 has now been released to general availability, alongside PostgreSQL version 17.7.
- PgBouncer version 1.25.0 has now been released to general availability.
- Added self-service Tags Management feature—allowing users to add, edit, or delete tags for their clusters directly through the Instaclustr console, APIs, or Terraform provider for RIYOA deployments
- Added new region Germany West Central for Azure
- Following the private preview release, Kafka’s Client Telemetry feature is progressing toward general availability soon. Read more here.
- We plan to extend the current ClickHouse integration with FSxN data sources by adding support for deployments across different VPCs, enabling broader enterprise lakehouse architectures.
- Apache Iceberg and Delta Lake integration are planned to soon be available for ClickHouse on the NetApp Instaclustr Platform, giving you a practical way to run analytics on open table formats while keeping control of your existing data platforms.
- We plan to soon introduce fully integrated AWS PrivateLink as a ClickHouse Add-On for secure and seamless connectivity with ClickHouse.
- We’re aiming to launch PostgreSQL integrated with FSx for NetApp ONTAP (FSxN) along with NVMe support into general availability soon. This enhancement is designed to combine enterprise-grade PostgreSQL with FSxN’s scalable, cost-efficient storage, enabling customers to optimize infrastructure costs while improving performance and flexibility. NVMe support is designed to deliver up to 20% greater throughput vs NFS.
- An AI Search plugin for OpenSearch is being released to GA (currently in public preview) to enhance search experiences using AI‑powered techniques such as semantic, hybrid, and conversational search, enabling more relevant, context‑aware results and unlocking new use cases including retrieval‑augmented generation (RAG) and AI‑driven chatbots.
- Following the public preview release, Zero Inbound Access is progressing to General Availability, designed to deliver the most secure management connectivity by eliminating inbound internet exposure and removing the need for any routable public IP addresses, including bastion or gateway instances.
- Explore how to freeze your streaming data for long-term
analytical queries in the future with our two-part blog series:
Freezing streaming data into Apache Iceberg
—Part 1: Using Apache Kafka®Connect Iceberg Sink Connector
introduces Apache Iceberg and demonstrates streaming Kafka data
using the Apache Kafka Connect Iceberg Sink Connector and
Freezing streaming data into Apache Iceberg
—Part 2: Using Iceberg Topics examines the experimental
approach of using Kafka Tiered Storage and Iceberg Topics, where
non‑active Kafka segments are copied to remote storage while
remaining transparently readable by Kafka clients. - Modern search applications go beyond simple keyword matching, requiring a deep understanding of user intent and context to deliver relevant, meaningful results. From keywords to concepts: How OpenSearch® AI search outperforms traditional search explores how semantic and hybrid search methods in OpenSearch AI search compare to traditional keyword search, and how you can use these capabilities for more relevant results.
- Generative AI and Large Language Models (LLMs) are booming, and they’ve put a spotlight on a crucial technology: vector search. Many applications today demand high throughput, low latency, and constant availability for retrieving information. Slow vector search can become a significant bottleneck, delaying responses and degrading the user experience. Our two-part series blogs Vector search benchmarking: Setting up embeddings, insertion, and retrieval with PostgreSQL® and Vector search benchmarking: Embeddings, insertion, and searching documents with ClickHouse® and Apache Cassandra® explore hands-on findings from our benchmarking projects, the role of databases in vector search, how to set up vector search for embeddings, insertion, and retrieval, and practical strategies for building faster, more efficient semantic search systems.
If you have any questions or need further assistance with these enhancements to the Instaclustr Managed Platform, please contact us.
SAFE HARBOR STATEMENT: Any unreleased services or features referenced in this blog are not currently available and may not be made generally available on time or at all, as may be determined in NetApp’s sole discretion. Any such referenced services or features do not represent promises to deliver, commitments, or obligations of NetApp and may not be incorporated into any contract. Customers should make their purchase decisions based upon services and features that are currently generally available.
The post Instaclustr product update: March 2026 appeared first on Instaclustr.