6 Reasons ScyllaDB Costs a Fraction of DynamoDB
Why teams typically experience 50% (or greater) cost reductions when moving from DynamoDB to ScyllaDB DynamoDB is expensive at scale. Some of that cost is fundamental to the managed service model. But much of it is the pricing model, the way DynamoDB charges per read, per write, per byte, and per region. ScyllaDB rethinks pricing from first principles. The result: teams typically see more than 50% cost reductions on equivalent workloads. In this post, I’ll share a few reasons why. Cheap writes DynamoDB charges 5x more for writes than reads. Write a 1 KB item and it costs 5 write capacity units. Read the same 1 KB item and it costs 0.25 read capacity units. ScyllaDB pricing is based on provisioned cluster capacity (nodes), not per operation. Whether you do 10K writes/sec or 100K writes/sec on a 3-node cluster, the ScyllaDB cost remains the same. Write-heavy workloads for AI, real-time analytics, logging, time-series data and IoT sensors often see the biggest savings. Take a look at our AI Feature Store example. A batch workload scenario with overnight peaks approximately 3x the daytime average on DynamoDB will cost $2.2M/year. The same workload on ScyllaDB would cost $145K/year. In other words, that’s at least 15x savings just switching to ScyllaDB. No need for a separate cache DynamoDB’s baseline latency is in the 10-20ms range. For many applications, that’s unacceptable. In those cases, teams commonly deploy DAX, Redis, or Memcached on top. That adds cost, complexity, and another service to operate and monitor. ScyllaDB was built for low latency. Internal caching and a shard-per-core architecture deliver sub-millisecond latencies on reads. For most workloads, an external cache is unnecessary. Let’s look at a retail example with a read-heavy workload that is cached and running on demand. On DynamoDB running with DAX, that workload would cost $1.6M/year. The same workload on ScyllaDB would cost $271K/year (and even less if you switch to a hybrid plan). That’s at least 6x cheaper using ScyllaDB. Plus: there are fewer moving parts, simpler operations, and no cache coherency headaches. Affordable multi-region data centers DynamoDB Global Tables charge replicated writes (rWCUs) at a premium: roughly 2x the cost of normal writes. Moreover, cross-region data transfer incurs AWS’s standard rates: $0.02-0.09/GB. For a workload doing 10K writes/sec with 5 KB payloads across 2 regions, data transfer alone can add $10K+/month. A social media scenario modeled across 3 regions on DynamoDB would cost $11.0M/year. The huge cost is partly because the write capacity cannot be reserved, and you effectively pay twice for the writes. The same workload on ScyllaDB would cost $591K/year. That’s a monstrous +$10M/year saving by switching to ScyllaDB. ScyllaDB handles multi-DC replication natively. You provision nodes in each data center, and replication is built into the protocol along with shard-aware and rack-aware drivers. This helps minimize network overhead and avoids the per-operation premium. You pay for the cluster nodes; replication comes with the territory. Large items don’t cost more In DynamoDB, a 1 KB write costs 1 WCU, and a 10KB write costs 10 WCUs. Item size directly drives billing. This incentivizes shrinking payloads, compressing data, and splitting tables. Architectural decisions are driven by cost, not design. A simple on-demand scenario with DynamoDB using 3 KB item sizes would cost $633K/year. ScyllaDB would cost $39K/year. Along with multi-region, item size remains one of the biggest cost levers to pull when looking for savings on DynamoDB. ScyllaDB billing is independent of item size. Store 1 KB items or 100 KB items and the cluster cost is unchanged. You architect around performance and correctness, not billing thresholds. Making multi-tenancy work for you DynamoDB is multi-tenant infrastructure. That’s how AWS achieves efficiency. But it also means: You pay for provisioned capacity AWS oversubscribes hardware Idle capacity benefits AWS, not you You pay for the full machine, but AWS shares it with everyone else. Multi-tenant infrastructure reduces cost for AWS but increases risk for users. Large DynamoDB outages (like us-east-1) impact thousands of customers simultaneously. When shared infrastructure fails, the blast radius is enormous. ScyllaDB flips that model. You get a dedicated cluster, which gives you: Isolation by design The ability to run multiple workloads The option to share idle capacity internally This is especially powerful for: Multi-tenant SaaS Microservices Multiple environments (dev/staging/prod) Instead of provisioning 100 tables separately, you provision one cluster and use it fully. You control your infrastructure. AWS monetizes multi-tenancy. ScyllaDB lets you monetize it. Flexible and predictable pricing DynamoDB is excellent for certain use cases: serverless applications with unpredictable spikes, multi-tenant services that need table-level isolation, and teams that prioritize operational simplicity over cost. But if you’re running a predictable, scale-intensive workload – especially one that’s write-heavy, multi-region, or stores large items – then DynamoDB’s per-operation pricing model becomes a massive cost driver. ScyllaDB’s node-based, cluster-centric model is fundamentally more cost-efficient for these scenarios. Combined with its performance and operational features, it’s why teams see more than 50% cost reductions. Want to see the actual numbers for your workload? Use the ScyllaDB Cost Calculator at calculator.scylladb.com to model a comparison between your current DynamoDB spend and equivalent ScyllaDB infrastructure.Apache Cassandra® 6 Accord transactions: What you need to know
There have always been architectural trade-offs when considering a distributed database like Apache Cassandra versus a relational database. Cassandra excels at linear horizontal scalability, multi-region replication, and fault-tolerant uptime that relational systems couldn’t match. This comes at the expense of general-purpose ACID (Atomicity, Consistency, Isolation, Durability) transactions which allows the ability to express complex, multi-row operations with guaranteed consistency.
With Cassandra 6 on its way to general availability status (and an alpha already released), we’re approaching a turning point where we can revisit whether these trade-offs will still exist. The latest version delivers general-purpose ACID transactions through a new protocol called Accord. With Cassandra 6, those transactional guarantees will be native, without compromising Cassandra’s operational model or availability.
TransactionsIn database parlance, a transaction says, “These operations belong together. They must all be applied, or none of them.” The classic example is a bank transfer. When you move money from one account to another, two things must happen: a debit and a credit. If the debit succeeds but the credit fails, money has disappeared. A transaction prevents this issue by guaranteeing the two operations are atomic, meaning they succeed or fail as a unit; combined with isolation, no other process can see an immediate or half-finished state.
Experiences like these depend on transactional guarantees at the data layer, which rely on ACID semantics, particularly atomicity and isolation, to prevent inconsistent intermediate states.
For most developers who have worked with relational databases, transactions are so fundamental they’re almost invisible. For Cassandra users, comparable guarantees across multiple partitions or tables historically required significant application-level coordination or weren’t natively supported.
Coordination at scale is fundamentally hardBecause Cassandra is designed to deal with data replication and scaling, coordinating atomic changes across multiple nodes is inherently challenging (e.g., decrement a balance here, increment one there). All participating replicas must agree on an order of operations. Distributed consensus protocols exist to solve exactly this, but prior approaches came with trade-offs.
Raft and Zab are examples of protocols that use leaders, which is not suitable for Cassandra since nodes are treated equally.
More information about prior solutions can be found in more details in CEP-15, but generally, leader-based approaches pose issues at scale.
The Accord protocolThe Accord protocol, proposed in CEP-15, is built to achieve fast, general-purpose distributed transactions that remain stable under the same failure conditions Cassandra already tolerates— with no elected leaders.
How it orders transactionsAccord is leaderless so any node can coordinate any transaction. Transactions are assigned unique timestamps using hybrid logical clocks, where each node appends its own unique ID to its clock value to ensure global uniqueness across the cluster. Conflicting transactions execute in timestamp order across all replicas. Under normal conditions, a transaction reaches consensus in a single round trip.
The reorder bufferThe challenge with timestamp-based ordering in a geo-distributed system is that two transactions started concurrently from different regions might arrive at replicas in different orders, breaking fast-path consensus. Accord solves this by having replicas buffer incoming transactions. The wait time is precisely bounded to be just long enough to account for clock differences between nodes and network latency, and no longer. This guarantees that replicas always process transactions in the correct order without needing extra message rounds.
Fast-path electoratesWhen replicas fail, other leaderless protocols fall back to slower, more expensive message patterns. Accord avoids this by dynamically adjusting which replicas participate in fast-path decisions as failures occur. The result is that Accord maintains fast-path availability under failure, avoiding the degradation to slower message patterns that other leaderless protocols experience.
The net effect: strict serializable isolation across multiple partitions and tables, in a single round trip, with no leaders, and preserving performance characteristics under the same minority‑failure conditions that Cassandra is designed to tolerate.
New CQL syntax to support transactionsThe most visible change for developers is new CQL syntax.
Transactions in Cassandra 6 are wrapped in BEGIN
TRANSACTION and COMMIT TRANSACTION blocks,
similar to SQL syntax.
Let’s examine a flight booking transaction that must simultaneously reserve a seat and deduct loyalty miles from two separate tables. Note: Cassandra 6 is pre-release. Syntax shown reflects the current alpha and may evolve before general availability.
BEGIN TRANSACTION LET seat = (SELECT available FROM flight_seats WHERE flight_id = 'ZZ101' AND seat_number = '14C'); LET miles = (SELECT balance FROM loyalty_accounts WHERE member_id = 'M-7823'); IF seat.available = true AND miles.balance >= 25000 THEN UPDATE flight_seats SET available = false, booked_by = 'M-7823' WHERE flight_id = 'ZZ101' AND seat_number = '14C'; UPDATE loyalty_accounts SET balance = miles.balance - 25000 WHERE member_id = 'M-7823'; END IF COMMIT TRANSACTION ;
Everything between BEGIN TRANSACTION and
COMMIT TRANSACTION executes atomically with strict
serializable isolation from the perspective of all other concurrent
transactions. The LET clause reads current values from
the database and binds them to variables. The IF block uses those
values to guard the writes. If the seat is already taken or the
member doesn’t have enough miles, nothing happens. Both updates
either apply together or not at all, across two different tables
and two different partition keys.
This is logic that previously had to live in the application, complete with retry handling, race condition guards, and compensating operations if something failed halfway through. Now it lives in the database.
Enabling Accord in Cassandra 6: The CMS dependencyWe can’t talk about Accord without discussing Cluster Metadata Service (CMS). Before Accord transactions are functional, Cluster Metadata Service (CMS), introduced alongside Accord as CEP-21, must be enabled. For teams upgrading from Cassandra 5, this is the most significant operational change in the release.
CMS is required. Accord needs every replica to have the same authoritative view of cluster topology showing which nodes own which data, and which replicas participate in a given transaction. Before Cassandra 6, this information was propagated via the eventually consistent Gossip Protocol. This is suitable for normal reads and writes, but Accord’s correctness depends on knowing precisely who the transaction participants are before committing. CMS replaces Gossip-based metadata propagation with a distributed, linearized transaction log, giving all nodes a consistent view of cluster state. Without it, Accord’s guarantees don’t hold.
Upgrading from Cassandra 5 to 6—plan carefullyThe upgrade cannot begin until every node in the cluster is running Cassandra 6. CMS initialization requires full cluster agreement; no mixed-version clusters are supported. Before upgrading, disable any automation that could trigger schema changes, node bootstrapping, decommissions, or replacements. These operations are blocked during the upgrade window, and if they fire on an older node before CMS is initialized, the migration can fail in ways that require manual intervention to recover.
Once all nodes are upgraded, run nodetool cms
initialize on one node to activate CMS. This creates the
service with a single member, which is enough to unblock metadata
operations but is not suitable for production. Follow up
immediately with nodetool cms reconfigure to add more
members. CMS uses Paxos internally and requires a minimum of three
nodes for a viable quorum, with more recommended for production
depending on cluster size.
Important: CMS initialization is not easily reversible. Plan the upgrade window accordingly and treat it as a one-way operational step.
On a fresh Cassandra 6 cluster that wasn’t migrated from a previous version, CMS is automatically enabled. First, one node is designated as the initial CMS member. From there, CMS membership scales automatically based on cluster size, with the service adding members as the cluster grows without requiring manual intervention.
Of course, for Instaclustr users, our platform and techops team will take care of most of this for you and walk you through any requirements on your side when the time comes to upgrade.
Coexistence with Lightweight Transactions (LWT)Existing LWT syntax (IF NOT EXISTS, IF
EXISTS, conditional UPDATE/INSERT statements)
continues to work and fundamentally differs from Accord
transactions as LWT is scoped to a single partition and is
extremely limited. Accord doesn’t replace or break existing
applications. Using BEGIN TRANSACTION/END TRANSACTION
is how developers opt into the broader cross-partition
guarantees.
Every prior approach to distributed transactions required accepting one of three constraints: a global leader (single point of failure, WAN latency penalty), limited to single-partition scope (LWT), or degraded performance under failure (prior leaderless protocols). The Accord paper’s central claim is that these constraints are not fundamental. They are artifacts of specific protocol design choices.
By combining flexible fast-path electorates with a timestamp reorder buffer on top of a leaderless execution model, Accord achieves:
- True cross-partition atomicity across multiple tables and partition keys
- Strict serializable isolation with formally proven correctness
- Single round-trip latency under normal operating conditions
- Failure‑tolerant steady‑state performance, avoiding the systematic degradation seen in earlier leaderless protocols
- No elected leaders, consistent with Cassandra’s existing operational model
This opens workloads that were previously natively incompatible with Cassandra: financial transaction processing, distributed inventory reservation, multi-step workflow coordination, and any application where ‘commit these changes together or not at all’ is a strict correctness requirement.
Looking aheadThough the Accord protocol is still maturing, the fundamental capability is finally here. We now have general-purpose, leaderless, multi-partition ACID transactions natively in Apache Cassandra.
The historically difficult problem of achieving strict serializable isolation in a geo-distributed system without compromising fault tolerance now has a proven, working answer.
For Cassandra users, this raises an exciting question: which workloads have you been routing to relational databases specifically because they needed transactional guarantees? It is time to reevaluate.
Stay tuned for a preview release of Cassandra 6 on the Instaclustr Platform and get ready to experience the power of ACID transactions on Cassandra for yourself!
The post Apache Cassandra® 6 Accord transactions: What you need to know appeared first on Instaclustr.
4 DynamoDB Configuration Changes for Significant Cost Savings
Learn about ways to cut DynamoDB costs with minimal code changes, zero migration, and no architectural upheaval If you’re running DynamoDB at scale, your bill might be tens of thousands of dollars higher than it needs to be. However, most teams don’t need a complete migration or architecture overhaul to save significantly. These configuration changes, all easily implemented, can reduce your costs by 50-80%. This guide covers the biggest wins for DynamoDB cost optimization, with the real math behind each recommendation. We will be sharing links to the ScyllaDB Cost Calculator at calculator.scylladb.com, which lets you model different workload scenarios with customized parameters and compare ScyllaDB pricing to DynamoDB pricing at the click of a button. Switch from on-demand to provisioned + reserved capacity This is the single biggest DynamoDB cost lever for most teams. On-demand capacity is convenient at first, with no planning required and just pay-as-you-go. But it’s also expensive. After AWS’s recent price reduction, on-demand costs 7.5x more than provisioned capacity. Before the drop, it was roughly 15x. Either way, the math is brutal. Let’s look at a simple example: a mid-sized workload running 10,000 reads/sec and 10,000 writes/sec, 24/7. On-Demand: ~$239K/year Provisioned: ~$71K/year Reserved: ~$34K/year That’s a 7x difference between on-demand and reserved. Even if your workload isn’t perfectly predictable, reserved capacity often pays for itself within months. The trade-off here is that you need a predictable load and the financial flexibility to commit. If your traffic varies wildly (or you’re short-term focused) provisioned mode without reservation is the middle ground. Still, it’s 3.3x cheaper than on-demand. Optimize item sizes DynamoDB’s billing is granular: writes are charged per 1KB of item size, and reads per 4KB. This means a 1.1KB item costs the same as a 2KB item on writes. If your items are consistently over these thresholds by a small margin, you’re paying 2-3x more than necessary. Let’s look at the same simple example, but with increasing item size for comparison. On-Demand with 1KB items: ~$239K/year On-Demand with 10KB items: ~$2M/year On-Demand with 100KB items: ~$20M/year Common culprits for higher DynamoDB costs here: Nested JSON with whitespace or redundant fields Variable-length strings with no trimming Metadata or audit fields added to every item Base64-encoded payloads What should you do? Compress JSON payloads before storage, remove redundant attributes, move infrequently accessed data to a separate table, or use a columnar storage strategy. Trimming just 200 bytes per item – across millions of items and thousands of writes/sec – adds up to thousands per month. Deploy DAX (DynamoDB Accelerator) for read-heavy workloads If your workload skews heavily toward reads and you’re not using an in-memory cache layer yet, DAX is one of the highest ROI moves you can make. DAX sits in front of DynamoDB and caches frequently accessed items in memory. Cache hits bypass DynamoDB entirely, meaning you avoid the RCU charge. For hot items queried thousands of times per minute, a single DAX cluster can reduce DynamoDB read capacity needs. Let’s look at another simple example: a read-heavy workload running 80,000 reads/sec and 1,000 writes/sec, 24/7. On-Demand: ~$335K/year On-Demand with DAX: ~$158K/year The cost math: a medium sized DAX cluster (3 nodes, cache.r5g.8xlarge) costs roughly $9K/month. A high hit rate on your cache will proportionally reduce your more expensive read costs. That can lead to potentially hundreds of thousands of dollars saved with DynamoDB. Bonus: DAX also improves latency dramatically. Cache hits respond in microseconds rather than milliseconds. Use the DynamoDB Infrequent Access (IA) table class Not all tables are created equal. If you have tables where data is accessed rarely but storage is high (think audit logs, historical records, compliance archives, or cold lookup tables), then the Standard-IA table class can save you substantially on storage. The pricing difference: Standard class: $0.25/GB Standard-IA class: $0.10/GB (up to 60% savings) The catch is that IA has a minimum item size of 100 bytes and a minimum billing duration. It’s designed for cold data. So, if you’re frequently scanning or querying these tables, IA isn’t the right fit (read costs are identical, but you lose the write discount). However, for true archive tables accessed only occasionally, it’s a no-brainer. The bottom line These four DynamoDB changes require minimal code changes, zero migration, and no architectural upheaval. They’re configuration changes, caching tweaks, and data optimization. Combined, they typically deliver massive cost reductions. Start with switching to provisioned + reserved (highest impact), then layer in the others based on your workload shape. Ready to model your savings? Use the ScyllaDB Cost Calculator at calculator.scylladb.com to compare your current DynamoDB costs against these optimizations. And to save even more, see how ScyllaDB compares.Shrinking the Search: Introducing ScyllaDB Vector Quantization
Learn how ScyllaDB Vector Quantization shrinks your vector index memory by up to 30x for cost-efficient, real-time AI applications Earlier this year, ScyllaDB launched integrated Vector Search, delivering sub-2ms P99 latencies for billion-vector datasets. However, high-dimensional vectors are notoriously memory-hungry. To help with memory efficiency, ScyllaDB recently introduced Vector Quantization. This allows you to shrink the index memory footprint for storing vectors by up to 30x (excluding index structure) without sacrificing the real-time performance ScyllaDB is known for. What is Quantization? To understand how we compress massive AI datasets, let’s look to the fundamentals of computer science. As Sam Rose explains in the ngrok blog on quantization, computers store numbers in bits, and representing high-precision decimal numbers (floating point) requires a significant number of them. Standard vectors use 32-bit floating point (f32) precision, where each dimension takes 4 bytes. Quantization is the process of compromising on this “floating point precision” to save space. By sacrificing some significant figures of accuracy, we can represent vectors as smaller 16-bit floats or even 8-bit or 1-bit integers. As Sam notes, while this results in a “precision compromise,” modern AI models are remarkably robust to this loss of information. They often maintain high quality even when compressed significantly. The Trade-off: Memory vs. Accuracy In ScyllaDB 2026.1, quantization is an index-only feature. The original source data remains at full precision in storage, while the in-memory HNSW index is compressed. This allows you to choose the level of “information loss” you are willing to accept for a given memory budget: Level Bytes/Dim Memory Savings Best For f32 (default) 4 1x (None) Small datasets, highest possible recall. f16 / bf16 2 ~2x Good balance of accuracy and memory. i8 1 ~4x Large datasets with moderate recall loss. b1 0.125 ~32x Maximum savings for massive datasets. CRITICAL NOTE: Quantization only compresses the vector data itself. The HNSW graph structure (the “neighbor lists” that make search fast) remains uncompressed to ensure query performance. Because of this fixed graph overhead, an i8 index typically provides a total memory reduction of ~3x rather than a raw 4x. Calculating Your Memory Needs To size your ScyllaDB Vector Search cluster effectively, be sure to consider both vector data and graph overhead. The total memory required for a vector index can be estimated with this formula: Memory ≈ N * (D * B + m * 16) * 1.2 N: Total number of vectors. D: Dimensions (e.g., 768 or 1536). B: Bytes per dimension based on quantization level (f32=4, i8=1, b1=0.125). m: Maximum connections per node (default 16). 1.2: 20% operational headroom for system processes and query handling. Example: 10 Million OpenAI Embeddings (768 Dimensions) Using this formula, let’s see how quantization affects your choice of AWS EC2 instances on ScyllaDB Cloud (which primarily utilizes the r7g Graviton and r7i Intel families): f32 (No Quantization): Requires ~40 GB RAM. You would need an r7g.2xlarge (64 GB) to ensure headroom. i8 Quantization: Requires ~12 GB RAM. You can comfortably drop to an r7g.xlarge (32 GB). b1 (1-bit): Requires ~4 GB RAM. This fits on a tiny r7g.medium (8 GB). By moving from f32 to i8, you can drop 2-3 instance tiers. This gets you significant cost savings. Improving Accuracy with Oversampling and Rescoring To mitigate the accuracy loss from quantization, ScyllaDB provides two complementary mechanisms. Oversampling retrieves a larger candidate set during the initial index search, increasing the chance that the true nearest neighbors are included. When a client requests the top K vectors, the algorithm retrieves ceiling(K * oversampling) candidates, sorts them by distance, and returns only the top K. A larger candidate pool means better recall without any extra round-trips to the application. Even without quantization, setting oversampling above 1.0 can improve recall on high-dimensionality datasets. Rescoring is a second-pass operation that recalculates distances using the original full-precision vectors stored in ScyllaDB, then re-ranks candidates before returning results. Because it must fetch and recompute exact distances for every candidate, rescoring can reduce search throughput by roughly 2x – so enable it only when high recall is critical. Note that rescoring is only beneficial when quantization is enabled; for unquantized indexes (default f32), the index already contains full-precision data, making the rescoring pass redundant. Both features are configured as index options when creating a vector index:CREATE CUSTOM INDEX ON myapp.comments(comment_vector)
USING 'vector_index' WITH OPTIONS = { 'similarity_function':
'COSINE', 'quantization': 'i8', 'oversampling': '5.0', 'rescoring':
'true' }; When (and When Not) to Use Quantization
Use quantization when: You are managing millions
or billions of vectors and need to control costs. You are
memory-constrained but can tolerate a small drop in recall. You are
using high-dimensional vectors (≥ 768), where the savings are most
pronounced. Avoid quantization when: You have a
small dataset where memory is not a bottleneck. Highest possible
recall is your only priority. Your application cannot afford the
~2x throughput reduction that comes with
rescoring—the process of recalculating exact
distances using the original f32 data to improve accuracy. Choosing
the Right Configuration for Your Scenario Here are some guidelines
to help you select the right configuration:
Scenario Recommendation Small
dataset, high recall required Use default f32 — no quantization
needed. Large dataset, memory-constrained Use i8 or f16 with
oversampling of 3.0–10.0. Add rescoring: true only if very high
recall is required. Very large dataset, approximate results
acceptable Use b1 for maximum memory savings. Enable oversampling
to compensate for accuracy loss. High-dimensionality vectors (≥
768) Consider oversampling > 1.0 even with f32 to improve
recall. Try ScyllaDB Vector Search Now Quantization is just one
part of the
ScyllaDB 2026.1 release, which also includes
Filtering,
Similarity Values, and
Real-Time Ingestion. With these tools, you can build
production-grade RAG applications that are both blazingly fast and
cost-efficient. Vector Search is available in ScyllaDB Cloud.
Get Started: Check out the
Quick Start Guide to Vector Search in ScyllaDB Cloud.
Deep Dive: Read our past posts on
building a Movie Recommendation App or our
1-billion vector benchmark. Documentation:
View the full ScyllaDB
Cloud Vector Search Documentation. Try ScyllaDB Cloud for free
today and see how quantization can supercharge your AI
infrastructure. The Great Stream Fix: Interleaving Writes in Seastar with AI-Powered Invariants Tracing
How we used AI-assisted invariant-based testing to locate and resolve tricky hidden bugs with complex state transitions Seastar is a high-performanceC++ framework for writing asynchronous server
applications. It powers projects like ScyllaDB and Redpanda. One of its core rules is
simple but strict: no blocking allowed. Every operation that could
take time (e.g., reading from disk, writing to a socket, waiting
for a lock) must be expressed asynchronously by returning a future
that resolves when the work is completed. This makes Seastar
applications extremely efficient on modern hardware. However, it
also means that even seemingly mundane things, like writing data to
a stream, require careful thought about ownership, lifetimes, and
buffering. Moreover Seastar’s output stream has always
experienced a limitation: the inability to freely mix small,
buffered writes with large, zero-copy chunks. It was something that
developers avoided and tolerated – but we always considered it
something worth improving … someday. Fixing this requires a deep
dive into complex state transitions, which inherently creates a
high risk for introducing sequencing bugs. A standard coding
approach won’t work; the task requires a way to trace the system’s
state across millions of test cases. This post describes the
process of using AI-assisted invariant-based testing to try to
locate and resolve these tricky hidden bugs. TL;DR What could have
been an extremely complicated fix ultimately was actually
surprisingly smooth and effective. Output streams Output
stream is Seastar’s output byte flow abstraction. It’s used
wherever data needs to go out of an application. For example, it’s
used for disk files, network connections, and stackable virtual
streams that transform data on the fly (such as compression or
encryption layers sitting on top of another stream). Whatever the
underlying sink is, the output stream presents a
uniform interface to the caller. It gives callers two ways to push
data through: Buffered writes: Copy bytes into an
internal buffer; flush when the buffer fills up or
when explicitly requested. Zero-copy writes: Hand
over memory buffers directly; the stream passes it to the sink
without copying a single byte of the buffer data.
Zero-copy is important for large blobs since we want
to avoid copying megabytes of data. Buffered writes
are important for building up small pieces efficiently. In a real
application, it’s natural to interleave both: write a small header
into the buffer, then attach a large payload as a zero-copy
buffer, then write a small trailer. There is also a
trim_to_size stream option. When enabled, the stream
guarantees that no chunk delivered to the underlying sink exceeds
the configured stream buffer size. This matters for
sinks that have an upper limit on how much data they can accept in
a single call – certain network APIs, for instance, or aligned disk
I/O. Without it, a larger buffer can pass through as-is. The
Problem Until recently, mixing the two write modes was not
supported. Internally, buffered and zero-copy writes used two
different storages: internal buffer for the former
data, and dedicated container for the latter. There
was no clean way to append buffered bytes onto the tail of pending
zero-copy data while preserving ordering. The code simply asserted
that the zero-copy container was empty whenever a
buffered write arrived and vice-versa. The nearby code
comment, however, stated that mixing writes was not supported
yet – so the intention to fix it had always been there.
The goal of the work described here was to make it happen. Start
with the Tests We figured we should build a solid test foundation
before touching the implementation. We had some pre-existing tests
for output streams, but they were really just a
collection of ad-hoc cases (specific input sequences with hardcoded
expected outputs). This was fine for catching regressions but not
great for systematically exploring the large space of possible
inputs against drastic code changes. The new approach was
invariant-based testing. Rather than checking exact output
sequences, the tests need to verify that certain properties always
hold, regardless of input. Specifically, we wanted to check that:
All written bytes arrive at the sink, in order, with no corruption.
Every chunk delivered to the sink (except the last) must be at
least stream_size bytes with no undersized non-last
chunks. With the trimming option enabled, all outgoing chunks must
be exactly stream_size bytes. With these invariants
defined, the test iterates over all combinations of chunk sizes (1
byte through 3x times the stream_size bytes) and all
assignments of write type (buffered or zero-copy) to
each chunk. For n chunks ,that’s 2^n type
patterns plus trimming option giving about 1.6 million combinations
in total. The ad-hoc tests were then removed – the invariant test
subsumed them. One practical issue: 1.6 million cases ran fast in a
regular build (~5 seconds), but under sanitizers
(ASan, UBSan) it ballooned to over two
minutes. Given the whole seastar test suite runs for
several minutes, this new timing had to be improved somehow. The
fix was to turn an exhaustive test into a fuzzy one: in debug
builds, shuffle all 2^n masks, always keep the
all-buffered and all-zero-copy patterns, and sample ~10% of the
rest. That brought sanitizer runs down to less than twenty seconds.
Implementing the Fix With tests in place, the implementation work
began. The key challenge was making the internal
buffer and zero-copy container interoperate
cleanly. Two transitions required handling: Buffered → zero-copy
Zero-copy → buffered Buffered → zero-copy When a
zero-copy write arrives and there’s buffered data.
That data needs to be folded into the zero-copy
container so that ordering is preserved. The naive approach
– trim buffer to its filled length and move it into container –
works, but it wastes the rest of the buffer
allocation. Instead, the filled buffer prefix is shared into
the container as a view or sub-span, and the buffer itself is
advanced past it, thus sharing the underlying memory. This way, the
tail of the original allocation is still available for
future buffered writes after the zero-copy
sequence. No reallocation is needed on the mode switch. This
tail – trimmer buffer, pointing at unused capacity within the
original allocation – is what we call the
remnant. It is a new concept introduced by
this change. Before mixed-mode writes were supported, the buffer
was always either full (and flushed) or freshly allocated. The
remnant is an optimization. But (as will become clear
shortly), its existence also introduced several subtle failure
modes that took time to track down. Zero-copy →
buffered When a buffered write arrives and
the zero-copy container is non-empty, the new data can
just be appended to the internal buffer regularly. A
new zero-copy arrival, if it happens, would keep the
buffer ordering as described above. The flushing path was also
updated to handle the case where buffered and zero-copy
data are both available at that time. The buffered
tail is folded into the zero-copy vector before
forwarding both downstream. Bugs Found Along the Way Once the
implementation was written, the tests were run. They failed – many
times, in different ways. To facilitate debugging, each test
failure logged the exact stream configuration and the write
sequence that triggered the problem, for example:
stream_size=5 trim_to_size=0 input_chunks=[3, 2]
write_types=[buffered, zero_copy] This made the diagnosis
unusually direct. Given a specific failing case, we could trace the
execution step by step – mentally simulating the stream, tracking
the bytes accumulated in the buffer and zero-copy container
state at each point, and comparing that against what the
state should be at that moment. The stream’s logic is not
huge, but it has enough branches and interacting state variables to
make it excessively time- and effort-consuming for a human. So, we
turned to AI to hopefully save time and effort. Each time a test
failed, instead of walking the input_stream state
step-by-step, we prompted the AI with the exact test-case
description string, the code itself, and the request to find where
the problem could be. The AI then started “thinking” and its
mind-flow looked as if it was simulating the
input_stream code execution step-by-step, like a
developer would do, just much faster. This turned out to be an
incredibly fast way to find where the invariant breaks. In many
cases, it could correctly analyze the desired state the stream had
to happen in. Sometimes, it even suggested excellent ways to fix
the problem that was found. Identifying the root cause is often
only half the work; understanding which fix is correct, safe, and
doesn’t introduce new assumptions that will break later is the
other half. AI found the problems for us and was quite helpful in
helping us fix them. For example, one of the first debugging
sessions with AI looked like this… Developer:
The test fails like that fatal error: in
"test_splitting_invariants": data integrity check failed:
stream_size=5 trim_to_size=0 input_chunks=[1, 1, 1, 4]
write_types=[zero_copy, buffered, buffered, buffered]
Please, analyze
AI: Let me trace through this case carefully.
stream_size=5, trim_to_size=false, writes:
[zc(1), buf(1), buf(1), buf(4)] After
zc(1): _zc_bufs = [tb(1)], _zc_len = 1, _end =
0 _zc_len < _size, nothing flushed
After buf(1): Goes to write(const char*,
1) fast path — but !_buf, so falls to
slow_write _end=0, n=1 < _size=5, so
skips the !_end && n >= _size branch Allocates
_buf, copies 1 byte, _end = 1
After buf(1): Fast path: fits in _buf, _end
= 2 After buf(4): _end=2, n=4, n >
_size - _end = 3, so goes to slow_write
_end=2, so skips the !_end && n >=
_size branch now = min(4, 5-2) = 3, copies 3
bytes, _end = 5 now != n (3
!= 4), so allocates next with
max(4-3, 5) = 5 bytes, copies 1 byte into it
_zc_bufs is non-empty, so enters the new branch: seals
_buf.share(0, 5) into _zc_bufs
_zc_len += 5 → _zc_len = 6 _zc_len += n - now =
1 → _zc_len = 7 pushes next into
_zc_bufs trim_to_size=false → calls
zero_copy_put(_zc_bufs) So zero_copy_put
sends [tb(1), tb(5), tb(5)] — but next
only has 1 byte of valid data in it, yet it’s a 5-byte allocation!
We’re pushing the full _size-sized next
buffer but only n - now = 1 bytes were written into
it. The buffer isn’t trimmed before being pushed. The fix: trim
next to n - now bytes before pushing it
into _zc_bufs when _zc_bufs is non-empty.
Shall I go ahead and fix it? The root-case was obvious and the
suggested fix was simple and correct. So, we accepted the
suggestion and the debugging session went on. We went through
several rounds like that, with each step producing more lengthy
analysis and more sophisticated fix proposals. At some point, a
test with a small buffered write followed by a
zero-copy write showed data corruption downstream.
Tracing the execution revealed that after trimming the
internal buffer past the filled region, the
trimmed buffer still appeared non-null
even though it had zero usable capacity remaining. The
fast-path check for buffer availability wasn’t
prepared for that and didn’t trigger reallocation on
the next write. As a result, bytes were written into unowned
memory. Another case involved a buffered write code
branch which handles writes larger than
stream_size by chopping them into chunks. After a mode
switch, the internal buffer might become a small
remnant – smaller than the tail chunk the
code in question was trying to store there. Tracing through the
code, we saw that it was the place where the capacity
check wasn’t prepared to meet with the remnant
buffer. It silently assumed that the internal
buffer always had room for a full-sized tail. The result was
an ASan-detected heap overflow. A more
subtle issue arose around the same remnant buffer in a
different scenario. When buffered write chopping code
encounters a tail chunk that is smaller than the
stream_size, but larger than the
remnant's remaining capacity, it has to make a choice.
It could either fill the remnant partially and
asynchronously put it before allocating a fresh buffer for the
rest, or simply abandon the remnant and allocate a
fresh full buffer. The first option is more space-efficient, but
would require an async flushing inside what is
otherwise a synchronous setup step, significantly complicating the
code. The second option wastes the unused bytes of the
remnant's allocation – but crucially, it doesn’t leak
them. The remnant shares its underlying allocation
with the sealed buffer already in the zero-copy
container, so the memory is freed once that buffer is
flushed and all references to the allocation are dropped. The
deliberate trade-off – wasted but not leaked – was worth making,
and a comment in the code explains the reasoning for whoever reads
it next. Each bug effectively had the same shape: a subtle
assumption about stream state that held in the
original single-mode code silently broke in mixed-mode scenarios.
The invariant test exposed the bugs by providing a
minimal reproducible case and a clear description of which
invariant was violated. Plus, it also made each one straightforward
to reason about and fix. The Result The work touches tests and
implementation in roughly equal measure, which feels about right
for a change like this. The test suite grew from a handful of
hand-crafted cases into an exhaustive invariant-based
framework that covers all combinations of chunk sizes and
write types – something that would have been impractical to write
by hand. On the implementation side, the long-standing restriction
on mixed-mode writes is gone. Buffered and
zero-copy writes can now be freely interleaved in any
order, with the stream handling the transitions internally. This
preserves ordering and the chunk-size invariants that
sinks depend on. In general, writing a test that covers as many
possible situations as possible and then making sure that the code
passes those tests is a very good approach. It makes sure the end
code is correct. In rare cases when the test covers all
possible situations the code may have to deal with, we can say that
“the code is officially bug free.” Making AI facilitate testing
turned out to be the best decision made in this work. Given the
amount of test cases and the number of possible combinations of
input_stream inner states, debugging each failing test
case would be a nightmare for the developer. The Hidden Insanity of DynamoDB Pricing
Learn how to navigate some of the sneakiest aspects of DynamoDB pricing DynamoDB’s pricing model has some head-scratching quirks that slyly inflate bills by hundreds of thousands of dollars per year. Most of these aren’t malicious; they’re just design decisions from 2012 that made sense at the time, but became increasingly absurd at scale. This post walks through four of the most egregious examples and the real cost impact on teams running large workloads. Cost per item size is punitive DynamoDB charges you for writes per 1KB chunk and reads per 4KB chunk. This means: 1KB write = 1 WCU 1.1KB write = 2 WCUs (you’re charged for 2KB, but only used 1.1KB) 1.5KB write = 2 WCUs 2.1KB write = 3 WCUs Every byte over a threshold doubles your cost for that operation. It’s a tax on items that don’t fit neatly into the billing boundary. And almost nothing fits neatly: JSON payloads with nested objects, variable-length strings, metadata, timestamps… Most real-world items end up hitting those boundaries, so you risk paying 2x or more for the overage. Consider a team logging 100M events per day, averaging 1.2KB each. That’s ~120M writes, almost all hitting the 2KB billing threshold. They’re paying for 200M KB instead of 120M KB. That’s a 67% surcharge baked into every bill. If their write cost is $10,000/month, that surcharge alone is ~$6,700/month in wasted capacity. On demand comes at a premium On-demand pricing was introduced as a convenience layer for unpredictable workloads. It saves teams the pain of provisioning and forecasting (“just pay for what you use”). The trade-off is that pricing is steep. Even after AWS’s recent price cut (it used to be ~15x!), on-demand is 7.5x more expensive than provisioned capacity. For a team that starts on on-demand and never switches, the cost difference is catastrophic. For example, say a SaaS company launches a new product on DynamoDB; they start with on-demand for convenience and quickly scale to 20K reads/sec and 20K writes/sec. On-demand now costs $39K/month. Switching to provisioned would drop that down to $11K/month. And teams often don’t switch because ‘it works’ or ‘the bill surprise hasn’t happened yet.’ The convenience tax on DynamoDB is insane. Even if you wanted to retain that flexibility, ScyllaDB would cost $3K/month for on-demand or just $1K/month with a hybrid subscription + flex component. Multi-region network costs are deceptive Global Tables already charge replicated writes (rWCUs) at a premium. But there’s a second hidden cost too: data transfer. AWS charges for cross-region data transfer at standard EC2 rates: $0.02/GB to adjacent regions, up to $0.09/GB to distant regions. As a result, Global Tables end up costing 2-3x more than expected. These hidden network costs often don’t appear as a line item on your DynamoDB bill. They’re rolled up into ‘Data Transfer’ charges. Many teams don’t notice or attribute it correctly. ScyllaDB can’t escape the variable costs of cross-region data transfer that AWS enforces. However, we have a number of cost reduction mechanisms that assist with these costs. ScyllaDB handles multi-DC replication natively. You provision nodes in each data center, and replication is built into the protocol. There are also shard-aware and rack-aware drivers, which help minimize network overhead. Add network compression, and your cross-region data costs get even lower. Reserved capacity requires you to predict capacity Reserved capacity offers massive discounts, up to 70% off. But there’s a catch: you must commit for 1 or 3 years upfront, and you must predict your read and write throughput independently. This is absurdly difficult. Your workload changes: new features launch, old features get deprecated, customer behavior shifts, and traffic patterns evolve. Predicting the exact read/write ratio years out is impossible. Teams either over-commit (wasting money on unused capacity) or under-commit (paying on-demand rates for the overage). Example: You commit to 200K reads/sec and 500K writes/sec for 1 year. On DynamoDB, that is going to cost $1.4M/year for the upfront and annual commitment. But six months into the year, growth exceeds your capacity estimates and your application starts having requests throttled. You revert to autoscaling a mixture of reserved plus on-demand. Now, you’re paying the 7.5x markup – and that costly misjudgment is locked in for the remainder of the year. The solution? Over-commit to hedge your bets. This guarantees you’re wasting money on overprovisioning, just to avoid even higher on-demand charges. It’s a no-win scenario. Compare this to ScyllaDB with a hybrid subscription + flex component that automatically scales to your requirements throughout the year, which might cost $133K/year to start with. Radically less expensive and more flexible (on both compute and storage requirements) thanks to true elastic scaling with X Cloud. Why does this matter? These four pricing quirks aren’t hypothetical. Combined, they add tens of thousands to six figures per year to bills across the industry. They’re especially brutal for write-heavy workloads, multi-region systems, and large items. And because they’re partially hidden, buried in separate line items, masked by the per-operation model, or justified by architectural constraints… Teams often don’t realize how much they’re paying. Some of this is inevitable with a fully managed service. But databases built on different cost models can deliver the same durability, consistency, flexibility and scale at a fraction of the price. For example, this is the case with ScyllaDB, which charges by the node and includes replication and large items at no extra cost. Curious what your workload actually costs? Use the ScyllaDB DynamoDB Cost Calculator at calculator.scylladb.com to model your real costs, including all the hidden charges, and see how ScyllaDB pricing stacks up.Powering a Billion Dreams: Scaling Meesho’s E-commerce Platform
How ScyllaDB plays a critical role in handling Meesho’s millions of transactions – optimizing our catalog rankings and ensuring ultra-low-latency operations With over a billion Indians set to shop online, Meesho is redefining e-commerce by making it accessible, affordable, and inclusive at an unprecedented scale. But scaling for Bharat isn’t just about growth—it’s about building a tech backbone that can handle massive traffic surges, dynamic pricing, real-time recommendations, and seamless user experiences. Let me take you behind the scenes of Meesho’s journey to democratize e-commerce while operating at monster scale. We’ll cover how ScyllaDB plays a critical role in handling Meesho’s millions of transactions – optimizing our catalog rankings and ensuring ultra-low-latency operations. Note: Adarsha Das from Meesho will be presenting a keynote at the upcoming Monster Scale Summit India/APAC. That talk is on BharatMLStack, an open-source, end-to-end machine learning infrastructure stack built at Meesho to support real-time and batch ML workloads at Bharat scale. Join Monster Scale Summit India/APAC — it’s free and virtual About Meesho In case you’re not familiar with Meesho, we’re an Indian e-commerce platform. The company was founded in 2015 to connect small and medium enterprises in India. Meesho helps consumers from these areas access products from all over India, beyond their local markets. Meesho focuses on bringing affordable product selections to Tier 2 cities and smaller markets. The company operates with a zero-commission model that reduces barriers for sellers. We function as an asset-light marketplace that connects sellers, logistics partners, and consumers. We make the listing process quite simple. Sellers just need to take a picture of the product, upload it, set the price, and start selling. Why Personalization is Essential for Meesho Meesho’s architecture aims to support people who are new to e-commerce. Tech-savvy users from Tier 1 cities likely know how to use search, tweak keywords, and find what they want. But someone from a Tier 2 city, new to e-commerce, needs discovery to be simpler. That’s why we invested in a lot of tech to build personalized experiences on the app. Specifically, we invested significantly in AI and personalization technologies to create intuitive app experiences. We personalize all the way from the moment the app opens to order completion. For example, different users see different homepages and product selections based on their preferences and purchase history. We also personalize for sellers, helping them create product descriptions that make sense to their buyers. Real-Time Feed-First Personalization Meesho meets these needs with a fundamentally feed-first app. We create a tailored product feed, ranking products based on preferences and actions (searches, clicks, etc). To do this, we built a CTR (click-through rate) prediction model to decide what product tiles to show each user, and in what order. Two people logging in will see different selections based on their behavior. Given all this, Meesho had to move from traditional recommendation systems to real-time, highly personalized experiences. Batch processing wasn’t sufficient; our personalization must respond instantly to recent user actions. That requires low-latency databases and systems at scale, with the ability to support millions of sellers and users on the app simultaneously. Why ScyllaDB We experimented with a few different databases and data stores: SQL, NoSQL, columnar, and non-columnar. Some worked at certain scales. But as we kept growing, we had to reinvent our storage strategy. Then we discovered ScyllaDB, which met our needs and proved itself at Meesho scale. More specifically, ScyllaDB provided… Horizontal Scaling Given the ever-increasing scale of Meesho – where user transactions kept increasing and users kept growing over years – horizontal scalability was very important to us. Today, I might be running with X nodes. If that becomes 2X tomorrow, how do you scale in a live manner? Being a low-cost e-commerce platform, we are conscious about server spend, so we try to emulate traffic patterns by dynamically scaling up and down based on demand. For example, not all 24 hours have the same number of orders; there are peaks and lows. We want to provision for baseline load and auto-scale for demand without downtime, since the cost of downtime for a business like ours is very high. Downtime can result in user churn and loss of trust, so we prioritize reliability and availability above all. Moreover, we expect that adding new nodes will linearly increase throughput. For example, if I run an X-node cluster and add nodes, I should get a proportionate throughput increase. This is critical as we scale up or down. We observed that in distributed systems with a primary-secondary configuration, the primary can become a bottleneck. So, we wanted a peer-to-peer architecture like ScyllaDB’s, where each node can service writes as well as reads. ScyllaDB gives us linear scalability. Low-Level Optimizations for Efficiency The database’s efficiency is also a factor for us. A major challenge we saw in JVM- or Java-based systems was garbage collection and related overheads. These impact performance, interrupt scaling, and limit hardware utilization. That’s why we prefer C++-based or other low-level language implementations, with minimal JVM or garbage collection issues, and minimal memory overhead. Most of our use cases require low-latency, real-time personalization, where every bit of memory is used for application logic and data, not overhead. Smart Architecture and Fault Tolerance Having a smart, fault-tolerant architecture was another consideration. Much of our user base is in Tier 3 and 4 cities, where network connectivity is sometimes flaky. We want to provide a Tier 1 user experience to Tier 4 users, so low latency is critical. We prioritize keeping latency within a few milliseconds. One of ScyllaDB’s key features is token-aware routing. When a query comes, it goes directly to the node with relevant data – reducing network hops since each node acts as its own master. This is the kind of distributed architecture we were looking for, and the token-level routing helps with horizontal scalability. Reliability and fault tolerance are also major requirements. When running on a public cloud, a big pain point is a particular zone going down. We’ve seen cloud regions and zones go down before. To minimize impact, we look for automatic data replication across zones and seamless failover in case of failures, so that user impact is minimal. Building trust with first-time e-commerce users is hard. If we lose it, getting them back is even harder. That’s why this capability is critical. Operational Simplicity Another thing we wanted is operational simplicity—having a system where adding or removing nodes is as simple as running a script or clicking a button. We like having an engine where we don’t need to tune everything ourselves. Results So Far We’ve been using ScyllaDB to power very low-latency systems at high throughput, for both reads and writes. We started with small workflows, scaled to platform workflows like ML platform and experimentation performance, and continue to scale. It’s been a good journey so far, and we’re looking forward to using it for more use cases.Instaclustr product update: March 2026
Here’s a roundup of the latest features and updates that we’ve recently released.
If you have any particular feature requests or enhancement ideas that you would like to see, please get in touch with us.
Major announcements Introducing AI Cluster Health: Smarter monitoring made simpleTurn complex metrics into clear, actionable insights with AI-powered health indicators—now available in the NetApp Instaclustr console. The new AI Cluster Health page simplifies cluster monitoring, making it easy to understand your cluster’s state at a glance without requiring deep technical expertise. This AI-driven analysis reviews recent metrics, highlights key indicators, explains their impact, and assigns an easy traffic-light health score for a quick status overview.
NetApp introduces Apache Kafka® Tiered Storage support on GCP in public previewTiered Storage is now available in public review for Instaclustr for Apache Kafka on Google Cloud Platform. This feature enables customers to optimize storage costs and improve scalability by offloading older Kafka log segments from local disk to Google Cloud Storage (GCS), while keeping active data local for fast access. Kafka clients continue to consume data seamlessly with no changes required, allowing teams to reduce infrastructure costs, simplify cluster scaling, and extend retention periods for analytics or compliance.
Other significant changes Apache Cassandra®- Released Apache Cassandra 4.0.19 and 5.0.6 into general availability on the NetApp Instaclustr Managed Platform, giving customers access to the latest stability, performance, and security improvements.
- Multi–data center Apache Cassandra clusters can now be provisioned across public and private networks via the Instaclustr Console, API, and Terraform provider, enabling customers to provision multi-DC clusters from day one.
- Single CA feature is now available for Apache Cassandra clusters. See Single CA for more details.
- GCP n4 machine types are now supported for Apache Cassandra in the available regions.
- Apache Kafka and Kafka Connect 4.1.1 are now generally available.
- Added Client Telemetry feature support for Kafka in private preview.
- Single CA feature is now available for Apache Kafka clusters. See Single CA for more details.
- ClickHouse 25.8.11 has been added to our managed platform in General Availability.
- Enabled ClickHouse
system.session_logtable that plays a key role in tracking session lifecycle and auditing user activities for enhanced session monitoring. This helps you with troubleshooting client-side connectivity issues and provides insights into failed connections.
- OpenSearch 2.19.4 and 3.3.2 have been released to general availability.
- Added support for the OpenSearch Assistant feature in OpenSearch Dashboards for clusters with Dashboards and AI Search enabled.
- PostgreSQL version 18.1 has now been released to general availability, alongside PostgreSQL version 17.7.
- PgBouncer version 1.25.0 has now been released to general availability.
- Added self-service Tags Management feature—allowing users to add, edit, or delete tags for their clusters directly through the Instaclustr console, APIs, or Terraform provider for RIYOA deployments
- Added new region Germany West Central for Azure
- Following the private preview release, Kafka’s Client Telemetry feature is progressing toward general availability soon. Read more here.
- We plan to extend the current ClickHouse integration with FSxN data sources by adding support for deployments across different VPCs, enabling broader enterprise lakehouse architectures.
- Apache Iceberg and Delta Lake integration are planned to soon be available for ClickHouse on the NetApp Instaclustr Platform, giving you a practical way to run analytics on open table formats while keeping control of your existing data platforms.
- We plan to soon introduce fully integrated AWS PrivateLink as a ClickHouse Add-On for secure and seamless connectivity with ClickHouse.
- We’re aiming to launch PostgreSQL integrated with FSx for NetApp ONTAP (FSxN) along with NVMe support into general availability soon. This enhancement is designed to combine enterprise-grade PostgreSQL with FSxN’s scalable, cost-efficient storage, enabling customers to optimize infrastructure costs while improving performance and flexibility. NVMe support is designed to deliver up to 20% greater throughput vs NFS.
- An AI Search plugin for OpenSearch is being released to GA (currently in public preview) to enhance search experiences using AI‑powered techniques such as semantic, hybrid, and conversational search, enabling more relevant, context‑aware results and unlocking new use cases including retrieval‑augmented generation (RAG) and AI‑driven chatbots.
- Following the public preview release, Zero Inbound Access is progressing to General Availability, designed to deliver the most secure management connectivity by eliminating inbound internet exposure and removing the need for any routable public IP addresses, including bastion or gateway instances.
- Explore how to freeze your streaming data for long-term
analytical queries in the future with our two-part blog series:
Freezing streaming data into Apache Iceberg
—Part 1: Using Apache Kafka®Connect Iceberg Sink Connector
introduces Apache Iceberg and demonstrates streaming Kafka data
using the Apache Kafka Connect Iceberg Sink Connector and
Freezing streaming data into Apache Iceberg
—Part 2: Using Iceberg Topics examines the experimental
approach of using Kafka Tiered Storage and Iceberg Topics, where
non‑active Kafka segments are copied to remote storage while
remaining transparently readable by Kafka clients. - Modern search applications go beyond simple keyword matching, requiring a deep understanding of user intent and context to deliver relevant, meaningful results. From keywords to concepts: How OpenSearch® AI search outperforms traditional search explores how semantic and hybrid search methods in OpenSearch AI search compare to traditional keyword search, and how you can use these capabilities for more relevant results.
- Generative AI and Large Language Models (LLMs) are booming, and they’ve put a spotlight on a crucial technology: vector search. Many applications today demand high throughput, low latency, and constant availability for retrieving information. Slow vector search can become a significant bottleneck, delaying responses and degrading the user experience. Our two-part series blogs Vector search benchmarking: Setting up embeddings, insertion, and retrieval with PostgreSQL® and Vector search benchmarking: Embeddings, insertion, and searching documents with ClickHouse® and Apache Cassandra® explore hands-on findings from our benchmarking projects, the role of databases in vector search, how to set up vector search for embeddings, insertion, and retrieval, and practical strategies for building faster, more efficient semantic search systems.
If you have any questions or need further assistance with these enhancements to the Instaclustr Managed Platform, please contact us.
SAFE HARBOR STATEMENT: Any unreleased services or features referenced in this blog are not currently available and may not be made generally available on time or at all, as may be determined in NetApp’s sole discretion. Any such referenced services or features do not represent promises to deliver, commitments, or obligations of NetApp and may not be incorporated into any contract. Customers should make their purchase decisions based upon services and features that are currently generally available.
The post Instaclustr product update: March 2026 appeared first on Instaclustr.
Claude Code Marketplace Now Available
Claude Code has become an indispensable part of my daily workflow. I use it for everything from writing code to debugging production issues. But while Claude is incredibly capable out of the box, there are areas where injecting specialized domain knowledge makes it dramatically more useful.
That’s why I built a plugin marketplace. Yesterday I released rustyrazorblade/skills, a collection of Claude Code plugins that extend Claude with expert-level knowledge in specific domains. The first plugin is something I’ve been talking about doing for a while: a Cassandra expert.
Apache Cassandra® 5.0: Improving performance with Unified Compaction Strategy
IntroductionUnified Compaction Strategy (UCS), introduced in Apache Cassandra 5.0, is a versatile compaction framework that not only unifies the benefits of Size-Tiered (STCS) and Leveled (LCS) Compaction Strategies, but also introduces new capabilities like shard parallelism, density-aware SSTable organization, and safer incremental compaction, all of which deliver more predictable performance at scale. By utilizing a flexible scaling model, UCS allows operators to tune compaction behavior to match evolving workloads, spanning from write-heavy to read-heavy, without requiring disruptive strategy migrations in most cases.
In the past, operators had to choose between rigid strategies and accept significant trade-offs. UCS changes this paradigm, allowing the system to efficiently adapt to changing workloads with tuneable configurations that can be altered mid-flight and even applied differently across different compaction levels based on data density.
Why compaction mattersCompaction is the critical process that determines a cluster’s long-term health and cost-efficiency. When executed correctly, it produces denser nodes with highly organized SSTables, allowing each server to store more data without sacrificing speed. This efficiency translates to a smaller infrastructure footprint, which can lower cloud costs and resource usage.
Conversely, inefficient compaction is a primary driver of performance degradation. Poorly managed SSTables lead to fragmented data, forcing the system to work harder for every request. This overhead consumes excessive CPU and I/O, often forcing teams to try adding more nodes (horizontal scale) just to keep up with background maintenance noise.
Key concepts and terminologyTo understand how UCS optimizes a cluster, it is necessary to understand the fundamental trade-offs it balances:
- Read amplification: Occurs when the database must consult multiple SSTables to answer a single query. High read amplification acts as a “latency tax,” forcing extra I/O to reconcile data fragments.
- Write amplification: A metric that quantifies the overhead of background processes (such as compactions). It represents the ratio between total data written to disk and the amount of data originally sent by an application. High write amplification wears out SSDs and steals throughput.
- Space amplification: The ratio of disk space used to the actual size of the “live” data. It tracks data such as tombstones or overwritten rows that haven’t been purged yet.
- Fan factor: The “growth dial” for the cluster data hierarchy. It defines how many files of a similar size must accumulate before they are merged into a larger tier.
- Sharding: UCS splits data into smaller, independent token ranges (shards), allowing the system to run multiple compactions in parallel across CPU cores.
UCS provides baseline architectural improvements that were not available in older strategies:
Improved compaction parallelismOlder strategies often got stuck on a single thread during large merges. UCS sharding allows a server to use its full processing power. This significantly reduces the likelihood of compaction storms and keeps tail latencies (p99) predictable.
Reduced disk space amplificationBecause UCS operates on smaller shards, it doesn’t need to double the entire disk space of a node to perform a major merge. This greatly reduces the risk of nodes from running out of space during heavy maintenance cycles.
Density-based SSTable organizationUCS measures SSTables by density (token range coverage). This mitigates the huge SSTable problem where a single massive file becomes too large to compact, hindering read performance indefinitely.
Scaling parameterThe scaling parameter (denoted as W) is a configurable setting that determines the size ratio between compaction tiers. It helps balance write amplification and read performance by controlling how much data is rewritten during compaction operations. A lower scaling parameter value results in more frequent, smaller compactions, whereas a higher value leads to larger compaction groups.
The strategy engine: tuning and parametersUCS acts as a strategy engine by adjusting the scaling parameter (W), allowing UCS to mimic, or outperform, its predecessors STCS and LCS.
At a high level, the scaling parameter influences the effective fan-out behavior at each compaction level. Tiered-style settings such as T4 allow more SSTables to accumulate before merging, favoring write efficiency, while leveled-style settings such as L10 keep SSTables more tightly organized, reducing read amplification at the cost of additional background work.
The numbers below are illustrative and not prescriptive:
UCS configuration guide Workload type Strategy target Scaling (W) Primary benefit Heavy writes / IoT STCS (Tiered) Negative (e.g., -4) Lowest read amplification Heavy reads LCS (Leveled) Positive (e.g., 10) Lowest write amplification Balanced Hybrid Zero (0) Balanced performance for general apps Practical exampleUCS allows operators to mix behaviors across the data lifecycle.
'scaling_parameters': 'T4, T4, L10'
Note that scaling_parameters takes a string format that can accommodate parameters for per-level tuning.
This example instructs a cluster: “Use tiered compaction for the first two levels to keep up with the high write volume, but once data reaches the third level, reorganize it into a leveled structure so reads stay fast.”
Here’s a fuller, illustrative example of how one might structure their CQL to change the compaction strategy.
ALTER TABLE keyspace_name.table_name WITH compaction = { 'class': 'UnifiedCompactionStrategy', 'scaling_parameters': 'T4,T4,L10' };
Operational evolution: moving beyond major compactions
In older strategies and in Apache Cassandra versions prior to 5.0, operators often felt forced to run a major compaction to reclaim disk space or fix performance. This was a critical event that could impact a node’s I/O for extended periods of time and required substantial free disk space to complete.
Because UCS is density-aware and sharded, it effectively performs compactions constantly and granularly so major compactions are rarely needed. It identifies overlapping data within specific token ranges (shards) and cleans them up incrementally. Operators no longer must choose between a fragmented disk and a risky, resource-heavy manual compaction; UCS keeps data density more uniform across the cluster over time.
The migration advantage: “in-place” adoptionOne of the key performance features of a UCS migration is in-place adoption, meaning that when a table is switched to UCS, it does not immediately force a massive data rewrite. Instead, it looks at the existing SSTables, calculates their density, and maps them into its new sharding structure.
This allows for moving from STCS or LCS to UCS with significantly less I/O overhead than any other strategy change.
ConclusionUCS is an operational shift toward simplicity and predictability. By removing the need to choose between compaction trade-offs, UCS allows organizations to scale with confidence. Whether handling a massive influx of IoT data or serving high-speed user profiles, UCS helps clusters remain performant, cost-effective, and ready for the future.
On a newly deployed NetApp Instaclustr Apache Cassandra 5 cluster, UCS is already the default strategy (while Apache Cassandra 5.0 has STCS set as the default).
Ready to experience this new level of Cassandra performance for yourself? Try it with a free 30-day trial today!
The post Apache Cassandra® 5.0: Improving performance with Unified Compaction Strategy appeared first on Instaclustr.
Exploring the key features of Cassandra® 5.0
Apache Cassandra has become one of the most broadly adopted distributed databases for large-scale, highly available applications since its launch as an open source project in 2008. The 5.0 release in September 2024 represents the most substantial advancement to the project since 4.0 released in July 2021. Multiple customers (and our own internal Cassandra use case) have now been happily running on Cassandra 5 for up to 12 months so we thought the time was right to explore the key features they are leveraging to power their modern applications.
An overview of new features in Apache Cassandra 5.0Apache Cassandra 5.0 introduces core capabilities aimed at AI-driven systems, low-latency analytical workloads, and environments that blend operational and analytical processing.
Highlights include:
- The new vector data type and an Approximate Nearest Neighbor (ANN) index based on Hierarchical Navigable Small World (HNSW), which is integrated into the Storage-Attached Index (SAI) architecture
- Trie-based memtables and the Big Trie-Index (BTI) SSTable format, delivering better memory efficiency and more consistent write performance
- The Unified Compaction Strategy, a tunable density-based approach that can align with leveled or tiered compaction patterns.
Additional enhancements include expanded mathematical CQL functions, dynamic data masking, and experimental support for Java 17.
At NetApp, Apache Cassandra 5.0 is fully supported, and we are actively assisting customers as they transition from 4.x.
A deeper look at Cassandra 5.0’s key features Storage–Attached Indexes (SAI)Storage–Attached Indexes bring a modern, storage-integrated approach to secondary indexing in Apache Cassandra, resolving many of the scalability and maintenance challenges associated with earlier index implementations. Legacy Secondary Indexes (2i) and SASI remain available, but SAI offers a more robust and predictable indexing model for a broad range of production workloads.
SAI operates per-SSTable, allowing queries to be indexed locally versus the cluster-wide coordination required of other strategies. This model supports diverse CQL data types, enables efficient numeric and text range filters, and provides more consistent performance characteristics than 2i or SASI. The same storage-attached foundation is also used for Cassandra 5’s vector indexing mechanism, allowing ANN search to operate within the same storage and query framework.
SAI supports combining filters across multiple indexed columns and works seamlessly with token-aware routing to reduce unnecessary coordinator work. Public evaluations and community testing have shown faster index builds, more predictable read paths, and improved disk utilization compared with previous index formats.
Operationally, SAI functions as part of the storage engine itself: indexes are defined using standard CQL statements and are maintained automatically during flush and compaction, with no cluster-wide rebuilds required. This provides more flexible query options and can simplify application designs that previously relied on manual denormalization or external indexing systems.
Native Vector Search capabilitiesApache Cassandra 5.0 introduces native support for high-dimensional vector embeddings through the new vector data type. Embeddings represent semantic information in numerical form, enabling similarity search to be performed directly within the database. The vector type is integrated with the database’s storage-attached index architecture, which uses HNSW graphs to efficiently support ANN search across cosine, Euclidean, and dot-product similarity metrics.
With vector search implemented at the storage layer, applications involving semantic matching, content discovery, and retrieval-oriented workflows while maintaining the system’s established scalability and fault-tolerance characteristics are supported.
After upgrading to 5.0, existing schemas can add vector columns and store embeddings through standard write operations. For example:
UPDATE products SET embedding = [0.1, 0.2, 0.3, 0.4, 0.5] WHERE id = <id>;
To create a new table with a vector type column:
CREATE TABLE items ( product_id UUID PRIMARY KEY, embedding VECTOR<FLOAT, 768> // 768 denotes dimensionality );
Because vector indexes are attached to SSTables, they participate automatically in the compaction and repair processes and do not require an external indexing system. ANN queries can be combined with regular CQL filters, allowing similarity searches and metadata conditions to be evaluated within a unified distributed query workflow. This brings vector retrieval into Apache Cassandra’s native consistency, replication, and storage model.
Unified Compaction Strategy (UCS)Unified Compaction Strategy in Apache Cassandra 5 included a density-aware approach to organizing SSTables that blends the strengths of Leveled Compaction Strategy (LCS) and Size Tiered Compaction Strategy (STCS). UCS aims to provide the predictable read amplification associated with LCS and the write efficiency of STCS, without many of the workload-specific drawbacks that previously made compaction selection difficult. Choosing an unsuitable compaction strategy in earlier releases could lead to operational complexity and long-term performance issues, which UCS is designed to mitigate.
UCS exposes a set of tunable parameters like density thresholds and per-level scaling that let operators adjust compaction behavior toward read-heavy, write-heavy, or time-series patterns. This flexibility also helps smooth the transition from existing strategies, as UCS can adopt and improve the current SSTable layout without requiring a full rewrite in most cases. The introduction of compaction shards further increases parallelism and reduces the impact of large compactions on cluster performance.
Although LCS and STCS remain available (and while STCS remains the default strategy in 5.0, UCS is the default strategy on newly deployed NetApp Instaclustr’s managed Apache Cassandra 5 clusters), UCS supports a broader range of workloads, reduces the operational burden of compaction tuning, and aligns well with other storage engine improvements in Apache Cassandra 5 such as trie-based SSTables and Storage-Attached Indexes.
Trie Memtables and Trie-Indexed SSTablesTrie Memtables and Trie-indexed SSTables (Big Trie-Index, BTI) are significant storage engine enhancements released in Apache Cassandra 5. They are designed to reduce memory overhead, improve lookup performance, and increase flush efficiency. A trie data structure stores keys by shared prefixes instead of repeatedly storing full keys, which lowers object count and improves CPU cache locality compared with the legacy skip-list memtable structure. These benefits are particularly visible in high-ingestion, IoT, and time-series workloads.
Skip-list memtables store full keys for every entry, which can lead to large heap usage and increased garbage collection activity under heavy write loads. Trie Memtables substantially reduce this overhead by compacting key storage and avoiding pointer-heavy layouts. On disk, the BTI SSTable format replaces the older BIG index with a trie-based partition index that removes redundant key material and reduces the number of key comparisons needed during partition lookups.
Using Trie memtables requires enabling both the trie-based memtable implementation and the BTI SSTable format. Existing BIG SSTables are converted to BTI through normal compaction or by rebuilding data. On NetApp Instaclustr’s managed Apache Cassandra clusters Trie Memtables and BTI are enabled by default, but when upgrading major versions to 5.0, data must be converted from BIG to BTI first to utilize Trie structures.
Other new features Mathematical CQL functionsApache Cassandra 5.0 added a rich set of math functions allowing developers to perform computations directly within queries. This reduces data transfer overhead and reduces client-side post-processing, among many other benefits. From fundamental functions like ABS(), ROUND(), or SQRT() to more complex operations like SIN(), COS(), TAN(), these math functions are extensible to a multitude of domains from financial data, scientific measurements or spatial data.
Dynamic Data MaskingDynamic Data Masking (DDM) is a new feature to obscure sensitive
column-level data at query time or permanently attach the
functionality to a column so that the data always returns
obfuscated. Stored data values are not altered in this process, and
administrators can control access through role-based access control
(RBAC) to ensure only those with access can see the data while also
tuning the visibility of the obscured data. This feature helps with
adherence to data privacy regulations such as GDPR, HIPAA, and PCI
DSS without needing external redaction systems.
Apache Cassandra 5.0 packs a punch with game changing features that meet the needs of modern workloads and applications. Features like vector search capabilities and Storage Attached Indexes stand out as they will inevitably shape how data can be leveraged within the same database while maintaining speed, scale, and resilience.
When you deploy a managed cluster on NetApp Instaclustr’s Managed Platform, you get the benefits of all these amazing features without worrying about configuration and maintenance.
Ready to experience the power of Apache Cassandra 5.0 for yourself? Try it free for 30 days today!
The post Exploring the key features of Cassandra® 5.0 appeared first on Instaclustr.
Instaclustr product update: December 2025
Instaclustr product update: December 2025Here’s a roundup of the latest features and updates that we’ve recently released.
If you have any particular feature requests or enhancement ideas that you would like to see, please get in touch with us.
Major announcements OpenSearch®AI Search for OpenSearch®: Unlocking next-generation search
AI Search for OpenSearch, which is now available in Public Preview on the NetApp Instaclustr Managed Platform, is designed to bring semantic, hybrid, and multimodal search capabilities to OpenSearch deployments—turning them into an end-to-end AI-powered search solution within minutes. With built-in ML models, vector indexing, and streamlined ingestion pipelines, next-generation search can be enabled in minutes without adding operational complexity. This feature powers smarter, more relevant discovery experiences backed by AI—securely deployed across any cloud or on-premises environment.
ClickHouse®
FSx for NetApp ONTAP and Managed ClickHouse® integration is now
available
We’re excited to announce that NetApp has introduced seamless
integration between Amazon FSx for NetApp ONTAP and Instaclustr
Managed ClickHouse, to enable customers to build a truly hybrid
lakehouse architecture on AWS. This integration is designed to
deliver lightning-fast analytics without the need for complex data
movement, while leveraging FSx for ONTAP’s unified file and object
storage, tiered performance, and cost optimization. Customers can
now run zero-copy lakehouse analytics with ClickHouse directly on
FSx for ONTAP data—to simplify operations, accelerate
time-to-insight, and reduce total cost of ownership.
Instaclustr for PostgreSQL® on Amazon FSx for ONTAP: A new
era
We’re excited to announce the public preview of Instaclustr Managed
PostgreSQL integrated with Amazon FSx for NetApp ONTAP—combining
enterprise-grade storage with world-class open source database
management. This integration is designed to deliver higher IOPS,
lower latency, and advanced data management without increasing
instance size or adding costly hardware. Customers can now run
PostgreSQL clusters backed by FSx for ONTAP storage, leveraging
on-disk compression for cost savings and paving the way for
ONTAP-powered features, such as instant snapshot backups, instant
restores, and fast forking. These ONTAP-enabled features are
planned to unlock huge operational benefits and will be launched
with our GA release.
- Released Apache Cassandra 5.0.5 into general availability on the NetApp Instaclustr Managed Platform.
- Transitioned Apache Cassandra v4.1.8 to CLOSED lifecycle state; scheduled to reach End of Life (EOL) on December 20, 2025.
- Kafka on Azure now supports v5 generation nodes, available in General Availability.
- Instaclustr Managed Apache ZooKeeper has moved from General Availability to closed status.
- Kafka and Kafka Connect 3.1.2 and 3.5.1 are retired; 3.6.2, 3.7.1, 3.8.1 are in legacy support. Next set of lifecycle state changes for Kafka and Kafka Connect in end March 2026 will see all supported versions 3.8.1 and below marked End of Life.
- Karapace Rest Proxy and Schema Registry 3.15.0 are closed. Customers are advised to move to version 5.x.
- Kafka Rest Proxy 5.0.0 and Kafka Schema Registry 5.0.0, 5.0.4 have been moved to end of life. Affected customers have been contacted by Support to schedule a migration to a supported version as soon as possible.
- ClickHouse 25.3.6 has been added to our managed platform in General Availability.
- Kafka Table Engine integration with ClickHouse has added support to enable real-time data ingestion, streamline streaming analytics, and accelerate insights.
- New ClickHouse node sizes, powered by AWS m7g, r7i, and r7g instances, are now in Limited Availability for cluster creation.
- Cadence is now available to be provisioned with Cassandra 5.x, designed to deliver improved performance, enhanced scalability, and stronger security for mission-critical workflows.
- OpenSearch 2.19.3 and 3.2.0 have been released to General Availability.
- PostgreSQL AWS PrivateLink support has been added, enabling connectivity between VPCs using AWS PrivateLink.
- PostgreSQL version 18.0 has now been released to General Availability, alongside PostgreSQL version 16.10, 17.6.
- Added new PostgreSQL metrics for connect states and wait event types.
- PostgreSQL Load Balancer add-on is now available, providing a unified endpoint for cluster access, simplifying failover handling, and ensuring node health through regular checks.
- We’re working on enabling multi-datacenter (multi-DC) cluster provisioning via API and console, designed to make it easier to deploy clusters across regions with secure networking and reduced manual steps.
- We’re working on adding Kafka Tiered Storage for clusters running in GCP— designed to bring affordable, scalable retention, and instant access to historical data, to ensure flexibility and performance across clouds for enterprise Kafka users.
- We’re planning to extend our Managed ClickHouse to allow it to work with on-prem deployments.
- Following the success of our public preview, we’re preparing to launch PostgreSQL integrated with FSx for NetApp ONTAP (FSxN) into General Availability. This enhancement is designed to combine enterprise-grade PostgreSQL with FSxN’s scalable, cost-efficient storage, enabling customers to optimize infrastructure costs while improving performance and flexibility.
- As part of our ongoing advancements in AI for OpenSearch, we are planning to enable adding GPU nodes into OpenSearch clusters, aiming to enhance the performance and efficiency of machine learning and AI workloads.
- Self-service Tags Management feature—allowing users to add, edit, or delete tags for their clusters directly through the Instaclustr console, APIs, or Terraform provider for RIYOA deployments.
- Cadence Workflow, the open source orchestration engine created by Uber, has officially joined the Cloud Native Computing Foundation (CNCF) as a Sandbox project. This milestone ensures transparent governance, community-driven innovation, and a sustainable future for one of the most trusted workflow technologies in modern microservices and agentic AI architectures. Uber donates Cadence Workflow to CNCF: The next big leap for the open source project—read the full story and discover what’s next for Cadence.
- Upgrading ClickHouse® isn’t just about new features—it’s essential for security, performance, and long-term stability. In ClickHouse upgrade: Why staying updated matters, you’ll learn why skipping upgrades can lead to technical debt, missed optimizations, and security risks. Then, explore A guide to ClickHouse® upgrades and best practices for practical strategies, including when to choose LTS releases for mission-critical workloads and when stable releases make sense for fast-moving environments.
- Our latest blog, AI Search for OpenSearch®: Unlocking next-generation search, explains how this new solution enables smarter discovery experiences using built-in ML models, vector embeddings, and advanced search techniques—all fully managed on the NetApp Instaclustr Platform. Ready to explore the future of search? Read the full article and see how AI can transform your OpenSearch deployments.
If you have any questions or need further assistance with these enhancements to the Instaclustr Managed Platform, please contact us.
SAFE HARBOR STATEMENT: Any unreleased services or features referenced in this blog are not currently available and may not be made generally available on time or at all, as may be determined in NetApp’s sole discretion. Any such referenced services or features do not represent promises to deliver, commitments, or obligations of NetApp and may not be incorporated into any contract. Customers should make their purchase decisions based upon services and features that are currently generally available.
The post Instaclustr product update: December 2025 appeared first on Instaclustr.
Freezing streaming data into Apache Iceberg™—Part 1: Using Apache Kafka®Connect Iceberg Sink Connector
IntroductionEver since the first distributed system—i.e. 2 or more computers networked together (in 1969)—there has been the problem of distributed data consistency: How can you ensure that data from one computer is available and consistent with the second (and more) computers? This problem can be uni-directional (one computer is considered the source of truth, others are just copies), or bi-directional (data must be synchronized in both directions across multiple computers).
Some approaches to this problem I’ve come across in the last 8 years include Kafka Connect (for elegantly solving the heterogeneous many-to-many integration problem by streaming data from source systems to Kafka and from Kafka to sink systems, some earlier blogs on Apache Camel Kafka Connectors and a blog series on zero-code data pipelines), MirrorMaker2 (MM2, for replicating Kafka clusters, a 2 part blog series), and Debezium (Change Data Capture/CDC, for capturing changes from databases as streams and making them available in downstream systems, e.g. for Apache Cassandra and PostgreSQL)—MM2 and Debezium are actually both built on Kafka Connect.
Recently, some “sink” systems have been taking over responsibility for streaming data from Kafka into themselves, e.g. OpenSearch pull-based ingestion (c.f. OpenSearch Sink Connector), and the ClickHouse Kafka Table Engine (c.f. ClickHouse Sink Connector). These “pull-based” approaches are potentially easier to configure and don’t require running a separate Kafka Connect cluster and sink connectors, but some downsides may be that they are not as reliable or independently scalable, and you will need to carefully monitor and scale them to ensure they perform adequately.
And then there’s “zero-copy” approaches—these rely on the well-known computer science trick of sharing a single copy of data using references (or pointers), rather than duplicating the data. This idea has been around for almost as long as computers, and is still widely applicable, as we’ll see in part 2 of the blog.
The distributed data use case we’re going to explore in this 2-part blog series is streaming Apache Kafka data into Apache Iceberg, or “Freezing streaming Apache Kafka data into an (Apache) Iceberg”! In part 1 we’ll introduce Apache Iceberg and look at the first approach for “freezing” streaming data using the Kafka Connect Iceberg Sink Connector.
What is Apache Iceberg?Apache Iceberg is an open source specification open table format optimized for column-oriented workloads, supporting huge analytic datasets. It supports multiple different concurrent engines that can insert and query table data using SQL—and Iceberg is organized like, well, an iceberg!
The tip of the Iceberg is the Catalog. An Iceberg Catalog acts as a central metadata repository, tracking the current state of Iceberg tables, including their names, schemas, and metadata file locations. It serves as the “single source of truth” for a data Lakehouse, enabling query engines to find the correct metadata file for a table to ensure consistent and atomic read/write operations.
Just under the water, the next layer is the metadata layer. The Iceberg metadata layer tracks the structure and content of data tables in a data lake, enabling features like efficient query planning, versioning, and schema evolution. It does this by maintaining a layered structure of metadata files, manifest lists, and manifest files that store information about table schemas, partitions, and data files, allowing query engines to prune unnecessary files and perform operations atomically.
The data layer is at the bottom. The Iceberg data layer is the storage component where the actual data files are stored. It supports different storage backends, including cloud-based object storage like Amazon S3 or Google Cloud Storage, or HDFS. It uses file formats like Parquet or Avro. Its main purpose is to work in conjunction with Iceberg’s metadata layer to manage table snapshots and provide a more reliable and performant table format for data lakes, bringing data warehouse features to large datasets.

As shown in the above diagram, Iceberg supports multiple different engines, including Apache Spark and ClickHouse. Engines provide the “database” features you would expect, including:
- Data Management
- ACID Transactions
- Query Planning and Optimization
- Schema Evolution
- And more!
I’ve recently been reading an excellent book on Apache Iceberg (“Apache Iceberg: The Definitive Guide”), which explains the philosophy, architecture and design, including operation, of Iceberg. For example, it says that it’s best practice to treat data lake storage as immutable—data should only be added to a Data Lake, not deleted. So, in theory at least, writing infinite, immutable Kafka streams to Iceberg should be straightforward!
But because it’s a complex distributed system (which looks like a database from above water but is really a bunch of files below water!), there is some operational complexity. For example, it handles change and consistency by creating new snapshots for every modification, enabling time travel, isolating readers from writes, and supporting optimistic concurrency control for multiple writers. But you need to manage snapshots (e.g. expiring old snapshots). And chapter 4 (performance optimisation) explains that you may need to worry about compaction (reducing too many small files), partitioning approaches (which can impact read performance), and handling row-level updates. The first two issues may be relevant for Kafka, but probably not the last one. So, it looks like it’s good fit for the streaming Kafka use cases, but we may need to watch out for Iceberg management issues.
“Freezing” streaming data with the Kafka Iceberg Sink ConnectorBut Apache Iceberg is “frozen”—what’s the connection to fast-moving streaming data? You certainly don’t want to collide with an iceberg from your speedy streaming “ship”—but you may want to freeze your streaming data for long-term analytical queries in the future. How can you do that without sinking? Actually, a “sink” is the first answer: A Kafka Connect Iceberg Sink Connector is the most common way of “freezing” your streaming data in Iceberg!
Kafka Connect is the standard framework provided by Apache Kafka to move data from multiple heterogeneous source systems to multiple heterogeneous sink systems, using:
- A Kafka cluster
- A Kafka Connect cluster (running connectors)
- Kafka Connect source connectors
- Kafka topics and
- Kafka Connect Sink Connectors
That is, a highly decoupled approach. It provides real-time data movement with high scalability, reliability, error handling and simple transformations.
Here’s the Kafka Connect Iceberg Sink Connector official documentation.
It appears to be reasonably complicated to configure this sink connector; you will need to know something about Iceberg. For example, what is a “control topic”? It’s apparently used to coordinate commits for exactly-once semantics (EOS).
The connector supports fan-out (writing to multiple Iceberg tables from one topic), fan-in (writing to one Iceberg table from multiple topics), static and dynamic routing, and filtering.
In common with many technologies that you may want to use as Kafka Connect sinks, they may not all have good support for Kafka metadata. The KafkaMetadata Transform (which injects topic, partition, offset and timestamp properties) is only experimental at present.
How are Iceberg tables created with the correct metadata? If you have JSON record values, then schemas are inferred by default (but may not be correct or optimal). Alternatively, explicit schemas can be included in-line or referenced from a Kafka Schema Registry (e.g. Karapace), and, as an added bonus, schema evolution is supported. Also note that Iceberg tables may have to be manually created prior to use if your Catalog doesn’t support table auto-creation.
From what I understood about Iceberg, to use it (e.g. for writes), you need support from an engine (e.g. to add raw data to the Iceberg warehouse, create the metadata files, and update the catalog). How does this work for Kafka Connect? From this blog I discovered that the Kafka Connect Iceberg Sink connector is functioning as an Iceberg engine for writes, so there really is an engine, but it’s built into the connector.
As is the case with all Kafka Connect Sink Connectors, records are available immediately they are written to Kafka topics by Kafka producers and Kafka Connect Source Connectors, i.e. records in active segments can be copied immediately to sink systems. But is the Iceberg Sink Connector real-time? Not really! The default time to write to Iceberg is every 5 minutes (iceberg.control.commit.interval-ms) to prevent multiplication of small files—something that Iceberg(s) doesn’t/don’t like (“melting”?). In practice, it’s because every data file must be tracked in the metadata layer, which impacts performance in many ways—proliferation of small files is typically addressed by optimization and compaction (e.g. Apache Spark supports Iceberg management, including these operations).
So, unlike most Kafka Connect sink connectors, which write as quickly as possible, there will be lag before records appear in Iceberg tables (“time to freeze” perhaps)!
The systems are separate (Kafka and Iceberg are independent), records are copied to Iceberg, and that’s it! This is a clean separation of concerns and ownership. Kafka owns the source data (with Kafka controlling data lifecycles, including record expiry), Kafka Connect Iceberg Sink Connector performs the reading from Kafka and writing to Iceberg, and is independently scalable to Kafka. Kafka doesn’t handle any of the Iceberg management. Once the data has landed in Iceberg, Kafka has no further visibility or interest in it. And the pipeline is purely one way, write only – reads or deletes are not supported.
Here’s a summary of this approach to freezing streams:
- Kafka Connect Iceberg Sink Connector shares all the benefits of the Kafka Connect framework, including scalability, reliability, error handling, routing, and transformations.
- At least, JSON values are required, ideally full schemas and referenced in Karapace—but not all schemas are guaranteed to work.
- Kafka Connect doesn’t “manage” Iceberg (e.g. automatically aggregate small files, remove snapshots, etc.)
- You may have to tune the commit interval – 5 minutes is the default.
- But it does have a built-in engine that supports writing to Iceberg.
- You may need to use an external tool (e.g. Apache Spark) for Iceberg management procedures.
- It’s write-only to Iceberg. Reads or deletes are not supported

But what’s the best thing about the Kafka Connect Iceberg Sink Connector? It’s available now (as part of the Apache Iceberg build) and works on the NetApp Instaclustr Kafka Connect platform as a “bring your own connector” (instructions here).
In part 2, we’ll look at Kafka Tiered Storage and Iceberg Topics!
The post Freezing streaming data into Apache Iceberg™—Part 1: Using Apache Kafka®Connect Iceberg Sink Connector appeared first on Instaclustr.
Stay ahead with Apache Cassandra®: 2025 CEP highlights
Apache Cassandra® committers are working hard, building new features to help you more seamlessly ease operational challenges of a distributed database. Let’s dive into some recently approved CEPs and explain how these upcoming features will improve your workflow and efficiency.
What is a CEP?CEP stands for Cassandra Enhancement Proposal. They are the process for outlining, discussing, and gathering endorsements for a new feature in Cassandra. They’re more than a feature request; those who put forth a CEP have intent to build the feature, and the proposal encourages a high amount of collaboration with the Cassandra contributors.
The CEPs discussed here were recently approved for implementation or have had significant progress in their implementation. As with all open-source development, inclusion in a future release is contingent upon successful implementation, community consensus, testing, and approval by project committers.
CEP-42: Constraints frameworkWith collaboration from NetApp Instaclustr, CEP-42, and subsequent iterations, delivers schema level constraints giving Cassandra users and operators more control over their data. Adding constraints on the schema level means that data can be validated at write time and send the appropriate error when data is invalid.
Constraints are defined in-line or as a separate definition. The inline style allows for only one constraint while a definition allows users to define multiple constraints with different expressions.
The scope of this CEP-42 initially supported a few constraints that covered the majority of cases, but in follow up efforts the expanded list of support includes scalar (>, <, >=, <=), LENGTH(), OCTET_LENGTH(), NOT NULL, JSON(), REGEX(). A user is also able to define their own constraints if they implement it and put them on Cassandra’s class path.
A simple example of an in-line constraint:
CREATE TABLE users (
username text PRIMARY KEY,
age int CHECK age >= 0 and age < 120
);
Constraints are not supported for UDTs (User-Defined Types) nor collections (except for using NOT NULL for frozen collections).
Enabling constraints closer to the data is a subtle but mighty way for operators to ensure that data goes into the database correctly. By defining rules just once, application code is simplified, more robust, and prevents validation from being bypassed. Those who have worked with MySQL, Postgres, or MongoDB will enjoy this addition to Cassandra.
CEP-51: Support “Include” Semantics for cassandra.yamlThe cassandra.yaml file holds important settings for storage,
memory, replication, compaction, and more. It’s no surprise that
the average size of the file around 1,000 lines (though, yes—most
are comments). CEP-51 enables splitting the cassandra.yaml
configuration into multiple files using includes
semantics. From the outside, this feels like a small change, but
the implications are huge if a user chooses to opt-in.
In general, the size of the configuration file makes it difficult to manage and coordinate changes. It’s often the case that multiple teams manage various aspects of the single file. In addition, cassandra.yaml permissions are readable for those with access to this file, meaning private information like credentials are comingled with all other settings. There is risk from an operational and security standpoint.
Enabling the new semantics and therefore modularity for the configuration file eases management, deployment, complexity around environment-specific settings, and security in one shot. The configuration file follows the principle of least privilege once the cassandra.yaml is broken up into smaller, well-defined files; sensitive configuration settings are separated out from general settings with fine-grained access for the individual files. With the feature enabled, different development teams are better equipped to deploy safely and independently.
If you’ve deployed your Cassandra cluster on the NetApp Instaclustr platform, the cassandra.yaml file is already configured and managed for you. We pride ourselves on making it easy for you to get up and running fast.
CEP-52: Schema annotations for Apache CassandraWith extensive review by the NetApp Instaclustr team and Stefan Miklosovic, CEP-52 introduces schema annotations in CQL allowing in-line comments and labels of schema elements such as keyspaces, tables, columns, and User Defined Types (UDT). Users can easily define and alter comments and labels on these elements. They can be copied over when desired using CREATE TABLE LIKE syntax. Comments are stored as plain text while labels are stored as structured metadata.
Comments and labels serve different annotation purposes: Comments document what a schema object is for, whereas labels describe how sensitive or controlled it is meant to be. For example, labels can be used to identify columns as “PII” or “confidential”, while the comment on that column explains usage, e.g. “Last login timestamp.”
Users can query these annotations. CEP-52 defines two new read-only tables (system_views.schema_comments and system_views.schema_security_labels) to store comments and security labels so objects with comments can be returned as a list or a user/machine process can query for specific labels, beneficial for auditing and classification. Note that adding security labels are descriptive metadata and do not enforce access control to the data.
CEP-53: Cassandra rolling restarts via SidecarSidecar is an auxiliary component in the Cassandra ecosystem that exposes cluster management and streaming capabilities through APIs. Introducing rolling restarts through Sidecar, this feature is designed to provide operators with more efficient and safer restarts without cluster-wide downtime. More specifically, operators can monitor, pause, resume, and abort restarts all through an API with configurable options if restarts fail.
Rolling restarts brings operators a step closer to cluster-wide operations and lifecycle management via Sidecar. Operators will be able to configure the number of nodes to restart concurrently with minimal risk as this CEP unleashes clear states as a node progresses through a restart. Accounting for a variety of edge cases, an operator can feel assured that, for example, a non-functioning sidecar won’t derail operations.
The current process for restarting a node is a multi-step, manual process, which does not scale for large cluster sizes (and is also tedious for small clusters). Restarting clusters previously lacked a streamlined approach as each node needed to be restarted one at a time, making the process time-intensive and error-prone.
Though Sidecar is still considered WIP, it’s got big plans to improve operating large clusters!
The NetApp Instaclustr Platform, in conjunction with our expert TechOps team already orchestrates these laborious tasks for our Cassandra customers with a high level of care to ensure their cluster stays online. Restarting or upgrading your Cassandra nodes is a huge pain-point for operators, but it doesn’t have to be when using our managed platform (with round-the-clock support!)
CEP-54: Zstd with dictionary SSTable compressionCEP-54, with NetApp Instaclustr’s collaboration, aims to add support Zstd with dictionary compression for SSTables. Zstd, or Zstandard, is a fast, lossless data compression algorithm that boasts impressive ratio and speed and has been supported in Cassandra since 4.0. Certain workloads can benefit from significantly faster read/write performance, reduced storage footprint, and increased storage device lifetime when using dictionary compression.
At a high level, operators choose a table they want to compress with a dictionary. A dictionary must be trained first on a small amount of already present data (recommended no more than 10MiB). The result of a training is a dictionary, which is stored cluster-wide for all other nodes to use, and this dictionary is used for all subsequent writes of SSTables to a disk.
Workloads with structured data of similar rows benefit most from Zstd with dictionary compression. Some examples of ideal workloads include event logs, telemetry data, metadata tables with templated messages. Think: repeated row data. If the table data is too unstructured or random, this feature likely won’t be optimal for dictionary compression, however plain Zstd will still be an excellent option.
New SSTables with dictionaries are readable across nodes and can stream, repair, and backup. Existing tables are unaffected if dictionary compression is not enabled. Too many unique dictionaries hurt decompression; use minimal dictionaries (recommended dictionary size is about 100KiB and one dictionary per table) and only adopt new ones when they’re noticeably better.
CEP-55: Generated role namesCEP-55 adds support to create users/roles without supplying a name, simplifying
user management, especially when generating users and roles in bulk. With an example syntax, CREATE GENERATED ROLE WITH GENERATED PASSWORD, new keys are placed in a newly introduced configuration section in cassandra.yaml under “role_name_policy.”
Stefan Miklosovic, our Cassandra engineer at NetApp Instaclustr, created this CEP as a logical follow up to CEP-24 (password validation/generation), which he authored as well. These quality-of-life improvements let operators spend less time doing trivial tasks with high-risk potential and more time on truly complex matters.
Manual name selection seems trivial until a hundred role names need to be generated; now there is a security risk if the new usernames—or worse passwords—are easily guessable. With CEP-55, the generated role name will be UUID-like, with optional prefix/suffix and size hints, however a pluggable policy is available to generate and validate names as well. This is an opt-in feature with no effect to the existing method of generating role names.
The future of Apache Cassandra is brightThese Cassandra Enhancement Proposals demonstrate a strong commitment to making Apache Cassandra more powerful, secure, and easier to operate. By staying on top of these updates, we ensure our managed platform seamlessly supports future releases that accelerate your business needs.
At NetApp Instaclustr, our expert TechOps team already orchestrates laborious tasks like restarts and upgrades for our Apache Cassandra customers, ensuring their clusters stay online. Our platform handles the complexity so you can get up and running fast.
Learn more about our fully managed and hosted Apache Cassandra offering and try it for free today!
The post Stay ahead with Apache Cassandra®: 2025 CEP highlights appeared first on Instaclustr.
Optimizing Cassandra Repair for Higher Node Density
This is the fourth post in my series on improving the cost efficiency of Apache Cassandra through increased node density. In the last post, we explored compaction strategies, specifically the new UnifiedCompactionStrategy (UCS) which appeared in Cassandra 5.
- Streaming Throughput
- Compaction Throughput and Strategies
- Repair (you are here)
- Query Throughput
- Garbage Collection and Memory Management
- Efficient Disk Access
- Compression Performance and Ratio
- Linearly Scaling Subsystems with CPU Core Count and Memory
Now, we’ll tackle another aspect of Cassandra operations that directly impacts how much data you can efficiently store per node: repair. Having worked with repairs across hundreds of clusters since 2012, I’ve developed strong opinions on what works and what doesn’t when you’re pushing the limits of node density.
Building easy-cass-mcp: An MCP Server for Cassandra Operations
I’ve started working on a new project that I’d like to share, easy-cass-mcp, an MCP (Model Context Protocol) server specifically designed to assist Apache Cassandra operators.
After spending over a decade optimizing Cassandra clusters in production environments, I’ve seen teams consistently struggle with how to interpret system metrics, configuration settings, schema design, and system configuration, and most importantly, how to understand how they all impact each other. While many teams have solid monitoring through JMX-based collectors, extracting and contextualizing specific operational metrics for troubleshooting or optimization can still be cumbersome. The good news is that we now have the infrastructure to make all this operational knowledge accessible through conversational AI.
easy-cass-stress Joins the Apache Cassandra Project
I’m taking a quick break from my series on Cassandra node density to share some news with the Cassandra community: easy-cass-stress has officially been donated to the Apache Software Foundation and is now part of the Apache Cassandra project ecosystem as cassandra-easy-stress.
Why This Matters
Over the past decade, I’ve worked with countless teams struggling with Cassandra performance testing and benchmarking. The reality is that stress testing distributed systems requires tools that can accurately simulate real-world workloads. Many tools make this difficult by requiring the end user to learn complex configurations and nuance. While consulting at The Last Pickle, I set out to create an easy to use tool that lets people get up and running in just a few minutes
Compaction Strategies, Performance, and Their Impact on Cassandra Node Density
This is the third post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. In the first post, I examined how streaming operations impact node density and laid out the groundwork for understanding why higher node density leads to significant cost savings. In the second post, I discussed how compaction throughput is critical to node density and introduced the optimizations we implemented in CASSANDRA-15452 to improve throughput on disaggregated storage like EBS.
Cassandra Compaction Throughput Performance Explained
This is the second post in my series on improving node density and lowering costs with Apache Cassandra. In the previous post, I examined how streaming performance impacts node density and operational costs. In this post, I’ll focus on compaction throughput, and a recent optimization in Cassandra 5.0.4 that significantly improves it, CASSANDRA-15452.
This post assumes some familiarity with Apache Cassandra storage engine fundamentals. The documentation has a nice section covering the storage engine if you’d like to brush up before reading this post.
How Cassandra Streaming, Performance, Node Density, and Cost are All related
This is the first post of several I have planned on optimizing Apache Cassandra for maximum cost efficiency. I’ve spent over a decade working with Cassandra and have spent tens of thousands of hours data modeling, fixing issues, writing tools for it, and analyzing it’s performance. I’ve always been fascinated by database performance tuning, even before Cassandra.
A decade ago I filed one of my first issues with the project, where I laid out my target goal of 20TB of data per node. This wasn’t possible for most workloads at the time, but I’ve kept this target in my sights.
Cassandra 5 Released! What's New and How to Try it
Apache Cassandra 5.0 has officially landed! This highly anticipated release brings a range of new features and performance improvements to one of the most popular NoSQL databases in the world. Having recently hosted a webinar covering the major features of Cassandra 5.0, I’m excited to give a brief overview of the key updates and show you how to easily get hands-on with the latest release using easy-cass-lab.
You can grab the latest release on the Cassandra download page.
easy-cass-lab v5 released
I’ve got some fun news to start the week off for users of easy-cass-lab: I’ve just released version 5. There are a number of nice improvements and bug fixes in here that should make it more enjoyable, more useful, and lay groundwork for some future enhancements.
- When the cluster starts, we wait for the storage service to
reach NORMAL state, then move to the next node. This is in contrast
to the previous behavior where we waited for 2 minutes after
starting a node. This queries JMX directly using Swiss Java Knife
and is more reliable than the 2-minute method. Please see
packer/bin-cassandra/wait-for-up-normalto read through the implementation. - Trunk now works correctly. Unfortunately, AxonOps doesn’t support trunk (5.1) yet, and using the agent was causing a startup error. You can test trunk out, but for now the AxonOps integration is disabled.
- Added a new repl mode. This saves keystrokes and provides some
auto-complete functionality and keeps SSH connections open. If
you’re going to do a lot of work with ECL this will help you be a
little more efficient. You can try this out with
ecl repl. - Power user feature: Initial support for profiles in AWS regions
other than
us-west-2. We only provide AMIs forus-west-2, but you can now set up a profile in an alternate region, and build the required AMIs usingeasy-cass-lab build-image. This feature is still under development and requires using aneasy-cass-labbuild from source. Credit to Jordan West for contributing this work. - Power user feature: Support for multiple profiles. Setting the
EASY_CASS_LAB_PROFILEenvironment variable allows you to configure alternate profiles. This is handy if you want to use multiple regions or have multiple organizations. - The project now uses Kotlin instead of Groovy for Gradle configuration.
- Updated Gradle to 8.9.
- When using the list command, don’t show the alias “current”.
- Project cleanup, remove old unused pssh, cassandra build, and async profiler subprojects.
The release has been released to the project’s GitHub page and to homebrew. The project is largely driven by my own consulting needs and for my training. If you’re looking to have some features prioritized please reach out, and we can discuss a consulting engagement.
easy-cass-lab updated with Cassandra 5.0 RC-1 Support
I’m excited to announce that the latest version of easy-cass-lab now supports Cassandra 5.0 RC-1, which was just made available last week! This update marks a significant milestone, providing users with the ability to test and experiment with the newest Cassandra 5.0 features in a simplified manner. This post will walk you through how to set up a cluster, SSH in, and run your first stress test.
For those new to easy-cass-lab, it’s a tool designed to streamline the setup and management of Cassandra clusters in AWS, making it accessible for both new and experienced users. Whether you’re running tests, developing new features, or just exploring Cassandra, easy-cass-lab is your go-to tool.
easy-cass-lab now available in Homebrew
I’m happy to share some exciting news for all Cassandra enthusiasts! My open source project, easy-cass-lab, is now installable via a homebrew tap. This powerful tool is designed to make testing any major version of Cassandra (or even builds that haven’t been released yet) a breeze, using AWS. A big thank-you to Jordan West who took the time to make this happen!
What is easy-cass-lab?
easy-cass-lab is a versatile testing tool for Apache Cassandra. Whether you’re dealing with the latest stable releases or experimenting with unreleased builds, easy-cass-lab provides a seamless way to test and validate your applications. With easy-cass-lab, you can ensure compatibility and performance across different Cassandra versions, making it an essential tool for developers and system administrators. easy-cass-lab is used extensively for my consulting engagements, my training program, and to evaluate performance patches destined for open source Cassandra. Here are a few examples: