ScyllaDB Operator 1.19.0 Release: Multi-Tenant Monitoring with Prometheus and OpenShift Support
Multi-tenant monitoring with Prometheus/OpenShift, improved sysctl config, and a new opt-in synchronization for safer topology changes The ScyllaDB team is pleased to announce the release of ScyllaDB Operator 1.19.0. ScyllaDB Operator is an open-source project that helps you run ScyllaDB on Kubernetes. It manages ScyllaDB clusters deployed to Kubernetes and automates tasks related to operating a ScyllaDB cluster, like installation, vertical and horizontal scaling, as well as rolling upgrades. The latest release introduces the “External mode,” which enables multi-tenant monitoring with Prometheus and OpenShift support. It also adds a new guardrail in the must-gather debugging tool preventing accidental inclusion of sensitive information, optimizes kernel parameter (sysctl) configuration, and introduces an opt-in synchronization feature for safer topology changes – plus several other updates. Multi-tenant monitoring with Prometheus and OpenShift support ScyllaDB Operator monitoring uses Prometheus (an industry-standard cloud-native monitoring system) for metric collection and aggregation. Up until now, you had to run a fresh, clean instance of Prometheus for every ScyllaDB cluster. We coined the term “Managed mode” for this architecture (because, in that
case, ScyllaDB Operator would manage the Prometheus deployment):
ScyllaDB Operator 1.19 introduces the
“External mode” – an option to connect (one or
more) ScyllaDB clusters with a shared Prometheus deployment that
may already be present in your production environment:
The External mode provides a very important
capability for users who run ScyllaDB on Red Hat OpenShift. The
User Workload Monitoring (UWM) capability of OpenShift becomes
available as a backend for ScyllaDB Monitoring:
Under the hood, ScyllaDB Operator 1.19 implements the new
monitoring architectures by extending the
ScyllaDBMonitoring CRD with a new field
.spec.components.prometheus.mode that can now be set
to Managed or External.
Managed is the preexisting behavior (to deploy a
clean Prometheus instance), while
External deploys just the Grafana dashboard using
your existing Prometheus as a data source instead, and puts
ServiceMonitors and PrometheusRules
in place to get all the ScyllaDB metrics there. See the new
ScyllaDB Monitoring overview and
Setting up ScyllaDB Monitoring documents to learn more about
the new mode and how to set up ScyllaDB Monitoring
with an existing Prometheus instance. The
Setting up ScyllaDB Monitoring on OpenShift guide offers
guidance on how to set up
User Workload Monitoring (UWM) for ScyllaDB in OpenShift. That
being said, our experience shows that cluster administrators prefer
closer control over the monitoring stack than what the
Managed mode offered. For this reason, we intend to
standardize on using External in the long run.
So, we’re still supporting the Managed mode, but it’s
being deprecated and will be removed in a future Operator version.
If you are an existing user, please consider deploying your own
Prometheus using the Prometheus
Operator platform guide and switching from Managed
to External. Sensitive information excluded from
must-gather ScyllaDB Operator comes with an embedded tool (called
must-gather) that helps preserve the configuration
(Kubernetes objects) and runtime state (ScyllaDB node logs, gossip
information, nodetool status, etc.) in a convenient
archive. This allows comparative analysis and troubleshooting with
a holistic, reproducible view. As of ScyllaDB Operator 1.19,
must-gather comes with a new setting
--exclude-resource that serves as an additional
guardrail preventing accidental inclusion of sensitive information
– covering Secrets and SealedSecrets by default. Users can specify
additional types to be restricted from capturing, or override the
defaults by setting the --include-sensitive-resources
flag. See the
Gathering data with must-gather guide for more information.
Configuration of kernel parameters (sysctl) ScyllaDB nodes require
kernel parameter (sysctl) configuration for optimal performance and
stability – ScyllaDB Operator 1.19 improves the API to do that.
Before 1.19, it was possible to configure these parameters through
v1.ScyllaCluster‘s .spec.sysctls.
However, we learned that this wasn’t the optimal place in the API
for a setting that affects entire Kubernetes nodes. So, ScyllaDB
Operator 1.19 lets you configure sysctls through
v1alpha1.NodeConfig for a range of Kubernetes nodes at
once by matching the specified placement rules using a label-based
selector. See the
Configuring kernel parameters (sysctls) section of the
documentation to learn how to configure the sysctl values
recommended for production-grade ScyllaDB deployments. With the
introduction of sysctl to NodeConfig, the legacy way of configuring
sysctl values through v1.ScyllaCluster‘s
.spec.sysctls is now deprecated. Topology change
operations synchronisation ScyllaDB
requires that no existing nodes are down when a new node is added
to a cluster. ScyllaDB Operator 1.19 addresses this by
extending ScyllaDB Pods for newly joining nodes with a barrier
blocking the ScyllaDB container from starting until the
preconditions for bootstrapping a new node are met. This feature is
opt-in in ScyllaDB Operator 1.19. You can enable it by setting the
--feature-gates=BootstrapSynchronisation=true
command-line argument to ScyllaDB Operator. This feature supports
ScyllaDB 2025.2 and newer. If you are running a multi-datacenter
ScyllaDB cluster (multiple ScyllaCluster objects bound
together with external seeds), you are still required to verify the
preconditions yourself before initiating any topology changes. This
is because the synchronisation only occurs on the level of an
individual ScyllaCluster. See
Synchronising bootstrap operations in ScyllaDB for more
information. Other notable changes Deprecation of
ScyllaDBMonitoring components’ exposeOptions By adding support for
external Prometheus instances, ScyllaDB Operator 1.19 makes a step
towards reducing ScyllaDBMonitoring‘s complexity
by deprecating exposeOptions in both
ScyllaDBMonitoring‘s Prometheus and Grafana
components. The use of exposeOptions is limited
because it provides no way to configure an Ingress that will
terminate TLS, which is likely the most common approach in
production. As an alternative, this release introduces a more
pragmatic and flexible approach: You can simply document how the
components’ corresponding Services can be exposed. This gives you
the flexibility to do exactly what your use case requires. See the
Exposing Grafana documentation to learn how to expose Grafana
deployed by ScyllaDBMonitoring using a
self-managed Ingress resource. The deprecated
ScyllaDBMonitoring‘s exposeOptions will
be removed in a future Operator version. Dependency updates This
release also includes regular updates of ScyllaDB Monitoring and
the packaged dashboards to support the latest ScyllaDB releases
(4.11.1->4.12.1, #3031),
as well as its dependencies: Grafana (12.0.2->12.2.0) and
Prometheus (v3.5.0->v3.6.0). For more changes and details, check
out the GitHub
release notes. Upgrade instructions For instructions on
upgrading ScyllaDB Operator to 1.19, see the
Upgrading Scylla Operator documentation. Supported versions
ScyllaDB 2024.1, 2025.1 – 2025.3 Kubernetes 1.31 – 1.34 Container
Runtime Interface API v1 ScyllaDB Manager 3.5, 3.7 Getting started
with ScyllaDB Operator ScyllaDB Operator
Documentation Learn
how to deploy ScyllaDB on Google Kubernetes Engine (GKE)
Learn
how to deploy ScyllaDB on Amazon Elastic Kubernetes Engine
(EKS) Learn how
to deploy ScyllaDB on a Kubernetes Cluster Related links
ScyllaDB Operator source (on
GitHub) ScyllaDB Operator image on
DockerHub ScyllaDB Operator
Helm Chart repository ScyllaDB Operator documentation
ScyllaDB Operator for Kubernetes
lesson in ScyllaDB University Report a
problem Your feedback is always welcome! Feel free to open an
issue or reach out on the #scylla-operator channel in ScyllaDB User Slack. Instaclustr product update: December 2025
Instaclustr product update: December 2025
Here’s a roundup of the latest features and updates that we’ve recently released.
If you have any particular feature requests or enhancement ideas that you would like to see, please get in touch with us.
Major announcements
OpenSearch®
AI Search for OpenSearch®: Unlocking next-generation search
AI Search for OpenSearch, which is now available in Public Preview on the NetApp Instaclustr Managed Platform, is designed to bring semantic, hybrid, and multimodal search capabilities to OpenSearch deployments—turning them into an end-to-end AI-powered search solution within minutes. With built-in ML models, vector indexing, and streamlined ingestion pipelines, next-generation search can be enabled in minutes without adding operational complexity. This feature powers smarter, more relevant discovery experiences backed by AI—securely deployed across any cloud or on-premises environment.
ClickHouse®
FSx for NetApp ONTAP and Managed ClickHouse® integration is now
available
We’re excited to announce that NetApp has introduced seamless
integration between Amazon FSx for NetApp ONTAP and Instaclustr
Managed ClickHouse, to enable customers to build a truly hybrid
lakehouse architecture on AWS. This integration is designed to
deliver lightning-fast analytics without the need for complex data
movement, while leveraging FSx for ONTAP’s unified file and object
storage, tiered performance, and cost optimization. Customers can
now run zero-copy lakehouse analytics with ClickHouse directly on
FSx for ONTAP data—to simplify operations, accelerate
time-to-insight, and reduce total cost of ownership.
PostgreSQL®
Instaclustr for PostgreSQL® on Amazon FSx for ONTAP: A new
era
We’re excited to announce the public preview of Instaclustr Managed
PostgreSQL integrated with Amazon FSx for NetApp ONTAP—combining
enterprise-grade storage with world-class open source database
management. This integration is designed to deliver higher IOPS,
lower latency, and advanced data management without increasing
instance size or adding costly hardware. Customers can now run
PostgreSQL clusters backed by FSx for ONTAP storage, leveraging
on-disk compression for cost savings and paving the way for
ONTAP-powered features, such as instant snapshot backups, instant
restores, and fast forking. These ONTAP-enabled features are
planned to unlock huge operational benefits and will be launched
with our GA release.
Other significant changes
Apache Cassandra®
- Released Apache Cassandra 5.0.5 into general availability on the NetApp Instaclustr Managed Platform.
- Transitioned Apache Cassandra v4.1.8 to CLOSED lifecycle state; scheduled to reach End of Life (EOL) on December 20, 2025.
Apache Kafka®
- Kafka on Azure now supports v5 generation nodes, available in General Availability.
- Instaclustr Managed Apache ZooKeeper has moved from General Availability to closed status.
- Kafka and Kafka Connect 3.1.2 and 3.5.1 are retired; 3.6.2, 3.7.1, 3.8.1 are in legacy support. Next set of lifecycle state changes for Kafka and Kafka Connect in end March 2026 will see all supported versions 3.8.1 and below marked End of Life.
- Karapace Rest Proxy and Schema Registry 3.15.0 are closed. Customers are advised to move to version 5.x.
- Kafka Rest Proxy 5.0.0 and Kafka Schema Registry 5.0.0, 5.0.4 have been moved to end of life. Affected customers have been contacted by Support to schedule a migration to a supported version as soon as possible.
ClickHouse
- ClickHouse 25.3.6 has been added to our managed platform in General Availability.
- Kafka Table Engine integration with ClickHouse has added support to enable real-time data ingestion, streamline streaming analytics, and accelerate insights.
- New ClickHouse node sizes, powered by AWS m7g, r7i, and r7g instances, are now in Limited Availability for cluster creation.
Cadence®
- Cadence is now available to be provisioned with Cassandra 5.x, designed to deliver improved performance, enhanced scalability, and stronger security for mission-critical workflows.
OpenSearch
- OpenSearch 2.19.3 and 3.2.0 have been released to General Availability.
PostgreSQL
- PostgreSQL AWS PrivateLink support has been added, enabling connectivity between VPCs using AWS PrivateLink.
- PostgreSQL version 18.0 has now been released to General Availability, alongside PostgreSQL version 16.10, 17.6.
- Added new PostgreSQL metrics for connect states and wait event types.
- PostgreSQL Load Balancer add-on is now available, providing a unified endpoint for cluster access, simplifying failover handling, and ensuring node health through regular checks.
Upcoming releases
Apache Cassandra
- We’re working on enabling multi-datacenter (multi-DC) cluster provisioning via API and console, designed to make it easier to deploy clusters across regions with secure networking and reduced manual steps.
Apache Kafka
- We’re working on adding Kafka Tiered Storage for clusters running in GCP— designed to bring affordable, scalable retention, and instant access to historical data, to ensure flexibility and performance across clouds for enterprise Kafka users.
ClickHouse
- We’re planning to extend our Managed ClickHouse to allow it to work with on-prem deployments.
PostgreSQL
- Following the success of our public preview, we’re preparing to launch PostgreSQL integrated with FSx for NetApp ONTAP (FSxN) into General Availability. This enhancement is designed to combine enterprise-grade PostgreSQL with FSxN’s scalable, cost-efficient storage, enabling customers to optimize infrastructure costs while improving performance and flexibility.
OpenSearch®
- As part of our ongoing advancements in AI for OpenSearch, we are planning to enable adding GPU nodes into OpenSearch clusters, aiming to enhance the performance and efficiency of machine learning and AI workloads.
Instaclustr Managed Platform
- Self-service Tags Management feature—allowing users to add, edit, or delete tags for their clusters directly through the Instaclustr console, APIs, or Terraform provider for RIYOA deployments.
Did you know?
- Cadence Workflow, the open source orchestration engine created by Uber, has officially joined the Cloud Native Computing Foundation (CNCF) as a Sandbox project. This milestone ensures transparent governance, community-driven innovation, and a sustainable future for one of the most trusted workflow technologies in modern microservices and agentic AI architectures. Uber donates Cadence Workflow to CNCF: The next big leap for the open source project—read the full story and discover what’s next for Cadence.
- Upgrading ClickHouse® isn’t just about new features—it’s essential for security, performance, and long-term stability. In ClickHouse upgrade: Why staying updated matters, you’ll learn why skipping upgrades can lead to technical debt, missed optimizations, and security risks. Then, explore A guide to ClickHouse® upgrades and best practices for practical strategies, including when to choose LTS releases for mission-critical workloads and when stable releases make sense for fast-moving environments.
- Our latest blog, AI Search for OpenSearch®: Unlocking next-generation search, explains how this new solution enables smarter discovery experiences using built-in ML models, vector embeddings, and advanced search techniques—all fully managed on the NetApp Instaclustr Platform. Ready to explore the future of search? Read the full article and see how AI can transform your OpenSearch deployments.
If you have any questions or need further assistance with these enhancements to the Instaclustr Managed Platform, please contact us.
SAFE HARBOR STATEMENT: Any unreleased services or features referenced in this blog are not currently available and may not be made generally available on time or at all, as may be determined in NetApp’s sole discretion. Any such referenced services or features do not represent promises to deliver, commitments, or obligations of NetApp and may not be incorporated into any contract. Customers should make their purchase decisions based upon services and features that are currently generally available.
The post Instaclustr product update: December 2025 appeared first on Instaclustr.
Consuming CDC with Java, Go… and Rust!
A quick look at how to use the ScyllaDB CDC with the Rust connector In 2021, we published a guide for using Java and Go with ScyllaDB CDC. Today, we are happy to share a new version of that post, including how to use ScyllaDB CDC with the Rust connector! Note: We will skip some of the sections in the original post, like “Why Use a Library?” and challenges in using CDC. If you are planning to use CDC in production, you should absolutely go back and read them. But if you’re just looking to get a demo up and running, this post will get you there. Getting Started with Rust scylla-cdc-rust is a library for consuming the ScyllaDB CDC Log in Rust applications. It automatically and transparently handles errors and topology changes of the underlying ScyllaDB cluster. As a result, the API allows the user to read the CDC log without having to deeply understand the internal structure of CDC. The library was written in pure Rust, using ScyllaDB Rust Driver and Tokio. Let’s see how to use the Rust library. We will build an application that prints changes happening to a table in real-time. You can see the final code here. Installing the library The scylla-cdc library is available on crates.io. Setting up the CDC consumer The most important part of using the library is to define a callback that will be executed after reading a CDC log from the database. Such a callback is defined by implementing the Consumer trait located in scylla-cdc::consumer. For now, we will define a struct with no member variables for this purpose: Since the callback will be executed asynchronously, we have to use the async-trait crate to implement the Consumer trait. We also use the anyhow crate for error handling. The library is going to create one instance of TutorialConsumer per CDC stream, so we also need to define a ConsumerFactory for them: Adding shared state to the consumer Different instances of Consumer are being used in separate Tokio tasks. Due to that, the runtime might schedule them on separate threads. In response, a struct implementing the Consumer trait should also implement the Send trait and a struct implementing the ConsumerFactory trait should implement Send and Sync traits. Luckily, Rust implements these traits by default if all member variables of a struct implement them. If the consumers need to share some state, like a reference to an object, they can be wrapped in an Arc. An example of that might be a Consumer that counts rows read by all its instances: Note: In general, keeping shared mutable state in the Consumer is not recommended. That’s because it requires synchronization (i.e. a mutex or an atomic like AtomicUsize), which reduces the speedup granted by Tokio by running the Consumer logic on multiple cores. Fortunately, keeping exclusive (not shared) mutable state in the Consumer comes with no additional overhead. Starting the application Now we’re ready to create our main function: As we can see, we have to configure a few things in order to start the log reader: We have to create a connection to the database, using the Session struct from ScyllaDB Rust Driver. Specify the keyspace and the table name. We create time bounds for our reader. This step is not compulsory – by default, the reader will start reading from now and will continue reading forever. In our case, we are going to read all logs added during the last 6 minutes. We create the factory. We can build the log reader. After creating the log reader, we can await the handle it returns so that our application will terminate as soon as the reader finishes. Now, let’s insert some rows into the table. After inserting 3 rows and running the application, you should see the output:Hello, scylla-cdc!
Hello, scylla-cdc! Hello, scylla-cdc! The application
printed one line for each CDC log consumed. To see how to use
CDCRow and save progress, see the full example below. Full Example
Follow
this detailed cdc-rust tutorial or git clone
https://github.com/scylladb/scylla-cdc-rust cd scylla-cdc-rust
cargo run --release --bin scylla-cdc-printer -- --keyspace KEYSPACE
--table TABLE --hostname HOSTNAME Where HOSTNAME is the IP
address of the cluster. Getting Started with Java and Go For a
detailed walk through of with Java and Go examples, see our
previous blog,
Consuming CDC with Java and Go. Further reading In this blog,
we have explained what problems the scylla-cdc-rust,
scylla-cdc-java, and scylla-cdc-go libraries solve and how to write
a simple application with each. If you would like to learn more,
check out the links below:
Replicator example application in the scylla-cdc-java
repository. It is an advanced application that replicates a table
from one Scylla cluster to another one using the CDC log and
scylla-cdc-java library. Example
applications in scylla-cdc-go repository. The repository
currently contains two examples: “simple-printer”, which prints
changes from a particular schema, “printer”, which is the same as
the example presented in the blog, and “replicator”, which is a
relatively complex application which replicates changes from one
cluster to another. API
reference for scylla-cdc-go. Includes slightly more
sophisticated examples which, unlike the example in this blog,
cover saving progress. CDC
documentation. Knowledge about the design of Scylla’s CDC can
be helpful in understanding the concepts in the documentation for
both the Java and Go libraries. The parts about the CDC log schema
and representation of data in the log is especially useful.
ScyllaDB users slack. We
will be happy to answer your questions about the CDC on the #cdc
channel. We hope all that talk about consuming data has managed to
whet your appetite for CDC! Happy and fruitful coding! Freezing streaming data into Apache Iceberg™—Part 1: Using Apache Kafka®Connect Iceberg Sink Connector
Introduction
Ever since the first distributed system—i.e. 2 or more computers networked together (in 1969)—there has been the problem of distributed data consistency: How can you ensure that data from one computer is available and consistent with the second (and more) computers? This problem can be uni-directional (one computer is considered the source of truth, others are just copies), or bi-directional (data must be synchronized in both directions across multiple computers).
Some approaches to this problem I’ve come across in the last 8 years include Kafka Connect (for elegantly solving the heterogeneous many-to-many integration problem by streaming data from source systems to Kafka and from Kafka to sink systems, some earlier blogs on Apache Camel Kafka Connectors and a blog series on zero-code data pipelines), MirrorMaker2 (MM2, for replicating Kafka clusters, a 2 part blog series), and Debezium (Change Data Capture/CDC, for capturing changes from databases as streams and making them available in downstream systems, e.g. for Apache Cassandra and PostgreSQL)—MM2 and Debezium are actually both built on Kafka Connect.
Recently, some “sink” systems have been taking over responsibility for streaming data from Kafka into themselves, e.g. OpenSearch pull-based ingestion (c.f. OpenSearch Sink Connector), and the ClickHouse Kafka Table Engine (c.f. ClickHouse Sink Connector). These “pull-based” approaches are potentially easier to configure and don’t require running a separate Kafka Connect cluster and sink connectors, but some downsides may be that they are not as reliable or independently scalable, and you will need to carefully monitor and scale them to ensure they perform adequately.
And then there’s “zero-copy” approaches—these rely on the well-known computer science trick of sharing a single copy of data using references (or pointers), rather than duplicating the data. This idea has been around for almost as long as computers, and is still widely applicable, as we’ll see in part 2 of the blog.
The distributed data use case we’re going to explore in this 2-part blog series is streaming Apache Kafka data into Apache Iceberg, or “Freezing streaming Apache Kafka data into an (Apache) Iceberg”! In part 1 we’ll introduce Apache Iceberg and look at the first approach for “freezing” streaming data using the Kafka Connect Iceberg Sink Connector.
What is Apache Iceberg?
Apache Iceberg is an open source specification open table format optimized for column-oriented workloads, supporting huge analytic datasets. It supports multiple different concurrent engines that can insert and query table data using SQL—and Iceberg is organized like, well, an iceberg!
The tip of the Iceberg is the Catalog. An Iceberg Catalog acts as a central metadata repository, tracking the current state of Iceberg tables, including their names, schemas, and metadata file locations. It serves as the “single source of truth” for a data Lakehouse, enabling query engines to find the correct metadata file for a table to ensure consistent and atomic read/write operations.
Just under the water, the next layer is the metadata layer. The Iceberg metadata layer tracks the structure and content of data tables in a data lake, enabling features like efficient query planning, versioning, and schema evolution. It does this by maintaining a layered structure of metadata files, manifest lists, and manifest files that store information about table schemas, partitions, and data files, allowing query engines to prune unnecessary files and perform operations atomically.
The data layer is at the bottom. The Iceberg data layer is the storage component where the actual data files are stored. It supports different storage backends, including cloud-based object storage like Amazon S3 or Google Cloud Storage, or HDFS. It uses file formats like Parquet or Avro. Its main purpose is to work in conjunction with Iceberg’s metadata layer to manage table snapshots and provide a more reliable and performant table format for data lakes, bringing data warehouse features to large datasets.

As shown in the above diagram, Iceberg supports multiple different engines, including Apache Spark and ClickHouse. Engines provide the “database” features you would expect, including:
- Data Management
- ACID Transactions
- Query Planning and Optimization
- Schema Evolution
- And more!
I’ve recently been reading an excellent book on Apache Iceberg (“Apache Iceberg: The Definitive Guide”), which explains the philosophy, architecture and design, including operation, of Iceberg. For example, it says that it’s best practice to treat data lake storage as immutable—data should only be added to a Data Lake, not deleted. So, in theory at least, writing infinite, immutable Kafka streams to Iceberg should be straightforward!
But because it’s a complex distributed system (which looks like a database from above water but is really a bunch of files below water!), there is some operational complexity. For example, it handles change and consistency by creating new snapshots for every modification, enabling time travel, isolating readers from writes, and supporting optimistic concurrency control for multiple writers. But you need to manage snapshots (e.g. expiring old snapshots). And chapter 4 (performance optimisation) explains that you may need to worry about compaction (reducing too many small files), partitioning approaches (which can impact read performance), and handling row-level updates. The first two issues may be relevant for Kafka, but probably not the last one. So, it looks like it’s good fit for the streaming Kafka use cases, but we may need to watch out for Iceberg management issues.
“Freezing” streaming data with the Kafka Iceberg Sink Connector
But Apache Iceberg is “frozen”—what’s the connection to fast-moving streaming data? You certainly don’t want to collide with an iceberg from your speedy streaming “ship”—but you may want to freeze your streaming data for long-term analytical queries in the future. How can you do that without sinking? Actually, a “sink” is the first answer: A Kafka Connect Iceberg Sink Connector is the most common way of “freezing” your streaming data in Iceberg!
Kafka Connect is the standard framework provided by Apache Kafka to move data from multiple heterogeneous source systems to multiple heterogeneous sink systems, using:
- A Kafka cluster
- A Kafka Connect cluster (running connectors)
- Kafka Connect source connectors
- Kafka topics and
- Kafka Connect Sink Connectors
That is, a highly decoupled approach. It provides real-time data movement with high scalability, reliability, error handling and simple transformations.
Here’s the Kafka Connect Iceberg Sink Connector official documentation.
It appears to be reasonably complicated to configure this sink connector; you will need to know something about Iceberg. For example, what is a “control topic”? It’s apparently used to coordinate commits for exactly-once semantics (EOS).
The connector supports fan-out (writing to multiple Iceberg tables from one topic), fan-in (writing to one Iceberg table from multiple topics), static and dynamic routing, and filtering.
In common with many technologies that you may want to use as Kafka Connect sinks, they may not all have good support for Kafka metadata. The KafkaMetadata Transform (which injects topic, partition, offset and timestamp properties) is only experimental at present.
How are Iceberg tables created with the correct metadata? If you have JSON record values, then schemas are inferred by default (but may not be correct or optimal). Alternatively, explicit schemas can be included in-line or referenced from a Kafka Schema Registry (e.g. Karapace), and, as an added bonus, schema evolution is supported. Also note that Iceberg tables may have to be manually created prior to use if your Catalog doesn’t support table auto-creation.
From what I understood about Iceberg, to use it (e.g. for writes), you need support from an engine (e.g. to add raw data to the Iceberg warehouse, create the metadata files, and update the catalog). How does this work for Kafka Connect? From this blog I discovered that the Kafka Connect Iceberg Sink connector is functioning as an Iceberg engine for writes, so there really is an engine, but it’s built into the connector.
As is the case with all Kafka Connect Sink Connectors, records are available immediately they are written to Kafka topics by Kafka producers and Kafka Connect Source Connectors, i.e. records in active segments can be copied immediately to sink systems. But is the Iceberg Sink Connector real-time? Not really! The default time to write to Iceberg is every 5 minutes (iceberg.control.commit.interval-ms) to prevent multiplication of small files—something that Iceberg(s) doesn’t/don’t like (“melting”?). In practice, it’s because every data file must be tracked in the metadata layer, which impacts performance in many ways—proliferation of small files is typically addressed by optimization and compaction (e.g. Apache Spark supports Iceberg management, including these operations).
So, unlike most Kafka Connect sink connectors, which write as quickly as possible, there will be lag before records appear in Iceberg tables (“time to freeze” perhaps)!
The systems are separate (Kafka and Iceberg are independent), records are copied to Iceberg, and that’s it! This is a clean separation of concerns and ownership. Kafka owns the source data (with Kafka controlling data lifecycles, including record expiry), Kafka Connect Iceberg Sink Connector performs the reading from Kafka and writing to Iceberg, and is independently scalable to Kafka. Kafka doesn’t handle any of the Iceberg management. Once the data has landed in Iceberg, Kafka has no further visibility or interest in it. And the pipeline is purely one way, write only – reads or deletes are not supported.
Here’s a summary of this approach to freezing streams:
- Kafka Connect Iceberg Sink Connector shares all the benefits of the Kafka Connect framework, including scalability, reliability, error handling, routing, and transformations.
- At least, JSON values are required, ideally full schemas and referenced in Karapace—but not all schemas are guaranteed to work.
- Kafka Connect doesn’t “manage” Iceberg (e.g. automatically aggregate small files, remove snapshots, etc.)
- You may have to tune the commit interval – 5 minutes is the default.
- But it does have a built-in engine that supports writing to Iceberg.
- You may need to use an external tool (e.g. Apache Spark) for Iceberg management procedures.
- It’s write-only to Iceberg. Reads or deletes are not supported

But what’s the best thing about the Kafka Connect Iceberg Sink Connector? It’s available now (as part of the Apache Iceberg build) and works on the NetApp Instaclustr Kafka Connect platform as a “bring your own connector” (instructions here).
In part 2, we’ll look at Kafka Tiered Storage and Iceberg Topics!
The post Freezing streaming data into Apache Iceberg™—Part 1: Using Apache Kafka®Connect Iceberg Sink Connector appeared first on Instaclustr.
Netflix Live Origin
Xiaomei Liu, Joseph Lynch, Chris Newton
Introduction
Behind the Streams: Building a Reliable Cloud Live Streaming Pipeline for Netflix introduced the architecture of the streaming pipeline. This blog post looks at the custom Origin Server we built for Live — the Netflix Live Origin. It sits at the demarcation point between the cloud live streaming pipelines on its upstream side and the distribution system, Open Connect, Netflix’s in-house Content Delivery Network (CDN), on its downstream side, and acts as a broker managing what content makes it out to Open Connect and ultimately to the client devices.
Netflix Live Origin is a multi-tenant microservice operating on EC2 instances within the AWS cloud. We lean on standard HTTP protocol features to communicate with the Live Origin. The Packager pushes segments to it using PUT requests, which place a file into storage at the particular location named in the URL. The storage location corresponds to the URL that is used when the Open Connect side issues the corresponding GET request.
Live Origin architecture is influenced by key technical decisions of the live streaming architecture. First, resilience is achieved through redundant regional live streaming pipelines, with failover orchestrated at the server-side to reduce client complexity. The implementation of epoch locking at the cloud encoder enables the origin to select a segment from either encoding pipeline. Second, Netflix adopted a manifest design with segment templates and constant segment duration to avoid frequent manifest refresh. The constant duration templates enable Origin to predict the segment publishing schedule.
Multi-pipeline and multi-region aware origin
Live streams inevitably contain defects due to the non-deterministic nature of live contribution feeds and strict real-time segment publishing timelines. Common defects include:
- Short segments: Missing video frames and audio samples.
- Missing segments: Entire segments are absent.
- Segment timing discontinuity: Issues with the Track Fragment Decode Time.
Communicating segment discontinuity from the server to the client via a segment template-based manifest is impractical, and these defective segments can disrupt client streaming.
The redundant cloud streaming pipelines operate independently, encompassing distinct cloud regions, contribution feeds, encoder, and packager deployments. This independence substantially mitigates the probability of simultaneous defective segments across the dual pipelines. Owing to its strategic placement within the distribution path, the live origin naturally emerges as a component capable of intelligent candidate selection.
The Netflix Live Origin features multi-pipeline and multi-region awareness. When a segment is requested, the live origin checks candidates from each pipeline in a deterministic order, selecting the first valid one. Segment defects are detected via lightweight media inspection at the packager. This defect information is provided as metadata when the segment is published to the live origin. In the rare case of concurrent defects at the dual pipeline, the segment defects can be communicated downstream for intelligent client-side error concealment.
Open Connect streaming optimization
When the Live project started, Open Connect had become highly optimised for VOD content delivery — nginx had been chosen many years ago as the Web Server since it is highly capable in this role, and a number of enhancements had been added to it and to the underlying operating system (BSD). Unlike traditional CDNs, Open Connect is more of a distributed origin server — VOD assets are pre-positioned onto carefully selected server machines (OCAs, or Open Connect Appliances) rather than being filled on demand.
Alongside the VOD delivery, an on-demand fill system has been used for non-VOD assets — this includes artwork and the downloadable portions of the clients, etc. These are also served out of the same nginx workers, albeit under a distinct server block, using a distinct set of hostnames.
Live didn’t fit neatly into this ‘small object delivery’ model, so we extended the proxy-caching functionality of nginx to address Live-specific needs. We will touch on some of these here related to optimized interactions with the Origin Server. Look for a future blog post that will go into more details on the Open Connect side.
The segment templates provided to clients are also provided to the OCAs as part of the Live Event Configuration data. Using the Availability Start Time and Initial Segment number, the OCA is able to determine the legitimate range of segments for each event at any point in time — requests for objects outside this range can be rejected, preventing unnecessary requests going up through the fill hierarchy to the origin. If a request makes it through to the origin, and the segment isn’t available yet, the origin server will return a 404 Status Code (indicating File Not Found) with the expiration policy of that error so that it can be cached within Open Connect until just before that segment is expected to be published.
If the Live Origin knows when segments are being pushed to it, and knows what the live edge is — when a request is received for the immediately next object, rather than handing back another 404 error (which would go all the way back through Open Connect to the client), the Live Origin can ‘hold open’ the request, and service it once the segment has been published to it. By doing this, the degree of chatter within the network handling requests that arrive early has been significantly reduced. As part of this, millisecond grain caching was added to nginx to enhance the standard HTTP Cache Control, which only works at second granularity, a long time when segments are generated every 2 seconds.
Streaming metadata enhancement
The HTTP standard allows for the addition of request and response headers that can be used to provide additional information as files move between clients and servers. The HTTP headers provide notifications of events within the stream in a highly scalable way that is independently conveyed to client devices, regardless of their playback position within the stream.
These notifications are provided to the origin by the live streaming pipeline and are inserted by the origin in the form of headers, appearing on the segments generated at that point in time (and persist to future segments — they are cumulative). Whenever a segment is received at an OCA, this notification information is extracted from the response headers and used to update an in-memory data structure, keyed by event ID; and whenever a segment is served from the OCA, the latest such notification data is attached to the response. This means that, given any flow of segments into an OCA, it will always have the most recent notification data, even if all clients requesting it are behind the live edge. In fact, the notification information can be conveyed on any response, not just those supplying new segments.
Cache invalidation and origin mask
An invalidation system has been available since the early days of the project. It can be used to “flush” all content associated with an event by altering the key used when looking up objects in cache — this is done by incorporating a version number into the cache key that can then be bumped on demand. This is used during pre-event testing so that the network can be returned to a pristine state for the test with minimal fuss.
Each segment published by the Live Origin conveys the encoding pipeline it was generated by, as well as the region it was requested from. Any issues that are found after segments make their way into the network can be remedied by an enhanced invalidation system that takes such variants into account. It is possible to invalidate (that is, cause to be considered expired) segments in a range of segment numbers, but only if they were sourced from encoder A, or from Encoder A, but only if retrieved from region X.
In combination with Open Connect’s enhanced cache invalidation, the Netflix Live Origin allows selective encoding pipeline masking to exclude a range of segments from a particular pipeline when serving segments to Open Connect. The enhanced cache invalidation and origin masking enable live streaming operations to hide known problematic segments (e.g., segments causing client playback errors) from streaming clients once the bad segments are detected, protecting millions of streaming clients during the DVR playback window.
Origin storage architecture
Our original storage architecture for the Live Origin was simple: just use AWS S3 like we do for SVOD. This served us well initially for our low-traffic events, but as we scaled up we discovered that Live streaming has unique latency and workload requirements that differ significantly from on-demand where we have significant time ahead-of-time to pre-position content. While S3 met its stated uptime guarantees, our strict 2-second retry budget inherent to Live events (where every write is critical) led us to explore optimizations specifically tailored for real-time delivery at scale. AWS S3 is an amazing object store, but our Live streaming requirements were closer to those of a global low-latency highly-available database. So, we went back to the drawing board and started from the requirements. The Origin required:
- [HA Writes] Extremely high write availability, ideally as close to full write availability within a single AWS region, with low second replication delay to other regions. Any failed write operation within 500ms is considered a bug that must be triaged and prevented from re-occurring.
- [Throughput] High write throughput, with hundreds of MiB replicating across regions
- [Large Partitions] Efficiently support O(MiB) writes that accumulate to O(10k) keys per partition with O(GiB) total size per event.
- [Strong Consistency] Within the same region, we needed read-your-write semantics to hit our <1s read delay requirements (must be able to read published segments)
- [Origin Storm] During worst-case load involving Open Connect edge cases, we may need to handle O(GiB) of read throughput without affecting writes.
Fortunately, Netflix had previously invested in building a KeyValue Storage Abstraction that cleverly leveraged Apache Cassandra to provide chunked storage of MiB or even GiB values. This abstraction was initially built to support cloud saves of Game state. The Live use case would push the boundaries of this solution, however, in terms of availability for writes (#1), cumulative partition size (#3), and read throughput during Origin Storm (#5).
High Availability for Writes of Large Payloads
The KeyValue Payload Chunking and Compression Algorithm breaks O(MiB) work down so each part can be idempotently retried and hedged to maintain strict latency service level objectives, as well as spreading the data across the full cluster. When we combine this algorithm with Apache Cassandra’s local-quorum consistency model, which allows write availability even with an entire Availability Zone outage, plus a write-optimized Log-Structured Merge Tree (LSM) storage engine, we could meet the first four requirements. After iterating on the performance and availability of this solution, we were not only able to achieve the write availability required, but did so with a P99 tail latency that was similar to the status quo’s P50 average latency while also handling cross-region replication behind the scenes for the Origin. This new solution was significantly more expensive (as expected, databases backed by SSD cost more), but minimizing cost was not a key objective and low latency with high availability was:
High Availability Reads at Gbps Throughputs
Now that we solved the write reliability problem, we had to handle the Origin Storm failure case, where potentially dozens of Open Connect top-tier caches could be requesting multiple O(MiB) video segments at once. Our back-of-the-envelope calculations showed worst-case read throughput in the O(100Gbps) range, which would normally be extremely expensive for a strongly-consistent storage engine like Apache Cassandra. With careful tuning of chunk access, we were able to respond to reads at network line rate (100Gbps) from Apache Cassandra, but we observed unacceptable performance and availability degradation on concurrent writes. To resolve this issue, we introduced write-through caching of chunks using our distributed caching system EVCache, which is based on Memcached. This allows almost all reads to be served from a highly scalable cache, allowing us to easily hit 200Gbps and beyond without affecting the write path, achieving read-write separation.
Final Storage Architecture
In the final storage architecture, the Live Origin writes and reads to KeyValue, which manages a write-through cache to EVCache (memcached) and implements a safe chunking protocol that spreads large values and partitions them out across the storage cluster (Apache Cassandra). This allows almost all read load to be handled from cache, with only misses hitting the storage. This combination of cache and highly available storage has met the demanding needs of our Live Origin for over a year now.
Delivering this consistent low latency for large writes with cross-region replication and consistent write-through caching to a distributed cache required solving numerous hard problems with novel techniques, which we plan to share in detail during a future post.
Scalability and scalable architecture
Netflix’s live streaming platform must handle a high volume of diverse stream renditions for each live event. This complexity stems from supporting various video encoding formats (each with multiple encoder ladders), numerous audio options (across languages, formats, and bitrates), and different content versions (e.g., with or without advertisements). The combination of these elements, alongside concurrent event support, leads to a significant number of unique stream renditions per live event. This, in turn, necessitates a high Requests Per Second (RPS) capacity from the multi-tenant live origin service to ensure publishing-side scalability.
In addition, Netflix’s global reach presents distinct challenges to the live origin on the retrieval side. During the Tyson vs. Paul fight event in 2024, a historic peak of 65 million concurrent streams was observed. Consequently, a scalable architecture for live origin is essential for the success of large-scale live streaming.
Scaling architecture
We chose to build a highly scalable origin instead of relying on the traditional origin shields approach for better end-to-end cache consistency control and simpler system architecture. The live origin in this architecture directly connects with top-tier Open Connect nodes, which are geographically distributed across several sites. To minimize the load on the origin, only designated nodes per stream rendition at each site are permitted to directly fill from the origin.
While the origin service can autoscale horizontally using EC2 instances, there are other system resources that are not autoscalable, such as storage platform capacity and AWS to Open Connect backbone bandwidth capacity. Since in live streaming, not all requests to the live origin are of the same importance, the origin is designed to prioritize more critical requests over less critical requests when system resources are limited. The table below outlines the request categories, their identification, and protection methods.

Publishing isolation
Publishing traffic, unlike potentially surging CDN retrieval traffic, is predictable, making path isolation a highly effective solution. As shown in the scalability architecture diagram, the origin utilizes separate EC2 publishing and CDN stacks to protect the latency and failure-sensitive origin writes. In addition, the storage abstraction layer features distinct clusters for key-value (KV) read and KV write operations. Finally, the storage layer itself separates read (EVCache) and write (Cassandra) paths. This comprehensive path isolation facilitates independent cloud scaling of publishing and retrieval, and also prevents CDN-facing traffic surges from impacting the performance and reliability of origin publishing.
Priority rate limiting
Given Netflix’s scale, managing incoming requests during a traffic storm is challenging, especially considering non-autoscalable system resources. The Netflix Live Origin implemented priority-based rate limiting when the underlying system is under stress. This approach ensures that requests with greater user impact are prioritized to succeed, while requests with lower user impact are allowed to fail during times of stress in order to protect the streaming infrastructure and are permitted to retry later to succeed.
Leveraging Netflix’s microservice platform priority rate limiting feature, the origin prioritizes live edge traffic over DVR traffic during periods of high load on the storage platform. The live edge vs. DVR traffic detection is based on the predictable segment template. The template is further cached in memory on the origin node to enable priority rate limiting without access to the datastore, which is valuable especially during periods of high datastore stress.
To mitigate traffic surges, TTL cache control is used alongside priority rate limiting. When the low-priority traffic is impacted, the origin instructs Open Connect to slow down and cache identical requests for 5 seconds by setting a max-age = 5s and returns an HTTP 503 error code. This strategy effectively dampens traffic surges by preventing repeated requests to the origin within that 5-second window.
The following diagrams illustrate origin priority rate limiting with simulated traffic. The nliveorigin_mp41 traffic is the low-priority traffic and is mixed with other high-priority traffic. In the first row: the 1st diagram shows the request RPS, the 2nd diagram shows the percentage of request failure. In the second row, the 1st diagram shows datastore resource utilization, and the 2nd diagram shows the origin retrieval P99 latency. The results clearly show that only the low-priority traffic (nliveorigin_mp41) is impacted at datastore high utilization, and the origin request latency is under control.
404 storm and cache optimization
Publishing isolation and priority rate limiting successfully protect the live origin from DVR traffic storms. However, the traffic storm generated by requests for non-existent segments presents further challenges and opportunities for optimization.
The live origin structures metadata hierarchically as event > stream rendition > segment, and the segment publishing template is maintained at the stream rendition level. This hierarchical organization allows the origin to preemptively reject requests with an HTTP 404(not found)/410(Gone) error, leveraging highly cacheable event and stream rendition level metadata, avoiding unnecessary queries to the segment level metadata:
- If the event is unknown, reject the request with 404
- If the event is known, but the segment request timing does not match the expected publishing timing, reject the request with 404 and cache control TTL matching the expected publishing time
- If the event is known, the requested segment is never generated or misses the retry deadline, reject the request with a 410 error, preventing the client from repeatedly requesting
At the storage layer, metadata is stored separately from media data in the control plane datastore. Unlike the media datastore, the control plane datastore does not use a distributed cache to avoid cache inconsistency. Event and rendition level metadata benefits from a high cache hit ratio when in-memory caching is utilized at the live origin instance. During traffic storms involving non-existent segments, the cache hit ratio for control plane access easily exceeds 90%.
The use of in-memory caching for metadata effectively handles 404 storms at the live origin without causing datastore stress. This metadata caching complements the storage system’s distributed media cache, providing a complete solution for traffic surge protection.
Summary
The Netflix Live Origin, built upon an optimized storage platform, is specifically designed for live streaming. It incorporates advanced media and segment publishing scheduling awareness and leverages enhanced intelligence to improve streaming quality, optimize scalability, and improve Open Connect live streaming operations.
Acknowledgement
Many teams and stunning colleagues contributed to the Netflix live origin. Special thanks to Flavio Ribeiro for advocacy and sponsorship of the live origin project; to Raj Ummadisetty, Prudhviraj Karumanchi for the storage platform; to Rosanna Lee, Hunter Ford, and Thiago Pontes for storage lifecycle management; to Ameya Vasani for e2e test framework; Thomas Symborski for orchestrator integration; to James Schek for Open Connect integration; to Kevin Wang for platform priority rate limit; to Di Li, Nathan Hubbard for origin scalability testing.
Netflix Live Origin was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.
re:Invent Recap
It’s been a while since I last attended re:Invent… long enough that I’d almost forgotten how expensive a bottle of water can be in a Vegas hotel room. This time was different. Instead of just attending, I wore many hats: audio-visual tech, salesperson, technical support, friendly ear, and booth rep. re:Invent is an experience that’s hard to explain to anyone outside tech. Picture 65,000 people converging in Las Vegas: DJ booths thumping beside deep-dive technical sessions, competitions running nonstop, and enough swag to fill a million Christmas stockings. Only then do you start to grasp what it’s really like. Needless to say, having the privilege to fly halfway across the globe, stay in a bougie hotel, and help host the impressive ScyllaDB booth was a fitting way to finish the year on a high. This year was ScyllaDB’s biggest re:Invent presence yet… a full-scale booth designed to show what predictable performance at extreme scale really looks like. The booth was buzzing from open to close, packed with data engineers, developers, and decision-makers exploring how ScyllaDB handles millions of operations per second with single-digit P99 millisecond latency. Some of the standout moments for me included: Customer sessions at the Content Hub featuring Freshworks and SAS, both showcasing how ScyllaDB powers their mission-critical AI workloads. In-booth talks from Freshworks, SAS, Sprig, and Revinate. Real users sharing real production stories. There’s nothing better than hearing how our customers are conquering performance challenges at scale. Technical deep dives exploring everything from linear scalability to real-time AI pipelines. ScyllaDB X Cloud linear-scale demonstration, a live visualization of throughput scaling predictably with every additional node. Watching tablets rebalance automatically linearly never gets old. High-impact in-booth videos driving home ScyllaDB’s key differentiators. I’m particularly proud of our DB Guy parody along with the ScyllaDB monster on the Vegas Sphere (yes, we fooled many with that one) For many visitors, this was their first time seeing ScyllaDB X Cloud and Vector Search in action. Our demos made it clear what we mean by performance at scale: serving billions of vectors or millions of events per second, all while keeping tail latency comfortably under 5 ms and cost behavior entirely predictable. Developers that I chatted to loved that ScyllaDB drops neatly into existing Cassandra or DynamoDB environments while delivering much better performance and a lower TCO. Architects zeroed in on our flexibility across EC2 instance families (especially i8g) and hybrid deployment models. The ability to bring your own AWS (or GCP) account sparked plenty of conversations around performance, security, and data sovereignty. What really stood out this year was the shift in mindset. re:Invent 2025 confirmed that the future of extreme scale database engineering belongs to real-time systems … from AI inference to IoT telemetry, where low latency and linear scale are essential for success. ScyllaDB sits right at that intersection: a database built to scale fearlessly, delivering the control of bare metal with the simplicity of managed cloud. If you missed us in Vegas, don’t worry … you can still catch the highlights. Watch our customer sessions and the full X Cloud demo, and see why predictable performance at extreme scale isn’t just our tagline. It’s what we do every day. Catch the re:Invent videosStay ahead with Apache Cassandra®: 2025 CEP highlights
Apache Cassandra® committers are working hard, building new features to help you more seamlessly ease operational challenges of a distributed database. Let’s dive into some recently approved CEPs and explain how these upcoming features will improve your workflow and efficiency.
What is a CEP?
CEP stands for Cassandra Enhancement Proposal. They are the process for outlining, discussing, and gathering endorsements for a new feature in Cassandra. They’re more than a feature request; those who put forth a CEP have intent to build the feature, and the proposal encourages a high amount of collaboration with the Cassandra contributors.
The CEPs discussed here were recently approved for implementation or have had significant progress in their implementation. As with all open-source development, inclusion in a future release is contingent upon successful implementation, community consensus, testing, and approval by project committers.
CEP-42: Constraints framework
With collaboration from NetApp Instaclustr, CEP-42, and subsequent iterations, delivers schema level constraints giving Cassandra users and operators more control over their data. Adding constraints on the schema level means that data can be validated at write time and send the appropriate error when data is invalid.
Constraints are defined in-line or as a separate definition. The inline style allows for only one constraint while a definition allows users to define multiple constraints with different expressions.
The scope of this CEP-42 initially supported a few constraints that covered the majority of cases, but in follow up efforts the expanded list of support includes scalar (>, <, >=, <=), LENGTH(), OCTET_LENGTH(), NOT NULL, JSON(), REGEX(). A user is also able to define their own constraints if they implement it and put them on Cassandra’s class path.
A simple example of an in-line constraint:
CREATE TABLE users (
username text PRIMARY KEY,
age int CHECK age >= 0 and age < 120
);
Constraints are not supported for UDTs (User-Defined Types) nor collections (except for using NOT NULL for frozen collections).
Enabling constraints closer to the data is a subtle but mighty way for operators to ensure that data goes into the database correctly. By defining rules just once, application code is simplified, more robust, and prevents validation from being bypassed. Those who have worked with MySQL, Postgres, or MongoDB will enjoy this addition to Cassandra.
CEP-51: Support “Include” Semantics for cassandra.yaml
The cassandra.yaml file holds important settings for storage,
memory, replication, compaction, and more. It’s no surprise that
the average size of the file around 1,000 lines (though, yes—most
are comments). CEP-51 enables splitting the cassandra.yaml
configuration into multiple files using includes
semantics. From the outside, this feels like a small change, but
the implications are huge if a user chooses to opt-in.
In general, the size of the configuration file makes it difficult to manage and coordinate changes. It’s often the case that multiple teams manage various aspects of the single file. In addition, cassandra.yaml permissions are readable for those with access to this file, meaning private information like credentials are comingled with all other settings. There is risk from an operational and security standpoint.
Enabling the new semantics and therefore modularity for the configuration file eases management, deployment, complexity around environment-specific settings, and security in one shot. The configuration file follows the principle of least privilege once the cassandra.yaml is broken up into smaller, well-defined files; sensitive configuration settings are separated out from general settings with fine-grained access for the individual files. With the feature enabled, different development teams are better equipped to deploy safely and independently.
If you’ve deployed your Cassandra cluster on the NetApp Instaclustr platform, the cassandra.yaml file is already configured and managed for you. We pride ourselves on making it easy for you to get up and running fast.
CEP-52: Schema annotations for Apache Cassandra
With extensive review by the NetApp Instaclustr team and Stefan Miklosovic, CEP-52 introduces schema annotations in CQL allowing in-line comments and labels of schema elements such as keyspaces, tables, columns, and User Defined Types (UDT). Users can easily define and alter comments and labels on these elements. They can be copied over when desired using CREATE TABLE LIKE syntax. Comments are stored as plain text while labels are stored as structured metadata.
Comments and labels serve different annotation purposes: Comments document what a schema object is for, whereas labels describe how sensitive or controlled it is meant to be. For example, labels can be used to identify columns as “PII” or “confidential”, while the comment on that column explains usage, e.g. “Last login timestamp.”
Users can query these annotations. CEP-52 defines two new read-only tables (system_views.schema_comments and system_views.schema_security_labels) to store comments and security labels so objects with comments can be returned as a list or a user/machine process can query for specific labels, beneficial for auditing and classification. Note that adding security labels are descriptive metadata and do not enforce access control to the data.
CEP-53: Cassandra rolling restarts via Sidecar
Sidecar is an auxiliary component in the Cassandra ecosystem that exposes cluster management and streaming capabilities through APIs. Introducing rolling restarts through Sidecar, this feature is designed to provide operators with more efficient and safer restarts without cluster-wide downtime. More specifically, operators can monitor, pause, resume, and abort restarts all through an API with configurable options if restarts fail.
Rolling restarts brings operators a step closer to cluster-wide operations and lifecycle management via Sidecar. Operators will be able to configure the number of nodes to restart concurrently with minimal risk as this CEP unleashes clear states as a node progresses through a restart. Accounting for a variety of edge cases, an operator can feel assured that, for example, a non-functioning sidecar won’t derail operations.
The current process for restarting a node is a multi-step, manual process, which does not scale for large cluster sizes (and is also tedious for small clusters). Restarting clusters previously lacked a streamlined approach as each node needed to be restarted one at a time, making the process time-intensive and error-prone.
Though Sidecar is still considered WIP, it’s got big plans to improve operating large clusters!
The NetApp Instaclustr Platform, in conjunction with our expert TechOps team already orchestrates these laborious tasks for our Cassandra customers with a high level of care to ensure their cluster stays online. Restarting or upgrading your Cassandra nodes is a huge pain-point for operators, but it doesn’t have to be when using our managed platform (with round-the-clock support!)
CEP-54: Zstd with dictionary SSTable compression
CEP-54, with NetApp Instaclustr’s collaboration, aims to add support Zstd with dictionary compression for SSTables. Zstd, or Zstandard, is a fast, lossless data compression algorithm that boasts impressive ratio and speed and has been supported in Cassandra since 4.0. Certain workloads can benefit from significantly faster read/write performance, reduced storage footprint, and increased storage device lifetime when using dictionary compression.
At a high level, operators choose a table they want to compress with a dictionary. A dictionary must be trained first on a small amount of already present data (recommended no more than 10MiB). The result of a training is a dictionary, which is stored cluster-wide for all other nodes to use, and this dictionary is used for all subsequent writes of SSTables to a disk.
Workloads with structured data of similar rows benefit most from Zstd with dictionary compression. Some examples of ideal workloads include event logs, telemetry data, metadata tables with templated messages. Think: repeated row data. If the table data is too unstructured or random, this feature likely won’t be optimal for dictionary compression, however plain Zstd will still be an excellent option.
New SSTables with dictionaries are readable across nodes and can stream, repair, and backup. Existing tables are unaffected if dictionary compression is not enabled. Too many unique dictionaries hurt decompression; use minimal dictionaries (recommended dictionary size is about 100KiB and one dictionary per table) and only adopt new ones when they’re noticeably better.
CEP-55: Generated role names
CEP-55 adds support to create users/roles without supplying a name, simplifying
user management, especially when generating users and roles in bulk. With an example syntax, CREATE GENERATED ROLE WITH GENERATED PASSWORD, new keys are placed in a newly introduced configuration section in cassandra.yaml under “role_name_policy.”
Stefan Miklosovic, our Cassandra engineer at NetApp Instaclustr, created this CEP as a logical follow up to CEP-24 (password validation/generation), which he authored as well. These quality-of-life improvements let operators spend less time doing trivial tasks with high-risk potential and more time on truly complex matters.
Manual name selection seems trivial until a hundred role names need to be generated; now there is a security risk if the new usernames—or worse passwords—are easily guessable. With CEP-55, the generated role name will be UUID-like, with optional prefix/suffix and size hints, however a pluggable policy is available to generate and validate names as well. This is an opt-in feature with no effect to the existing method of generating role names.
The future of Apache Cassandra is bright
These Cassandra Enhancement Proposals demonstrate a strong commitment to making Apache Cassandra more powerful, secure, and easier to operate. By staying on top of these updates, we ensure our managed platform seamlessly supports future releases that accelerate your business needs.
At NetApp Instaclustr, our expert TechOps team already orchestrates laborious tasks like restarts and upgrades for our Apache Cassandra customers, ensuring their clusters stay online. Our platform handles the complexity so you can get up and running fast.
Learn more about our fully managed and hosted Apache Cassandra offering and try it for free today!
The post Stay ahead with Apache Cassandra®: 2025 CEP highlights appeared first on Instaclustr.