re:Invent Recap
It’s been a while since I last attended re:Invent… long enough that I’d almost forgotten how expensive a bottle of water can be in a Vegas hotel room. This time was different. Instead of just attending, I wore many hats: audio-visual tech, salesperson, technical support, friendly ear, and booth rep. re:Invent is an experience that’s hard to explain to anyone outside tech. Picture 65,000 people converging in Las Vegas: DJ booths thumping beside deep-dive technical sessions, competitions running nonstop, and enough swag to fill a million Christmas stockings. Only then do you start to grasp what it’s really like. Needless to say, having the privilege to fly halfway across the globe, stay in a bougie hotel, and help host the impressive ScyllaDB booth was a fitting way to finish the year on a high. This year was ScyllaDB’s biggest re:Invent presence yet… a full-scale booth designed to show what predictable performance at extreme scale really looks like. The booth was buzzing from open to close, packed with data engineers, developers, and decision-makers exploring how ScyllaDB handles millions of operations per second with single-digit P99 millisecond latency. Some of the standout moments for me included: Customer sessions at the Content Hub featuring Freshworks and SAS, both showcasing how ScyllaDB powers their mission-critical AI workloads. In-booth talks from Freshworks, SAS, Sprig, and Revinate. Real users sharing real production stories. There’s nothing better than hearing how our customers are conquering performance challenges at scale. Technical deep dives exploring everything from linear scalability to real-time AI pipelines. ScyllaDB X Cloud linear-scale demonstration, a live visualization of throughput scaling predictably with every additional node. Watching tablets rebalance automatically linearly never gets old. High-impact in-booth videos driving home ScyllaDB’s key differentiators. I’m particularly proud of our DB Guy parody along with the ScyllaDB monster on the Vegas Sphere (yes, we fooled many with that one) For many visitors, this was their first time seeing ScyllaDB X Cloud and Vector Search in action. Our demos made it clear what we mean by performance at scale: serving billions of vectors or millions of events per second, all while keeping tail latency comfortably under 5 ms and cost behavior entirely predictable. Developers that I chatted to loved that ScyllaDB drops neatly into existing Cassandra or DynamoDB environments while delivering much better performance and a lower TCO. Architects zeroed in on our flexibility across EC2 instance families (especially i8g) and hybrid deployment models. The ability to bring your own AWS (or GCP) account sparked plenty of conversations around performance, security, and data sovereignty. What really stood out this year was the shift in mindset. re:Invent 2025 confirmed that the future of extreme scale database engineering belongs to real-time systems … from AI inference to IoT telemetry, where low latency and linear scale are essential for success. ScyllaDB sits right at that intersection: a database built to scale fearlessly, delivering the control of bare metal with the simplicity of managed cloud. If you missed us in Vegas, don’t worry … you can still catch the highlights. Watch our customer sessions and the full X Cloud demo, and see why predictable performance at extreme scale isn’t just our tagline. It’s what we do every day. Catch the re:Invent videosStay ahead with Apache Cassandra®: 2025 CEP highlights
Apache Cassandra® committers are working hard, building new features to help you more seamlessly ease operational challenges of a distributed database. Let’s dive into some recently approved CEPs and explain how these upcoming features will improve your workflow and efficiency.
What is a CEP?
CEP stands for Cassandra Enhancement Proposal. They are the process for outlining, discussing, and gathering endorsements for a new feature in Cassandra. They’re more than a feature request; those who put forth a CEP have intent to build the feature, and the proposal encourages a high amount of collaboration with the Cassandra contributors.
The CEPs discussed here were recently approved for implementation or have had significant progress in their implementation. As with all open-source development, inclusion in a future release is contingent upon successful implementation, community consensus, testing, and approval by project committers.
CEP-42: Constraints framework
With collaboration from NetApp Instaclustr, CEP-42, and subsequent iterations, delivers schema level constraints giving Cassandra users and operators more control over their data. Adding constraints on the schema level means that data can be validated at write time and send the appropriate error when data is invalid.
Constraints are defined in-line or as a separate definition. The inline style allows for only one constraint while a definition allows users to define multiple constraints with different expressions.
The scope of this CEP-42 initially supported a few constraints that covered the majority of cases, but in follow up efforts the expanded list of support includes scalar (>, <, >=, <=), LENGTH(), OCTET_LENGTH(), NOT NULL, JSON(), REGEX(). A user is also able to define their own constraints if they implement it and put them on Cassandra’s class path.
A simple example of an in-line constraint:
CREATE TABLE users (
username text PRIMARY KEY,
age int CHECK age >= 0 and age < 120
);
Constraints are not supported for UDTs (User-Defined Types) nor collections (except for using NOT NULL for frozen collections).
Enabling constraints closer to the data is a subtle but mighty way for operators to ensure that data goes into the database correctly. By defining rules just once, application code is simplified, more robust, and prevents validation from being bypassed. Those who have worked with MySQL, Postgres, or MongoDB will enjoy this addition to Cassandra.
CEP-51: Support “Include” Semantics for cassandra.yaml
The cassandra.yaml file holds important settings for storage,
memory, replication, compaction, and more. It’s no surprise that
the average size of the file around 1,000 lines (though, yes—most
are comments). CEP-51 enables splitting the cassandra.yaml
configuration into multiple files using includes
semantics. From the outside, this feels like a small change, but
the implications are huge if a user chooses to opt-in.
In general, the size of the configuration file makes it difficult to manage and coordinate changes. It’s often the case that multiple teams manage various aspects of the single file. In addition, cassandra.yaml permissions are readable for those with access to this file, meaning private information like credentials are comingled with all other settings. There is risk from an operational and security standpoint.
Enabling the new semantics and therefore modularity for the configuration file eases management, deployment, complexity around environment-specific settings, and security in one shot. The configuration file follows the principle of least privilege once the cassandra.yaml is broken up into smaller, well-defined files; sensitive configuration settings are separated out from general settings with fine-grained access for the individual files. With the feature enabled, different development teams are better equipped to deploy safely and independently.
If you’ve deployed your Cassandra cluster on the NetApp Instaclustr platform, the cassandra.yaml file is already configured and managed for you. We pride ourselves on making it easy for you to get up and running fast.
CEP-52: Schema annotations for Apache Cassandra
With extensive review by the NetApp Instaclustr team and Stefan Miklosovic, CEP-52 introduces schema annotations in CQL allowing in-line comments and labels of schema elements such as keyspaces, tables, columns, and User Defined Types (UDT). Users can easily define and alter comments and labels on these elements. They can be copied over when desired using CREATE TABLE LIKE syntax. Comments are stored as plain text while labels are stored as structured metadata.
Comments and labels serve different annotation purposes: Comments document what a schema object is for, whereas labels describe how sensitive or controlled it is meant to be. For example, labels can be used to identify columns as “PII” or “confidential”, while the comment on that column explains usage, e.g. “Last login timestamp.”
Users can query these annotations. CEP-52 defines two new read-only tables (system_views.schema_comments and system_views.schema_security_labels) to store comments and security labels so objects with comments can be returned as a list or a user/machine process can query for specific labels, beneficial for auditing and classification. Note that adding security labels are descriptive metadata and do not enforce access control to the data.
CEP-53: Cassandra rolling restarts via Sidecar
Sidecar is an auxiliary component in the Cassandra ecosystem that exposes cluster management and streaming capabilities through APIs. Introducing rolling restarts through Sidecar, this feature is designed to provide operators with more efficient and safer restarts without cluster-wide downtime. More specifically, operators can monitor, pause, resume, and abort restarts all through an API with configurable options if restarts fail.
Rolling restarts brings operators a step closer to cluster-wide operations and lifecycle management via Sidecar. Operators will be able to configure the number of nodes to restart concurrently with minimal risk as this CEP unleashes clear states as a node progresses through a restart. Accounting for a variety of edge cases, an operator can feel assured that, for example, a non-functioning sidecar won’t derail operations.
The current process for restarting a node is a multi-step, manual process, which does not scale for large cluster sizes (and is also tedious for small clusters). Restarting clusters previously lacked a streamlined approach as each node needed to be restarted one at a time, making the process time-intensive and error-prone.
Though Sidecar is still considered WIP, it’s got big plans to improve operating large clusters!
The NetApp Instaclustr Platform, in conjunction with our expert TechOps team already orchestrates these laborious tasks for our Cassandra customers with a high level of care to ensure their cluster stays online. Restarting or upgrading your Cassandra nodes is a huge pain-point for operators, but it doesn’t have to be when using our managed platform (with round-the-clock support!)
CEP-54: Zstd with dictionary SSTable compression
CEP-54, with NetApp Instaclustr’s collaboration, aims to add support Zstd with dictionary compression for SSTables. Zstd, or Zstandard, is a fast, lossless data compression algorithm that boasts impressive ratio and speed and has been supported in Cassandra since 4.0. Certain workloads can benefit from significantly faster read/write performance, reduced storage footprint, and increased storage device lifetime when using dictionary compression.
At a high level, operators choose a table they want to compress with a dictionary. A dictionary must be trained first on a small amount of already present data (recommended no more than 10MiB). The result of a training is a dictionary, which is stored cluster-wide for all other nodes to use, and this dictionary is used for all subsequent writes of SSTables to a disk.
Workloads with structured data of similar rows benefit most from Zstd with dictionary compression. Some examples of ideal workloads include event logs, telemetry data, metadata tables with templated messages. Think: repeated row data. If the table data is too unstructured or random, this feature likely won’t be optimal for dictionary compression, however plain Zstd will still be an excellent option.
New SSTables with dictionaries are readable across nodes and can stream, repair, and backup. Existing tables are unaffected if dictionary compression is not enabled. Too many unique dictionaries hurt decompression; use minimal dictionaries (recommended dictionary size is about 100KiB and one dictionary per table) and only adopt new ones when they’re noticeably better.
CEP-55: Generated role names
CEP-55 adds support to create users/roles without supplying a name, simplifying
user management, especially when generating users and roles in bulk. With an example syntax, CREATE GENERATED ROLE WITH GENERATED PASSWORD, new keys are placed in a newly introduced configuration section in cassandra.yaml under “role_name_policy.”
Stefan Miklosovic, our Cassandra engineer at NetApp Instaclustr, created this CEP as a logical follow up to CEP-24 (password validation/generation), which he authored as well. These quality-of-life improvements let operators spend less time doing trivial tasks with high-risk potential and more time on truly complex matters.
Manual name selection seems trivial until a hundred role names need to be generated; now there is a security risk if the new usernames—or worse passwords—are easily guessable. With CEP-55, the generated role name will be UUID-like, with optional prefix/suffix and size hints, however a pluggable policy is available to generate and validate names as well. This is an opt-in feature with no effect to the existing method of generating role names.
The future of Apache Cassandra is bright
These Cassandra Enhancement Proposals demonstrate a strong commitment to making Apache Cassandra more powerful, secure, and easier to operate. By staying on top of these updates, we ensure our managed platform seamlessly supports future releases that accelerate your business needs.
At NetApp Instaclustr, our expert TechOps team already orchestrates laborious tasks like restarts and upgrades for our Apache Cassandra customers, ensuring their clusters stay online. Our platform handles the complexity so you can get up and running fast.
Learn more about our fully managed and hosted Apache Cassandra offering and try it for free today!
The post Stay ahead with Apache Cassandra®: 2025 CEP highlights appeared first on Instaclustr.
ScyllaDB Vector Search: 1B Vectors with 2ms P99s and 250K QPS Throughput
Results from a benchmark using the yandex-deep_1b dataset, which contains 1 billion vectors of 96 dimensions January 20, 2026 Update: ScyllaDB Vector Search is now GA and production-ready. See the Quick Start Guide and give it try! As AI-driven applications move from experimentation into real-time production systems, the expectations placed on vector similarity search continue to rise dramatically. Teams now need to support billion-scale datasets, high concurrency, strict p99 latency budgets, and a level of operational simplicity that reduces architectural overhead rather than adding to it. ScyllaDB Vector Search was built with these constraints in mind. It offers a unified engine for storing structured data alongside unstructured embeddings, and it achieves performance that pushes the boundaries of what a managed database system can deliver at scale. The results of our recent high scale 1-billion-vector benchmark show that ScyllaDB demonstrates both ultra-low latency and highly predictable behaviour under load. Architecture at a Glance To achieve low-single-millisecond performance across massive vector sets, ScyllaDB adopts an architecture that separates the storage and indexing responsibilities while keeping the system unified from the user’s perspective. The ScyllaDB nodes store both the structured attributes and the vector embeddings in the same distributed table. Meanwhile, a dedicated Vector Store service – implemented in Rust and powered by the USearch engine optimized to support ScyllaDB’s predictable single-digit millisecond latencies – consumes updates from ScyllaDB via CDC and builds approximate-nearest-neighbour (ANN) indexes in memory. Queries are issued to the database using a familiar CQL expression such as: SELECT … ORDER BY vector_column ANN_OF ? LIMIT k; They are then internally routed to the Vector Store, which performs the similarity search and returns the candidate rows. This design allows each layer to scale independently, optimising for its own workload characteristics and eliminating resource interference. Benchmarking 1 Billion Vectors To evaluate real-world performance, ScyllaDB ran a rigorous benchmark using the publicly available yandex-deep_1b dataset, which contains 1 billion vectors of 96 dimensions. The setup consisted of six nodes: three ScyllaDB nodes running on i4i.16xlarge instances, each equipped with 64 vCPUs, and three Vector Store nodes running on r7i.48xlarge instances, each with 192 vCPUs. This hardware configuration reflects realistic production deployments where the database and vector indexing tiers are provisioned with different resource profiles. The results focus on two usage scenarios with distinct accuracy and latency goals (detailed in the following sections). A full architectural deep-dive, including diagrams, performance trade-offs, and extended benchmark results for higher-dimension datasets, can be found in the technical blog post Building a Low-Latency Vector Search Engine for ScyllaDB. These additional results follow the same pattern seen in the 96-dimensional tests: exceptionally low latency, high throughput, and stability across a wide range of concurrent load profiles. Scenario #1 — Ultra-Low Latency with Moderate Recall The first scenario was designed for workloads such as recommendation engines and real-time personalisation systems, where the primary objective is extremely low latency and the recall can be moderately relaxed. We used index parameters m = 16, ef-construction = 128, ef-search = 64 and Euclidean distance. At approximately 70% recall and with 30 concurrent searches, the system maintained a p99 latency of only 1.7 milliseconds and a p50 of just 1.2 milliseconds, while sustaining 25,000 queries per second. When expanding the throughput window (still keeping p99 latency below 10 milliseconds), the cluster reached 60,000 QPS for k = 100 with a p50 latency of 4.5 milliseconds, and 252,000 QPS for k = 10 with a p50 latency of 2.2 milliseconds. Importantly, utilizing ScyllaDB’s predictable performance, this throughput scales linearly: adding more Vector Store nodes directly increases the achievable QPS without compromising latency or recall. Latency and throughput depending on the concurrency level for recall equal to 70%. Scenario #2 — High Recall with Slightly Higher Latency The second scenario targets systems that require near-perfect recall, including high-fidelity semantic search and retrieval-augmented generation pipelines. Here, the index parameters were significantly increased to m = 64, ef-construction = 512, and ef-search = 512. This configuration raises compute requirements but dramatically improves recall. With 50 concurrent searches and recall approaching 98%, ScyllaDB kept p99 latency below 12 milliseconds and p50 around 8 milliseconds while delivering 6,500 QPS. When shifting the focus to maximum sustained throughput while keeping p99 latency under 20 milliseconds and p50 under 10 milliseconds, the system achieved 16,600 QPS. Even under these settings, latency remained notably stable across values of k from 10 to 100, demonstrating predictable behaviour in environments where query limits vary dynamically. Latency and throughput depending on the concurrency level for recall equal to 98%. Detailed results The table below presents the summary of the results for some representative concurrency levels. Run 1 Run 2 Run 3 m 16 16 64 efconstruct 128 128 512 efsearch 64 64 512 metric Euclidean Euclidean Euclidean upload 3.5 h 3.5 h 3.5 h build 4.5 h 4.5 h 24.4 h p50 [ms] 2.2 1.7 8.2 p99 [ms] 9.9 4 12.3 qps 252,799 225,891 10,206 Unified Vector Search Without the Complexity A big advantage of integrating Vector Search with ScyllaDB is that it delivers substantial performance and networking cost advantages. The vector store resides close to the data with just a single network hop between metadata and embedding storage in the same availability zone. This locality, combined with ScyllaDB’s shard-per-core execution model, allows the system to provide real-time latency and massive throughput even under heavy load. The result is that teams can accomplish more with fewer resources compared to specialised vector-search systems. In addition to being fast at scale, ScyllaDB’s Vector Search is also simpler to operate. Its key advantage is its ability to unify structured and unstructured retrieval within a single dataset. This means you can store traditional attributes and vector embeddings side-by-side and express queries that combine semantic search with conventional search. For example, you can ask the database to “find the top five most similar documents, but only those belonging to this specific customer and created within the past 30 days.” This approach eliminates the common pain of maintaining separate systems for transactional data and vector search, and it removes the operational fragility associated with syncing between two sources of truth. This also means there is no ETL drift and no dual-write risk. Instead of shipping embeddings to a separate vector database while keeping metadata in a transactional store, ScyllaDB consolidates everything into a single system. The only pipeline you need is the computational step that generates embeddings using your preferred LLM or ML model. Once written, the data remains consistent without extra coordination, backfills, or complex streaming jobs. Operationally, ScyllaDB simplifies the entire retrieval stack. Because it is built on ScyllaDB’s proven distributed architecture, the system is highly available, horizontally scalable, and resilient across availability zones and regions. Instead of operating two or three different technologies – each with its own monitoring, security configurations, and failure modes – you only manage one. This consolidation drastically reduces operational complexity while simultaneously improving performance. Roadmap The product is now in General Availability. This includes Cloud Portal provisioning, on-demand billing, a full range of instance types, and additional performance optimisations. Self-service scaling is planned for Q1. By the end of Q1 we will introduce native filtering capabilities, enabling vector search queries to combine ANN results with traditional predicates for more precise hybrid retrieval. Looking further ahead, the roadmap includes support for scalar and binary quantisation to reduce memory usage, TTL functionality for lifecycle automation of vector data, and integrated hybrid search combining ANN with BM25 for unified lexical and semantic relevance. Conclusion ScyllaDB has demonstrated that it is capable of delivering industry-leading performance for vector search at massive scale, handling a dataset of 1 billion vectors with p99 latency as low as 1.7 milliseconds and throughput up to 252,000 QPS. These results validate ScyllaDB Vector Search as a unified, high-performance solution that simplifies the operational complexity of real-time AI applications by co-locating structured data and unstructured embeddings. The current benchmarks showcase the current state of ScyllaDB’s scalability. With planned enhancements in the upcoming roadmap, including scalar quantization and sharding, these performance limits are set to increase in the next year. Nevertheless, even now, the feature is ready for running latency critical workloads such as fraud detection or recommendation systems.Cut LLM Costs and Latency with ScyllaDB Semantic Caching
How semantic caching can help with costs and latency as you scale up your AI workload Developers building large-scale LLM solutions often rely on powerful APIs such as OpenAI’s. This approach outsources model hosting and inference, allowing teams to focus on application logic rather than infrastructure. However, there are two main challenges you might face as you scale up your AI workload: high costs and high latency. This blog post introduces semantic caching as a possible solution to these problems. Along the way, we cover how ScyllaDB can help implement semantic caching. What is semantic caching? Semantic caching follows the same principle as traditional caching: storing data in a system that allows faster access than your primary source. In conventional caching solutions, that source is a database. In AI systems, the source is an LLM. Here’s a simplified semantic caching workflow: User sends a question (“What is ScyllaDB?”) Check if this type of question has been asked before (for example “whats scylladb” or “Tell me about ScyllaDB”) If yes, deliver the response from cache If no a)Send the request to LLM and deliver the response from there b) Save the response to cache Semantic caching stores the meaning of user queries as vector embeddings and uses vector search to find similar ones. If there’s a close enough match, it returns the cached result instead of calling the LLM. The more queries you can serve from the cache, the more you save on cost and latency over time. Invalidating data is just as important for semantic caching as it is for traditional caching. For instance, if you are working with RAGs (where the underlying base information can change over time), then you need to invalidate the cache periodically so it returns accurate information. For example, if the user query is “What’s the most recent version of ScyllaDB Enterprise,” the answer depends on when you ask this question. The cached response to this answer must be refreshed accordingly (assuming the only context your LLM works with is the one provided by the cache). Why use a semantic cache? Simply put, semantic caching saves you money and time. You save money by making fewer LLM calls, and you save time from faster responses. When a use case involves repeated or semantically similar queries, and identical responses are acceptable, semantic caching offers a practical way to reduce both inference costs and latency. Heavy LLM usage might put you on OpenAI’s top spenders list. That’s great for OpenAI. But is it great for you? Sure, you’re using cutting-edge AI and delivering value to users, but the real question is: can you optimize those costs? Cost isn’t the only concern. Latency matters too. LLMs inherently cannot achieve sub-millisecond response times. But users still expect instant responses. So how do you bridge that gap? You can combine LLM APIs with a low-latency database like ScyllaDB to speed things up. Combining AI models with traditional optimization techniques is key to meeting strict latency requirements. Semantic caching helps mitigate these issues by caching LLM responses associated with the input embeddings. When a new input is received, its embedding is compared to those stored in the cache. If a similar-enough embedding is found (based on a defined similarity threshold), the saved response is returned from the cache. This way, you can skip the round trip to the LLM provider. This leads to two major benefits: Lower latency: No need to wait for the LLM to generate a new response. Your low-latency database will always return responses faster than an LLM. Lower cost: Cached responses are “free” – no LLM API fees. Unlike LLM calls, database queries don’t charge you per request or per token. Why use ScyllaDB for semantic caching? From day one, ScyllaDB has focused on three things: cutting latency, cost, and operational overhead. All three of those things matter just as much for LLM apps and semantic caching as they do for “traditional” applications. Furthermore, ScyllaDB is more than an in-memory cache. It’s a full-fledged high-performance database with a built-in caching layer. It offers high availability and strong P99 latency guarantees, making it ideal for real-time AI applications. ScyllaDB has recently added Vector Search offering, which is essential for building a semantic cache, and it’s also used for a wide range of AI and LLM-based applications. For example, it’s quite commonly used as a feature store. In short, you can consolidate all your AI workloads into a single high-performance, low-latency database. Now let’s see how you can implement semantic caching with ScyllaDB. How to implement semantic caching with ScyllaDB > If you just want to dive in, clone the repo, and try it yourself, check out the GitHub repository here. Here’s a simplified, general guide on how to implement semantic caching with ScyllaDB (using Python examples): 1. Create a semantic caching schema First, we create a keyspace, then a table called prompts, which will act as our cache table. It includes the following columns: prompt_id: The partition key for the table. Inserted_at: Stores the timestamp when the row was originally inserted (the response first cached) prompt_text: The actual input provided by the user, such as a question or query. prompt_embedding: The vector embedding representation of the user input. llm_response: The LLM’s response for that prompt, returned from the cache when a similar prompt appears again. updated_at: Timestamp of when the row was last updated, useful if the underlying data changes and the cached response needs to be refreshed. Finally, we create an ANN (Approximate Nearest Neighbor) index on the prompt_embedding column to enable fast and efficient vector searches. Now that ScyllaDB is ready to receive and return responses, let’s implement semantic caching in our application code. 2. Convert user input to vector embedding Take the user’s text input (which is usually a question or some kind of query) and convert it into an embedding using your chosen embedding model. It’s important that the same embedding model is used consistently for both cached data and new queries. In this example, we’re using a local embedding model from sentence transformers. In your application, you might use OpenAI or some other embedding provider platform. 3. Calculate similarity score Use ScyllaDB Vector Search syntax: `ANN OF` to find semantically similar entries in the cache. There are two key components in this part of the application. Similarity score: You need to calculate the similarity between the user’s new query and the most similar item returned by vector search. Cosine similarity, which is the most frequently used similarity function in LLM-based applications, ranges from 0 to 1. A similarity of 1 means the embeddings are identical. A similarity of 0 means they are completely dissimilar. Threshold: Determines whether the response can be provided from cache. If the similarity score is above that threshold, it means the new query is similar enough to one already stored in the cache, so the cached response can be returned. If it falls below the threshold, the system should fetch a fresh response from the LLM. The exact threshold should be tuned experimentally based on your use case. 4. Implement cache logic Finally, putting it all together, you need a function that decides whether to serve a response from the cache or make a request to the LLM. If the user query matches something similar in the cache, follow the earlier steps and return the cached response. If it’s not in the cache, make a request to your LLM provider, such as OpenAI, return that response to the user, and then store it in the cache. This way, the next time a similar query comes in, the response can be served instantly from the cache. Get started! Get started building with ScyllaDB; check out our examples on GitHub: `git clone https://github.com/scylladb/vector-search-examples.git ` Vector Search Semantic Cache RAG See the Quick Start Guide and give it try Contact us with your questions, or for a personalized tourManaging ScyllaDB Background Operations with Task Manager
Learn about Task Manager, which provides a unified way to observe and control ScyllaDB’s background maintenance work In each ScyllaDB cluster, there are a lot of background processes that help maintain data consistency, durability, and performance in a distributed environment. For instance, such operations include compaction (which cleans up on-disk data files) and repair (which ensures data consistency in a cluster). These operations are critical for preserving cluster health and integrity. However, some processes can be long-running and resource-intensive. Given that ScyllaDB is used for latency-sensitive database workloads, it’s important to monitor and track these operations. That’s where ScyllaDB’s Task Manager comes in. Task Manager allows administrators of self-managed ScyllaDB to see all running operations, manage them, or get detailed information about a specific operation. And beyond being a monitoring tool, Task Manager also provides a unified way to manage asynchronous operations. How Task Manager Organizes and Tracks Operations Task Manager adds structure and visibility into ScyllaDB’s background work. It groups related maintenance activities into modules, represents them as hierarchical task trees, and tracks their lifecycle from creation through completion. The following sections explain how operations are organized, retained, and monitored at both node and cluster levels. Supported Operations Task Manager supports the following operations: Local: Compaction; Repair; Streaming; Backup; Restore. Global: Tablet repair; Tablet migration; Tablet split and merge; Node operations: bootstrap, replace, rebuild, remove node, decommission. Reviewing Active/Completed Tasks Task Manager is divided into modules: the entities that gather information about operations of similar functionality. Task Manager captures and exposes this data using tasks. Each task covers an operation or its part (e.g., a task can represent the part of the repair operation running on a specific shard). Each operation is represented by a tree of tasks. The tree root covers the whole operation. The root may have children, which give more fine-grained control over the operation. The children may have their own children, etc. Let’s consider the example of a global major compaction task tree: The root covers the compaction of all keyspaces in a node; The children of the root task cover a single keyspace; The second-degree descendants of the root task cover a single keyspace on a single shard; The third-degree descendants of the root task cover a single table on a single shard; etc. You can inspect a task from each depth to see details on the operation’s progress. Determining How Long Tasks Are Shown Task Manager can show completed tasks as well as running ones. The completed tasks are removed from Task Manager after some time. To customize how long a task’s status is preserved, modify task_ttl_in_seconds (aka task_ttl) and user_task_ttl_in_seconds (aka user_task_ttl) configuration parameters. Task_ttl applies to operations that are started internally, while user_task_ttl refers to those initiated by the user. When the user starts an operation, the root of the task tree is a user-task. Descendant tasks are internal and such tasks are unregistered after they finish, propagating their status to their parents. Node Tasks vs Cluster Tasks Task Manager tracks operations local to a node as well as global cluster-wide operations. A local task is created on a node that the respective operation runs on. Its status may be requested only from a node on which the task was created. A global task always covers the whole operation. It is the root of a task tree and it may have local children. A global task is reachable from each node in a cluster. Task_ttl and user_task_ttl are not relevant for global tasks. Per-Task Details When you list all tasks in a Task Manager module, it shows brief information about them with task_stats. Each task has a unique task_id and sequence_number that’s unique within its module. All tasks in a task tree share the same sequence_number. Task stats also include several descriptive attributes: kind: either “node” (a local operation) or “cluster” (a global one). type: what specific operation this task involves (e.g., “major compaction” or “intranode migration”). scope: the level of granularity (e.g., “keyspace” or “tablet”). Additional attributes such as shard, keyspace, table, and entity can further specify the scope. Status fields summarize the task’s state and timing: state: indicate if the task was created, running, done, failed, or suspended. start_time and end_time: indicate when the task began and finished. If a task is still running, its end_time is set to epoch. When you request a specific task’s status, you’ll see more detailed metrics: progress_total and progress_completed show how much work is done, measured in progress_units. parent_id and children_ids place the task within its tree hierarchy. is_abortable indicates whether the task can be stopped before completion. If the task failed, you will also see the exact error message. Interacting with Task Manager Task Manager provides a REST API for listing, monitoring, and controlling ScyllaDB’s background operations. You can also use it to manage the execution of long-running maintenance tasks started with the asynchronous API instead of blocking a client call. If you prefer command-line tools, the same functionality is available through nodetool tasks. Using the Task Management API Task Manager exposes a REST API that lets you manage tasks: GET /task_manager/list_modules – lists all supported Task Manager modules. GET /task_manager/list_module_tasks/{module} – lists all tasks in a specified module. GET /task_manager/task_status/{task_id} – shows the detailed status of a specified task. GET /task_manager/wait_task/{task_id} – waits for a specified task and shows its status. POST /task_manager/abort_task/{task_id} – aborts a specified task. GET /task_manager/task_status_recursive/{task_id} – gets statuses of a specified task and all its descendants. GET/POST /task_manager/ttl – gets/sets task_ttl. GET/POST /task_manager/user_ttl – gets/sets user_task_ttl. POST /task_manager/drain/{module} – drains the finished tasks in a specified module. Running Maintenance Tasks Asynchronously Some ScyllaDB maintenance operations can take a while to complete, especially at scale. Waiting for them to finish through a synchronous API call isn’t always practical. Thanks to Task Manager, existing synchronous APIs are easily and consistently converted into asynchronous ones. Instead of waiting for an operation to finish, a new API can immediately return the ID of the root task representing the started operation. Using this task_id, you can check the operation’s progress, wait for completion, or abort it if needed. This gives you a unified and consistent way to manage all those long-running tasks. Nodetool A task can be managed using nodetool’s tasks command. For details, see the related nodetool docs page. Example: Tracking and Managing Tasks Preparation To start, we locally set up a cluster of three nodes with the IP addresses 127.43.0.1, 127.43.0.2, and 127.43.0.3. Next, we create two keyspaces: keyspace1 with replication factor 3 and keyspace2 with replication factor 2. In each keyspace, we create 2 tables: table1 and table2 in keyspace1, and table3 and table4 in keyspace2. We populate them with data. Exploring Task Manager Let’s start by listing the modules supported by Task Manager: nodetool tasks modules -h 127.43.0.1 Starting and Tracking a Repair Task We request a tablet repair on all tokens of table keyspace2.table3. curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://127.43.0.3:10000/storage_service/tablets/repair?ks=keyspace2&table=table3&tokens=all' {"tablet_task_id":"2f06bff0-ab45-11f0-94c2-60ca5d6b2927"} In response, we get the task id of the respective tablet repair task. We can use it to track the progress of the repair. Let’s check whether the task with id 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 will be listed in a tablets module. nodetool tasks list tablets -h 127.43.0.1 Apart from the repair task, we can see that there are two intranode migrations running. All the tasks are of type “cluster”, which means that they cover the global operations. All these tasks would be visible regardless of which node we request them from. We can also see the scope of the operations. We always migrate one tablet at a time, so the migration tasks’ scope is “tablet”. For repair, the scope is “table” because we previously started the operation on a whole table. Entity, sequence_number, and shard are irrelevant for global tasks. Since all tasks are running, their end_time is set to a default value (epoch). Examining Task Status Let’s examine the status of the tablet repair using its task_id. Global tasks are available on the whole cluster, so we change the requested node… just because we can. 😉 nodetool tasks status 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 -h 127.43.0.3 The task status contains detailed information about the tablet repair task. We can see whether the task is abortable (via task_manager API). There could also be some additional information that’s not applicable for this particular task : error, which would be set if the task failed; parent_id, which would be set if it had a parent (impossible for a global task); progress_unit, progress_total, progress_comwepleted, which would indicate task progress (not yet supported for tablet repair tasks). There’s also a list of tasks that were created as a part of the global task. The list above has been shortened to improve readability. The key point is that children of a global task may be created on all nodes in a cluster. Those children are local tasks (because global tasks cannot have a parent). Thus, they are reachable only from the nodes where they were created. For example, the status of a task 1eb69569-c19d-481e-a5e6-0c433a5745ae should be requested from node 127.43.0.2. nodetool tasks status 1eb69569-c19d-481e-a5e6-0c433a5745ae -h 127.43.0.2 As expected, the child’s kind is “node”. Its parent_id references the tablet repair task’s task_id. The task has completed successfully, as indicated by the state. The end_time of a task is set. Its sequence_number is 15, which means it is the 15th task in its module. The task’s scope is wider than the parent’s. It could encompass the whole keyspace, but – in this case – it is limited to the parent’s scope. The task’s progress is measured in ranges, and we can see that exactly one range was repaired. This task has one child that is created on the same node as its parent. That’s always true for local tasks. nodetool tasks status 70d098c4-df79-4ea2-8a5e-6d7386d8d941 -h 127.43.0.3 We may examine other children of the global tablet repair task too. However, we may only check each one on the node where it was created. Let’s wait until the global task is completed. nodetool tasks wait 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 -h 127.43.0.2 We can see that its state is “done” and its end_time is set. Working with Compaction Tasks Let’s start some compactions and have a look at the compaction module. nodetool tasks list compaction -h 127.43.0.2 We can see that one of the major compaction tasks is still running. Let’s abort it and check its task tree. nodetool tasks abort 16a6cdcc-bb32-41d0-8f06-1541907a3b48 -h 127.43.0.2 nodetool tasks tree 16a6cdcc-bb32-41d0-8f06-1541907a3b48 -h 127.43.0.2 We can see that the abort request propagated to one of the task’s children and aborted it. That task now has a failed state and its error field contains abort_requested_exception. Managing Asynchronous Operations Beyond examining the running operations, Task Manager can manage asynchronous operations started with the REST API. For example, we may start a major compaction of a keyspace synchronously with /storage_service/keyspace_compaction/{keyspace} or use an asynchronous version of this API: curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://127.43.0.1:10000/tasks/compaction/keyspace_compaction/keyspace2' "4c6f3dd4-56dc-4242-ad6a-8be032593a02" The response includes the task_id of the operation we just started. This id may be used in Task Manager to track the progress, wait for the operation, or abort it. Key Takeaways The Task Manager provides a clear, unified way to observe and control background maintenance work in ScyllaDB. Visibility: It shows detailed, hierarchical information about ongoing and completed operations, from cluster-level tasks down to individual shards. Consistency: You can use the same mechanisms for listing, tracking, and managing all asynchronous operations. Control: You can check progress, wait for completion, or abort tasks directly, without guessing what’s running. Extensibility: It also provides a framework for turning synchronous APIs into asynchronous ones by returning task IDs that can be monitored or managed through the Task Manager. Together, these capabilities make it easier to see what ScyllaDB is doing, keep the system stable, and convert long-running operations to asynchronous workflows.The Cost of Multitenancy
DynamoDB and ScyllaDB share many similarities, but DynamoDB is a multi-tenant database, while ScyllaDB is single-tenant The recent DynamoDB outage is a stark reminder that even the most reliable and mature cloud services can experience downtime. Amazon DynamoDB remains a strong and proven choice for many workloads, and many teams are satisfied with its latency and cost. However, incidents like this highlight the importance of architecture, control, and flexibility when building for resilience. DynamoDB and ScyllaDB share many similarities: Both are distributed NoSQL databases with the same “ancestor”: the Dynamo paper (although both databases have significantly evolved from the original concept). A compatible API: The DynamoDB API is one of two supported APIs in ScyllaDB Cloud. Both use multi-zone deployment for higher HA. Both support multi-region deployment. DynamoDB uses Global Tablets (See this analysis for more). ScyllaDB can go beyond and allow multi-cloud deployments, or on-prem / hybrid deployments. But they also have a major difference: DynamoDB is a multi-tenant database, while ScyllaDB is single-tenant. Source: https://blog.bytebytego.com/p/a-deep-dive-into-amazon-dynamodb Multi-tenancy has notable advantages for the vendor: Lower infrastructure cost: Since tenants’ peaks don’t align, the vendor can provision for the aggregate average rather than the sum of all peaks, and even safely over-subscribe resources. Shared burst capacity: Extra capacity for traffic spikes is pooled across all users. Multi-tenancy also comes with significant technical challenges and is never perfect. All users still share the same underlying resources (CPU, storage, and network) while the service works hard to preserve the illusion of a dedicated environment for each tenant (e.g., using various isolation mechanisms). However, sometimes the isolation breaks and the real architecture behind the curtain is revealed. One example is the Noisy Neighbor issue. Another is that when a shared resource breaks, like the DNS endpoint in the latest DynamoDB outage, MANY users are affected. In this case, all DynamoDB users in a region suffer. ScyllaDB Cloud takes a different approach: all database resources are completely separated from each other. Each ScyllaDB database is running: On dedicated VMs On a dedicated VPC In a dedicated Security Group Using a dedicated endpoint and (an optional) dedicated Private Link Isolated authorization and authentication (per database) Dedicated Monitoring and Administration (ScyllaDB Manager) servers When using ScyllaDB Cloud Bring Your Own Account (BYOA), the entire deployment is running on the *user* account, often on a dedicated sub-account. This provides additional isolation. The ScyllaDB Cloud control plane is loosely coupled to the managed databases. Even in the case of a disconnect, the database clusters will continue to serve requests. This design greatly reduces the blast radius of any one issue. While the single-tenant architecture is more resilient, it does come with a few challenges: Scaling: To scale, ScyllaDB needs to allocate new resources (nodes) from EC2, and depend on the EC2 API to allocate them. Tablets and X Cloud have made a great improvement in reducing scaling time. Workload Isolation: ScyllaDB allows users to control the resource bandwidth per workload with Workload Prioritization (docs | tech talk | demo) Pricing: Using numerous optimization techniques, like shard-per-core, ScyllaDB achieves extreme performance per node, which allows us to provide lower prices than DynamoDB for most use cases. To conclude: DynamoDB optimizes for multi-tenancy, whereas ScyllaDB favors stronger tenant isolation and a smaller blast radius.Cache vs. Database: How Architecture Impacts Performance
Lessons learned comparing Memcached with ScyllaDB Although caches and databases are different animals, databases have always cached data and caches started to use disks, extending beyond RAM. If an in-memory cache can rely on flash storage, can a persistent database also function as a cache? And how far can you reasonably push each beyond its original intent, given the power and constraints of its underlying architecture? A little while ago, I joined forces with Memcached maintainer Alan Kasindorf (a.k.a. dormando) to explore these questions. The collaboration began with the goal of an “apples to oranges” benchmark comparing ScyllaDB with Memcached, which is covered in the article “We Compared ScyllaDB and Memcached and… We Lost?” A few months later, we were pleasantly surprised that the stars aligned for P99 CONF. At the last minute, Kasindorf was able to join us to chat about the project – specifically, what it all means for developers with performance-sensitive use cases. Note: P99 CONF is a highly technical conference on performance and low-latency engineering. We just wrapped P99 CONF 2025, and you can watch the core sessions on-demand. Watch on demand Cache Efficiency Which data store uses memory more efficiently? To test it, we ran a simple key-value workload on both systems. The results: Memcached cached 101 million items before evictions began ScyllaDB cached only 61 million items before evictions Cache efficiency comparison What’s behind the difference? ScyllaDB also has its own LRU (Least Recently Used) cache, bypassing the Linux cache. But unlike Memcached, ScyllaDB supports a wide-column data representation: A single key may contain many rows. This, along with additional protocol overhead, causes a single write in ScyllaDB to consume more space than a write in Memcached. Drilling down into the differences, Memcached has very little per-item overhead. In the example from the image above, each stored item consumes either 48 or 56 bytes, depending on whether compare and swap (CAS) is enabled. In contrast, ScyllaDB has to handle a lot more (it’s a persistent database after all!). It needs to allocate space for its memtables, Bloom filters and SSTable summaries so it can efficiently retrieve data from disk when a cache miss occurs. On top of that, ScyllaDB supports a much richer data model(wide column). Another notable architectural difference stands out in the performance front: Memcached is optimized for pipelined requests (think batching, as in DynamoDB’s BatchGetItem), considerably reducing the number of roundtrips over the network to retrieve several keys. ScyllaDB is optimized for single (and contiguous) key retrievals under a wide-column representation. Read-only in-memory efficiency comparison Following each system’s ideal data model, both ScyllaDB and Memcached managed to saturate the available network throughput, servicing around 3 million rows/s while sustaining below single-digit millisecond P99 latencies. Disks and IO Efficiency Next, the focus shifted to disks. We measured performance under different payload sizes, as well as how efficiently each of the systems could maximize the underlying storage. With Extstore and small (1K) payloads, Memcached stored about 11 times more items (compared to its in-memory workload) before evictions started to kick in, leaving a significant portion of free available disk space. This happens because, in addition to the regular per-key overhead, Memcached stores an additional 12 bytes per item in RAM as a pointer to storage. As RAM gets depleted, Extstore is no longer effective and users will no longer observe savings beyond that point. Disk performance with small payloads comparison For the actual performance tests, we stressed Extstore against item sizes of 1KB and 8KB. The table below summarizes the results: Test Type Payload Size I/O Threads GET Rate P99 Latency perfrun_metaget_pipe 1KB 32 188K/s 4~5 ms perfrun_metaget 1KB 32 182K/s <1ms perfrun_metaget_pipe 1KB 64 261K/s 5~6 ms perfrun_metaget 1KB 64 256K/s 1~2ms perfrun_metaget_pipe 8KB 16 92K/s 5~6 ms perfrun_metaget 8KB 16 90K/s <1ms perfrun_metaget_pipe 8KB 32 110K/s 3~4 ms perfrun_metaget 8KB 32 105K/s <1ms We populated ScyllaDB with the same number of items as we used for Memcached. ScyllaDB actually achieved higher throughput – and just slightly higher latency – than Extstore. I’m pretty sure that if the throughput had been reduced, the latency would have been lower. But even with no tuning, the performance is quite comparable. This is summarized below: Test Type Payload Size GET Rate Server-Side P99 Client-Side P99 1KB Read 1KB 268.8K/s 2ms 2.4ms 8KB Read 8KB 156.8K/s 1.54ms 1.9ms A few notable points from these tests: Extstore required considerable tuning to fully saturate flash storage I/O. Due to Memcached’s architecture, smaller payloads are unable to fully use the available disk space, providing smaller gains compared to ScyllaDB. ScyllaDB rates were overall higher than Memcached in a key-value orientation, especially under higher payload sizes. Latencies were better than pipelined requests, but slightly higher than individual GETs in Memcached. I/O Access Methods Discussion These disk-focused tests unsurprisingly sparked a discussion about the different I/O access methods used by ScyllaDB vs. Memcached/Extstore. I explained that ScyllaDB uses asynchronous direct I/O. For an extensive discussion of this, read this blog post by ScyllaDB CTO and cofounder Avi Kivity. Here’s the short version: ScyllaDB is a persistent database. When people adopt a database, they rightfully expect that it will persist their data. So, direct I/O is a deliberate choice. It bypasses the kernel page cache, giving ScyllaDB full control over disk operations. This is critical for things like compactions, write-ahead logs and efficiently reading data off disk. A user-space I/O scheduler is also involved. It lives in the middle and decides which operation gets how much I/O bandwidth. That could be an internal compaction task or a user-facing query. It arbitrates between them. That’s what enables ScyllaDB to balance persistence work with latency-sensitive operations. Extstore takes a rather very different approach: keep things as simple as possible and avoid touching the disk unless it’s absolutely necessary. As Kasindorf put it: “We do almost nothing.” That’s fully intentional. Most operations — like deletes, TTL updates, or overwrites — can happen entirely in memory. No disk access needed. So Extstore doesn’t bother with a scheduler.” Without a scheduler, Extstore performance tuning is manual. You can change the number of Extstore I/O threads to get better utilization. If you roll it out and notice that your disk doesn’t look fully utilized – and you still have a lot of spare CPU – you can bump up the thread count. Kasindorf mentioned that it will likely become self-tuning at some point. But for now, it’s a knob that users can tweak. Another important piece is how Extstore layers itself on top of Memcached’s existing RAM cache. It’s not a replacement; it’s additive. You still have your in-memory cache and Extstore just handles the overflow. Here’s how Kasindorf explained it: “If you have, say, five gigs of RAM and one gig of that is dedicated to these small pointers that point from memory into disk, we still have a couple extra gigs left over for RAM cache.” That means if a user is actively clicking around, their data may never even go to disk. The only time Extstore might need to read from disk is when the cache has gone cold (for instance, a user returning the next day). Then the entries get pulled back in. Basically, while ScyllaDB builds around persistent, high-performance disk I/O (with scheduling, direct control and durable storage), Extstore is almost the opposite. It’s light, minimal and tries to avoid disk entirely unless it really has to. Conclusion and Takeaways Across these and the other tests that we performed in the full benchmark, Memcached and ScyllaDB both managed to maximize the underlying hardware utilization and keep latencies predictably low. So which one should you pick? The real answer: It depends. If your existing workload can accommodate a simple key-value model and it benefits from pipelining, then Memcached should be more suitable to your needs. On the other hand, if the workload requires support for complex data models, then ScyllaDB is likely a better fit. Another reason for sticking with Memcached: It easily delivers traffic far beyond what a network interface card can sustain. In fact, in this Hacker News thread, dormando mentioned that he could scale it up past 55 million read ops/sec for a considerably larger server. Given that, you could make use of smaller and/or cheaper instance types to sustain a similar workload, provided the available memory and disk footprint meet your workload needs. A different angle to consider is the data set size. Even though Extstore provides great cost savings by allowing you to store items beyond RAM, there’s a limit to how many keys can fit per gigabyte of memory. Workloads with very small items should observe smaller gains compared to those with larger items. That’s not the case with ScyllaDB, which allows you to store billions of items irrespective of their sizes. It’s also important to consider whether data persistence is required. If it is, then running ScyllaDB as a replicated distributed cache provides you greater resilience and non-stop operations, with the tradeoff being (and as Memcached correctly states) that replication halves your effective cache size. Unfortunately, Extstore doesn’t support warm restarts and thus the failure or maintenance of a single node is prone to elevating your cache miss ratios. Whether this is acceptable depends on your application semantics: If a cache miss corresponds to a round-trip to the database, then the end-to-end latency will be momentarily higher. Regardless of whether you choose a cache like Memcached or a database like ScyllaDB, I hope this work inspires you to think differently about performance testing. As we’ve seen, databases and caches are fundamentally different. And at the end of the day, just comparing performance numbers isn’t enough. Moreover, recognize that it’s hard to fully represent your system’s reality with simple benchmarks, and every optimization comes with some trade-offs. For example, pipelining is great, but as we saw with Extstore, it can easily introduce I/O contention. ScyllaDB’s shard-per-core model and support for complex data models are also powerful, but they come with costs too, like losing some pipelining flexibility and adding memory overhead.11X Faster ScyllaDB Backup
Learn about ScyllaDB’s new native backup, which improves backup speed up to 11X by using Seastar’s CPU and IO scheduling ScyllaDB’s 2025.3 release introduces native backup functionality. Previously, an external process managed backups independently, without visibility into ScyllaDB’s internal workload. Now, Seastar’s CPU and I/O schedulers handle backups internally, which gives ScyllaDB full control over prioritization and resource usage. In this blog post, we explain why we changed our approach to backup, share what users need to know, and provide a preview of what to expect next. What We Changed and Why Previously, SSTable backups to S3 were managed entirely by ScyllaDB Manager and the Scylla Manager Agent running on each node. You would schedule the backup, and Manager would coordinate the required operations (taking snapshots, collecting metadata, and orchestrating uploads). Scylla Manager Agent handled all the actual data movement. The problem with this approach was that it was often too slow for our users’ liking, especially at the massive scale that’s common across our user base. Since uploads ran through an external process, they competed with ScyllaDB for resources (CPU, disk I/O, and network bandwidth). The rclone process read from \disk at the same time that ScyllaDB did – so two processes on the same node were performing heavy disk I/O simultaneously. This contention on the disk could impact query latencies when user requests were being processed during a backup. To mitigate the effect on real-time database requests, we use Systemd slice to control Scylla Manager Agent resources. This solution successfully reduced backup bandwidth, but failed to increase the bandwidth when the pressure from online requests was low. To optimize this process, ScyllaDB now provides a native backup capability. Rather than relying on an external agent (ScyllaDB Manager) to copy files, ScyllaDB uploads files directly to S3. The new approach is faster and more efficient because ScyllaDB uses its internal IO and CPU scheduling to control the backup operations. Backup operations are assigned a lower priority than user queries. In the event of resource contention, ScyllaDB will deprioritize them so they don’t interfere with the latency of the actual workload. Note that this new native backup capability is currently available for AWS. It is coming soon for other backup targets (such as GCP Cloud Storage and Azure Storage). To enable native backup, configure the S3 connectivity on each node’s scylla.yaml and set the desired strategy (Native, Auto, or Rclone) in ScyllaDB Manager. Note that the rclone agent is always used to upload backup metadata, so you should still configure the Manager Agent even if you are using native backup and restore. Performance Improvements So how much faster is the new backup approach? We recently ran some tests to find out. We ran two tests which are the same in all aspects except for the tool being used for backup: rclone in one and native scylla in the other. Test Setup The test uses 6 nodes i4i.2xlarge with total injected data of 2TB with RF=3. That means that the 2TB injected data becomes 6TB (RF=3) and these 6TB are spread across 6 nodes, resulting in each node holding 1TB of data. The backup benchmark then measures how long it takes to backup the entire cluster, indicating the data size of one node Native Backup Here are the results of the native backup tests: Name Size Time [s] native_backup_1016_2234 1.057 TiB 00:19:18 Data was uploaded at a rate of approximately 900 MB/s. OS Tx Bytes during backup The slightly higher values for the OS metrics are due to for example tcp-retransmit, size of HTTP headers that is not part of the data but part of the transmitted bytes, and more alike. rclone Backup The same exact test with rclone produced the following results: Name Size Time [s] rclone_backup_1017_2334 1.057 TiB 03:48:57 Here, data was uploaded at a rate of approximately 80MB/s Next Up: Faster Restore Next, we’re optimizing restore, which is the more complex part of the backup/restore process. Backups are relatively straightforward: you just upload the data to object storage. But restoring that data is harder, especially if you need to bring a cluster back online quickly or restore it onto a topology that’s different from the original one. The original cluster’s nodes, token ranges, and data distribution might look quite different from the new setup – but during restore, ScyllaDB must somehow map between what was backed up and what the new topology expects. Replication adds even more complexity. ScyllaDB replicates data according to the specified replication factor (RF), so the backup has multiple copies of the same data. During the restore process, we don’t want to redundantly download or process those copies; we need a way to handle them efficiently. And one more complicating factor: the restore process must understand whether the cluster uses virtual nodes or tablets because that affects how data is distributed. Wrapping Up ScyllaDB’s move to native integration with object storage is a big step forward for the faster backup/restore operations that many of our large-scale users have been asking for. We’ve already sped up backups by eliminating the extra rclone layer. Now, our focus is on making restores equally efficient while handling complex topologies, replication, and data distribution. This will make it faster and easier to restore large clusters. Looking ahead, we’re working on using object storage not only for backup and restore, but also for tiering: letting ScyllaDB read data directly from object storage as if it were on local disk. For a more detailed look at ScyllaDB’s plans for backup, restore, and object storage as native storage, see this video:Vector search benchmarking: Embeddings, insertion, and searching documents with ClickHouse® and Apache Cassandra®
Welcome back to our series on vector search benchmarking. In part 1, we dove into setting up a benchmarking project and explored how to implement vector search in PostgreSQL from the example code in GitHub. We saw how a hands-on project with students from Northeastern University provided a real-world testing ground for Retrieval-Augmented Generation (RAG) pipelines.
Now, we’re continuing our journey by exploring two more powerful open source technologies: ClickHouse and Apache Cassandra. Both handle vector data differently and understanding their methods is key to effective vector search benchmarking. Using the same student project as our guide, this post will examine the code for embedding, inserting, and retrieving data to see how these technologies stack up.
Let’s get started.
Vector search benchmarking with ClickHouse
ClickHouse is a column-oriented database management system known for its incredible speed in analytical queries. It’s no surprise that it has also embraced vector search. Let’s see how the student project team implemented and benchmarked the core components.
Step 1: Embedding and inserting data
scripts/vectorize_and_upload.py
This is the file that handles Step 1 of the pipeline for
ClickHouse. Embeddings in this file
(scripts/vectorize_and_upload.py) are used as vector
representations of Guardian news articles for the purpose of
storing them in a database and performing semantic search. Here’s
how embeddings are handled step-by-step (the steps look similar to
PostgreSQL).
First up, is the generation of embeddings. The same
SentenceTransformer model used in part 1
(all-MiniLM-L6-v2) is loaded in the class constructor.
In the method generate_embeddings(self, articles), for
each article:
- The article’s title and body are concatenated into a text string.
- The model generates an embedding vector
(
self.model.encode(text_for_embedding)), which is a numerical representation of the article’s semantic content. - The embedding is added to the article’s dictionary under the
key
embedding.
Then the embeddings are stored in ClickHouse as follows.
- The database table
guardian_articlesis created with an embeddingArray(Float64) NOT NULLcolumn specifically to store these vectors. - In
upload_to_clickhouse_debug(self, articles_with_embeddings), the script inserts articles into ClickHouse, including the embedding vector as part of each row.
Step 2: Vector search and retrieval
services/clickhouse/clickhouse_dao.py
The steps to search are the same as for PostgreSQL in part 1.
Here’s part of the related_articles method for
ClickHouse:
def related_articles(self, query: str, limit: int =
5):
"""Search for similar articles using vector similarity""" ... query_embedding = self.model.encode(query).tolist() search_query = f""" SELECT url, title, body, publication_date, cosineDistance(embedding, {query_embedding}) as distance FROM guardian_articles ORDER BY distance ASC LIMIT {limit} """ ...
When searching for related articles, it encodes the query into an embedding, then performs a vector similarity search in ClickHouse using cosineDistance between stored embeddings and the query embedding, and results are ordered by similarity, returning the most relevant articles.
Vector search benchmarking with Apache Cassandra
Next, let’s turn our attention to Apache Cassandra. As a distributed NoSQL database, Cassandra is designed for high availability and scalability, making it an intriguing option for large-scale RAG applications.
Step 1: Embedding and inserting data
scripts/pull_docs_cassandra.py
As in the above examples, embeddings in this file are used to
convert article text (body) into numerical vector
representations for storage and later retrieval in Cassandra.
For each article, the code extracts the body and
computes the embeddings:
embedding = model.encode(body) embedding_list = [float(x) for x in embedding]
model.encode(body)converts the text to aNumPyarray of 384 floats.- The array is converted to a standard Python list of floats for Cassandra storage.
Next, the embedding is stored in the vector column of the
articles table using a CQL INSERT:
insert_cql = SimpleStatement(""" INSERT INTO articles (url, title, body, publication_date, vector) VALUES (%s, %s, %s, %s, %s) IF NOT EXISTS; """) result = session.execute(insert_cql, (url, title, body, publication_date, embedding_list))
The schema for the table specifies: vector
vector<float, 384>, meaning each article has a
corresponding 384-dimensional embedding. The code also creates a
custom index for the vector column:
session.execute(""" CREATE CUSTOM INDEX IF NOT EXISTS ann_index ON articles(vector) USING 'StorageAttachedIndex'; """)
This enables efficient vector (ANN: Approximate Nearest Neighbor) search capabilities, allowing similarity queries on stored embeddings.
A key part of the setup is the schema and indexing. The
Cassandra schema in
services/cassandra/init/01-schema.cql defines the
vector column.
Being a NoSQL database, Cassandra schemas are a bit different to normal SQL databases, so it’s worth taking a closer look. This Cassandra schema is designed to support Retrieval-Augmented Generation (RAG) architectures, which combine information retrieval with generative models to answer queries using both stored data and generative AI. Here’s how the schema supports RAG:
- Keyspace and table structure
- Keyspace (
vectorembeds): Analogous to a database, this isolates all RAG-related tables and data. - Table (
articles): Stores retrievable knowledge sources (e.g., articles) for use in generation.
- Keyspace (
- Table columns
url TEXT PRIMARY KEY: Uniquely identifies each article/document, useful for referencing and deduplication.title TEXTandbody TEXT: Store the actual content and metadata, which may be retrieved and passed to the generative model during RAG.publication_date TIMESTAMP: Enables filtering or ranking based on recency.vector VECTOR<FLOAT, 384>: Stores the embedding representation of the article. The new Cassandra vector data type is documented here.
- Indexing
- Sets up an Approximate Nearest Neighbor (ANN) index using Cassandra’s Storage Attached Index.
More information about Cassandra vector support is in the documentation.
Step 2: Vector search and retrieval
The retrieval logic in
services/cassandra/cassandra_dao.py showcases the
elegance of Cassandra’s vector search capabilities.
The code to create the query embeddings and perform the query is similar to the previous examples, but the CQL query to retrieve similar documents looks like this:
query_cql = """ SELECT url, title, body, publication_date FROM articles ORDER BY vector ANN OF ? LIMIT ? """ prepared = self.client.prepare(query_cql) rows = self.client.execute(prepared, (emb, limit))
What have we learned?
By exploring the code from this RAG benchmarking project we’ve seen distinct approaches to vector search. Here’s a summary of key takeaways:
- Critical steps in the process:
- Step 1: Embedding articles and inserting them into the vector databases.
- Step 2: Embedding queries and retrieving relevant articles from the database.
- Key design pattern:
- The DAO (Data Access Object) design pattern provides a clean, scalable way to support multiple databases.
- This approach could extend to other databases, such as OpenSearch, in the future.
- Additional insights:
- It’s possible to perform vector searches over the latest documents, pre-empting queries, and potentially speeding up the pipeline.
What’s next?
So far, we have only scratched the surface. The students built a complete benchmarking application with a GUI (using Steamlit), used multiple other interesting components (e.g. LangChain, LangGraph, FastAPI and uvicorn), Grafana and LangSmith for metrics, and Claude to use the retrieved articles to answer questions, and Docker support for the components. They also revealed some preliminary performance results! Here’s what the final system looked like (this and the previous blog focused on the bottom boxes only).

In a future article, we will examine the rest of the application code, look at the preliminary performance results the students uncovered, and discuss what they tell us about the trade-offs between these different databases.
Ready to learn more right now? We have a wealth of resources on vector search. You can explore our blogs on ClickHouse vector search and Apache Cassandra Vector Search (here, here, and here) to deepen your understanding.
The post Vector search benchmarking: Embeddings, insertion, and searching documents with ClickHouse® and Apache Cassandra® appeared first on Instaclustr.
P99 CONF 2025 Recap: Latency to LLMs
Another year has flown by — in the blink of an eye, I found myself back in the US hosting P99 CONF again. This makes it my third time on stage and the fifth in this incredible series of talks from engineers around the world, all sharing their stories about chasing that elusive P99 latency. Beyond raw speed and tail latency, we explored modern systems programming, kernel innovation, databases and storage at scale, observability, testing, performance insights… and of course, this year’s big wave: artificial intelligence, with vector search and LLMs taking center stage. In this blog post, I’ll help you chart a course through it all (though honestly, every session is worth your time). Watch 60+ P99 CONF Talks On Demand Starting with low latency “Taming tail latency” has been a running theme at P99 for years, and this time PayPal and TigerBeetle both took the stage to show how they deal with unpredictable outliers that wreck user experience. PayPal showed how tiny inefficiencies multiply under load. And (as we’ve come to expect from the past optimization talks), TigerBeetle engineered determinism, single-threaded scheduling, predictable I/O, and batching everywhere. ScyllaDB joined the party with their low-latency vector search engine, proving that AI queries don’t have to experience high latency. If your database architecture already handles the long tail at the storage layer, your vector workloads inherit that speed for free. And since I can’t resist some speculative trading (Aussies love a bet), I enjoyed the talks from Maven Securities on lock-free queues for trading systems and Bloomberg on building scalable, end-to-end latency metrics from distributed traces. If you’re chasing nanoseconds instead of milliseconds, check out Steve Heller’s Design Considerations for P99-Optimized Hash Tables. Rust was everywhere (again) Turso is rewriting SQLite in Rust. ClickHouse tried converting 1.5 million lines of C++ … or at least part of it. Neon rebuilt its I/O stack with tokio and io_uring, Datadog squeezed out extra juice from Lambda extensions in Rust, and Trigo visualized async abstractions (also Rust). Maybe we’re the unofficial Rust conf? But Go and C++ held their ground too. Miguel Young de la Sota presented a faster protobuf implementation in Go, and Manya Bansal (MIT PhD student) gave a highly detailed talk on how not to program GPUs with some C++ insights there. On the database and storage side Avi Kivity kicked off the conference by sharing how ScyllaDB’s Seastar CPU scheduler prioritizes complex workloads while keeping latency predictable. Nadav Har’El showed how ScyllaDB manages client ingestion to prevent memory blowouts, Dor Laor connected theory to real-world tail behavior, and Felipe explained how tablet replication delivers true elasticity. And Andy Pavlo took us through the very real challenge of both humans and autonomous systems tuning databases. Other databases showed up strong too. Turso is pushing SQLite into new dimensions, DragonflyDB nailed sorted sets with B+ trees, and Qian Li from DBOS shared how they merged app logic and state right into the database. These are all good reminders that performance starts at the data layer. Performance engineering, testing, and observability My old mate Ashutosh Agrawal shared how he built and tested systems for 32 million concurrent cricket fans. Also on testing, I loved the double feature on deterministic simulation testing: Resonate’s approach to catching Heisenbugs and Antithesis running fuzzing workloads at scale in near real time. eBPF continues to fascinate. Cosmic elaborated on reliability versus memory trade-offs, and Tanel Poder showed thread-level observability with some seriously impressive tooling. AWS’s Geoff Blake introduced aperf for profiling the nitty-gritty, Arm unlocked new insights with the PMUv3 plugin, and Raphael Carvalho took us on a wild hunt through a Linux kernel bug triggered by io_uring. Proper detective work, that one. And then there’s AI and ML, the new frontier Chip Huyen delivered a standout keynote on LLM inference optimization, tying together hardware, software, and architecture choices. Eshcar Hillel went deep into KV cache offloading, Microsoft’s Magdalen Manohar showed how to make vector search cost-effective and fast, and ScyllaDB’s Pawel Pery explained how we decoupled a Rust-based vector engine for high-performance ANN queries. We wrapped it up with a cracking conversation between Rachel Stephens from RedMonk and Adrian Cockcroft, exploring AI-assisted analytics and that eternal hunt for predictability at the tail. Hosting this conference is always a privilege. Big thanks to Natalie Estrada, who keeps the whole thing running (and somehow curates killer playlists). Cynthia Dunlop, who finds the best speakers and replies to messages faster than anyone I know, and Felipe and the lounge crew, daring the demo deities and answering audience questions live. Plus the many ScyllaDB engineers who present, coordinate, and quietly make it all happen behind the scenes. And to all 30,000 registrants and 100,000+ participants so far… You’re what makes this community so special. From all of us at ScyllaDB: a huge thank you. See you at the next one. Join is for P99 CONF’s sister conference, Monster SCALE SummitOptimizing Cassandra Repair for Higher Node Density
This is the fourth post in my series on improving the cost efficiency of Apache Cassandra through increased node density. In the last post, we explored compaction strategies, specifically the new UnifiedCompactionStrategy (UCS) which appeared in Cassandra 5.
- Streaming Throughput
- Compaction Throughput and Strategies
- Repair (you are here)
- Query Throughput
- Garbage Collection and Memory Management
- Efficient Disk Access
- Compression Performance and Ratio
- Linearly Scaling Subsystems with CPU Core Count and Memory
Now, we’ll tackle another aspect of Cassandra operations that directly impacts how much data you can efficiently store per node: repair. Having worked with repairs across hundreds of clusters since 2012, I’ve developed strong opinions on what works and what doesn’t when you’re pushing the limits of node density.
Building a Resilient Data Platform with Write-Ahead Log at Netflix
By Prudhviraj Karumanchi, Samuel Fu, Sriram Rangarajan, Vidhya Arvind, Yun Wang, John Lu
Introduction
Netflix operates at a massive scale, serving hundreds of millions of users with diverse content and features. Behind the scenes, ensuring data consistency, reliability, and efficient operations across various services presents a continuous challenge. At the heart of many critical functions lies the concept of a Write-Ahead Log (WAL) abstraction. At Netflix scale, every challenge gets amplified. Some of the key challenges we encountered include:
- Accidental data loss and data corruption in databases
- System entropy across different datastores (e.g., writing to Cassandra and Elasticsearch)
- Handling updates to multiple partitions (e.g., building secondary indices on top of a NoSQL database)
- Data replication (in-region and across regions)
- Reliable retry mechanisms for real time data pipeline at scale
- Bulk deletes to database causing OOM on the Key-Value nodes
All the above challenges either resulted in production incidents or outages, consumed significant engineering resources, or led to bespoke solutions and technical debt. During one particular incident, a developer issued an ALTER TABLE command that led to data corruption. Fortunately, the data was fronted by a cache, so the ability to extend cache TTL quickly together with the app writing the mutations to Kafka allowed us to recover. Absent the resilience features on the application, there would have been permanent data loss. As the data platform team, we needed to provide resilience and guarantees to protect not just this application, but all the critical applications we have at Netflix.
Regarding the retry mechanisms for real time data pipelines, Netflix operates at a massive scale where failures (network errors, downstream service outages, etc.) are inevitable. We needed a reliable and scalable way to retry failed messages, without sacrificing throughput.
With these problems in mind, we decided to build a system that would solve all the aforementioned issues and continue to serve the future needs of Netflix in the online data platform space. Our Write-Ahead Log (WAL) is a distributed system that captures data changes, provides strong durability guarantees, and reliably delivers these changes to downstream consumers. This blog post dives into how Netflix is building a generic WAL solution to address common data challenges, enhance developer efficiency, and power high-leverage capabilities like secondary indices, enable cross-region replication for non-replicated storage engines, and support widely used patterns like delayed queues.
API
Our API is intentionally simple, exposing just the essential parameters. WAL has one main API endpoint, WriteToLog, abstracting away the internal implementation and ensuring that users can onboard easily.
rpc WriteToLog (WriteToLogRequest) returns (WriteToLogResponse) {...}
/**
* WAL request message
* namespace: Identifier for a particular WAL
* lifecycle: How much delay to set and original write time
* payload: Payload of the message
* target: Details of where to send the payload
*/
message WriteToLogRequest {
string namespace = 1;
Lifecycle lifecycle = 2;
bytes payload = 3;
Target target = 4;
}
/**
* WAL response message
* durable: Whether the request succeeded, failed, or unknown
* message: Reason for failure
*/
message WriteToLogResponse {
Trilean durable = 1;
string message = 2;
}
A namespace defines where and how data is stored, providing logical separation while abstracting the underlying storage systems. Each namespace can be configured to use different queues: Kafka, SQS, or combinations of multiple. Namespace also serves as a central configuration of settings, such as backoff multiplier or maximum number of retry attempts, and more. This flexibility allows our Data Platform to route different use cases to the most suitable storage system based on performance, durability, and consistency needs.
WAL can assume different personas depending on the namespace configuration.
Persona #1 (Delayed Queues)
In the example configuration below, the Product Data Systems (PDS) namespace uses SQS as the underlying message queue, enabling delayed messages. PDS uses Kafka extensively, and failures (network errors, downstream service outages, etc.) are inevitable. We needed a reliable and scalable way to retry failed messages, without sacrificing throughput. That’s when PDS started leveraging WAL for delayed messages.
"persistenceConfigurations": {
"persistenceConfiguration": [
{
"physicalStorage": {
"type": "SQS",
},
"config": {
"wal-queue": [
"dgwwal-dq-pds"
],
"wal-dlq-queue": [
"dgwwal-dlq-pds"
],
"queue.poll-interval.secs": 10,
"queue.max-messages-per-poll": 100
}
}
]
}
Persona #2 (Generic Cross-Region Replication)
Below is the namespace configuration for cross-region replication of EVCache using WAL, which replicates messages from a source region to multiple destinations. It uses Kafka under the hood.
"persistence_configurations": {
"persistence_configuration": [
{
"physical_storage": {
"type": "KAFKA"
},
"config": {
"consumer_stack": "consumer",
"context": "This is for cross region replication for evcache_foobar",
"target": {
"euwest1": "dgwwal.foobar.cluster.eu-west-1.netflix.net",
"type": "evc-replication",
"useast1": "dgwwal.foobar.cluster.us-east-1.netflix.net",
"useast2": "dgwwal.foobar.cluster.us-east-2.netflix.net",
"uswest2": "dgwwal.foobar.cluster.us-west-2.netflix.net"
},
"wal-kafka-dlq-topics": [],
"wal-kafka-topics": [
"evcache_foobar"
],
"wal.kafka.bootstrap.servers.prefix": "kafka-foobar"
}
}
]
}
Persona #3 (Handling multi-partition mutations)
Below is the namespace configuration for supporting mutateItems API in Key-Value, where multiple write requests can go to different partitions and have to be eventually consistent. A key detail in the below configuration is the presence of Kafka and durable_storage. These data stores are required to facilitate two phase commit semantics, which we will discuss in detail below.
"persistence_configurations": {
"persistence_configuration": [
{
"physical_storage": {
"type": "KAFKA"
},
"config": {
"consumer_stack": "consumer",
"contacts": "unknown",
"context": "WAL to support multi-id/namespace mutations for dgwkv.foobar",
"durable_storage": {
"namespace": "foobar_wal_type",
"shard": "walfoobar",
"type": "kv"
},
"target": {},
"wal-kafka-dlq-topics": [
"foobar_kv_multi_id-dlq"
],
"wal-kafka-topics": [
"foobar_kv_multi_id"
],
"wal.kafka.bootstrap.servers.prefix": "kaas_kafka-dgwwal_foobar7102"
}
}
]
}
An important note is that requests to WAL support at-least once semantics due to the underlying implementation.
Under the Hood
The core architecture consists of several key components working together.
Message Producer and Message Consumer separation: The message producer receives incoming messages from client applications and adds them into the queue, while the message consumer processes messages from the queue and sends them to the targets. Because of this separation, other systems can bring their own pluggable producers or consumers, depending on their use cases. WAL’s control plane allows for a pluggable model, which, depending on the use-case, allows us to switch between different message queues.
SQS and Kafka with a dead letter queue by default: Every WAL namespace has its own message queue and gets a dead letter queue (DLQ) by default, because there can be transient errors and hard errors. Application teams using Key-Value abstraction simply need to toggle a flag to enable WAL and get all this functionality without needing to understand the underlying complexity.
- Kafka-backed namespaces: handle standard message processing
- SQS-backed namespaces: support delayed queue semantics (we added custom logic to go beyond the standard defaults enforced in terms of delay, size limits, etc)
- Complex multi-partition scenarios: use queues and durable storage
Target Flexibility: The messages added to WAL are pushed to the target datastores. Targets can be Cassandra databases, Memcached caches, Kafka queues, or upstream applications. Users can specify the target via namespace configuration and in the API itself.
Deployment Model
WAL is deployed using the Data Gateway infrastructure. This means that WAL deployments automatically come with mTLS, connection management, authentication, runtime and deployment configurations out of the box.
Each data gateway abstraction (including WAL) is deployed as a shard. A shard is a physical concept describing a group of hardware instances. Each use case of WAL is usually deployed as a separate shard. For example, the Ads Events service will send requests to WAL shard A, while the Gaming Catalog service will send requests to WAL shard B, allowing for separation of concerns and avoiding noisy neighbour problems.
Each shard of WAL can have multiple namespaces. A namespace is a logical concept describing a configuration. Each request to WAL has to specify its namespace so that WAL can apply the correct configuration to the request. Each namespace has its own configuration of queues to ensure isolation per use case. If the underlying queue of a WAL namespace becomes the bottleneck of throughput, the operators can choose to add more queues on the fly by modifying the namespace configurations. The concept of shards and namespaces is shared across all Data Gateway Abstractions, including Key-Value, Counter, Timeseries, etc. The namespace configurations are stored in a globally replicated Relational SQL database to ensure availability and consistency.
Based on certain CPU and network thresholds, the Producer group and the Consumer group of each shard will (separately) automatically scale up the number of instances to ensure the service has low latency, high throughput and high availability. WAL, along with other abstractions, also uses the Netflix adaptive load shedding libraries and Envoy to automatically shed requests beyond a certain limit. WAL can be deployed to multiple regions, so each region will deploy its own group of instances.
Solving different flavors of problems with no change to the core architecture
The WAL addresses multiple data reliability challenges with no changes to the core architecture:
Data Loss Prevention: In case of database downtime, WAL can continue to hold the incoming mutations. When the database becomes available again, replay mutations back to the database. The tradeoff is eventual consistency rather than immediate consistency, and no data loss.
Generic Data Replication: For systems like EVCache (using Memcached) and RocksDB that do not support replication by default, WAL provides systematic replication (both in-region and across-region). The target can be another application, another WAL, or another queue — it’s completely pluggable through configuration.
System Entropy and Multi-Partition Solutions: Whether dealing with writes across two databases (like Cassandra and Elasticsearch) or mutations across multiple partitions in one database, the solution is the same — write to WAL first, then let the WAL consumer handle the mutations. No more asynchronous repairs needed; WAL handles retries and backoff automatically.
Data Corruption Recovery: In case of DB corruptions, restore to the last known good backup, then replay mutations from WAL omitting the offending write/mutation.
There are some major differences between using WAL and directly using Kafka/SQS. WAL is an abstraction on the underlying queues, so the underlying technology can be swapped out depending on use cases with no code changes. WAL emphasizes an easy yet effective API that saves users from complicated setups and configurations. We leverage the control plane to pivot technologies behind WAL when needed without app or client intervention.
WAL usage at Netflix
Delay Queue
The most common use case for WAL is as a Delay Queue. If an application is interested in sending a request at a certain time in the future, it can offload its requests to WAL, which guarantees that their requests will land after the specified delay.
Netflix’s Live Origin processes and delivers Netflix live stream video chunks, storing its video data in a Key-Value abstraction backed by Cassandra and EVCache. When Live Origin decides to delete certain video data after an event is completed, it issues delete requests to the Key-Value abstraction. However, the large amount of delete requests in a short burst interfere with the more important real-time read/write requests, causing performance issues in Cassandra and timeouts for the incoming live traffic. To get around this, Key-Value issues the delete requests to WAL first, with a random delay and jitter set for each delete request. WAL, after the delay, sends the delete requests back to Key-Value. Since the deletes are now a flatter curve of requests over time, Key-Value is then able to send the requests to the datastore with no issues.
Additionally, WAL is used by many services that utilize Kafka to stream events, including Ads, Gaming, Product Data Systems, etc. Whenever Kafka requests fail for any reason, the client apps will send WAL a request to retry the kafka request with a delay. This abstracts away the backoff and retry layer of Kafka for many teams, increasing developer efficiency.
Cross-Region Replication
WAL is also used for global cross-region replication. The architecture of WAL is generic and allows any datastore/applications to onboard for cross-region replication. Currently, the largest use case is EVCache, and we are working to onboard other storage engines.
EVCache is deployed by clusters of Memcached instances across multiple regions, where each cluster in each region shares the same data. Each region’s client apps will write, read, or delete data from the EVCache cluster of the same region. To ensure global consistency, the EVCache client of one region will replicate write and delete requests to all other regions. To implement this, the EVCache client that originated the request will send the request to a WAL corresponding to the EVCache cluster and region.
Since the EVCache client acts as the message producer group in this case, WAL only needs to deploy the message consumer groups. From there, the multiple message consumers are set up to each target region. They will read from the Kafka topic, and send the replicated write or delete requests to a Writer group in their target region. The Writer group will then go ahead and replicate the request to the EVCache server in the same region.
The biggest benefits of this approach, compared to our legacy architecture, is being able to migrate from multi-tenant architecture to single tenant architecture for the most latency sensitive applications. For example, Live Origin will have its own dedicated Message Consumer and Writer groups, while a less latency sensitive service can be multi-tenant. This helps us reduce the blast radius of the issues and also prevents noisy neighbor issues.
Multi-Table Mutations
WAL is used by Key-Value service to build the MutateItems API. WAL enables the API’s multi-table and multi-id mutations by implementing 2-phase commit semantics under the hood. For this discussion, we can assume that Key-Value service is backed by Cassandra, and each of its namespaces represents a certain table in a Cassandra DB.
When a Key-Value client issues a MutateItems request to Key-Value server, the request can contain multiple PutItems or DeleteItems requests. Each of those requests can go to different ids and namespaces, or Cassandra tables.
message MutateItemsRequest {
repeated MutationRequest mutations = 1;
message MutationRequest {
oneof mutation {
PutItemsRequest put = 1;
DeleteItemsRequest delete = 2;
}
}
}
The MutateItems request operates on an eventually consistent model. When the Key-Value server returns a success response, it guarantees that every operation within the MutateItemsRequest will eventually complete successfully. Individual put or delete operations may be partitioned into smaller chunks based on request size, meaning a single operation could spawn multiple chunk requests that must be processed in a specific sequence.
Two approaches exist to ensure Key-Value client requests achieve success. The synchronous approach involves client-side retries until all mutations complete. However, this method introduces significant challenges; datastores might not natively support transactions and provide no guarantees about the entire request succeeding. Additionally, when more than one replica set is involved in a request, latency occurs in unexpected ways, and the entire request chain must be retried. Also, partial failures in synchronous processing can leave the database in an inconsistent state if some mutations succeed while others fail, requiring complex rollback mechanisms or leaving data integrity compromised. The asynchronous approach was ultimately adopted to address these performance and consistency concerns.
Given Key-Value’s stateless architecture, the service cannot maintain the mutation success state or guarantee order internally. Instead, it leverages a Write-Ahead Log (WAL) to guarantee mutation completion. For each MutateItems request, Key-Value forwards individual put or delete operations to WAL as they arrive, with each operation tagged with a sequence number to preserve ordering. After transmitting all mutations, Key-Value sends a completion marker indicating the full request has been submitted.
The WAL producer receives these messages and persists the content, state, and ordering information to a durable storage. The message producer then forwards only the completion marker to the message queue. The message consumer retrieves these markers from the queue and reconstructs the complete mutation set by reading the stored state and content data, ordering operations according to their designated sequence. Failed mutations trigger re-queuing of the completion marker for subsequent retry attempts.
Closing Thoughts
Building Netflix’s generic Write-Ahead Log system has taught us several key lessons that guided our design decisions:
Pluggable Architecture is Core: The ability to support different targets, whether databases, caches, queues, or upstream applications, through configuration rather than code changes has been fundamental to WAL’s success across diverse use cases.
Leverage Existing Building Blocks: We had control plane infrastructure, Key-Value abstractions, and other components already in place. Building on top of these existing abstractions allowed us to focus on the unique challenges WAL needed to solve.
Separation of Concerns Enables Scale: By separating message processing from consumption and allowing independent scaling of each component, we can handle traffic surges and failures more gracefully.
Systems Fail — Consider Tradeoffs Carefully: WAL itself has failure modes, including traffic surges, slow consumers, and non-transient errors. We use abstractions and operational strategies like data partitioning and backpressure signals to handle these, but the tradeoffs must be understood.
Future work
- We are planning to add secondary indices in Key-Value service leveraging WAL.
- WAL can also be used by a service to guarantee sending requests to multiple datastores. For example, a database and a backup, or a database and a queue at the same time etc.
Acknowledgements
Launching WAL was a collaborative effort involving multiple teams at Netflix, and we are grateful to everyone who contributed to making this idea a reality. We would like to thank the following teams for their roles in this launch.
- Caching team — Additional thanks to Shih-Hao Yeh, Akashdeep Goel for contributing to cross region replication for KV, EVCache etc. and owning this service.
- Product Data System team — Carlos Matias Herrero, Brandon Bremen for contributing to the delay queue design and being early adopters of WAL giving valuable feedback.
- KeyValue and Composite abstractions team — Raj Ummadisetty for feedback on API design and mutateItems design discussions. Rajiv Shringi for feedback on API design.
- Kafka and Real Time Data Infrastructure teams — Nick Mahilani for feedback and inputs on integrating the WAL client into Kafka client. Sundaram Ananthanarayan for design discussions around the possibility of leveraging Flink for some of the WAL use cases.
- Joseph Lynch for providing strategic direction and organizational support for this project.
Building a Resilient Data Platform with Write-Ahead Log at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Building easy-cass-mcp: An MCP Server for Cassandra Operations
I’ve started working on a new project that I’d like to share, easy-cass-mcp, an MCP (Model Context Protocol) server specifically designed to assist Apache Cassandra operators.
After spending over a decade optimizing Cassandra clusters in production environments, I’ve seen teams consistently struggle with how to interpret system metrics, configuration settings, schema design, and system configuration, and most importantly, how to understand how they all impact each other. While many teams have solid monitoring through JMX-based collectors, extracting and contextualizing specific operational metrics for troubleshooting or optimization can still be cumbersome. The good news is that we now have the infrastructure to make all this operational knowledge accessible through conversational AI.
easy-cass-stress Joins the Apache Cassandra Project
I’m taking a quick break from my series on Cassandra node density to share some news with the Cassandra community: easy-cass-stress has officially been donated to the Apache Software Foundation and is now part of the Apache Cassandra project ecosystem as cassandra-easy-stress.
Why This Matters
Over the past decade, I’ve worked with countless teams struggling with Cassandra performance testing and benchmarking. The reality is that stress testing distributed systems requires tools that can accurately simulate real-world workloads. Many tools make this difficult by requiring the end user to learn complex configurations and nuance. While consulting at The Last Pickle, I set out to create an easy to use tool that lets people get up and running in just a few minutes
Azure fault domains vs availability zones: Achieving zero downtime migrations
The challenges of operating production-ready enterprise systems in the cloud are ensuring applications remain up to date, secure and benefit from the latest features. This can include operating system or application version upgrades, but it is not limited to advancements in cloud provider offerings or the retirement of older ones. Recently, NetApp Instaclustr undertook a migration activity for (almost) all our Azure fault domain customers to availability zones and Basic SKU IP addresses.
Understanding Azure fault domains vs availability zones
“Azure fault domain vs availability zone” reflects a critical distinction in ensuring high availability and fault tolerance. Fault domains offer physical separation within a data center, while availability zones expand on this by distributing workloads across data centers within a region. This enhances resiliency against failures, making availability zones a clear step forward.
The need for migrating from fault domains to availability zones
NetApp Instaclustr has supported Azure as a cloud provider for our Managed open source offerings since 2016. Originally this offering was distributed across fault domains to ensure high availability using “Basic SKU public IP Addresses”, but this solution had some drawbacks when performing particular types of maintenance. Once released by Azure in several regions we extended our Azure support to availability zones which have a number of benefits including more explicit placement of additional resources, and we leveraged “Standard SKU Public IP’s” as part of this deployment.
When we introduced availability zones, we encouraged customers to provision new workloads in them. We also supported migrating workloads to availability zones, but we had not pushed existing deployments to do the migration. This was initially due to the reduced number of regions that supported availability zones.
In early 2024, we were notified that Azure would be retiring support for Basic SKU public IP addresses in September 2025. Notably, no new Basic SKU public IPs would be created after March 1, 2025. For us and our customers, this had the potential to impact cluster availability and stability – as we would be unable to add nodes, and some replacement operations would fail.
Very quickly we identified that we needed to migrate all customer deployments from Basic SKU to Standard SKU public IPs. Unfortunately, this operation involves node-level downtime as we needed to stop each individual virtual machine, detach the IP address, upgrade the IP address to the new SKU, and then reattach and start the instance. For customers who are operating their applications in line with our recommendations, node-level downtime does not have an impact on overall application availability, however it can increase strain on the remaining nodes.
Given that we needed to perform this potentially disruptive maintenance by a specific date, we decided to evaluate the migration of existing customers to Azure availability zones.
Key migration consideration for Cassandra clusters
As with any migration, we were looking at performing this with zero application downtime, minimal additional infrastructure costs, and as safe as possible. For some customers, we also needed to ensure that we do not change the contact IP addresses of the deployment, as this may require application updates from their side. We quickly worked out several ways to achieve this migration, each with its own set of pros and cons.
For our Cassandra customers, our go to method for changing cluster topology is through a data center migration. This is our zero-downtime migration method that we have completed hundreds of times, and have vast experience in executing. The benefit here is that we can be extremely confident of application uptime through the entire operation and be confident in the ability to pause and reverse the migration if issues are encountered. The major drawback to a data center migration is the increased infrastructure cost during the migration period – as you effectively need to have both your source and destination data centers running simultaneously throughout the operation. The other item of note, is that you will need to update your cluster contact points to the new data center.
For clusters running other applications, or customers who are more cost conscious, we evaluated doing a “node by node” migration from Basic SKU IP addresses in fault domains, to Standard SKU IP addresses in availability zones. This does not have any short-term increased infrastructure cost, however the upgrade from Basic SKU public IP to Standard SKU is irreversible, and different types of public IPs cannot coexist within the same fault domain. Additionally, this method comes with reduced rollback abilities. Therefore, we needed to devise a plan to minimize risks for our customers and ensure a seamless migration.
Developing a zero-downtime node-by-node migration strategy
To achieve a zero-downtime “node by node” migration, we explored several options, one of which involved building tooling to migrate the instances in the cloud provider but preserve all existing configurations. The tooling automates the migration process as follows:
- Begin with stopping the first VM in the cluster. For cluster availability, ensure that only 1 VM is stopped at any time.
- Create an OS disk snapshot and verify its success, then do the same for data disks
- Ensure all snapshots are created and generate new disks from snapshots
- Create a new network interface card (NIC) and confirm its status is green
- Create a new VM and attach the disks, confirming that the new VM is up and running
- Update the private IP address and verify the change
- The public IP SKU will then be upgraded, making sure this operation is successful
- The public IP will then be reattached to the VM
- Start the VM
Even though the disks are created from snapshots of the original disks, we encountered several discrepancies in our testing, with settings between the original VM and the new VM. For instance, certain configurations, such as caching policies, did not automatically carry over, requiring manual adjustments to align with our managed standards.
Recognizing these challenges, we decided to extend our existing node replacement mechanism to streamline our migration process. This is done so that a new instance is provisioned with a new OS disk with the same IP and application data. The new node is configured by the Instaclustr Managed Platform to be the same as the original node.
The next challenge: our existing solution is built so that the replaced node was provisioned to be the exact same as the original. However, for this operation we needed the new node to be placed in an availability zone instead of the same fault domain. This required us to extend the replacement operation so that when we triggered the replacement, the new node was placed in the desired availability zone. Once this operation completed, we had a replacement tool that ensured that the new instance was correctly provisioned in the availability zone, with a Standard SKU, and without data loss.
Now that we had two very viable options, we went back to our existing Azure customers to outline the problem space, and the operations that needed to be completed. We worked with all impacted customers on the best migration path for their specific use case or application and worked out the best time to complete the migration. Where possible, we first performed the migration on any test or QA environments before moving onto production environments.
Collaborative customer migration success
Some of our Cassandra customers opted to perform the migration using our data center migration path, however most customers opted for the node-by-node method. We successfully migrated the existing Azure fault domain clusters over to the Availability Zone that we were targeting, with only a very small number of clusters remaining. These clusters are operating in Azure regions which do not yet support availability zones, but we were able to successfully upgrade their public IP from Basic SKUs that are set for retirement to Standard SKUs.
No matter what provider you use, the pace of development in cloud computing can require significant effort to support ongoing maintenance and feature adoption to take advantage of new opportunities. For business-critical applications, being able to migrate to new infrastructure and leverage these opportunities while understanding the limitations and impact they have on other services is essential.
NetApp Instaclustr has a depth of experience in supporting business critical applications in the cloud. You can read more about another large-scale migration we completed The worlds Largest Apache Kafka and Apache Cassandra Migration or head over to our console for a free trial of the Instaclustr Managed Platform.
The post Azure fault domains vs availability zones: Achieving zero downtime migrations appeared first on Instaclustr.
Integrating support for AWS PrivateLink with Apache Cassandra® on the NetApp Instaclustr Managed Platform
Discover how NetApp Instaclustr leverages AWS PrivateLink for secure and seamless connectivity with Apache Cassandra®. This post explores the technical implementation, challenges faced, and the innovative solutions we developed to provide a robust, scalable platform for your data needs.
Last year, NetApp achieved a significant milestone by fully
integrating AWS PrivateLink support for Apache Cassandra® into the
NetApp Instaclustr Managed Platform. Read our AWS PrivateLink
support for Apache Cassandra General Availability announcement
here. Our Product Engineering team made remarkable progress in
incorporating this feature into various NetApp Instaclustr
application offerings. NetApp now offers AWS PrivateLink support as
an Enterprise Feature add-on for the Instaclustr Managed Platform
for
Cassandra,
Kafka®,
OpenSearch®,
Cadence®, and
Valkey
.
The journey to support AWS PrivateLink for Cassandra involved considerable engineering effort and numerous development cycles to create a solution tailored to the unique interaction between the Cassandra application and its client driver. After extensive development and testing, our product engineering team successfully implemented an enterprise ready solution. Read on for detailed insights into the technical implementation of our solution.
What is AWS PrivateLink?
PrivateLink is a networking solution from AWS that provides private connectivity between Virtual Private Clouds (VPCs) without exposing any traffic to the public internet. This solution is ideal for customers who require a unidirectional network connection (often due to compliance concerns), ensuring that connections can only be initiated from the source VPC to the destination VPC. Additionally, PrivateLink simplifies network management by eliminating the need to manage overlapping CIDRs between VPCs. The one-way connection allows connections to be initiated only from the source VPC to the managed cluster hosted in our platform (target VPC)—and not the other way around.
To get an idea of what major building blocks are involved in making up an end-to-end AWS PrivateLink solution for Cassandra, take a look at the following diagram—it’s a simplified representation of the infrastructure used to support a PrivateLink cluster:

In this example, we have a 3-node Cassandra cluster at the far right with one Cassandra node per Availability Zone (or AZ). Next, we have the VPC Endpoint Service and a Network Load Balancer (NLB). The Endpoint Service is essentially the AWS PrivateLink, and by design AWS needs it to be backed by an NLB–that’s pretty much what we have to manage on our side.
On the customer side, they must create a VPC Endpoint that enables them to privately connect to the AWS PrivateLink on our end; naturally, customers will also have to use a Cassandra client(s) to connect to the cluster.
AWS PrivateLink support with Instaclustr for Apache Cassandra
To incorporate AWS PrivateLink support with Instaclustr for Apache Cassandra on our platform, we came across a few technical challenges. First and foremost, the primary challenge was relatively straightforward: Cassandra clients need to talk to each individual node in a cluster.
However, the problem is that nodes in an AWS PrivateLink cluster are only assigned private IPs; that is what the nodes would announce by default when Cassandra clients attempt to discover the topology of the cluster. Cassandra clients cannot do much with the received private IPs as they cannot be used to connect to the nodes directly in an AWS PrivateLink setup.
We devised a plan of attack to get around this problem:
- Make each individual Cassandra node listen for CQL queries on unique ports.
- Configure the NLB so it can route traffic to the appropriate node based on the relevant unique port.
- Let clients implement the AddressTranslator interface from the Cassandra driver. The custom address translator will need to translate the received private IPs to one of the VPC Endpoint Elastic Network Interface (or ENI) IPs without altering the corresponding unique ports.
To understand this approach better, consider the following example:
Suppose we have a 3-node Cassandra cluster. According to the proposed approach we will need to do the followings:
- Let the nodes listen on ports 172.16.0.1:6001 (in AZ1), 172.16.0.2: 6002 (in AZ2) and 172.16.0.3: 6003 (in AZ3)
- Configure the NLB to listen on the same set of ports
- Define and associate target groups based on the port. For instance, the listener on port 6002 will be associated with a target group containing only the node that is listening on port 6002.
- As for how the custom address translator is expected to work,
let’s assume the VPC Endpoint ENI IPs are 192.168.0.1 (in AZ1),
192.168.0.2 (in AZ2) and 192.168.0.3 (in AZ3). The address
translator should translate received addresses like so:
- 172.16.0.1:6001 --> 192.168.0.1:6001 - 172.16.0.2:6002 --> 192.168.0.2:6002 - 172.16.0.3:6003 --> 192.168.0.3:6003
The proposed approach not only solves the connectivity problem but also allows for connecting to appropriate nodes based on query plans generated by load balancing policies.
Around the same time, we came up with a slightly modified approach as well: we realized the need for address translation can be mostly mitigated if we make the Cassandra nodes return the VPC Endpoint ENI IPs in the first place.
But the excitement did not last for long! Why? Because we quickly discovered a key problem: there is a limit to the number of listeners that can be added to any given AWS NLB of just 50.
While 50 is certainly a decent limit, the way we designed our solution meant we wouldn’t be able to provision a cluster with more than 50 nodes. This was quickly deemed to be an unacceptable limitation as it is not uncommon for a cluster to have more than 50 nodes; many Cassandra clusters in our fleet have hundreds of nodes. We had to abandon the idea of address translation and started thinking about alternative solution approaches.
Introducing Shotover Proxy
We were disappointed but did not lose hope. Soon after, we devised a practical solution centred around using one of our open source products: Shotover Proxy.
Shotover Proxy is used with Cassandra clusters to support AWS PrivateLink on the Instaclustr Managed Platform. What is Shotover Proxy, you ask? Shotover is a layer 7 database proxy built to allow developers, admins, DBAs, and operators to modify in-flight database requests. By managing database requests in transit, Shotover gives NetApp Instaclustr customers AWS PrivateLink’s simple and secure network setup with the many benefits of Cassandra.
Below is an updated version of the previous diagram that introduces some Shotover nodes in the mix:

As you can see, each AZ now has a dedicated Shotover proxy node.
In the above diagram, we have a 6-node Cassandra cluster. The Cassandra cluster sitting behind the Shotover nodes is an ordinary Private Network Cluster. The role of the Shotover nodes is to manage client requests to the Cassandra nodes while masking the real Cassandra nodes behind them. To the Cassandra client, the Shotover nodes appear to be Cassandra nodes, and it is only them that make up the entire cluster! This is the secret recipe for AWS PrivateLink for Instaclustr for Apache Cassandra that enabled us to get past the challenges discussed earlier.
So how is this model made to work?
Shotover can alter certain requests from—and responses to—the client. It can examine the tokens allocated to the Cassandra nodes in its own AZ (aka rack) and claim to be the owner of all those tokens. This essentially makes them appear to be an aggregation of the nodes in its own rack.
Given the purposely crafted topology and token allocation metadata, while the client directs queries to the Shotover node, the Shotover node in turn can pass them on to the appropriate Cassandra node and then transparently send responses back. It is worth noting that the Shotover nodes themselves do not store any data.
Because we only have 1 Shotover node per AZ in this design and there may be at most about 5 AZs per region, we only need that many listeners in the NLB to make this mechanism work. As such, the 50-listener limit on the NLB was no longer a problem.
The use of Shotover to manage client driver and cluster interoperability may sound straight forward to implement, but developing it was a year-long undertaking. As described above, the initial months of development were devoted to engineering CQL queries on unique ports and the AddressTranslator interface from the Cassandra driver to gracefully manage client connections to the Cassandra cluster. While this solution did successfully provide support for AWS PrivateLink with a Cassandra cluster, we knew that the 50-listener limit on the NLB was a barrier for use and wanted to provide our customers with a solution that could be used for any Cassandra cluster, regardless of node count.
The next few months of engineering were then devoted to the Proof of Concept of an alternative solution with the goal to investigate how Shotover could manage client requests for a Cassandra cluster with any number of nodes. And so, after a solution to support a cluster with any number of nodes was successfully proved, subsequent effort was then devoted to work through stability testing the new solution, the results of that engineering being the stable solution described above.
We have also conducted performance testing to evaluate the relative performance of a PrivateLink-enabled Cassandra cluster compared to its non-PrivateLink counterpart. Multiple iterations of performance testing were executed as some adjustments to Shotover were identified from test cases and resulted in the PrivateLink-enabled Cassandra cluster throughput and latency measuring near to a standard Cassandra cluster throughput and latency.
Related content: Read more about creating an AWS PrivateLink-enabled Cassandra cluster on the Instaclustr Managed Platform
The following was our experimental setup for identifying the max throughput in terms of Operations per second of a Cassandra PrivateLink cluster in comparison to a non-Cassandra PrivateLink cluster
- Baseline node size:
i3en.xlarge - Shotover Proxy node size on Cassandra Cluster:
CSO-PRD-c6gd.medium-54 - Cassandra version:
4.1.3 - Shotover Proxy version:
0.2.0 - Other configuration: Repair and backup disabled, Client Encryption disabled
Throughput results
| Operation | Operation rate with PrivateLink and Shotover | Operation rate without PrivateLink |
| Mixed-small (3 Nodes) | 16608 | 16206 |
| Mixed-small (6 Nodes) | 33585 | 33598 |
| Mixed-small (9 Nodes) | 51792 | 51798 |
Across different cluster sizes, we observed no significant difference in operation throughput between PrivateLink and non-PrivateLink configurations.
Latency results
Latency benchmarks were conducted at ~70% of the observed peak throughput (as above) to simulate realistic production traffic.
| Operation | Ops/second | Setup | Mean Latency (ms) | Median Latency (ms) | P95 Latency (ms) | P99 Latency (ms) |
| Mixed-small (3 Nodes) | 11630 | Non-PrivateLink | 9.90 | 3.2 | 53.7 | 119.4 |
| PrivateLink | 9.50 | 3.6 | 48.4 | 118.8 | ||
| Mixed-small (6 Nodes) | 23510 | Non-PrivateLink | 6 | 2.3 | 27.2 | 79.4 |
| PrivateLink | 9.10 | 3.4 | 45.4 | 104.9 | ||
| Mixed-small (9 Nodes) | 36255 | Non-PrivateLink | 5.5 | 2.4 | 21.8 | 67.6 |
| PrivateLink | 11.9 | 2.7 | 77.1 | 141.2 |
Results indicate that for lower to mid-tier throughput levels, AWS PrivateLink introduced minimal to negligible overhead. However, at higher operation rates, we observed increased latency, most notably at the p99 mark—likely due to network level factors or Shotover.
The increase in latency is expected as AWS PrivateLink introduces an additional hop to route traffic securely, which can impact latencies, particularly under heavy load. For the vast majority of applications, the observed latencies remain within acceptable ranges. However, for latency-sensitive workloads, we recommend adding more nodes (for high load cases) to help mitigate the impact of the additional network hop introduced by PrivateLink.
As with any generic benchmarking results, performance may vary depending on specific data model, workload characteristics, and environment. The results presented here are based on specific experimental setup using standard configurations and should primarily be used to compare the relative performance of PrivateLink vs. Non-PrivateLink networking under similar conditions.
Why choose AWS PrivateLink with NetApp Instaclustr?
NetApp’s commitment to innovation means you benefit from cutting-edge technology combined with ease of use. With AWS PrivateLink support on our platform, customers gain:
- Enhanced security: All traffic stays private, never touching the internet.
- Simplified networking: No need to manage complex CIDR overlaps.
- Enterprise scalability: Handles sizable clusters effortlessly.
By addressing challenges, such as the NLB listener cap and private-to-VPC IP translation, we’ve created a solution that balances efficiency, security, and scalability.
Experience PrivateLink today
The integration of AWS PrivateLink with Apache Cassandra® is now generally available with production-ready SLAs for our customers. Log in to the Console to create a Cassandra cluster with support for AWS PrivateLink with just a few clicks today. Whether you’re managing sensitive workloads or demanding performance at scale, this feature delivers unmatched value.
Want to see it in action? Book a free demo today and experience the Shotover-powered magic of AWS PrivateLink firsthand.
Resources
- Getting started: Visit the documentation to learn how to create an AWS PrivateLink-enabled Apache Cassandra cluster on the Instaclustr Managed Platform.
- Connecting clients: Already created a Cassandra cluster with AWS PrivateLink? Click here to read about how to connect Cassandra clients in one VPC to an AWS PrivateLink-enabled Cassandra cluster on the Instaclustr Platform.
- General availability announcement: For more details, read our General Availability announcement on AWS PrivateLink support for Cassandra.
The post Integrating support for AWS PrivateLink with Apache Cassandra® on the NetApp Instaclustr Managed Platform appeared first on Instaclustr.
Compaction Strategies, Performance, and Their Impact on Cassandra Node Density
This is the third post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. In the first post, I examined how streaming operations impact node density and laid out the groundwork for understanding why higher node density leads to significant cost savings. In the second post, I discussed how compaction throughput is critical to node density and introduced the optimizations we implemented in CASSANDRA-15452 to improve throughput on disaggregated storage like EBS.
Cassandra Compaction Throughput Performance Explained
This is the second post in my series on improving node density and lowering costs with Apache Cassandra. In the previous post, I examined how streaming performance impacts node density and operational costs. In this post, I’ll focus on compaction throughput, and a recent optimization in Cassandra 5.0.4 that significantly improves it, CASSANDRA-15452.
This post assumes some familiarity with Apache Cassandra storage engine fundamentals. The documentation has a nice section covering the storage engine if you’d like to brush up before reading this post.
CEP-24 Behind the scenes: Developing Apache Cassandra®’s password validator and generator
Introduction: The need for an Apache Cassandra® password validator and generator
Here’s the problem: while users have always had the ability to create whatever password they wanted in Cassandra–from straightforward to incredibly complex and everything in between–this ultimately created a noticeable security vulnerability.
While organizations might have internal processes for generating secure passwords that adhere to their own security policies, Cassandra itself did not have the means to enforce these standards. To make the security vulnerability worse, if a password initially met internal security guidelines, users could later downgrade their password to a less secure option simply by using “ALTER ROLE” statements.
When internal password requirements are enforced for an individual, users face the additional burden of creating compliant passwords. This inevitably involved lots of trial-and-error in attempting to create a compliant password that satisfied complex security roles.
But what if there was a way to have Cassandra automatically create passwords that meet all bespoke security requirements–but without requiring manual effort from users or system operators?
That’s why we developed CEP-24: Password validation/generation. We recognized that the complexity of secure password management could be significantly reduced (or eliminated entirely) with the right approach–and improving both security and user experience at the same time.
The Goals of CEP-24
A Cassandra Enhancement Proposal (or CEP) is a structured process for proposing, creating, and ultimately implementing new features for the Cassandra project. All CEPs are thoroughly vetted among the Cassandra community before they are officially integrated into the project.
These were the key goals we established for CEP-24:
- Introduce a way to enforce password strength upon role creation or role alteration.
- Implement a reference implementation of a password validator which adheres to a recommended password strength policy, to be used for Cassandra users out of the box.
- Emit a warning (and proceed) or just reject “create role” and “alter role” statements when the provided password does not meet a certain security level, based on user configuration of Cassandra.
- To be able to implement a custom password validator with its own policy, whatever it might be, and provide a modular/pluggable mechanism to do so.
- Provide a way for Cassandra to generate a password which would pass the subsequent validation for use by the user.
The Cassandra Password Validator and Generator builds upon an established framework in Cassandra called Guardrails, which was originally implemented under CEP-3 (more details here).
The password validator implements a custom guardrail introduced
as part of
CEP-24. A custom guardrail can validate and generate values of
arbitrary types when properly implemented. In the CEP-24 context,
the password guardrail provides
CassandraPasswordValidator by extending
ValueValidator, while passwords are generated by
CassandraPasswordGenerator by extending
ValueGenerator. Both components work with passwords as
String type values.
Password validation and generation are configured in the
cassandra.yaml file under the
password_validator section. Let’s explore the key
configuration properties available. First, the
class_name and generator_class_name
parameters specify which validator and generator classes will be
used to validate and generate passwords respectively.
Cassandra
ships CassandraPasswordValidator and CassandraPasswordGenerator out
of the box. However, if a particular enterprise decides that they
need something very custom, they are free to implement their own
validators, put it on Cassandra’s class path and reference it in
the configuration behind class_name parameter. Same for the
validator.
CEP-24 provides implementations of the validator and generator that the Cassandra team believes will satisfy the requirements of most users. These default implementations address common password security needs. However, the framework is designed with flexibility in mind, allowing organizations to implement custom validation and generation rules that align with their specific security policies and business requirements.
password_validator: # Implementation class of a validator. When not in form of FQCN, the # package name org.apache.cassandra.db.guardrails.validators is prepended. # By default, there is no validator. class_name: CassandraPasswordValidator # Implementation class of related generator which generates values which are valid when # tested against this validator. When not in form of FQCN, the # package name org.apache.cassandra.db.guardrails.generators is prepended. # By default, there is no generator. generator_class_name: CassandraPasswordGenerator
Password quality might be looked at as the number of characteristics a password satisfies. There are two levels for any password to be evaluated – warning level and failure level. Warning and failure levels nicely fit into how Guardrails act. Every guardrail has warning and failure thresholds. Based on what value a specific guardrail evaluates, it will either emit a warning to a user that its usage is discouraged (but ultimately allowed) or it will fail to be set altogether.
This same principle applies to password evaluation – each password is assessed against both warning and failure thresholds. These thresholds are determined by counting the characteristics present in the password. The system evaluates five key characteristics: the password’s overall length, the number of uppercase characters, the number of lowercase characters, the number of special characters, and the number of digits. A comprehensive password security policy can be enforced by configuring minimum requirements for each of these characteristics.
# There are four characteristics: # upper-case, lower-case, special character and digit. # If this value is set e.g. to 3, a password has to # consist of 3 out of 4 characteristics. # For example, it has to contain at least 2 upper-case characters, # 2 lower-case, and 2 digits to pass, # but it does not have to contain any special characters. # If the number of characteristics found in the password is # less than or equal to this number, it will emit a warning. characteristic_warn: 3 # If the number of characteristics found in the password is #less than or equal to this number, it will emit a failure. characteristic_fail: 2
Next, there are configuration parameters for each characteristic which count towards warning or failure:
# If the password is shorter than this value, # the validator will emit a warning. length_warn: 12 # If a password is shorter than this value, # the validator will emit a failure. length_fail: 8 # If a password does not contain at least n # upper-case characters, the validator will emit a warning. upper_case_warn: 2 # If a password does not contain at least # n upper-case characters, the validator will emit a failure. upper_case_fail: 1 # If a password does not contain at least # n lower-case characters, the validator will emit a warning. lower_case_warn: 2 # If a password does not contain at least # n lower-case characters, the validator will emit a failure. lower_case_fail: 1 # If a password does not contain at least # n digits, the validator will emit a warning. digit_warn: 2 # If a password does not contain at least # n digits, the validator will emit a failure. digit_fail: 1 # If a password does not contain at least # n special characters, the validator will emit a warning. special_warn: 2 # If a password does not contain at least # n special characters, the validator will emit a failure. special_fail: 1
It is also possible to say that illegal sequences of certain length found in a password will be forbidden:
# If a password contains illegal sequences that are at least this long, it is invalid. # Illegal sequences might be either alphabetical (form 'abcde'), # numerical (form '34567'), or US qwerty (form 'asdfg') as well # as sequences from supported character sets. # The minimum value for this property is 3, # by default it is set to 5. illegal_sequence_length: 5
Lastly, it is also possible to configure a dictionary of passwords to check against. That way, we will be checking against password dictionary attacks. It is up to the operator of a cluster to configure the password dictionary:
# Dictionary to check the passwords against. Defaults to no dictionary. # Whole dictionary is cached into memory. Use with caution with relatively big dictionaries. # Entries in a dictionary, one per line, have to be sorted per String's compareTo contract. dictionary: /path/to/dictionary/file
Now that we have gone over all the configuration parameters, let’s take a look at an example of how password validation and generation look in practice.
Consider a scenario where a Cassandra super-user (such as the default ‘cassandra’ role) attempts to create a new role named ‘alice’.
cassandra@cqlsh> CREATE ROLE alice WITH PASSWORD = 'cassandraisadatabase' AND LOGIN = true; InvalidRequest: Error from server: code=2200 [Invalid query] message="Password was not set as it violated configured password strength policy. To fix this error, the following has to be resolved: Password contains the dictionary word 'cassandraisadatabase'. You may also use 'GENERATED PASSWORD' upon role creation or alteration."
The password is not found in the dictionary, but it is not long enough. When an operator sees this, they will try to fix it by making the password longer:
cassandra@cqlsh> CREATE ROLE alice WITH PASSWORD = 'T8aum3?' AND LOGIN = true; InvalidRequest: Error from server: code=2200 [Invalid query] message="Password was not set as it violated configured password strength policy. To fix this error, the following has to be resolved: Password must be 8 or more characters in length. You may also use 'GENERATED PASSWORD' upon role creation or alteration."
The password is finally set, but it is not completely secure. It satisfies the minimum requirements but our validator identified that not all characteristics were met.
cassandra@cqlsh> CREATE ROLE alice WITH PASSWORD = 'mYAtt3mp' AND LOGIN = true; Warnings: Guardrail password violated: Password was set, however it might not be strong enough according to the configured password strength policy. To fix this warning, the following has to be resolved: Password must be 12 or more characters in length. Passwords must contain 2 or more digit characters. Password must contain 2 or more special characters. Password matches 2 of 4 character rules, but 4 are required. You may also use 'GENERATED PASSWORD' upon role creation or alteration.
The password is finally set, but it is not completely secure. It satisfies the minimum requirements but our validator identified that not all characteristics were met.
When an operator saw this, they noticed the note about the ‘GENERATED PASSWORD’ clause which will generate a password automatically without an operator needing to invent it on their own. This is a lot of times, as shown, a cumbersome process better to be left on a machine. Making it also more efficient and reliable.
cassandra@cqlsh> ALTER ROLE alice WITH GENERATED PASSWORD; generated_password ------------------ R7tb33?.mcAX
The generated password shown above will satisfy all the rules we have configured in the cassandra.yaml automatically. Every generated password will satisfy all of the rules. This is clearly an advantage over manual password generation.
When the CQL statement is executed, it will be visible in the CQLSH history (HISTORY command or in cqlsh_history file) but the password will not be logged, hence it cannot leak. It will also not appear in any auditing logs. Previously, Cassandra had to obfuscate such statements. This is not necessary anymore.
We can create a role with generated password like this:
cassandra@cqlsh> CREATE ROLE alice WITH GENERATED PASSWORD AND LOGIN = true; or by CREATE USER: cassandra@cqlsh> CREATE USER alice WITH GENERATED PASSWORD;
When a password is generated for alice (out of scope of this documentation), she can log in:
$ cqlsh -u alice -p R7tb33?.mcAX ... alice@cqlsh>
Note: It is recommended to save password to ~/.cassandra/credentials, for example:
[PlainTextAuthProvider] username = cassandra password = R7tb33?.mcAX
and by setting auth_provider in ~/.cassandra/cqlshrc
[auth_provider] module = cassandra.auth classname = PlainTextAuthProvider
It is also possible to configure password validators in such a way that a user does not see why a password failed. This is driven by configuration property for password_validator called detailed_messages. When set to false, the violations will be very brief:
alice@cqlsh> ALTER ROLE alice WITH PASSWORD = 'myattempt'; InvalidRequest: Error from server: code=2200 [Invalid query] message="Password was not set as it violated configured password strength policy. You may also use 'GENERATED PASSWORD' upon role creation or alteration."
The following command will automatically generate a new password that meets all configured security requirements.
alice@cqlsh> ALTER ROLE alice WITH GENERATED PASSWORD;
Several potential enhancements to password generation and validation could be implemented in future releases. One promising extension would be validating new passwords against previous values. This would prevent users from reusing passwords until after they’ve created a specified number of different passwords. A related enhancement could include restricting how frequently users can change their passwords, preventing rapid cycling through passwords to circumvent history-based restrictions.
These features, while valuable for comprehensive password security, were considered beyond the scope of the initial implementation and may be addressed in future updates.
Final thoughts and next steps
The Cassandra Password Validator and Generator implemented under CEP-24 represents a significant improvement in Cassandra’s security posture.
By providing robust, configurable password policies with built-in enforcement mechanisms and convenient password generation capabilities, organizations can now ensure compliance with their security standards directly at the database level. This not only strengthens overall system security but also improves the user experience by eliminating guesswork around password requirements.
As Cassandra continues to evolve as an enterprise-ready database solution, these security enhancements demonstrate a commitment to meeting the demanding security requirements of modern applications while maintaining the flexibility that makes Cassandra so powerful.
Ready to experience CEP-24 yourself? Try it out on the Instaclustr Managed Platform and spin up your first Cassandra cluster for free.
CEP-24 is just our latest contribution to open source. Check out everything else we’re working on here.
The post CEP-24 Behind the scenes: Developing Apache Cassandra®’s password validator and generator appeared first on Instaclustr.
Introduction to similarity search: Part 2–Simplifying with Apache Cassandra® 5’s new vector data type
In Part 1 of this series, we explored how you can combine Cassandra 4 and OpenSearch to perform similarity searches with word embeddings. While that approach is powerful, it requires managing two different systems.
But with the release of Cassandra 5, things become much simpler.
Cassandra 5 introduces a native VECTOR data type and built-in Vector Search capabilities, simplifying the architecture by enabling Cassandra 5 to handle storage, indexing, and querying seamlessly within a single system.
Now in Part 2, we’ll dive into how Cassandra 5 streamlines the process of working with word embeddings for similarity search. We’ll walk through how the new vector data type works, how to store and query embeddings, and how the Storage-Attached Indexing (SAI) feature enhances your ability to efficiently search through large datasets.
The power of vector search in Cassandra 5
Vector search is a game-changing feature added in Cassandra 5 that enables you to perform similarity searches directly within the database. This is especially useful for AI applications, where embeddings are used to represent data like text or images as high-dimensional vectors. The goal of vector search is to find the closest matches to these vectors, which is critical for tasks like product recommendations or image recognition.
The key to this functionality lies in embeddings: arrays of floating-point numbers that represent the similarity of objects. By storing these embeddings as vectors in Cassandra, you can use Vector Search to find connections in your data that may not be obvious through traditional queries.
How vectors work
Vectors are fixed-size sequences of non-null values, much like lists. However, in Cassandra 5, you cannot modify individual elements of a vector — you must replace the entire vector if you need to update it. This makes vectors ideal for storing embeddings, where you need to work with the whole data structure at once.
When working with embeddings, you’ll typically store them as vectors of floating-point numbers to represent the semantic meaning.
Storage-Attached Indexing (SAI): The engine behind vector search
Vector Search in Cassandra 5 is powered by Storage-Attached Indexing, which enables high-performance indexing and querying of vector data. SAI is essential for Vector Search, providing the ability to create column-level indexes on vector data types. This ensures that your vector queries are both fast and scalable, even with large datasets.
SAI isn’t just limited to vectors—it also indexes other types of data, making it a versatile tool for boosting the performance of your queries across the board.
Example: Performing similarity search with Cassandra 5’s vector data type
Now that we’ve introduced the new vector data type and the power of Vector Search in Cassandra 5, let’s dive into a practical example. In this section, we’ll show how to set up a table to store embeddings, insert data, and perform similarity searches directly within Cassandra.
Step 1: Setting up the embeddings table
To get started with this example, you’ll need access to a Cassandra 5 cluster. Cassandra 5 introduces native support for vector data types and Vector Search, available on Instaclustr’s managed platform. Once you have your cluster up and running, the first step is to create a table to store the embeddings. We’ll also create an index on the vector column to optimize similarity searches using SAI.
CREATE KEYSPACE aisearch WITH REPLICATION = {{'class': 'SimpleStrategy', ' replication_factor': 1}}; CREATE TABLE IF NOT EXISTS embeddings ( id UUID, paragraph_uuid UUID, filename TEXT, embeddings vector<float, 300>, text TEXT, last_updated timestamp, PRIMARY KEY (id, paragraph_uuid) ); CREATE INDEX IF NOT EXISTS ann_index ON embeddings(embeddings) USING 'sai';
This setup allows us to store the embeddings as 300-dimensional vectors, along with metadata like file names and text. The SAI index will be used to speed up similarity searches on the embedding’s column.
You can also fine-tune the index by specifying the similarity function to be used for vector comparisons. Cassandra 5 supports three types of similarity functions: DOT_PRODUCT, COSINE, and EUCLIDEAN. By default, the similarity function is set to COSINE, but you can specify your preferred method when creating the index:
CREATE INDEX IF NOT EXISTS ann_index ON embeddings(embeddings) USING 'sai' WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
Each similarity function has its own advantages depending on your use case. DOT_PRODUCT is often used when you need to measure the direction and magnitude of vectors, COSINE is ideal for comparing the angle between vectors, and EUCLIDEAN calculates the straight-line distance between vectors. By selecting the appropriate function, you can optimize your search results to better match the needs of your application.
Step 2: Inserting embeddings into Cassandra 5
To insert embeddings into Cassandra 5, we can use the same code from the first part of this series to extract text from files, load the FastText model, and generate the embeddings. Once the embeddings are generated, the following function will insert them into Cassandra:
import time from uuid import uuid4, UUID from cassandra.cluster import Cluster from cassandra.query import SimpleStatement from cassandra.policies import DCAwareRoundRobinPolicy from cassandra.auth import PlainTextAuthProvider from google.colab import userdata # Connect to the single-node cluster cluster = Cluster( # Replace with your IP list ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx ", " xxx.xxx.xxx.xxx "], # Single-node cluster address load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='AWS_VPC_US_EAST_1'), # Update the local data centre if needed port=9042, auth_provider=PlainTextAuthProvider ( username='iccassandra', password='replace_with_your_password' ) ) session = cluster.connect() print('Connected to cluster %s' % cluster.metadata.cluster_name) def insert_embedding_to_cassandra(session, embedding, id=None, paragraph_uuid=None, filename=None, text=None, keyspace_name=None): try: embeddings = list(map(float, embedding)) # Generate UUIDs if not provided if id is None: id = uuid4() if paragraph_uuid is None: paragraph_uuid = uuid4() # Ensure id and paragraph_uuid are UUID objects if isinstance(id, str): id = UUID(id) if isinstance(paragraph_uuid, str): paragraph_uuid = UUID(paragraph_uuid) # Create the query string with placeholders insert_query = f""" INSERT INTO {keyspace_name}.embeddings (id, paragraph_uuid, filename, embeddings, text, last_updated) VALUES (?, ?, ?, ?, ?, toTimestamp(now())) """ # Create a prepared statement with the query prepared = session.prepare(insert_query) # Execute the query session.execute(prepared.bind((id, paragraph_uuid, filename, embeddings, text))) return None # Successful insertion except Exception as e: error_message = f"Failed to execute query:\nError: {str(e)}" return error_message # Return error message on failure def insert_with_retry(session, embedding, id=None, paragraph_uuid=None, filename=None, text=None, keyspace_name=None, max_retries=3, retry_delay_seconds=1): retry_count = 0 while retry_count < max_retries: result = insert_embedding_to_cassandra(session, embedding, id, paragraph_uuid, filename, text, keyspace_name) if result is None: return True # Successful insertion else: retry_count += 1 print(f"Insertion failed on attempt {retry_count} with error: {result}") if retry_count < max_retries: time.sleep(retry_delay_seconds) # Delay before the next retry return False # Failed after max_retries # Replace the file path pointing to the desired file file_path = "/path/to/Cassandra-Best-Practices.pdf" paragraphs_with_embeddings = extract_text_with_page_number_and_embeddings(file_path) from tqdm import tqdm for paragraph in tqdm(paragraphs_with_embeddings, desc="Inserting paragraphs"): if not insert_with_retry( session=session, embedding=paragraph['embedding'], id=paragraph['uuid'], paragraph_uuid=paragraph['paragraph_uuid'], text=paragraph['text'], filename=paragraph['filename'], keyspace_name=keyspace_name, max_retries=3, retry_delay_seconds=1 ): # Display an error message if insertion fails tqdm.write(f"Insertion failed after maximum retries for UUID {paragraph['uuid']}: {paragraph['text'][:50]}...")
This function handles inserting embeddings and metadata into Cassandra, ensuring that UUIDs are correctly generated for each entry.
Step 3: Performing similarity searches in Cassandra 5
Once the embeddings are stored, we can perform similarity searches directly within Cassandra using the following function:
import numpy as np # ------------------ Embedding Functions ------------------ def text_to_vector(text): """Convert a text chunk into a vector using the FastText model.""" words = text.split() vectors = [fasttext_model[word] for word in words if word in fasttext_model.key_to_index] return np.mean(vectors, axis=0) if vectors else np.zeros(fasttext_model.vector_size) def find_similar_texts_cassandra(session, input_text, keyspace_name=None, top_k=5): # Convert the input text to an embedding input_embedding = text_to_vector(input_text) input_embedding_str = ', '.join(map(str, input_embedding.tolist())) # Adjusted query without the ORDER BY clause and correct comment syntax query = f""" SELECT text, filename, similarity_cosine(embeddings, ?) AS similarity FROM {keyspace_name}.embeddings ORDER BY embeddings ANN OF [{input_embedding_str}] LIMIT {top_k}; """ prepared = session.prepare(query) bound = prepared.bind((input_embedding,)) rows = session.execute(bound) # Sort the results by similarity in Python similar_texts = sorted([(row.similarity, row.filename, row.text) for row in rows], key=lambda x: x[0], reverse=True) return similar_texts[:top_k] from IPython.display import display, HTML # The word you want to find similarities for input_text = "place" # Call the function to find similar texts in the Cassandra database similar_texts = find_similar_texts_cassandra(session, input_text, keyspace_name="aisearch", top_k=10)
This function searches for similar embeddings in Cassandra and retrieves the top results based on cosine similarity. Under the hood, Cassandra’s vector search uses Hierarchical Navigable Small Worlds (HNSW). HNSW organizes data points in a multi-layer graph structure, making queries significantly faster by narrowing down the search space efficiently—particularly important when handling large datasets.
Step 4: Displaying the results
To display the results in a readable format, we can loop through the similar texts and present them along with their similarity scores:
# Print the similar texts along with their similarity scores for similarity, filename, text in similar_texts: html_content = f""" <div style="margin-bottom: 10px;"> <p><b>Similarity:</b> {similarity:.4f}</p> <p><b>Text:</b> {text}</p> <p><b>File:</b> {filename}</p> </div> <hr/> """ display(HTML(html_content))
This code will display the top similar texts, along with their similarity scores and associated file names.
Cassandra 5 vs. Cassandra 4 + OpenSearch®
Cassandra 4 relies on an integration with OpenSearch to handle word embeddings and similarity searches. This approach works well for applications that are already using or comfortable with OpenSearch, but it does introduce additional complexity with the need to maintain two systems.
Cassandra 5, on the other hand, brings vector support directly into the database. With its native VECTOR data type and similarity search functions, it simplifies your architecture and improves performance, making it an ideal solution for applications that require embedding-based searches at scale.
| Feature | Cassandra 4 + OpenSearch | Cassandra 5 (Preview) |
| Embedding Storage | OpenSearch | Native VECTOR Data Type |
| Similarity Search | KNN Plugin in OpenSearch | COSINE, EUCLIDEAN, DOT_PRODUCT |
| Search Method | Exact K-Nearest Neighbor | Approximate Nearest Neighbor (ANN) |
| System Complexity | Requires two systems | All-in-one Cassandra solution |
Conclusion: A simpler path to similarity search with Cassandra 5
With Cassandra 5, the complexity of setting up and managing a separate search system for word embeddings is gone. The new vector data type and Vector Search capabilities allow you to perform similarity searches directly within Cassandra, simplifying your architecture and making it easier to build AI-powered applications.
Coming up: more in-depth examples and use cases that demonstrate how to take full advantage of these new features in Cassandra 5 in future blogs!
Ready to experience vector search with Cassandra 5? Spin up your first cluster for free on the Instaclustr Managed Platform and try it out!
The post Introduction to similarity search: Part 2–Simplifying with Apache Cassandra® 5’s new vector data type appeared first on Instaclustr.
How Cassandra Streaming, Performance, Node Density, and Cost are All related
This is the first post of several I have planned on optimizing Apache Cassandra for maximum cost efficiency. I’ve spent over a decade working with Cassandra and have spent tens of thousands of hours data modeling, fixing issues, writing tools for it, and analyzing it’s performance. I’ve always been fascinated by database performance tuning, even before Cassandra.
A decade ago I filed one of my first issues with the project, where I laid out my target goal of 20TB of data per node. This wasn’t possible for most workloads at the time, but I’ve kept this target in my sights.
Cassandra 5 Released! What's New and How to Try it
Apache Cassandra 5.0 has officially landed! This highly anticipated release brings a range of new features and performance improvements to one of the most popular NoSQL databases in the world. Having recently hosted a webinar covering the major features of Cassandra 5.0, I’m excited to give a brief overview of the key updates and show you how to easily get hands-on with the latest release using easy-cass-lab.
You can grab the latest release on the Cassandra download page.
easy-cass-lab v5 released
I’ve got some fun news to start the week off for users of easy-cass-lab: I’ve just released version 5. There are a number of nice improvements and bug fixes in here that should make it more enjoyable, more useful, and lay groundwork for some future enhancements.
- When the cluster starts, we wait for the storage service to
reach NORMAL state, then move to the next node. This is in contrast
to the previous behavior where we waited for 2 minutes after
starting a node. This queries JMX directly using Swiss Java Knife
and is more reliable than the 2-minute method. Please see
packer/bin-cassandra/wait-for-up-normalto read through the implementation. - Trunk now works correctly. Unfortunately, AxonOps doesn’t support trunk (5.1) yet, and using the agent was causing a startup error. You can test trunk out, but for now the AxonOps integration is disabled.
- Added a new repl mode. This saves keystrokes and provides some
auto-complete functionality and keeps SSH connections open. If
you’re going to do a lot of work with ECL this will help you be a
little more efficient. You can try this out with
ecl repl. - Power user feature: Initial support for profiles in AWS regions
other than
us-west-2. We only provide AMIs forus-west-2, but you can now set up a profile in an alternate region, and build the required AMIs usingeasy-cass-lab build-image. This feature is still under development and requires using aneasy-cass-labbuild from source. Credit to Jordan West for contributing this work. - Power user feature: Support for multiple profiles. Setting the
EASY_CASS_LAB_PROFILEenvironment variable allows you to configure alternate profiles. This is handy if you want to use multiple regions or have multiple organizations. - The project now uses Kotlin instead of Groovy for Gradle configuration.
- Updated Gradle to 8.9.
- When using the list command, don’t show the alias “current”.
- Project cleanup, remove old unused pssh, cassandra build, and async profiler subprojects.
The release has been released to the project’s GitHub page and to homebrew. The project is largely driven by my own consulting needs and for my training. If you’re looking to have some features prioritized please reach out, and we can discuss a consulting engagement.
easy-cass-lab updated with Cassandra 5.0 RC-1 Support
I’m excited to announce that the latest version of easy-cass-lab now supports Cassandra 5.0 RC-1, which was just made available last week! This update marks a significant milestone, providing users with the ability to test and experiment with the newest Cassandra 5.0 features in a simplified manner. This post will walk you through how to set up a cluster, SSH in, and run your first stress test.
For those new to easy-cass-lab, it’s a tool designed to streamline the setup and management of Cassandra clusters in AWS, making it accessible for both new and experienced users. Whether you’re running tests, developing new features, or just exploring Cassandra, easy-cass-lab is your go-to tool.