10 December 2025, 2:23 pm by
ScyllaDB
It’s been a while since I last attended re:Invent… long enough that
I’d almost forgotten how expensive a bottle of water can be in a
Vegas hotel room. This time was different. Instead of just
attending, I wore many hats: audio-visual tech, salesperson,
technical support, friendly ear, and booth rep. re:Invent is an
experience that’s hard to explain to anyone outside tech. Picture
65,000 people converging in Las Vegas: DJ booths thumping beside
deep-dive technical sessions, competitions running nonstop, and
enough swag to fill a million Christmas stockings. Only then do you
start to grasp what it’s really like. Needless to say, having the
privilege to fly halfway across the globe, stay in a bougie hotel,
and help host the impressive ScyllaDB booth was a fitting way to
finish the year on a high. This year was ScyllaDB’s biggest
re:Invent presence yet… a full-scale booth designed to show what
predictable performance at extreme scale really looks like. The
booth was buzzing from open to close, packed with data engineers,
developers, and decision-makers exploring how ScyllaDB handles
millions of operations per second with single-digit P99 millisecond
latency. Some of the standout moments for me included: Customer
sessions at the Content Hub featuring
Freshworks and SAS, both showcasing how ScyllaDB powers their
mission-critical AI workloads. In-booth talks from Freshworks, SAS,
Sprig, and
Revinate. Real users sharing real production stories. There’s
nothing better than hearing
how our customers are conquering
performance challenges at scale. Technical deep dives exploring
everything from linear scalability to
real-time AI pipelines.
ScyllaDB X Cloud linear-scale demonstration, a live
visualization of throughput scaling predictably with every
additional node. Watching tablets rebalance automatically linearly
never gets old. High-impact in-booth videos driving home
ScyllaDB’s key differentiators. I’m particularly proud of our
DB Guy parody along with the ScyllaDB monster on the Vegas Sphere
(yes, we fooled many with that one) For many visitors, this was
their first time seeing ScyllaDB X Cloud and Vector Search in
action. Our demos made it clear what we mean by performance at
scale: serving billions of vectors or millions of events per
second, all while keeping tail latency comfortably under 5 ms and
cost behavior entirely predictable. Developers that I chatted to
loved that ScyllaDB drops neatly into existing
Cassandra or
DynamoDB environments while delivering much better performance
and a lower TCO. Architects zeroed in on our flexibility across EC2
instance families (especially i8g) and hybrid deployment models.
The ability to bring your own AWS (or GCP) account sparked plenty
of conversations around performance, security, and data
sovereignty. What really stood out this year was the shift in
mindset. re:Invent 2025 confirmed that the future of extreme scale
database engineering belongs to real-time systems … from AI
inference to IoT telemetry, where low latency and linear scale are
essential for success. ScyllaDB sits right at that intersection: a
database built to scale fearlessly, delivering the control of bare
metal with the simplicity of managed cloud. If you missed us in
Vegas, don’t worry … you can still catch the highlights. Watch our
customer sessions and the full X Cloud demo, and see why
predictable performance at extreme scale isn’t just our
tagline. It’s what we do every day.
Catch the
re:Invent videos
8 December 2025, 12:00 pm by
Apache Cassandra - Instaclustr
Apache Cassandra® committers are working hard,
building new features to help you more seamlessly ease operational
challenges of a distributed database. Let’s dive into some recently
approved CEPs and explain how these upcoming features will improve
your workflow and efficiency.
What is a CEP?
CEP stands for Cassandra Enhancement Proposal. They are the
process for outlining, discussing, and gathering endorsements for a
new feature in Cassandra. They’re more than a feature request;
those who put forth a CEP have intent to build the feature, and the
proposal encourages a high amount of collaboration with the
Cassandra contributors.
The CEPs discussed here were recently approved for
implementation or have had significant progress in their
implementation. As with all open-source development,
inclusion in a future release is contingent upon successful
implementation, community consensus, testing, and approval by
project committers.
CEP-42: Constraints framework
With collaboration from NetApp Instaclustr, CEP-42, and
subsequent iterations, delivers schema level constraints giving
Cassandra users and operators more control over their data. Adding
constraints on the schema level means that data can be validated at
write time and send the appropriate error when data is invalid.
Constraints are defined in-line or as a separate definition. The
inline style allows for only one constraint while a definition
allows users to define multiple constraints with different
expressions.
The scope of this CEP-42 initially supported a few constraints
that covered the majority of cases, but in follow up efforts the
expanded list of support includes scalar (>, <, >=,
<=), LENGTH(), OCTET_LENGTH(), NOT NULL, JSON(), REGEX(). A user
is also able to define their own constraints if they implement it
and put them on Cassandra’s class path.
A simple example of an in-line constraint:
CREATE TABLE users (
username text PRIMARY KEY,
age int CHECK age >= 0 and age < 120
);
Constraints are not supported for UDTs (User-Defined Types) nor
collections (except for using NOT NULL for frozen collections).
Enabling constraints closer to the data is a subtle but mighty
way for operators to ensure that data goes into the database
correctly. By defining rules just once, application code is
simplified, more robust, and prevents validation from being
bypassed. Those who have worked with MySQL, Postgres, or MongoDB
will enjoy this addition to Cassandra.
CEP-51: Support “Include” Semantics for cassandra.yaml
The cassandra.yaml file holds important settings for storage,
memory, replication, compaction, and more. It’s no surprise that
the average size of the file around 1,000 lines (though, yes—most
are comments). CEP-51 enables splitting the cassandra.yaml
configuration into multiple files using includes
semantics. From the outside, this feels like a small change, but
the implications are huge if a user chooses to opt-in.
In general, the size of the configuration file makes it
difficult to manage and coordinate changes. It’s often the case
that multiple teams manage various aspects of the single file. In
addition, cassandra.yaml permissions are readable for those with
access to this file, meaning private information like credentials
are comingled with all other settings. There is risk from an
operational and security standpoint.
Enabling the new semantics and therefore modularity for the
configuration file eases management, deployment, complexity around
environment-specific settings, and security in one shot. The
configuration file follows the principle of least privilege once
the cassandra.yaml is broken up into smaller, well-defined files;
sensitive configuration settings are separated out from general
settings with fine-grained access for the individual files. With
the feature enabled, different development teams are better
equipped to deploy safely and independently.
If you’ve deployed your Cassandra cluster on the NetApp
Instaclustr platform, the cassandra.yaml file is already configured
and managed for you. We pride ourselves on making it easy for you
to get up and running fast.
CEP-52: Schema annotations for Apache Cassandra
With extensive review by the NetApp Instaclustr team and Stefan
Miklosovic, CEP-52 introduces schema annotations in CQL allowing
in-line comments and labels of schema elements such as keyspaces,
tables, columns, and User Defined Types (UDT). Users can easily
define and alter comments and labels on these elements. They can be
copied over when desired using CREATE TABLE LIKE syntax. Comments
are stored as plain text while labels are stored as structured
metadata.
Comments and labels serve different annotation purposes:
Comments document what a schema object is for, whereas labels
describe how sensitive or controlled it is meant to be. For
example, labels can be used to identify columns as “PII” or
“confidential”, while the comment on that column explains usage,
e.g. “Last login timestamp.”
Users can query these annotations. CEP-52 defines two new
read-only tables (system_views.schema_comments and
system_views.schema_security_labels) to store comments and security
labels so objects with comments can be returned as a list or a
user/machine process can query for specific labels, beneficial for
auditing and classification. Note that adding security labels are
descriptive metadata and do not enforce access control to the
data.
CEP-53: Cassandra rolling restarts via Sidecar
Sidecar is an auxiliary component in the Cassandra ecosystem
that exposes cluster management and streaming capabilities through
APIs. Introducing rolling restarts through Sidecar, this feature is
designed to provide operators with more efficient and safer
restarts without cluster-wide downtime. More specifically,
operators can monitor, pause, resume, and abort restarts all
through an API with configurable options if restarts fail.
Rolling restarts brings operators a step closer to cluster-wide
operations and lifecycle management via Sidecar. Operators will be
able to configure the number of nodes to restart concurrently with
minimal risk as this CEP unleashes clear states as a node
progresses through a restart. Accounting for a variety of edge
cases, an operator can feel assured that, for example, a
non-functioning sidecar won’t derail operations.
The current process for restarting a node is a multi-step,
manual process, which does not scale for large cluster sizes (and
is also tedious for small clusters). Restarting clusters previously
lacked a streamlined approach as each node needed to be restarted
one at a time, making the process time-intensive and
error-prone.
Though Sidecar is still considered WIP, it’s got big plans to
improve operating large clusters!
The NetApp Instaclustr Platform, in conjunction with our expert
TechOps team already orchestrates these laborious tasks for our
Cassandra customers with a high level of care to ensure their
cluster stays online. Restarting or upgrading your Cassandra nodes
is a huge pain-point for operators, but it doesn’t have to be when
using our managed platform (with round-the-clock support!)
CEP-54: Zstd with dictionary SSTable compression
CEP-54, with NetApp Instaclustr’s collaboration, aims to add
support Zstd with dictionary compression for SSTables. Zstd, or
Zstandard, is a fast, lossless data compression algorithm that
boasts impressive ratio and speed and has been supported in
Cassandra since 4.0. Certain workloads can benefit from
significantly faster read/write performance, reduced storage
footprint, and increased storage device lifetime when using
dictionary compression.
At a high level, operators choose a table they want to compress
with a dictionary. A dictionary must be trained first on a small
amount of already present data (recommended no more than 10MiB).
The result of a training is a dictionary, which is stored
cluster-wide for all other nodes to use, and this dictionary is
used for all subsequent writes of SSTables to a disk.
Workloads with structured data of similar rows benefit most from
Zstd with dictionary compression. Some examples of ideal workloads
include event logs, telemetry data, metadata tables with templated
messages. Think: repeated row data. If the table data is too
unstructured or random, this feature likely won’t be optimal for
dictionary compression, however plain Zstd will still be an
excellent option.
New SSTables with dictionaries are readable across nodes and can
stream, repair, and backup. Existing tables are unaffected if
dictionary compression is not enabled. Too many unique dictionaries
hurt decompression; use minimal dictionaries (recommended
dictionary size is about 100KiB and one dictionary per table) and
only adopt new ones when they’re noticeably better.
CEP-55: Generated role names
CEP-55 adds support to create users/roles
without supplying a name, simplifying
user management, especially when generating users and roles in
bulk. With an example syntax, CREATE GENERATED ROLE WITH GENERATED
PASSWORD, new keys are placed in a newly introduced configuration
section in cassandra.yaml under “role_name_policy.”
Stefan Miklosovic, our Cassandra engineer at NetApp Instaclustr,
created this CEP as a logical follow up to CEP-24
(password validation/generation), which he authored as well. These
quality-of-life improvements let operators spend less time doing
trivial tasks with high-risk potential and more time on truly
complex matters.
Manual name selection seems trivial until a hundred role names
need to be generated; now there is a security risk if the new
usernames—or worse passwords—are easily guessable. With CEP-55, the
generated role name will be UUID-like, with optional prefix/suffix
and size hints, however a pluggable policy is available to generate
and validate names as well. This is an opt-in feature with no
effect to the existing method of generating role names.
The future of Apache Cassandra is bright
These Cassandra Enhancement Proposals
demonstrate a strong commitment to making Apache Cassandra more
powerful, secure, and easier to operate. By staying on top of these
updates, we ensure our managed platform seamlessly supports future
releases that accelerate your business needs.
At NetApp Instaclustr, our expert TechOps team already
orchestrates laborious tasks like restarts and upgrades for our
Apache Cassandra customers, ensuring their clusters stay online.
Our platform handles the complexity so you can get up and running
fast.
Learn more about our fully managed and hosted
Apache Cassandra offering and try it for free today!
The post
Stay ahead with Apache Cassandra®: 2025 CEP highlights appeared
first on Instaclustr.
1 December 2025, 1:15 pm by
ScyllaDB
Results from a benchmark using the yandex-deep_1b dataset,
which contains 1 billion vectors of 96 dimensions As
AI-driven applications move from experimentation into real-time
production systems, the expectations placed on vector similarity
search continue to rise dramatically. Teams now need to support
billion-scale datasets, high concurrency, strict p99 latency
budgets, and a level of operational simplicity that reduces
architectural overhead rather than adding to it. ScyllaDB Vector
Search was built with these constraints in mind. It offers a
unified engine for storing structured data alongside unstructured
embeddings, and it achieves performance that pushes the boundaries
of what a managed database system can deliver at scale. The results
of our recent high scale 1-billion-vector benchmark show that
ScyllaDB demonstrates both ultra-low latency and highly predictable
behaviour under load. Architecture at a Glance To achieve
low-single-millisecond performance across massive vector sets,
ScyllaDB adopts an architecture that separates the storage and
indexing responsibilities while keeping the system unified from the
user’s perspective. The ScyllaDB nodes store both the structured
attributes and the vector embeddings in the same distributed table.
Meanwhile, a dedicated Vector Store service – implemented in Rust
and powered by the USearch engine optimized to support ScyllaDB’s
predictable single-digit millisecond latencies – consumes updates
from ScyllaDB via CDC and builds approximate-nearest-neighbour
(ANN) indexes in memory. Queries are issued to the database using a
familiar CQL expression such as: SELECT … ORDER BY vector_column
ANN_OF ? LIMIT k; They are then internally routed to the Vector
Store, which performs the similarity search and returns the
candidate rows. This design allows each layer to scale
independently, optimising for its own workload characteristics and
eliminating resource interference. Benchmarking 1 Billion Vectors
To evaluate real-world performance, ScyllaDB ran a
rigorous benchmark using the publicly available yandex-deep_1b
dataset, which contains 1 billion vectors of 96 dimensions. The
setup consisted of six nodes: three ScyllaDB nodes running on
i4i.16xlarge instances, each equipped with 64 vCPUs, and three
Vector Store nodes running on r7i.48xlarge instances, each with 192
vCPUs. This hardware configuration reflects realistic production
deployments where the database and vector indexing tiers are
provisioned with different resource profiles. The results focus on
two usage scenarios with distinct accuracy and latency goals
(detailed in the following sections). A full architectural
deep-dive, including diagrams, performance trade-offs, and extended
benchmark results for higher-dimension datasets, can be found in
the technical blog post
Building a Low-Latency Vector Search Engine for
ScyllaDB. These additional results follow the same pattern
seen in the 96-dimensional tests: exceptionally low latency, high
throughput, and stability across a wide range of concurrent load
profiles. Scenario #1 — Ultra-Low Latency with Moderate Recall The
first scenario was designed for workloads such as recommendation
engines and real-time personalisation systems, where the primary
objective is extremely low latency and the recall can be moderately
relaxed. We used index parameters m = 16, ef-construction = 128,
ef-search = 64 and Euclidean distance. At approximately 70% recall
and with 30 concurrent searches, the system maintained a p99
latency of only 1.7 milliseconds and a p50 of just 1.2
milliseconds, while sustaining 25,000 queries per second. When
expanding the throughput window (still keeping p99 latency below 10
milliseconds), the cluster reached 60,000 QPS for k = 100 with a
p50 latency of 4.5 milliseconds, and 252,000 QPS for k = 10 with a
p50 latency of 2.2 milliseconds. Importantly, utilizing ScyllaDB’s
predictable performance, this throughput scales linearly: adding
more Vector Store nodes directly increases the achievable QPS
without compromising latency or recall.
Latency and throughput depending on the concurrency level
for recall equal to 70%. Scenario #2 — High Recall with
Slightly Higher Latency The second scenario targets systems that
require near-perfect recall, including high-fidelity semantic
search and retrieval-augmented generation pipelines. Here, the
index parameters were significantly increased to m = 64,
ef-construction = 512, and ef-search = 512. This configuration
raises compute requirements but dramatically improves recall. With
50 concurrent searches and recall approaching 98%, ScyllaDB kept
p99 latency below 12 milliseconds and p50 around 8 milliseconds
while delivering 6,500 QPS. When shifting the focus to maximum
sustained throughput while keeping p99 latency under 20
milliseconds and p50 under 10 milliseconds, the system achieved
16,600 QPS. Even under these settings, latency remained notably
stable across values of k from 10 to 100, demonstrating predictable
behaviour in environments where query limits vary dynamically.
Latency and throughput depending on the concurrency level
for recall equal to 98%. Detailed results The table below
presents the summary of the results for some representative
concurrency levels. Run 1 Run 2 Run 3 m 16 16 64 efconstruct 128
128 512 efsearch 64 64 512 metric Euclidean Euclidean Euclidean
upload 3.5 h 3.5 h 3.5 h build 4.5 h 4.5 h 24.4 h p50 [ms] 2.2 1.7
8.2 p99 [ms] 9.9 4 12.3 qps 252,799 225,891 10,206 Unified
Vector Search Without the Complexity A big advantage of integrating
Vector Search with ScyllaDB is that it delivers substantial
performance and networking cost advantages. The vector store
resides close to the data with just a single network hop between
metadata and embedding storage in the same availability zone. This
locality, combined with ScyllaDB’s shard-per-core execution model,
allows the system to provide real-time latency and massive
throughput even under heavy load. The result is that teams can
accomplish more with fewer resources compared to specialised
vector-search systems. In addition to being fast at scale,
ScyllaDB’s Vector Search is also simpler to operate. Its key
advantage is its ability to unify structured and unstructured
retrieval within a single dataset. This means you can store
traditional attributes and vector embeddings side-by-side and
express hybrid queries that combine semantic search with
conventional filters. For example, you can ask the database to
“find the top five most similar documents, but only those belonging
to this specific customer and created within the past 30 days.”
This approach eliminates the common pain of maintaining separate
systems for transactional data and vector search, and it removes
the operational fragility associated with syncing between two
sources of truth. This also means there is no ETL drift and no
dual-write risk. Instead of shipping embeddings to a separate
vector database while keeping metadata in a transactional store,
ScyllaDB consolidates everything into a single system. The only
pipeline you need is the computational step that generates
embeddings using your preferred LLM or ML model. Once written, the
data remains consistent without extra coordination, backfills, or
complex streaming jobs. Operationally, ScyllaDB simplifies the
entire retrieval stack. Because it is built on ScyllaDB’s proven
distributed architecture, the system is highly available,
horizontally scalable, and resilient across availability zones and
regions. Instead of operating two or three different technologies –
each with its own monitoring, security configurations, and failure
modes – you only manage one. This consolidation drastically reduces
operational complexity while simultaneously improving performance.
Public Preview and Roadmap The vector search feature is currently
offered in public preview, with a clear path toward general
availability and a set of enhancements focused on performance,
flexibility, and developer experience. The GA release is planned at
the beginning of Q1’26 and will include Cloud Portal provisioning,
on-demand billing, a full range of instance types, self-service
scaling and additional performance optimisations. By the end of Q1
we will introduce native filtering capabilities, enabling vector
search queries to combine ANN results with traditional predicates
for more precise hybrid retrieval. Looking further ahead, the
roadmap includes support for scalar and binary quantisation to
reduce memory usage, TTL functionality for lifecycle automation of
vector data, and integrated hybrid search combining ANN with BM25
for unified lexical and semantic relevance. Conclusion ScyllaDB has
demonstrated that it is capable of delivering industry-leading
performance for vector search at massive scale, handling a dataset
of 1 billion vectors with p99 latency as low as 1.7 milliseconds
and throughput up to 252,000 QPS. These results validate ScyllaDB
Vector Search as a unified, high-performance solution that
simplifies the operational complexity of real-time AI applications
by co-locating structured data and unstructured embeddings. The
current benchmarks showcase the current state of ScyllaDB’s
scalability. With planned enhancements in the upcoming roadmap,
including scalar quantization and sharding, these performance
limits are set to increase in the next year. Nevertheless, even
now, the feature is ready for running latency critical workloads
such as fraud detection or recommendation systems.
24 November 2025, 5:33 pm by
ScyllaDB
How semantic caching can help with costs and latency as you
scale up your AI workload Developers building large-scale
LLM solutions often rely on powerful APIs such as OpenAI’s. This
approach outsources model hosting and inference, allowing teams to
focus on application logic rather than infrastructure. However,
there are two main challenges you might face as you scale up your
AI workload: high costs and high latency. This blog post introduces
semantic caching as a possible solution to these problems. Along
the way, we cover how ScyllaDB can help implement semantic caching.
What is semantic caching? Semantic caching follows the same
principle as traditional caching: storing data in a system that
allows faster access than your primary source. In conventional
caching solutions, that source is a database. In AI systems, the
source is an LLM.
Here’s a simplified semantic caching workflow: User sends a
question (“What is ScyllaDB?”) Check if this type of question has
been asked before (for example “whats scylladb” or “Tell me about
ScyllaDB”) If yes, deliver the response from cache If no a)Send the
request to LLM and deliver the response from there b) Save the
response to cache Semantic caching stores the meaning of user
queries as vector embeddings and uses vector search to find similar
ones. If there’s a close enough match, it returns the cached result
instead of calling the LLM. The more queries you can serve from the
cache, the more you save on cost and latency over time.
Invalidating data is just as important for semantic caching as it
is for traditional caching. For instance, if you are working with
RAGs (where the underlying base information can change over time),
then you need to invalidate the cache periodically so it returns
accurate information. For example, if the user query is “What’s the
most recent version of ScyllaDB Enterprise,” the answer depends on
when you ask this question. The cached response to this answer must
be refreshed accordingly (assuming the only context your LLM works
with is the one provided by the cache). Why use a semantic cache?
Simply put,
semantic caching saves you money and
time. You save money by making fewer LLM calls, and you
save time from faster responses. When a use case involves repeated
or semantically similar queries, and identical responses are
acceptable, semantic caching offers a practical way to reduce both
inference costs and latency. Heavy LLM usage might put you on
OpenAI’s top spenders list. That’s great for OpenAI. But is it
great for you? Sure, you’re using cutting-edge AI and delivering
value to users, but the real question is: can you optimize those
costs? Cost isn’t the only concern. Latency matters too. LLMs
inherently
cannot achieve sub-millisecond response times. But users still
expect instant responses. So how do you bridge that gap? You can
combine LLM APIs with a low-latency database like ScyllaDB to speed
things up. Combining AI models with traditional optimization
techniques is key to meeting strict latency requirements. Semantic
caching helps mitigate these issues by caching LLM responses
associated with the input embeddings. When a new input is received,
its embedding is compared to those stored in the cache. If a
similar-enough embedding is found (based on a defined similarity
threshold), the saved response is returned from the cache. This
way, you can skip the round trip to the LLM provider. This leads to
two major benefits:
Lower latency: No need to wait
for the LLM to generate a new response. Your low-latency database
will always return responses faster than an LLM.
Lower
cost: Cached responses are “free” – no LLM API fees.
Unlike LLM calls, database queries don’t charge you per request or
per token. Why use ScyllaDB for semantic caching? From day one,
ScyllaDB has focused on three things: cutting latency, cost, and
operational overhead. All three of those things matter just as much
for LLM apps and semantic caching as they do for “traditional”
applications. Furthermore, ScyllaDB is more than an in-memory
cache. It’s a full-fledged high-performance database with a
built-in caching layer. It offers high availability and strong P99
latency guarantees, making it ideal for
real-time AI
applications. ScyllaDB has recently added Vector Search
offering, which is essential for building a semantic cache, and
it’s also used for a wide range of AI and LLM-based applications.
For example, it’s quite commonly used as a feature store. In short,
you can consolidate all your AI workloads into a single
high-performance, low-latency database. Now let’s see how you can
implement semantic caching with ScyllaDB. How to implement semantic
caching with ScyllaDB
> If you just want to dive in, clone
the repo, and try it yourself, check out the GitHub repository
here. Here’s a simplified, general guide on how to implement
semantic caching with ScyllaDB (using Python examples): 1. Create a
semantic caching schema First, we create a keyspace, then a table
called prompts, which will act as our cache table. It includes the
following columns: prompt_id: The partition key for the table.
Inserted_at: Stores the timestamp when the row was originally
inserted (the response first cached) prompt_text: The actual input
provided by the user, such as a question or query.
prompt_embedding: The vector embedding representation of the user
input. llm_response: The LLM’s response for that prompt, returned
from the cache when a similar prompt appears again. updated_at:
Timestamp of when the row was last updated, useful if the
underlying data changes and the cached response needs to be
refreshed. Finally, we create an ANN (Approximate Nearest Neighbor)
index on the prompt_embedding column to enable fast and efficient
vector searches. Now that ScyllaDB is ready to receive and return
responses, let’s implement semantic caching in our application
code. 2. Convert user input to vector embedding Take the user’s
text input (which is usually a question or some kind of query) and
convert it into an embedding using your chosen embedding model.
It’s important that the same embedding model is used consistently
for both cached data and new queries. In this example, we’re using
a local embedding model from sentence transformers. In your
application, you might use OpenAI or some other embedding provider
platform. 3. Calculate similarity score Use ScyllaDB Vector Search
syntax: `ANN OF` to find semantically similar entries in the cache.
There are two key components in this part of the application.
Similarity score: You need to calculate the
similarity between the user’s new query and the most similar item
returned by vector search. Cosine similarity, which is the most
frequently used similarity function in LLM-based applications,
ranges from 0 to 1. A similarity of 1 means the embeddings are
identical. A similarity of 0 means they are completely dissimilar.
Threshold: Determines whether the response can be
provided from cache. If the similarity score is above that
threshold, it means the new query is similar enough to one already
stored in the cache, so the cached response can be returned. If it
falls below the threshold, the system should fetch a fresh response
from the LLM. The exact threshold should be tuned experimentally
based on your use case. 4. Implement cache logic Finally, putting
it all together, you need a function that decides whether to serve
a response from the cache or make a request to the LLM. If the user
query matches something similar in the cache, follow the earlier
steps and return the cached response. If it’s not in the cache,
make a request to your LLM provider, such as OpenAI, return that
response to the user, and then store it in the cache. This way, the
next time a similar query comes in, the response can be served
instantly from the cache. Get started! Get started building with
ScyllaDB; check out our examples on GitHub: `
git clone
https://github.com/scylladb/vector-search-examples.git `
Vector Search
Semantic Cache
RAG
18 November 2025, 3:41 pm by
ScyllaDB
Learn about Task Manager, which provides a unified way to
observe and control ScyllaDB’s background maintenance work
In each ScyllaDB cluster, there are a lot of background processes
that help maintain data consistency, durability, and performance in
a distributed environment. For instance, such operations include
compaction (which cleans up on-disk data files) and repair (which
ensures data consistency in a cluster). These operations are
critical for preserving cluster health and integrity. However, some
processes can be long-running and resource-intensive. Given that
ScyllaDB is used for latency-sensitive database workloads, it’s
important to monitor and track these operations. That’s where
ScyllaDB’s Task Manager comes in. Task Manager allows
administrators of self-managed ScyllaDB to see all running
operations, manage them, or get detailed information about a
specific operation. And beyond being a monitoring tool, Task
Manager also provides a unified way to manage asynchronous
operations. How Task Manager Organizes and Tracks Operations Task
Manager adds structure and visibility into ScyllaDB’s background
work. It groups related maintenance activities into modules,
represents them as hierarchical task trees, and tracks their
lifecycle from creation through completion. The following sections
explain how operations are organized, retained, and monitored at
both node and cluster levels. Supported Operations Task Manager
supports the following operations: Local: Compaction; Repair;
Streaming; Backup; Restore. Global: Tablet repair; Tablet
migration; Tablet split and merge; Node operations: bootstrap,
replace, rebuild, remove node, decommission. Reviewing
Active/Completed Tasks Task Manager is divided into
modules: the entities that gather information
about operations of similar functionality. Task Manager captures
and exposes this data using
tasks. Each task
covers an operation or its part (e.g., a task can represent the
part of the repair operation running on a specific shard).
Each operation is represented by a tree of tasks. The tree root
covers the whole operation. The root may have children, which give
more fine-grained control over the operation. The children may have
their own children, etc. Let’s consider the example of a global
major compaction task tree: The root covers the compaction of all
keyspaces in a node; The children of the root task cover a single
keyspace; The second-degree descendants of the root task cover a
single keyspace on a single shard; The third-degree descendants of
the root task cover a single table on a single shard; etc. You can
inspect a task from each depth to see details on the operation’s
progress. Determining How Long Tasks Are Shown Task Manager
can show completed tasks as well as running ones. The completed
tasks are removed from Task Manager after some time. To customize
how long a task’s status is preserved, modify task_ttl_in_seconds
(aka task_ttl) and user_task_ttl_in_seconds (aka user_task_ttl)
configuration parameters. Task_ttl applies to operations that are
started internally, while user_task_ttl refers to those initiated
by the user. When the user starts an operation, the root of the
task tree is a user-task. Descendant tasks are internal and such
tasks are unregistered after they finish, propagating their status
to their parents. Node Tasks vs Cluster Tasks Task Manager tracks
operations local to a node as well as global cluster-wide
operations.
A local task is created on a node that
the respective operation runs on. Its status may be requested only
from a node on which the task was created.
A global
task always covers the whole operation. It is the root of
a task tree and it may have local children. A global task is
reachable from each node in a cluster. Task_ttl and user_task_ttl
are not relevant for global tasks. Per-Task Details When you list
all tasks in a Task Manager module, it shows brief information
about them with task_stats. Each task has a unique task_id and
sequence_number that’s unique within its module. All tasks in a
task tree share the same sequence_number. Task stats also include
several descriptive attributes: kind: either “node” (a local
operation) or “cluster” (a global one). type: what specific
operation this task involves (e.g., “major compaction” or
“intranode migration”). scope: the level of granularity (e.g.,
“keyspace” or “tablet”). Additional attributes such as shard,
keyspace, table, and entity can further specify the scope. Status
fields summarize the task’s state and timing: state: indicate if
the task was created, running, done, failed, or suspended.
start_time and end_time: indicate when the task began and finished.
If a task is still running, its end_time is set to epoch. When you
request a specific task’s status, you’ll see more detailed metrics:
progress_total and progress_completed show how much work is done,
measured in progress_units. parent_id and children_ids place the
task within its tree hierarchy. is_abortable indicates whether the
task can be stopped before completion. If the task failed, you will
also see the exact error message. Interacting with Task Manager
Task Manager provides a REST API for listing, monitoring, and
controlling ScyllaDB’s background operations. You can also use it
to manage the execution of long-running maintenance tasks started
with the asynchronous API instead of blocking a client call. If you
prefer command-line tools, the same functionality is available
through nodetool tasks. Using the Task Management API Task Manager
exposes a REST API that lets you manage tasks: GET
/task_manager/list_modules – lists all supported Task Manager
modules. GET /task_manager/list_module_tasks/{module} – lists all
tasks in a specified module. GET
/task_manager/task_status/{task_id} – shows the detailed status of
a specified task. GET /task_manager/wait_task/{task_id} – waits for
a specified task and shows its status. POST
/task_manager/abort_task/{task_id} – aborts a specified task. GET
/task_manager/task_status_recursive/{task_id} – gets statuses of a
specified task and all its descendants. GET/POST /task_manager/ttl
– gets/sets task_ttl. GET/POST /task_manager/user_ttl – gets/sets
user_task_ttl. POST /task_manager/drain/{module} – drains the
finished tasks in a specified module. Running Maintenance Tasks
Asynchronously Some ScyllaDB maintenance operations can take a
while to complete, especially at scale. Waiting for them to finish
through a synchronous API call isn’t always practical. Thanks to
Task Manager, existing synchronous APIs are easily and consistently
converted into asynchronous ones. Instead of waiting for an
operation to finish, a new API can immediately return the ID of the
root task representing the started operation. Using this task_id,
you can check the operation’s progress, wait for completion, or
abort it if needed. This gives you a unified and consistent way to
manage all those long-running tasks. Nodetool A task can be managed
using nodetool’s
tasks command. For details, see
the related
nodetool docs page. Example: Tracking and Managing Tasks
Preparation To start, we locally set up a cluster of three nodes
with the IP addresses 127.43.0.1, 127.43.0.2, and 127.43.0.3. Next,
we create two keyspaces: keyspace1 with replication factor 3 and
keyspace2 with replication factor 2. In each keyspace, we create 2
tables: table1 and table2 in keyspace1, and table3 and table4 in
keyspace2. We populate them with data. Exploring Task Manager Let’s
start by listing the modules supported by Task Manager: nodetool
tasks modules -h 127.43.0.1
["sstables_loader","node_ops","tablets","repair","snapshot","compaction"]
Starting and Tracking a Repair Task We request a tablet repair on
all tokens of table keyspace2.table3. curl -X POST --header
'Content-Type: application/json' --header 'Accept:
application/json'
'http://127.43.0.3:10000/storage_service/tablets/repair?ks=keyspace2&table=table3&tokens=all'
{"tablet_task_id":"2f06bff0-ab45-11f0-94c2-60ca5d6b2927"} In
response, we get the task id of the respective tablet repair task.
We can use it to track the progress of the repair. Let’s check
whether the task with id 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 will
be listed in a tablets module. nodetool tasks list tablets -h
127.43.0.1
[{"task_id":"88a7ceb0-ab44-11f0-9016-68b61792a9a7","state":"running","type":"intranode_migration","kind":"cluster","scope":"tablet","keyspace":"keyspace1","table":"table1","entity":"","sequence_number":0,"shard":0,"start_time":"2025-10-17T10:32:08Z","end_time":"1970-01-01T00:00:00Z"},
{"task_id":"2f06bff0-ab45-11f0-94c2-60ca5d6b2927","state":"running","type":"user_repair","kind":"cluster","scope":"table","keyspace":"keyspace2","table":"table3","entity":"","sequence_number":0,"shard":0,"start_time":"2025-10-17T10:36:47Z","end_time":"1970-01-01T00:00:00Z"},
{"task_id":"88ac6290-ab44-11f0-9016-68b61792a9a7","state":"running","type":"intranode_migration","kind":"cluster","scope":"tablet","keyspace":"keyspace2","table":"table4","entity":"","sequence_number":0,"shard":0,"start_time":"2025-10-17T10:32:08Z","end_time":"1970-01-01T00:00:00Z"}]
Apart from the repair task, we can see that there are two intranode
migrations running. All the tasks are of type “cluster”, which
means that they cover the global operations. All these tasks would
be visible regardless of which node we request them from. We can
also see the scope of the operations. We always migrate one tablet
at a time, so the migration tasks’ scope is “tablet”. For repair,
the scope is “table” because we previously started the operation on
a whole table. Entity, sequence_number, and shard are irrelevant
for global tasks. Since all tasks are running, their end_time is
set to a default value (epoch). Examining Task Status Let’s examine
the status of the tablet repair using its task_id. Global tasks are
available on the whole cluster, so we change the requested node…
just because we can. 😉 nodetool tasks status
2f06bff0-ab45-11f0-94c2-60ca5d6b2927 -h 127.43.0.3 {"id":
"2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "type": "user_repair",
"kind": "cluster", "scope": "table", "state": "running",
"is_abortable": true, "start_time": "2025-10-17T10:36:47Z",
"end_time": "1970-01-01T00:00:00Z", "error": "", "parent_id":
"none", "sequence_number": 0, "shard": 0, "keyspace": "keyspace2",
"table": "table3", "entity": "", "progress_units": "",
"progress_total": 0, "progress_completed": 0, "children_ids":
[{"task_id": "52b5bff5-467f-4f4c-a280-95e99adde2b6", "node":
"127.43.0.1"},{"task_id": "1eb69569-c19d-481e-a5e6-0c433a5745ae",
"node": "127.43.0.2"},{"task_id":
"70d098c4-df79-4ea2-8a5e-6d7386d8d941", "node": "127.43.0.3"},...]}
The task status contains detailed information about the tablet
repair task. We can see whether the task is abortable (via
task_manager API). There could also be some additional information
that’s not applicable for this particular task : error, which would
be set if the task failed; parent_id, which would be set if it had
a parent (impossible for a global task); progress_unit,
progress_total, progress_comwepleted, which would indicate task
progress (not yet supported for tablet repair tasks). There’s also
a list of tasks that were created as a part of the global task. The
list above has been shortened to improve readability. The key point
is that children of a global task may be created on all nodes in a
cluster. Those children are local tasks (because global tasks
cannot have a parent). Thus, they are reachable only from the nodes
where they were created. For example, the status of a task
1eb69569-c19d-481e-a5e6-0c433a5745ae should be requested from node
127.43.0.2. nodetool tasks status
1eb69569-c19d-481e-a5e6-0c433a5745ae -h 127.43.0.2 {"id":
"1eb69569-c19d-481e-a5e6-0c433a5745ae", "type": "repair", "kind":
"node", "scope": "keyspace", "state": "done", "is_abortable": true,
"start_time": "2025-10-17T10:36:48Z", "end_time":
"2025-10-17T10:36:48Z", "error": "", "parent_id":
"2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "sequence_number": 15,
"shard": 0, "keyspace": "keyspace2", "table": "", "entity": "",
"progress_units": "ranges", "progress_total": 1,
"progress_completed": 1, "children_ids": [{"task_id":
"52dedd00-7960-482c-85a1-9114131348c3", "node": "127.43.0.2"}]} As
expected, the child’s kind is “node”. Its parent_id references the
tablet repair task’s task_id. The task has completed successfully,
as indicated by the state. The end_time of a task is set. Its
sequence_number is 15, which means it is the 15th task in its
module. The task’s scope is wider than the parent’s. It could
encompass the whole keyspace, but – in this case – it is limited to
the parent’s scope. The task’s progress is measured in ranges, and
we can see that exactly one range was repaired. This task has one
child that is created on the same node as its parent. That’s always
true for local tasks. nodetool tasks status
70d098c4-df79-4ea2-8a5e-6d7386d8d941 -h 127.43.0.3 {"id":
"70d098c4-df79-4ea2-8a5e-6d7386d8d941", "type": "repair", "kind":
"node", "scope": "keyspace", "state": "done", "is_abortable": true,
"start_time": "2025-10-17T10:37:49Z", "end_time":
"2025-10-17T10:37:49Z", "error": "", "parent_id":
"2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "sequence_number": 25,
"shard": 0, "keyspace": "keyspace2", "table": "", "entity": "",
"progress_units": "ranges", "progress_total": 1,
"progress_completed": 1, "children_ids": [{"task_id":
"20e95420-9f03-4cca-b069-6f16bd23dd14", "node": "127.43.0.3"}]} We
may examine other children of the global tablet repair task too.
However, we may only check each one on the node where it was
created. Let’s wait until the global task is completed. nodetool
tasks wait 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 -h 127.43.0.2
{"id": "2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "type":
"user_repair", "kind": "cluster", "scope": "table", "state":
"done", "is_abortable": true, "start_time": "2025-10-17T10:36:47Z",
"end_time": "2025-10-17T10:47:30Z", "error": "", "parent_id":
"none", "sequence_number": 0, "shard": 0, "keyspace": "keyspace2",
"table": "table3", "entity": "", "progress_units": "",
"progress_total": 0, "progress_completed": 0, "children_ids":
[{"task_id": "52b5bff5-467f-4f4c-a280-95e99adde2b6", "node":
"127.43.0.1"},{"task_id": "1eb69569-c19d-481e-a5e6-0c433a5745ae",
"node": "127.43.0.2"},{"task_id":
"70d098c4-df79-4ea2-8a5e-6d7386d8d941", "node": "127.43.0.3"},...]}
We can see that its state is “done” and its end_time is set.
Working with Compaction Tasks Let’s start some compactions and have
a look at the compaction module. nodetool tasks list compaction -h
127.43.0.2
[{"task_id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","state":"running","type":"major
compaction","kind":"node","scope":"keyspace","keyspace":"keyspace1","table":"","entity":"","sequence_number":685,"shard":1,"start_time":"2025-10-17T11:00:01Z","end_time":"1970-01-01T00:00:00Z"},
{"task_id":"0861e058-349e-41e1-9f4f-f9c3d90fcd8c","state":"done","type":"major
compaction","kind":"node","scope":"keyspace","keyspace":"keyspace1","table":"","entity":"","sequence_number":671,"shard":1,"start_time":"2025-10-17T10:50:58Z","end_time":"2025-10-17T10:50:58Z"}]
We can see that one of the major compaction tasks is still running.
Let’s abort it and check its task tree. nodetool tasks abort
16a6cdcc-bb32-41d0-8f06-1541907a3b48 -h 127.43.0.2 nodetool tasks
tree 16a6cdcc-bb32-41d0-8f06-1541907a3b48 -h 127.43.0.2
[{"id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","type":"major
compaction","kind":"node","scope":"keyspace","state":"failed","is_abortable":true,"start_time":"2025-10-17T11:00:01Z","end_time":"2025-10-17T11:01:14Z","error":"
seastar::abort_requested_exception (abort
requested)","parent_id":"none","sequence_number":685,"shard":1,"keyspace":"keyspace1","table":"","entity":"","progress_units":"bytes","progress_total":208,"progress_completed":206,"children_ids":[{"task_id":"9764694a-cb44-4405-b653-95a6c8cebf45","node":"127.43.0.2"},{"task_id":"b6949bc8-0489-48e0-9325-16c6411d0fcc","node":"127.43.0.2"}]},
{"id":"9764694a-cb44-4405-b653-95a6c8cebf45","type":"major
compaction","kind":"node","scope":"shard","state":"done","is_abortable":false,"start_time":"2025-10-17T11:00:01Z","end_time":"2025-10-17T11:00:01Z","error":"","parent_id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","sequence_number":685,"shard":1,"keyspace":"keyspace1","table":"","entity":"","progress_units":"bytes","progress_total":0,"progress_completed":0},
{"id":"b6949bc8-0489-48e0-9325-16c6411d0fcc","type":"major
compaction","kind":"node","scope":"shard","state":"failed","is_abortable":false,"start_time":"2025-10-17T11:00:01Z","end_time":"2025-10-17T11:01:14Z","error":"seastar::abort_requested_exception
(abort
requested)","parent_id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","sequence_number":685,"shard":0,"keyspace":"keyspace1","table":"","entity":"","progress_units":"bytes","progress_total":208,"progress_completed":206}]
We can see that the abort request propagated to one of the task’s
children and aborted it. That task now has a failed state and its
error field contains abort_requested_exception. Managing
Asynchronous Operations Beyond examining the running operations,
Task Manager can manage asynchronous operations started with the
REST API. For example, we may start a major compaction of a
keyspace synchronously with
/storage_service/keyspace_compaction/{keyspace} or use an
asynchronous version of this API: curl -X POST --header
'Content-Type: application/json' --header 'Accept:
application/json'
'http://127.43.0.1:10000/tasks/compaction/keyspace_compaction/keyspace2'
"4c6f3dd4-56dc-4242-ad6a-8be032593a02" The response includes the
task_id of the operation we just started. This id may be used in
Task Manager to track the progress, wait for the operation, or
abort it. Key Takeaways The Task Manager provides a clear, unified
way to observe and control background maintenance work in ScyllaDB.
Visibility: It shows detailed, hierarchical
information about ongoing and completed operations, from
cluster-level tasks down to individual shards.
Consistency: You can use the same mechanisms for
listing, tracking, and managing all asynchronous operations.
Control: You can check progress, wait for
completion, or abort tasks directly, without guessing what’s
running.
Extensibility: It also provides a
framework for turning synchronous APIs into asynchronous ones by
returning task IDs that can be monitored or managed through the
Task Manager. Together, these capabilities make it easier to see
what ScyllaDB is doing, keep the system stable, and convert
long-running operations to asynchronous workflows.
13 November 2025, 10:28 pm by
ScyllaDB
DynamoDB and ScyllaDB share many similarities, but DynamoDB
is a multi-tenant database, while ScyllaDB is
single-tenant The recent
DynamoDB outage is a
stark reminder that even the most reliable and mature cloud
services can experience downtime. Amazon DynamoDB remains a strong
and proven choice for many workloads, and many teams are satisfied
with its latency and cost. However, incidents like this highlight
the importance of architecture, control, and flexibility when
building for resilience. DynamoDB and ScyllaDB share many
similarities: Both are distributed NoSQL databases with the same
“ancestor”: the
Dynamo paper (although both databases have significantly
evolved from the original concept). A compatible API: The DynamoDB
API is one of two supported APIs in ScyllaDB Cloud. Both use
multi-zone deployment for higher HA. Both support multi-region
deployment. DynamoDB uses Global Tablets (See
this analysis for more). ScyllaDB can go beyond and allow
multi-cloud deployments, or on-prem / hybrid deployments. But they
also have a major difference: DynamoDB is a multi-tenant database,
while ScyllaDB is single-tenant.
Source:
https://blog.bytebytego.com/p/a-deep-dive-into-amazon-dynamodb
Multi-tenancy has notable advantages for the vendor:
Lower
infrastructure cost: Since tenants’ peaks don’t align, the
vendor can provision for the aggregate average rather than the sum
of all peaks, and even safely over-subscribe resources.
Shared burst capacity: Extra capacity for traffic
spikes is pooled across all users. Multi-tenancy also comes with
significant technical challenges and is never perfect. All users
still share the same underlying resources (CPU, storage, and
network) while the service works hard to preserve the illusion of a
dedicated environment for each tenant (e.g., using various
isolation mechanisms). However, sometimes the isolation breaks and
the real architecture behind the curtain is revealed. One example
is the
Noisy Neighbor issue. Another is that when a shared resource
breaks, like the DNS endpoint in the latest DynamoDB outage, MANY
users are affected. In this case, all DynamoDB users in a region
suffer. ScyllaDB Cloud takes a different approach: all database
resources are completely separated from each other. Each ScyllaDB
database is running: On dedicated VMs On a dedicated VPC In a
dedicated Security Group Using a dedicated endpoint and (an
optional) dedicated Private Link Isolated authorization and
authentication (per database) Dedicated
Monitoring and
Administration (
ScyllaDB Manager)
servers When using ScyllaDB Cloud Bring Your Own Account (
BYOA),
the entire deployment is running on the *user* account, often on a
dedicated sub-account. This provides additional isolation. The
ScyllaDB Cloud control plane is loosely coupled to the managed
databases. Even in the case of a disconnect, the database clusters
will continue to serve requests. This design greatly reduces the
blast radius of any one issue.
While the single-tenant architecture is more resilient, it
does come with a few challenges:
Scaling: To
scale, ScyllaDB needs to allocate new resources (nodes) from EC2,
and depend on the EC2 API to allocate them. Tablets and
X Cloud
have made a great improvement in reducing scaling time.
Workload Isolation: ScyllaDB allows users to
control the resource bandwidth per workload with Workload
Prioritization (
docs
|
tech talk |
demo)
Pricing: Using numerous optimization
techniques, like
shard-per-core, ScyllaDB achieves extreme performance per node,
which allows us to provide lower prices than DynamoDB for most use
cases. To conclude: DynamoDB optimizes for multi-tenancy, whereas
ScyllaDB favors stronger tenant isolation and a smaller blast
radius.
12 November 2025, 1:22 pm by
ScyllaDB
Lessons learned comparing Memcached with ScyllaDB
Although caches and databases are different animals, databases have
always cached data and caches started to use disks, extending
beyond RAM. If an in-memory cache can rely on flash storage, can a
persistent database also function as a cache? And how far can you
reasonably push each beyond its original intent, given the power
and constraints of its underlying architecture? A little while ago,
I joined forces with Memcached maintainer Alan Kasindorf (a.k.a.
dormando) to explore these questions. The collaboration began with
the goal of an “apples to oranges” benchmark comparing ScyllaDB
with Memcached, which is covered in the article “
We
Compared ScyllaDB and Memcached and… We Lost?” A few months
later, we were pleasantly surprised that the stars aligned for
P99 CONF. At the last minute,
Kasindorf was able to join us to chat about the project –
specifically, what it all means for developers with
performance-sensitive use cases. Note: P99 CONF is a highly
technical conference on performance and low-latency engineering. We
just wrapped P99 CONF 2025, and you can watch the core sessions
on-demand.
Watch on demand
Cache Efficiency Which data store uses memory more efficiently? To
test it, we ran a simple key-value workload on both systems. The
results: Memcached cached 101 million items before evictions began
ScyllaDB cached only 61 million items before evictions
Cache efficiency comparison What’s behind the difference?
ScyllaDB also has its own LRU (Least Recently Used) cache,
bypassing the Linux cache. But unlike Memcached, ScyllaDB supports
a wide-column data representation: A single key may contain many
rows. This, along with additional protocol overhead, causes a
single write in ScyllaDB to consume more space than a write in
Memcached. Drilling down into the differences, Memcached has very
little per-item overhead. In the example from the image above, each
stored item consumes either 48 or 56 bytes, depending on whether
compare and swap (CAS) is enabled. In contrast, ScyllaDB has to
handle a lot more (it’s a persistent database after all!). It needs
to allocate space for its memtables, Bloom filters and SSTable
summaries so it can efficiently retrieve data from disk when a
cache miss occurs. On top of that, ScyllaDB supports a much richer
data model(wide column). Another notable architectural difference
stands out in the performance front: Memcached is optimized for
pipelined requests (think batching, as in DynamoDB’s BatchGetItem),
considerably reducing the number of roundtrips over the network to
retrieve several keys. ScyllaDB is optimized for single (and
contiguous) key retrievals under a wide-column representation.
Read-only in-memory efficiency comparison Following each
system’s ideal data model, both ScyllaDB and Memcached managed to
saturate the available network throughput, servicing around 3
million rows/s while sustaining below single-digit millisecond P99
latencies. Disks and IO Efficiency Next, the focus shifted to
disks. We measured performance under different payload sizes, as
well as how efficiently each of the systems could maximize the
underlying storage. With Extstore and small (1K) payloads,
Memcached stored about 11 times more items (compared to its
in-memory workload) before evictions started to kick in, leaving a
significant portion of free available disk space. This happens
because, in addition to the regular per-key overhead, Memcached
stores an additional 12 bytes per item in RAM as a pointer to
storage. As RAM gets depleted, Extstore is no longer effective and
users will no longer observe savings beyond that point.
Disk performance with small payloads comparison For the
actual performance tests, we stressed Extstore against item sizes
of 1KB and 8KB. The table below summarizes the results:
Test Type Payload Size
I/O Threads GET Rate P99
Latency
perfrun_metaget_pipe 1KB 32 188K/s 4~5 ms
perfrun_metaget 1KB 32 182K/s <1ms
perfrun_metaget_pipe 1KB 64 261K/s 5~6 ms
perfrun_metaget 1KB 64 256K/s 1~2ms
perfrun_metaget_pipe 8KB 16 92K/s 5~6 ms
perfrun_metaget 8KB 16 90K/s <1ms
perfrun_metaget_pipe 8KB 32 110K/s 3~4 ms
perfrun_metaget 8KB 32 105K/s <1ms We populated ScyllaDB
with the same number of items as we used for Memcached. ScyllaDB
actually achieved higher throughput – and just slightly higher
latency – than Extstore. I’m pretty sure that if the throughput had
been reduced, the latency would have been lower. But even with no
tuning, the performance is quite comparable. This is summarized
below:
Test Type Payload Size
GET Rate Server-Side P99
Client-Side P99
1KB Read 1KB 268.8K/s 2ms 2.4ms
8KB Read 8KB 156.8K/s 1.54ms 1.9ms A few notable points from
these tests: Extstore required considerable tuning to fully
saturate flash storage I/O. Due to Memcached’s architecture,
smaller payloads are unable to fully use the available disk space,
providing smaller gains compared to ScyllaDB. ScyllaDB rates were
overall higher than Memcached in a key-value orientation,
especially under higher payload sizes. Latencies were better than
pipelined requests, but slightly higher than individual GETs in
Memcached. I/O Access Methods Discussion These disk-focused tests
unsurprisingly sparked a discussion about the different I/O access
methods used by ScyllaDB vs. Memcached/Extstore. I explained that
ScyllaDB uses asynchronous direct I/O. For an extensive discussion
of this, read
this
blog post by ScyllaDB CTO and cofounder Avi Kivity. Here’s the
short version: ScyllaDB is a persistent database. When people adopt
a database, they rightfully expect that it will persist their data.
So, direct I/O is a deliberate choice. It bypasses the kernel page
cache, giving ScyllaDB full control over disk operations. This is
critical for things like compactions, write-ahead logs and
efficiently reading data off disk. A user-space I/O scheduler is
also involved. It lives in the middle and decides which operation
gets how much I/O bandwidth. That could be an internal compaction
task or a user-facing query. It arbitrates between them. That’s
what enables ScyllaDB to balance persistence work with
latency-sensitive operations. Extstore takes a rather very
different approach: keep things as simple as possible and avoid
touching the disk unless it’s absolutely necessary. As Kasindorf
put it: “We do almost nothing.” That’s fully intentional. Most
operations — like deletes, TTL updates, or overwrites — can happen
entirely in memory. No disk access needed. So Extstore doesn’t
bother with a scheduler.” Without a scheduler, Extstore performance
tuning is manual. You can change the number of Extstore I/O threads
to get better utilization. If you roll it out and notice that your
disk doesn’t look fully utilized – and you still have a lot of
spare CPU – you can bump up the thread count. Kasindorf mentioned
that it will likely become self-tuning at some point. But for now,
it’s a knob that users can tweak. Another important piece is how
Extstore layers itself on top of Memcached’s existing RAM cache.
It’s not a replacement; it’s additive. You still have your
in-memory cache and Extstore just handles the overflow. Here’s how
Kasindorf explained it: “If you have, say, five gigs of RAM and one
gig of that is dedicated to these small pointers that point from
memory into disk, we still have a couple extra gigs left over for
RAM cache.” That means if a user is actively clicking around, their
data may never even go to disk. The only time Extstore might need
to read from disk is when the cache has gone cold (for instance, a
user returning the next day). Then the entries get pulled back in.
Basically, while ScyllaDB builds around persistent,
high-performance disk I/O (with scheduling, direct control and
durable storage), Extstore is almost the opposite. It’s light,
minimal and tries to avoid disk entirely unless it really has to.
Conclusion and Takeaways Across these and the other tests that we
performed in
the
full benchmark, Memcached and ScyllaDB both managed to maximize
the underlying hardware utilization and keep latencies predictably
low. So which one should you pick? The real answer: It depends. If
your existing workload can accommodate a simple key-value model and
it benefits from pipelining, then Memcached should be more suitable
to your needs. On the other hand, if the workload requires support
for complex data models, then ScyllaDB is likely a better fit.
Another reason for sticking with Memcached: It easily delivers
traffic far beyond what a network interface card can sustain. In
fact, in this
Hacker News
thread, dormando mentioned that he could scale it up past 55
million read ops/sec for a considerably larger server. Given that,
you could make use of smaller and/or cheaper instance types to
sustain a similar workload, provided the available memory and disk
footprint meet your workload needs. A different angle to consider
is the data set size. Even though Extstore provides great cost
savings by allowing you to store items beyond RAM, there’s a limit
to how many keys can fit per gigabyte of memory. Workloads with
very small items should observe smaller gains compared to those
with larger items. That’s not the case with ScyllaDB, which allows
you to store billions of items irrespective of their sizes. It’s
also important to consider whether data persistence is required. If
it is, then running ScyllaDB as a replicated distributed cache
provides you greater resilience and non-stop operations, with the
tradeoff being (and as Memcached correctly states) that replication
halves your effective cache size. Unfortunately, Extstore doesn’t
support warm restarts and thus the failure or maintenance of a
single node is prone to elevating your cache miss ratios. Whether
this is acceptable depends on your application semantics: If a
cache miss corresponds to a round-trip to the database, then the
end-to-end latency will be momentarily higher. Regardless of
whether you choose a cache like Memcached or a database like
ScyllaDB, I hope this work inspires you to think differently about
performance testing. As we’ve seen, databases and caches are
fundamentally different. And at the end of the day, just comparing
performance numbers isn’t enough. Moreover, recognize that it’s
hard to fully represent your system’s reality with simple
benchmarks, and every optimization comes with some trade-offs. For
example, pipelining is great, but as we saw with Extstore, it can
easily introduce I/O contention. ScyllaDB’s shard-per-core model
and support for complex data models are also powerful, but they
come with costs too, like losing some pipelining flexibility and
adding memory overhead.
4 November 2025, 1:49 pm by
ScyllaDB
Learn about ScyllaDB’s new native backup, which improves
backup speed up to 11X by using Seastar’s CPU and IO
scheduling ScyllaDB’s 2025.3 release introduces native
backup functionality. Previously, an external process managed
backups independently, without visibility into ScyllaDB’s internal
workload. Now,
Seastar’s CPU and I/O
schedulers handle backups internally, which gives ScyllaDB full
control over prioritization and resource usage. In this blog post,
we explain why we changed our approach to backup, share what users
need to know, and provide a preview of what to expect next. What We
Changed and Why Previously, SSTable backups to S3 were managed
entirely by ScyllaDB Manager and the
Scylla Manager Agent running on each node. You would schedule
the backup, and Manager would coordinate the required operations
(taking snapshots, collecting metadata, and orchestrating uploads).
Scylla Manager Agent handled all the actual data movement. The
problem with this approach was that it was often too slow for our
users’ liking, especially at the massive scale that’s common across
our user base. Since uploads ran through an external process, they
competed with ScyllaDB for resources (CPU, disk I/O, and network
bandwidth). The rclone process read from \disk at the same time
that ScyllaDB did – so two processes on the same node were
performing heavy disk I/O simultaneously. This contention on the
disk could impact query latencies when user requests were being
processed during a backup. To mitigate the effect on real-time
database requests, we use
Systemd slice to control Scylla Manager Agent resources. This
solution successfully reduced backup bandwidth, but failed to
increase the bandwidth when the pressure from online requests was
low. To optimize this process, ScyllaDB now provides a native
backup capability. Rather than relying on an external agent
(ScyllaDB Manager) to copy files, ScyllaDB uploads files directly
to S3. The new approach is faster and more efficient because
ScyllaDB uses its internal IO and CPU scheduling to control the
backup operations. Backup operations are assigned a lower priority
than user queries. In the event of resource contention, ScyllaDB
will deprioritize them so they don’t interfere with the latency of
the actual workload. Note that this new native backup capability is
currently available for AWS. It is coming soon for other backup
targets (such as GCP Cloud Storage and Azure Storage). To enable
native backup, configure the
S3 connectivity on each node’s scylla.yaml and set the desired
strategy (Native, Auto, or Rclone) in ScyllaDB Manager. Note that
the rclone agent is always used to upload backup metadata, so you
should still configure the Manager Agent even if you are using
native backup and restore. Performance Improvements So how much
faster is the new backup approach? We recently ran some tests to
find out. We ran two tests which are the same in all aspects except
for the tool being used for backup:
rclone in one
and
native scylla in the other. Test Setup The
test uses 6 nodes i4i.2xlarge with total injected data of 2TB with
RF=3. That means that the 2TB injected data becomes 6TB (RF=3) and
these 6TB are spread across 6 nodes, resulting in each node holding
1TB of data. The backup benchmark then measures how long it takes
to backup the entire cluster, indicating the data size of one node
Native Backup Here are the results of the native backup tests: Name
Size Time [s] native_backup_1016_2234 1.057 TiB 00:19:18
Data
was uploaded at a rate of approximately 900
MB/s. OS Tx Bytes during backup
The slightly higher values for the OS metrics are due to for
example tcp-retransmit, size of HTTP headers that is not part of
the data but part of the transmitted bytes, and more alike.
rclone Backup The same exact test with rclone produced the
following results: Name Size Time [s] rclone_backup_1017_2334 1.057
TiB 03:48:57
Here, data was uploaded at a rate of approximately
80MB/s Next Up: Faster Restore Next, we’re
optimizing restore, which is the more complex part of the
backup/restore process. Backups are relatively straightforward: you
just upload the data to object storage. But restoring that data is
harder, especially if you need to bring a cluster back online
quickly or restore it onto a topology that’s different from the
original one. The original cluster’s nodes, token ranges, and data
distribution might look quite different from the new setup – but
during restore, ScyllaDB must somehow map between what was backed
up and what the new topology expects. Replication adds even more
complexity. ScyllaDB replicates data according to the specified
replication factor (RF), so the backup has multiple copies of the
same data. During the restore process, we don’t want to redundantly
download or process those copies; we need a way to handle them
efficiently. And one more complicating factor: the restore process
must understand whether the cluster uses virtual nodes or tablets
because that affects how data is distributed. Wrapping Up
ScyllaDB’s move to native integration with object storage is a big
step forward for the faster backup/restore operations that many of
our large-scale users have been asking for. We’ve already sped up
backups by eliminating the extra rclone layer. Now, our focus is on
making restores equally efficient while handling complex
topologies, replication, and data distribution. This will make it
faster and easier to restore large clusters. Looking ahead, we’re
working on using object storage not only for backup and restore,
but also for tiering: letting ScyllaDB read data directly from
object storage as if it were on local disk. For a more detailed
look at ScyllaDB’s plans for backup, restore, and object storage as
native storage, see this video: