Introducing ScyllaDB Agent Skills
A new set of best practices and usage patterns for AI agents working with ScyllaDB Cloud clusters Today we’re releasing a curated set of best practices and usage patterns for AI agents working with ScyllaDB Cloud clusters. If you just want to grab the skills and go build, here you go:npx skills add
scylladb/agent-skills If you want to understand why these
skills are useful and what problems they solve, read on. ** You may
have noticed a short warning at the bottom of many AI applications:
“AI can make mistakes. Double-check the output.” Or something along
those lines. This is also true when it comes to working with
databases. We’ve seen agents reach for the wrong driver, fail to
connect to ScyllaDB Cloud, generate schemas that fit a relational
database but not NoSQL, and produce queries that technically
execute but perform poorly at scale.
For more on agents getting things wrong, see this video…These problems can all be minimized by using agent skills. What are Agent Skills? Agent Skills are markdown files that give your AI agent best practices and domain-specific knowledge. They follow the standard format and help your agent reduce hallucinations. They are also essential to give the agent up-to-date information. Since LLM training data doesn’t include real-time updates by default, these skills help bridge that gap. A specialized skill helps make the agent’s behavior more consistent and predictable. Available ScyllaDB Skills The ScyllaDB Agent Skills cover three distinct areas: scylladb-cloud-setup: Guides agents through the full connection flow: retrieving cluster credentials from the Cloud Console, selecting the correct shard-aware driver for the user’s language, configuring DC-aware load balancing with the right datacenter name, and verifying the connection. scylladb-data-modeling: Encodes query-first design methodology, partition key and clustering column patterns, anti-patterns (
ALLOW FILTERING, hot partitions,
unbounded partition growth), time-series bucketing, and guidance on
when to use secondary indexes versus denormalized tables. The goal
is to create schemas and queries that hold up under production load
(just returning correct results in development is not sufficient).
scylladb-vector-search: Covers vector index
creation, ANN queries, filtering strategies (global vs. local
indexes and when each applies), quantization, and driver
configuration. You can install all three at once, or pick only what
your project needs. Each skill loads on demand when a relevant task
comes up, they don’t interfere with each other. Let’s look at the
main areas where AI systems get ScyllaDB wrong. Shard-aware drivers
ScyllaDB has its own family of shard-aware drivers for Python,
Java, Go, Rust, C++, and
more. Agents sometimes decide to download the wrong driver.
While it may appear to work, unofficial drivers bypass ScyllaDB’s
shard-aware routing and degrade performance. In other cases, agents
may hallucinate non-existent drivers. Besides making it impossible
to connect to the ScyllaDB cluster, this also introduces a security
risk: you may install a fake package designed to trick the AI (this
is called slopsquatting).
Connecting to ScyllaDB Cloud Connecting to ScyllaDB Cloud requires
DC-aware load balancing configured with the exact datacenter name
(e.g. AWS_US_EAST_1) from your cluster. If your agent
gets that wrong, the driver will fail to connect. Data modeling
ScyllaDB’s data model requires you to have a query-first approach.
You design tables around your access patterns, not your entities.
Agents tend to be trained more heavily on SQL and relational
databases than on NoSQL systems such as ScyllaDB. That means they
are more likely to generate an entity-first schema, then use
ALLOW FILTERING to force queries. This can result in
suboptimal performance when using ScyllaDB. Vector Search Vector
search on ScyllaDB is powerful but specific. There are global and
local vector indexes with different filtering semantics and
performance considerations. There’s an ANN OF
operator, and quantization options that matter at scale. Choosing
the wrong index type for a filtered query can hurt performance.
Getting started Install all skills using the Vercel Skills CLI
(requires Node.js): npx skills add
scylladb/agent-skills Or install a specific skill: npx
skills add scylladb/agent-skills --skill
scylladb-data-modeling You can also install manually by
cloning the GitHub repository
and copying the skill folders into your agent’s skills directory:
Agent Skills directory Claude Code ~/.claude/skills/
Cursor ~/.cursor/skills/ OpenAI Codex
~/.codex/skills/ OpenCode
~/.config/opencode/skills/ The skills follow the
Agent Skills open standard
and work with any agent that supports it, including Claude Code,
Cursor, Codex, and GitHub Copilot. Native Claude Code and Cursor
plugins are coming soon. We recommend installing all three skills
in any project that uses ScyllaDB. You get full coverage of the
areas where agents most commonly go wrong, with no overhead when
those skills aren’t relevant to the current task. As of now, the
skills cover the CQL interface; Alternator (DynamoDB
API) is not yet included. Feedback is welcome. Create an
issue on GitHub! New Research on Cloud Database Trends: Technical Risks, Cost Pressures, and Migration Triggers
Good enough until it isn’t: the database complacency trap A database is like a water heater. When all is well, it just does its job in the background. You don’t fantasize about replacing it or envy the one your friend just got. Really, you don’t even think about it — until something goes awry. But new research reveals a key difference: With databases, the problems don’t blindside you. Some 38% of technology leaders worry that their current database won’t meet their needs in the near future. However, they aren’t acting on it. They wait until some compelling event (e.g., a production incident, usage spike, budget cut, or cloud strategy pivot) pushes the database to the top of the priority list. That’s just one of the interesting findings from the Futurum Group’s latest research study, commissioned by ScyllaDB, which explores the latest trends in cloud database cost pressures, performance risks, and migration motivations. Respondents include technical decision-makers who shape cloud database strategy as well as team members directly responsible for the database. Guy Currier, Futurum Group Chief Analyst, summarizes the findings this way: “Those technology leaders expressed complacency with their cloud databases at the same time as concern and caution. This combination suggests that although they would prefer not to take immediate action, they know they will have to move when compelling events force a change.” The full report, Is Cloud Database Complacency Affecting Your Business Objectives?, is available now. Here are some key takeaways. Comfort masks concern A third of the leaders surveyed report satisfaction with the performance of their current cloud databases. Yet, 38% worry that their database isn’t fit to support future AI/ML workloads and the resulting explosion in data volume. The prime characteristic of these workloads is their unpredictability; past database performance is a poor indicator of future behavior as the technology evolves and as volumes increase. “Organizations experience what we might call ‘good enough for now’ syndrome,” Currier noted. “Their databases handle today’s workloads adequately, but leaders doubt these solutions will scale to meet tomorrow’s demands.” Cloud database costs are also a major concern. The research found that 35% of leaders want to improve performance but feel constrained by budget. Another 35% are concerned about rising costs despite being satisfied with performance. The top cloud database cost drivers include: Unexpected loads (40%) New or strict technical requirements (38%) Networking bandwidth growth (38%) Storage growth (38%) The 10% cost-savings tipping point Nearly 40% of organizations are meeting their cloud database budgets, but just as many consider their predictable costs too high. As Currier explains, “Organizations might tolerate high costs when they can plan for them. However, this tolerance creates an opening for solutions that can deliver similar predictability at lower price points.” That opening is quite specific: A 10% cost reduction is all it would take for many tech leaders to consider migrating their cloud database. Why so low? Likely, the answer lies in scale. When database costs climb into the millions annually – which is not unusual for platforms like DynamoDB, according to the research – even a modest 10% translates to substantial savings. Event-driven database migration triggers Still, technical leaders don’t proactively seek alternatives that are more cost-efficient or better prepared for the technical needs of current/future AI/ML workloads. They wait for trigger events that force them into a crisis-driven decision. Leadership changes (36%) and major production incidents (32%) emerged as the primary catalysts. Other significant triggers include: Load spikes (32%) Cost reductions of 10% or more (31%) Maintenance burdens (31%) Performance issues (29%) Volatile costs (28%) Most of these triggers highlight the reactive nature of these migrations, rather than proactive, strategic changes. Note that volatile database costs drive 28% of switching decisions, suggesting that sheer unpredictability can be nearly as disruptive as high costs. “Database decisions are rarely made in a vacuum,” the research report notes. “Even when teams identify performance or cost inefficiencies, acting on them competes with feature delivery, roadmap commitments, limited operational bandwidth, and against their familiar tech stack.” Early warning signs While water heater issues tend to surface without warning, database issues can usually be anticipated. There are several early warning signs that a database is starting to become a constraint: Cost is growing faster than throughput. When database spend rises faster than the throughput it’s handling, the system may not be as scalable as it appears. Teams patch their way forward (e.g., with caches) to sustain performance. But the cost per query keeps climbing. Rising tail latency. When P95 or P99 latency starts to climb during peak periods or background operations, it indicates the system is nearing its breaking point. These changes might be dismissed if they don’t immediately violate SLAs, but they’re canaries in the coal mine. Increasing operational friction. More manual tuning, more frequent capacity adjustments, more time spent managing the database to maintain the same level of performance…all these signal diminishing returns from the current architecture. Disproportionate complexity for organic growth. When routine scaling or new workload support requires outsized engineering effort, it’s a sign that the database has become a constraint rather than an enabler. From reactive to strategic Recognizing these signals is one thing, but actually acting on them before a crisis forces your hand is another. Some due diligence now will help you stay ahead of it. Get a general sense of what options are available for your use cases Define vendor-neutral evaluation criteria Stress test your existing database to understand its breaking point – before production traffic exposes it for you Set clear decision triggers (e.g., specific performance thresholds, cost targets, and capability gaps) Map your database capabilities against your 12–24 month strategic roadmap, not just your current workloads As Currier concludes: “Your database might be ‘good enough for now,’ but if that isn’t aligned with where your business needs to go, complacency is already costing you.” Download the full report here; you’ll also get access to an expert panel discussing the research findings.Native Vector Search for the DynamoDB API
Developers building on the DynamoDB API can run vector similarity search without the complexity of bolted-on “Zero ETL” For users in the DynamoDB environment, implementing vector search has been overly complicated. Amazon’s “Zero ETL” forces a dual-service approach (managing both DynamoDB and OpenSearch) and requires using two separate APIs just for Vector Semantic Search queries. ScyllaDB believes this is unnecessary complexity. We’re eliminating the heavy lifting by integrating vector search capabilities into Alternator, our DynamoDB-compatible API. This gives DynamoDB users high-performance similarity search within their familiar API, without the need for extra clusters or constant API context-switching. Architectural Differences: Unified vs. Fragmented Amazon’s approach to vector search exports data to S3 and then syncs it to OpenSearch via DynamoDB Streams. While “Zero ETL” sounds hands-off, you’re still responsible for the cost and complexity of a separate search cluster. The AWS cost is composed of DynamoDB, DynamoDB Streams, S3, OpenSearch, and the OSIS pipeline. Each of these elements’ pricing is complex on its own. Amazon Vector Search (using Open Search) for DynamoDB architecture. Source: AWS Blog. ScyllaDB Alternator simplifies this by integrating the vector store engine directly into the backend. Simple module: The ScyllaDB database hosts both the data and the vector index. Native API: You perform vector searches using DynamoDB Query operations. Performance: 10 Million Vectors on a Budget In our latest benchmark using a 10-million-vector dataset (768-dimensional Cohere embeddings), a modest five-node ScyllaDB cluster delivered over 12K QPS with single-digit millisecond latency.Setup: 10M vectors; 768 dimensions; K: 10 (retrieve top K values); No QuantizationResults Recall: ~90% Throughput: 12,763 QPS P99 Latency: 7.8 ms Cost: $1,643 / Month for 1Y full up front Estimating the AWS cost for this case is not trivial. The write-path includes DynamoDB (storage+ops), DynamoDB streams, S3 (storage, API), OpenSearch (data nodes, master nodes, EBS), and the OSIS pipeline. To read more on the pricing of Amazon Zero ETL, see Implementing search on Amazon DynamoDB data using zero-ETL integration with Amazon OpenSearch service. Code Examples Note: The exact JSON format might change in the next few months. 1. Enabling a Vector Index You can enable vector indexing during
CreateTable or via
UpdateTable. Note the new
VectorSecondaryIndexUpdates parameter. // Adding
a vector index to an existing table { "TableName":
"ProductCatalog", "AttributeDefinitions": [ {"AttributeName":
"ProductEmbedding", "AttributeType": "V"} ],
"VectorSecondaryIndexUpdates": [ { "Create": { "IndexName":
"VectorIdx", "VectorAttribute": { "AttributeName":
"ProductEmbedding", "Dimensions": 768 }, "IndexOptions": {
"SimilarityFunction": "COSINE", "M": 32, "ef_construction": 256 } }
} ] } Pro Tip: You will get the best
results with ScyllaDB’s optimized “V” (Vector)
type. Although you can use standard DynamoDB Lists, the
“V” type will store data as a tight array of 32-bit floats – and
that saves storage while boosting performance. 2. Performing a
Vector Search To search, use the Query operation with the ScyllaDB
VectorSearch parameter. { "TableName":
"ProductCatalog", "IndexName": "VectorIdx", "VectorSearch": {
"QueryVector": [0.12, 0.05, ..., 0.88], "Oversampling": 1.5 },
"Limit": 10, "ReturnVectorSearchSimilarity": "SIMILARITY" }
Example Use Cases Semantic Product Search Instead of relying on
exact keyword matches, users can find products based on intent. For
example, a search for “waterproof rugged hiking gear” can surface
relevant items even if those exact words aren’t in the title. RAG
(Retrieval-Augmented Generation) For knowledge bases, precision is
non-negotiable. Using the High Recall
configuration, ScyllaDB delivers 99.2% recall. That way, the LLM
receives the most accurate context possible for generating
responses. Semantic Deduplication At the Max
Throughput end of the spectrum, ScyllaDB can quickly scan
millions of incoming vectors to find near-duplicates. That prevents
redundant data from cluttering your system – reducing costs and
improving performance. Conclusion With ScyllaDB, DynamoDB users now
have a “fast track” to AI-ready infrastructure. By unifying storage
and vector search into a single API, you eliminate the operational
tax of “Zero ETL” without sacrificing the sub-millisecond
performance ScyllaDB is known for. ScyllaDB Vector Search Benchmark: 10M Vectors on a Compact Cluster
Even a small, compact setup achieved up to 12,840 QPS at k=10 with a serial P99 latency of 5.5 ms Our 1-billion-vector benchmark demonstrated that ScyllaDB Vector Search can sustain 252,000 QPS with 2 ms P99 latency across a large-scale deployment. But not every workload starts at a billion vectors. Many production use cases (e.g., product catalogs, knowledge bases for RAG, and semantic caches) live comfortably in the 10–100 million range. This post presents a smaller benchmark: a 10-million-vector dataset of 768-dimensional Cohere embeddings on a compact five-node cluster. It used three modest storage nodes and two memory-optimized search nodes, all running on AWS Graviton. We explore four index configurations that span the recall-throughput spectrum, from near-perfect recall to maximum throughput. The results show that even this small setup can deliver up to 12,840 QPS at k=10 with a serial P99 latency of 5.5 ms — without any quantization. Architecture at a Glance First, some background. ScyllaDB Vector Search separates storage and indexing responsibilities while keeping the system unified from the user’s perspective. The ScyllaDB storage nodes hold both the structured attributes and the vector embeddings in the same distributed table. Meanwhile, a dedicated Vector Store service — implemented in Rust and powered by the USearch engine — consumes updates from ScyllaDB via CDC and builds approximate nearest neighbor (ANN) indexes in memory. Queries are issued through standard CQL:SELECT … ORDER BY vector_column ANN OF
? LIMIT k; The queries are internally routed to the Vector
Store service, which performs the HNSW similarity search and
returns the candidate rows. This design allows each layer to scale
independently, optimizing for its own workload characteristics and
eliminating resource interference. For a detailed architectural
deep-dive, see the
1-billion-vector benchmark and the technical blog
Building a Low-Latency Vector Search Engine for ScyllaDB.
Benchmark Setup Here’s a look at the dataset and hardware used for
the benchmark. Dataset Property
Value Vectors 10,000,000
Dimensions 768 Embedding model
Cohere Similarity function COSINE
Quantization None (f32) Hardware
Role Instance
vCPUs RAM Count
Storage nodes i8g.large 2 16 GB 3 Search
nodes r7g.2xlarge 8 64 GB 2 With 768-dimensional f32
vectors and M values up to 64, the in-memory index size can be
estimated as: Memory ≈ N × (D × 4 + M × 16) × 1.2 For the largest
configuration (M=64): 10M × (768 × 4 + 64 × 16) × 1.2 ≈ 49
GB, which fits comfortably in the 64 GB of a single
r7g.2xlarge search node. No quantization is needed at this
scale. Experiments We tested four HNSW index
configurations, progressively lowering graph connectivity (M) and
search effort (ef_search) to shift the balance from
recall toward throughput. Experiment
M ef_construction
ef_search k tested
#1 (high quality) 64 384 192 100, 10
#2 (balanced) 32 256 128 100, 10
#3 (high throughput) 24 256 64 100, 10
#4 (max throughput) 20 256 48 10 The three HNSW
parameters control different aspects of the index:
M
(maximum_node_connections): Maximum edges per node in
the HNSW graph. Higher values create a richer, better-connected
graph that improves recall at the cost of more memory and slower
inserts and queries. ef_construction
(construction_beam_width): Controls how thoroughly the
algorithm searches for the best neighbors when inserting a new
vector. Higher values produce a higher-quality graph but slow down
index building. This is a one-time cost.
ef_search
(search_beam_width): The main tuning knob for query
performance. Controls the size of the candidate beam during search.
Higher values evaluate more candidates, which improves recall but
increases query latency. Since vector index options cannot be
changed after creation, each experiment required dropping and
recreating the index. Here are the CQL statements used: --
Experiment #1: M=64, ef_construction=384, ef_search=192 CREATE
CUSTOM INDEX vdb_bench_collection_vector_idx ON
vdb_bench.vdb_bench_collection (vector) USING 'vector_index' WITH
OPTIONS = { 'search_beam_width': '192', 'construction_beam_width':
'384', 'maximum_node_connections': '64', 'similarity_function':
'COSINE' }; -- Experiment #2: M=32, ef_construction=256,
ef_search=128 CREATE CUSTOM INDEX vdb_bench_collection_vector_idx
ON vdb_bench.vdb_bench_collection (vector) USING 'vector_index'
WITH OPTIONS = { 'search_beam_width': '128',
'construction_beam_width': '256', 'maximum_node_connections': '32',
'similarity_function': 'COSINE' }; -- Experiment #3: M=24,
ef_construction=256, ef_search=64 CREATE CUSTOM INDEX
vdb_bench_collection_vector_idx ON vdb_bench.vdb_bench_collection
(vector) USING 'vector_index' WITH OPTIONS = { 'search_beam_width':
'64', 'construction_beam_width': '256', 'maximum_node_connections':
'24', 'similarity_function': 'COSINE' }; -- Experiment #4: M=20,
ef_construction=256, ef_search=48 CREATE CUSTOM INDEX
vdb_bench_collection_vector_idx ON vdb_bench.vdb_bench_collection
(vector) USING 'vector_index' WITH OPTIONS = { 'search_beam_width':
'48', 'construction_beam_width': '256', 'maximum_node_connections':
'20', 'similarity_function': 'COSINE' }; The benchmark was
run using VectorDBBench with
the upcoming ScyllaDB Python driver built on a Rust core (a dev
version is available at
python-rs-driver). VectorDBBench ramps concurrency from 1 to
150 concurrent search clients and measures QPS, P99 and average
latency at each level. A separate serial run of 1,000 queries
measures recall and nDCG against brute-force ground truth. Results
Peak QPS Comparison To start our analysis, let’s examine the
maximum throughput that each index configuration can sustain under
peak concurrency. When strictly looking at the highest throughput
achieved:
The bar chart highlights the dramatic impact of index parameters at
k=10: throughput rises sharply as the index becomes lighter. At
k=100, the differences are much smaller; all configurations cluster
between 2,300 and 3,000 QPS. QPS vs Concurrency The chart below
shows how each index configuration scales as concurrency ramps from
1 to 150 clients.
At k=10, the lighter configurations (Experiments
#3 and #4) scale nearly linearly up to 60–80 concurrent clients
before saturating. Experiment #4 demonstrates the benefit of a
leaner graph: it achieves 5.5X higher peak QPS
than Experiment #1 at k=10. At k=100, all
configurations converge to a narrower throughput band (2,300–3,025
QPS). This shows that retrieving 100 neighbors dominates the
per-query cost regardless of index parameters. P99 and Average
Latency vs Concurrency As expected, increasing throughput adds
queuing delay, and that leads to higher tail latencies.
<!-- Note: The original document had 6 images. The source note
lists the order as 4-1-2-3-5-6. The text contains 7 [image]
placeholders. Based on the document's structure, I will assume the
sixth placeholder corresponds to the last chart (Average Latency)
and omit the extra placeholder, as the source note only accounts
for six images. I will adjust the numbering below. The original
list was 4-1-2-3-5-6. I will use the final placeholder (Image 6
from source) here. The next section has another chart, so I will
add a seventh placeholder and mark it as 'Source Unknown'.
Lighter configurations start at dramatically lower baseline
latencies. Experiment #4 maintains sub-6 ms P99 latency up to 30
concurrent clients, while Experiment #1 starts above 13 ms, even at
concurrency 1. All configurations show latency rising
proportionally once throughput saturates. This is the expected
queuing behavior when the system is at capacity. QPS vs P99 Latency
(Pareto View) Plotting throughput directly against tail latency
provides a Pareto frontier of our benchmark configurations:
This view makes the operational trade-off easier to read than the
concurrency charts alone. At k=10, Experiments #3 and #4 push the
frontier outward, with much higher QPS at the same or lower tail
latency. At k=100, the frontier is tighter, which again shows that
returning more neighbors dominates the total cost per query. Recall
vs Peak QPS Finally, plotting recall helps select the optimal index
strategy based on business requirements:
This chart summarizes the core choice in a single picture: should
you spend compute on accuracy or throughput? Experiment #1 sits at
the high-recall end, Experiment #4 at the high-throughput end, and
Experiment #2 emerges as the practical middle ground for workloads
that need both. Scenario Analysis With the charts above as a visual
reference, let’s examine the three main usage scenarios that emerge
from the data. Scenario 1: Maximum Throughput Experiments #3 (M=24,
ef_search=64) and #4 (M=20, ef_search=48) target workloads where
throughput is the primary objective and moderate recall is
acceptable — for example, coarse candidate retrieval stages in
recommendation pipelines or semantic deduplication. At
k=10, Experiment #4 reached a peak of
12,840 QPS at concurrency 100, with a serial P99
latency of just 5.5 ms and recall of
92.0%. Experiment #3 achieved 9,719
QPS with marginally better recall at
95.0% and a serial P99 of 6.0 ms.
Even at k=100, these lightweight configurations
delivered competitive throughput: Experiment #3 peaked at
3,025 QPS (87.9% recall), which is comparable to
the heavier configurations. Retrieval of 100 neighbors per query
inherently requires more work, which limits the throughput range
across all configurations. Scenario 2: High Recall Experiment #1
(M=64, ef_search=192) prioritizes accuracy for applications that
cannot tolerate missed results (e.g., high-fidelity semantic
search, retrieval-augmented generation [RAG] pipelines, or
compliance-sensitive retrieval). At k=10, the
system delivered 99.2% recall and 99.1%
nDCG — essentially indistinguishable from exact
brute-force search. Peak QPS reached 2,324 with a
serial P99 latency of 14.6 ms. At
k=100, recall was 96.8% with
2,345 QPS and a serial P99 of 15.2
ms. The higher latency and lower throughput are a direct
consequence of the richer graph (64 connections per node) and wider
search beam (192 candidates), which evaluate substantially more
distance computations per query. Scenario 3: Balanced Experiment #2
(M=32, ef_search=128) takes the middle ground, offering strong
recall with significantly better throughput than the high-recall
configuration. At k=10, it achieved 97.7%
recall with 4,897 QPS — roughly double
the throughput of Experiment #1, with only a 1.5 percentage-point
recall reduction. The serial P99 was 8.7 ms. At
k=100, recall was 92.0% with
2,975 QPS and a serial P99 of 9.6
ms. This configuration represents a practical sweet spot
for many production deployments where both recall and throughput
matter. Summary Tables k=100 Metric #1
M=64 ef_s=192 #2 M=32 ef_s=128 #3
M=24 ef_s=64 Peak QPS 2,345 (c=150) 2,975
(c=40) 3,025 (c=40) QPS @ c=10 947 1,314 1,489
Serial P99 Latency 15.2 ms 9.6 ms 7.8 ms
P99 Latency @ c=1 15.5 ms 9.9 ms 8.1 ms
P99 Latency @ c=100 81.2 ms 49.9 ms 49.6 ms
Recall 96.8% 92.0% 87.9% nDCG
97.3% 93.1% 89.7% k=10 Metric #1 M=64
ef_s=192 #2 M=32 ef_s=128 #3 M=24
ef_s=64 #4 M=20 ef_s=48 Peak
QPS 2,324 (c=100) 4,897 (c=80) 9,719 (c=80) 12,840 (c=100)
QPS @ c=10 1,054 1,602 2,046 2,311 Serial
P99 Latency 14.6 ms 8.7 ms 6.0 ms 5.5 ms P99
Latency @ c=1 14.0 ms 8.5 ms 6.2 ms 5.5 ms P99
Latency @ c=100 81.0 ms 38.1 ms 18.0 ms 12.3 ms
Recall 99.2% 97.7% 95.0% 92.0%
nDCG 99.1% 97.6% 94.9% 92.0% Key Takeaways
k=10 vs k=100: At k=10, lighter index parameters
yield massive throughput gains (up to 5.5X) with modest recall
loss. At k=100, all configurations converge to a narrow QPS band
(~1.3X range) because retrieving more neighbors dominates per-query
cost. Recall trade-offs are favorable: At k=10,
recall drops only 7.2 pp (99.2% to 92.0%) for a 5.5X QPS increase.
At k=100, the trade-off is steeper: 8.9 pp for just 1.3X gain.
Latency tracks index weight: Serial P99 drops from
14.6 ms to 5.5 ms at k=10, and from 15.2 ms to 7.8 ms at k=100, as
lighter graphs require fewer distance computations.
Saturation points differ: Experiments #1–#3
plateau around c=40–80; Experiment #4 scales further to c=100
before saturating, reflecting its lower per-query compute cost.
Conclusion These results show that ScyllaDB Vector Search delivers
strong performance even on a compact, five-node cluster with 10
million 768-dimensional vectors. A pair of r7g.2xlarge search nodes
provides enough memory to hold the full HNSW index at f32 precision
– without requiring any quantization. The three storage nodes with
replication factor 3, combined with vector search nodes distributed
across availability zones, also provide high availability. The
system is designed to tolerate node failures without data loss or
service interruption. Depending on the index configuration, the
system can prioritize near-perfect recall (99.2% at k=10) or
maximize throughput (12,840 QPS at k=10 with 92% recall), with
practical balanced options in between. This 10M scenario represents
the accessible end of the scale. For workloads that push into
hundreds of millions or billions of vectors, quantization,
additional search nodes and larger instances extend the same
architecture. See the ScyllaDB
1-billion-vector benchmark for results at extreme scale, and
look for our upcoming 100-million-vector benchmark
post. At K=10, the performance bottleneck resides within the vector
index nodes, leaving ScyllaDB with significant headroom. This means
you can likely add a Vector Search index to your cluster and
continue running a similar workload on your existing ScyllaDB
infrastructure – without needing to scale your database
nodes. The full Jupyter notebook with interactive charts and all
data is available
in this repository. Ready to try it yourself? Follow the
ScyllaDB Vector Search Quick Start Guide to get started. ScyllaDB X Cloud: Your Questions Answered
A technical FAQ on ScyllaDB X Cloud: architecture, autoscaling, compression, use cases, and more It’s been a few months since ScyllaDB X Cloud landed. In case you missed the news, here’s a quick recap… ScyllaDB X Cloud is the next generation of ScyllaDB’s fully-managed database-as-a-service. It’s a truly elastic database designed to support variable/unpredictable workloads with consistent low latency as well as low costs. Users can scale out and scale in almost instantly to match actual usage. For example, you can scale all the way from 100K OPS to 2M OPS in just minutes, with consistent single-digit millisecond P99 latency. This means you don’t need to overprovision for the worst-case scenario or suffer the lag traditionally associated with ramping up capacity in response to a sudden surge. Some key features (all covered in Introducing ScyllaDB X Cloud: A (Mostly) Technical Overview): Tablets + just-in-time autoscaling Up to 90% storage utilization Support for mixed size clusters File-based streaming Dictionary-based compression Flex credit Here’s a look at ScyllaDB X Cloud in action: Not surprisingly, users have been quite curious about all these changes and new options. So we thought we’d collect some of the most common questions here, along with our answers. In no particular order… What are the key differences between a “standard” ScyllaDB Cloud database and “ScyllaDB X Cloud”? Compared to a standard ScyllaDB Cloud database, ScyllaDB X Cloud provides two major advantages: Faster scaling in and out. Higher storage utilization (90% vs. 70%). The above advantages are the result of two technical updates: X Cloud always uses Tablets, while standard databases can use a mix of vNode and Tablets keyspaces. X Cloud enables mixed sized clusters, so you can define more granular cluster and storage sizes. In which cases should you choose a “standard” ScyllaDB Cloud Database vs X Cloud? None! We’ve reached full parity now. Materialized views, CDC, Alternator (DynamoDB API), even counters – it’s all supported. Can I migrate from one type of ScyllaDB Cloud database to the other? Yes. If you are using a standard database with Tablets only, you can migrate this database to X Cloud. If you are using vNode keyspaces, you cannot (yet). How does X Cloud achieve higher storage utilization? Two factors enable higher storage utilization: Faster scaling removes the need to over-reserve storage space (or “sandbag”) while waiting for the cluster to expand Support for mixed instance sizes allows for more granular cluster size How can I start an X Cloud cluster? Simply choose the “X Cloud” Cluster Type on ScyllaDB Cloud’s Launch Cluster page. How can I set the scaling policy? Can I change it later, while the database is in production? (UI/API) The scaling policy is part of the X Cloud cluster properties. You can either set it when launching the cluster or update it later. The policy is optional. It defines the minimum required resources for your database in terms of vCPU and Storage. If you’re not sure how to set it, you can keep the default minimum values (zero) as is. The cluster will scale automatically if and when storage is approaching the threshold, and you can scale the vCPU as required by your workload. Note that the parameters affect each other since more storage may require more compute power. How are X Cloud and Tablets related? X Cloud takes advantage of (and depends on) Tablets to achieve faster scale and higher storage utilization. That means all Keyspaces in X Cloud must use Tablets, which is already the default for ScyllaDB Cloud. How can X Cloud help reduce database costs? There are a few ways that X Cloud reduces cost. The primary factor is the extreme elasticity. You can scale the cluster in and out, even multiple times per day, to meet the demand. If you cannot reliably plan the cluster usage, you can reserve a minimal deployment and pay for bursts using Flex Credit. The higher storage utilization means you use less cloud resources. Improved compression, both on the wire and at rest, reduces cost further. What’s a good use case for ScyllaDB X Cloud? Am I a good candidate for ScyllaDB X Cloud? New (greenfield) workloads should use X Cloud. Workloads that require frequent scaling out/in will benefit the most. For example: A workload with significant fluctuation throughout the day (e.g., peak hours during the evening). A workload with expected high demand on specific days of the year (e.g., Super Bowl, IPL games, or Black Friday). With X Cloud, scaling can be done days in advance. You don’t need to do it one or more weeks ahead. Difficult-to-predict workloads, with common (but volatile) bursts. How many times per day can X Cloud scale? As often as required. Although new nodes start serving requests very fast, it still takes time for the data balancing to be complete if you’re working with rather large nodes. Does X Cloud support multi-DC (region) deployment? Does each region scale independently? X Cloud does not yet support multi-datacenter deployment. Multi-DC support is coming with the 2026.2 release. Scaling Policy: I asked for storage of Y TB and got a bigger cluster with storage of W TB…why? Same for vCPU? vCPU, RAM, and Storage are not independent variables. ScyllaDB will allocate each of these 3 variables to support the required value of the other two. For example, higher storage requires more RAM – which requires more vCPU. The policy UI reflects the expected deployment per each resource selection. Can I suspend / resume the dynamic scaling? Currently: no. Can I restore a backup from X Cloud database to a standard database and vice versa? Yes, you can. Is X Cloud production ready? Absolutely, customers are already using it in production. Why should I care about advanced compression? What is the advantage of having it? ScyllaDB already supported compression before X Cloud – including at-rest and in-transit. However, dictionary-based compression is much more effective in reducing data overhead. By compressing data further, you save on disk space utilization (combined with up to 90% disk space utilization) as well as inter-AZ networking for data replication and high availability. X Cloud claims faster scaling. How fast is it really? The legacy vNode-based architecture imposed some limitations: Nodes could only be added one at a time, even across DCs. Data was replicated in rows – that is, rows were being transferred over the wire. A node only started serving requests after its streaming was fully completed. This process could easily take hours, if not days, to complete on large clusters. Now, X Cloud leverages tablets to remove those limits: Nodes can be added in parallel, multiple nodes at a time, including across DCs. Nodes join the cluster instantly, then start streaming data later. Streaming under Tablets relies on file-based streaming, transferring gigabytes of data per second in a very efficient process. As Tablet transfers complete, nodes start to serve requests immediately; this increases as more transfers complete, until the cluster rebalancing is completed. This allows X Cloud to scale to an unlimited number of nodes at a single step – and streaming data is made super efficient by file-based streaming. A cluster can go from 100K ops per second to 2M ops per second in a matter of a few minutes, not hours or days. Can I use Vector Search with X Cloud? Yes, you can! Enable the Vector Search option at the bottom of the Launch Cluster page and choose the Vector Search instances. Note that Vector Search index nodes scale independently from ScyllaDB nodes. You can learn more about Vector Search here.6 Reasons ScyllaDB Costs a Fraction of DynamoDB
Why teams typically experience 50% (or greater) cost reductions when moving from DynamoDB to ScyllaDB DynamoDB is expensive at scale. Some of that cost is fundamental to the managed service model. But much of it is the pricing model, the way DynamoDB charges per read, per write, per byte, and per region. ScyllaDB rethinks pricing from first principles. The result: teams typically see more than 50% cost reductions on equivalent workloads. In this post, I’ll share a few reasons why. Cheap writes DynamoDB charges 5x more for writes than reads. Write a 1 KB item and it costs 5 write capacity units. Read the same 1 KB item and it costs 0.25 read capacity units. ScyllaDB pricing is based on provisioned cluster capacity (nodes), not per operation. Whether you do 10K writes/sec or 100K writes/sec on a 3-node cluster, the ScyllaDB cost remains the same. Write-heavy workloads for AI, real-time analytics, logging, time-series data and IoT sensors often see the biggest savings. Take a look at our AI Feature Store example. A batch workload scenario with overnight peaks approximately 3x the daytime average on DynamoDB will cost $2.2M/year. The same workload on ScyllaDB would cost $145K/year. In other words, that’s at least 15x savings just switching to ScyllaDB. No need for a separate cache DynamoDB’s baseline latency is in the 10-20ms range. For many applications, that’s unacceptable. In those cases, teams commonly deploy DAX, Redis, or Memcached on top. That adds cost, complexity, and another service to operate and monitor. ScyllaDB was built for low latency. Internal caching and a shard-per-core architecture deliver sub-millisecond latencies on reads. For most workloads, an external cache is unnecessary. Let’s look at a retail example with a read-heavy workload that is cached and running on demand. On DynamoDB running with DAX, that workload would cost $1.6M/year. The same workload on ScyllaDB would cost $271K/year (and even less if you switch to a hybrid plan). That’s at least 6x cheaper using ScyllaDB. Plus: there are fewer moving parts, simpler operations, and no cache coherency headaches. Affordable multi-region data centers DynamoDB Global Tables charge replicated writes (rWCUs) at a premium: roughly 2x the cost of normal writes. Moreover, cross-region data transfer incurs AWS’s standard rates: $0.02-0.09/GB. For a workload doing 10K writes/sec with 5 KB payloads across 2 regions, data transfer alone can add $10K+/month. A social media scenario modeled across 3 regions on DynamoDB would cost $11.0M/year. The huge cost is partly because the write capacity cannot be reserved, and you effectively pay twice for the writes. The same workload on ScyllaDB would cost $591K/year. That’s a monstrous +$10M/year saving by switching to ScyllaDB. ScyllaDB handles multi-DC replication natively. You provision nodes in each data center, and replication is built into the protocol along with shard-aware and rack-aware drivers. This helps minimize network overhead and avoids the per-operation premium. You pay for the cluster nodes; replication comes with the territory. Large items don’t cost more In DynamoDB, a 1 KB write costs 1 WCU, and a 10KB write costs 10 WCUs. Item size directly drives billing. This incentivizes shrinking payloads, compressing data, and splitting tables. Architectural decisions are driven by cost, not design. A simple on-demand scenario with DynamoDB using 3 KB item sizes would cost $633K/year. ScyllaDB would cost $39K/year. Along with multi-region, item size remains one of the biggest cost levers to pull when looking for savings on DynamoDB. ScyllaDB billing is independent of item size. Store 1 KB items or 100 KB items and the cluster cost is unchanged. You architect around performance and correctness, not billing thresholds. Making multi-tenancy work for you DynamoDB is multi-tenant infrastructure. That’s how AWS achieves efficiency. But it also means: You pay for provisioned capacity AWS oversubscribes hardware Idle capacity benefits AWS, not you You pay for the full machine, but AWS shares it with everyone else. Multi-tenant infrastructure reduces cost for AWS but increases risk for users. Large DynamoDB outages (like us-east-1) impact thousands of customers simultaneously. When shared infrastructure fails, the blast radius is enormous. ScyllaDB flips that model. You get a dedicated cluster, which gives you: Isolation by design The ability to run multiple workloads The option to share idle capacity internally This is especially powerful for: Multi-tenant SaaS Microservices Multiple environments (dev/staging/prod) Instead of provisioning 100 tables separately, you provision one cluster and use it fully. You control your infrastructure. AWS monetizes multi-tenancy. ScyllaDB lets you monetize it. Flexible and predictable pricing DynamoDB is excellent for certain use cases: serverless applications with unpredictable spikes, multi-tenant services that need table-level isolation, and teams that prioritize operational simplicity over cost. But if you’re running a predictable, scale-intensive workload – especially one that’s write-heavy, multi-region, or stores large items – then DynamoDB’s per-operation pricing model becomes a massive cost driver. ScyllaDB’s node-based, cluster-centric model is fundamentally more cost-efficient for these scenarios. Combined with its performance and operational features, it’s why teams see more than 50% cost reductions. Want to see the actual numbers for your workload? Use the ScyllaDB Cost Calculator at calculator.scylladb.com to model a comparison between your current DynamoDB spend and equivalent ScyllaDB infrastructure.Apache Cassandra® 6 Accord transactions: What you need to know
There have always been architectural trade-offs when considering a distributed database like Apache Cassandra versus a relational database. Cassandra excels at linear horizontal scalability, multi-region replication, and fault-tolerant uptime that relational systems couldn’t match. This comes at the expense of general-purpose ACID (Atomicity, Consistency, Isolation, Durability) transactions which allows the ability to express complex, multi-row operations with guaranteed consistency.
With Cassandra 6 on its way to general availability status (and an alpha already released), we’re approaching a turning point where we can revisit whether these trade-offs will still exist. The latest version delivers general-purpose ACID transactions through a new protocol called Accord. With Cassandra 6, those transactional guarantees will be native, without compromising Cassandra’s operational model or availability.
TransactionsIn database parlance, a transaction says, “These operations belong together. They must all be applied, or none of them.” The classic example is a bank transfer. When you move money from one account to another, two things must happen: a debit and a credit. If the debit succeeds but the credit fails, money has disappeared. A transaction prevents this issue by guaranteeing the two operations are atomic, meaning they succeed or fail as a unit; combined with isolation, no other process can see an immediate or half-finished state.
Experiences like these depend on transactional guarantees at the data layer, which rely on ACID semantics, particularly atomicity and isolation, to prevent inconsistent intermediate states.
For most developers who have worked with relational databases, transactions are so fundamental they’re almost invisible. For Cassandra users, comparable guarantees across multiple partitions or tables historically required significant application-level coordination or weren’t natively supported.
Coordination at scale is fundamentally hardBecause Cassandra is designed to deal with data replication and scaling, coordinating atomic changes across multiple nodes is inherently challenging (e.g., decrement a balance here, increment one there). All participating replicas must agree on an order of operations. Distributed consensus protocols exist to solve exactly this, but prior approaches came with trade-offs.
Raft and Zab are examples of protocols that use leaders, which is not suitable for Cassandra since nodes are treated equally.
More information about prior solutions can be found in more details in CEP-15, but generally, leader-based approaches pose issues at scale.
The Accord protocolThe Accord protocol, proposed in CEP-15, is built to achieve fast, general-purpose distributed transactions that remain stable under the same failure conditions Cassandra already tolerates— with no elected leaders.
How it orders transactionsAccord is leaderless so any node can coordinate any transaction. Transactions are assigned unique timestamps using hybrid logical clocks, where each node appends its own unique ID to its clock value to ensure global uniqueness across the cluster. Conflicting transactions execute in timestamp order across all replicas. Under normal conditions, a transaction reaches consensus in a single round trip.
The reorder bufferThe challenge with timestamp-based ordering in a geo-distributed system is that two transactions started concurrently from different regions might arrive at replicas in different orders, breaking fast-path consensus. Accord solves this by having replicas buffer incoming transactions. The wait time is precisely bounded to be just long enough to account for clock differences between nodes and network latency, and no longer. This guarantees that replicas always process transactions in the correct order without needing extra message rounds.
Fast-path electoratesWhen replicas fail, other leaderless protocols fall back to slower, more expensive message patterns. Accord avoids this by dynamically adjusting which replicas participate in fast-path decisions as failures occur. The result is that Accord maintains fast-path availability under failure, avoiding the degradation to slower message patterns that other leaderless protocols experience.
The net effect: strict serializable isolation across multiple partitions and tables, in a single round trip, with no leaders, and preserving performance characteristics under the same minority‑failure conditions that Cassandra is designed to tolerate.
New CQL syntax to support transactionsThe most visible change for developers is new CQL syntax.
Transactions in Cassandra 6 are wrapped in BEGIN
TRANSACTION and COMMIT TRANSACTION blocks,
similar to SQL syntax.
Let’s examine a flight booking transaction that must simultaneously reserve a seat and deduct loyalty miles from two separate tables. Note: Cassandra 6 is pre-release. Syntax shown reflects the current alpha and may evolve before general availability.
BEGIN TRANSACTION LET seat = (SELECT available FROM flight_seats WHERE flight_id = 'ZZ101' AND seat_number = '14C'); LET miles = (SELECT balance FROM loyalty_accounts WHERE member_id = 'M-7823'); IF seat.available = true AND miles.balance >= 25000 THEN UPDATE flight_seats SET available = false, booked_by = 'M-7823' WHERE flight_id = 'ZZ101' AND seat_number = '14C'; UPDATE loyalty_accounts SET balance = miles.balance - 25000 WHERE member_id = 'M-7823'; END IF COMMIT TRANSACTION ;
Everything between BEGIN TRANSACTION and
COMMIT TRANSACTION executes atomically with strict
serializable isolation from the perspective of all other concurrent
transactions. The LET clause reads current values from
the database and binds them to variables. The IF block uses those
values to guard the writes. If the seat is already taken or the
member doesn’t have enough miles, nothing happens. Both updates
either apply together or not at all, across two different tables
and two different partition keys.
This is logic that previously had to live in the application, complete with retry handling, race condition guards, and compensating operations if something failed halfway through. Now it lives in the database.
Enabling Accord in Cassandra 6: The CMS dependencyWe can’t talk about Accord without discussing Cluster Metadata Service (CMS). Before Accord transactions are functional, Cluster Metadata Service (CMS), introduced alongside Accord as CEP-21, must be enabled. For teams upgrading from Cassandra 5, this is the most significant operational change in the release.
CMS is required. Accord needs every replica to have the same authoritative view of cluster topology showing which nodes own which data, and which replicas participate in a given transaction. Before Cassandra 6, this information was propagated via the eventually consistent Gossip Protocol. This is suitable for normal reads and writes, but Accord’s correctness depends on knowing precisely who the transaction participants are before committing. CMS replaces Gossip-based metadata propagation with a distributed, linearized transaction log, giving all nodes a consistent view of cluster state. Without it, Accord’s guarantees don’t hold.
Upgrading from Cassandra 5 to 6—plan carefullyThe upgrade cannot begin until every node in the cluster is running Cassandra 6. CMS initialization requires full cluster agreement; no mixed-version clusters are supported. Before upgrading, disable any automation that could trigger schema changes, node bootstrapping, decommissions, or replacements. These operations are blocked during the upgrade window, and if they fire on an older node before CMS is initialized, the migration can fail in ways that require manual intervention to recover.
Once all nodes are upgraded, run nodetool cms
initialize on one node to activate CMS. This creates the
service with a single member, which is enough to unblock metadata
operations but is not suitable for production. Follow up
immediately with nodetool cms reconfigure to add more
members. CMS uses Paxos internally and requires a minimum of three
nodes for a viable quorum, with more recommended for production
depending on cluster size.
Important: CMS initialization is not easily reversible. Plan the upgrade window accordingly and treat it as a one-way operational step.
On a fresh Cassandra 6 cluster that wasn’t migrated from a previous version, CMS is automatically enabled. First, one node is designated as the initial CMS member. From there, CMS membership scales automatically based on cluster size, with the service adding members as the cluster grows without requiring manual intervention.
Of course, for Instaclustr users, our platform and techops team will take care of most of this for you and walk you through any requirements on your side when the time comes to upgrade.
Coexistence with Lightweight Transactions (LWT)Existing LWT syntax (IF NOT EXISTS, IF
EXISTS, conditional UPDATE/INSERT statements)
continues to work and fundamentally differs from Accord
transactions as LWT is scoped to a single partition and is
extremely limited. Accord doesn’t replace or break existing
applications. Using BEGIN TRANSACTION/END TRANSACTION
is how developers opt into the broader cross-partition
guarantees.
Every prior approach to distributed transactions required accepting one of three constraints: a global leader (single point of failure, WAN latency penalty), limited to single-partition scope (LWT), or degraded performance under failure (prior leaderless protocols). The Accord paper’s central claim is that these constraints are not fundamental. They are artifacts of specific protocol design choices.
By combining flexible fast-path electorates with a timestamp reorder buffer on top of a leaderless execution model, Accord achieves:
- True cross-partition atomicity across multiple tables and partition keys
- Strict serializable isolation with formally proven correctness
- Single round-trip latency under normal operating conditions
- Failure‑tolerant steady‑state performance, avoiding the systematic degradation seen in earlier leaderless protocols
- No elected leaders, consistent with Cassandra’s existing operational model
This opens workloads that were previously natively incompatible with Cassandra: financial transaction processing, distributed inventory reservation, multi-step workflow coordination, and any application where ‘commit these changes together or not at all’ is a strict correctness requirement.
Looking aheadThough the Accord protocol is still maturing, the fundamental capability is finally here. We now have general-purpose, leaderless, multi-partition ACID transactions natively in Apache Cassandra.
The historically difficult problem of achieving strict serializable isolation in a geo-distributed system without compromising fault tolerance now has a proven, working answer.
For Cassandra users, this raises an exciting question: which workloads have you been routing to relational databases specifically because they needed transactional guarantees? It is time to reevaluate.
Stay tuned for a preview release of Cassandra 6 on the Instaclustr Platform and get ready to experience the power of ACID transactions on Cassandra for yourself!
The post Apache Cassandra® 6 Accord transactions: What you need to know appeared first on Instaclustr.