“Key-Value” is Misleading. Access Patterns are Key.
Access patterns determine your data model, your I/O costs, and which database is the best fit for your workload I’ve been part of enough key-value database evaluations to recognize the pattern. When the conversation starts with benchmarks, the evaluation inevitably ends with regret. The benchmark answers “which is faster?” It doesn’t tell you which model fits how your application actually reads and writes data – and that’s what matters. Every data modeling decision should begin with access patterns, regardless of the technology on the table. What does your application read? At what granularity? What does it write? How often? How large? Let those answers drive the data model, then pick the technology. Flip that order and you pay for it. A fast database like ScyllaDB amplifies schema decisions: good models perform well, bad ones break faster. Edgar Codd invented First Normal Form (1NF) in 1970 to save disk space, but a terabyte of NVMe now costs about the same as lunch. So, even though the rule outlasted the constraint that justified it, we are still teaching it. That’s partly why so many teams expect to normalize their data with ScyllaDB the way they would a relational schema. But if they don’t get the order right (access patterns> data model> technology), they won’t get the performance that the engine was built to deliver. A lot of the confusion comes down to terminology. “Key-value” is one of the most overloaded labels in the database industry. We use it to describe both: A system that maps a string to an opaque blob A system that maps a partition key plus a clustering key to typed, individually addressable columns with partial-update semantics. Lumping these together hides the architectural decisions that determine your I/O patterns and your infrastructure costs. “Key-value” is often used to describe three very different data models. They differ in capability and in how deeply you can address your data. Pick the wrong one for your access patterns and you pay for it in I/O overhead, infrastructure cost, and write throughput. ScyllaDB can operate across multiple levels of this hierarchy. The one you select influences your I/O patterns, your update costs, and your infrastructure spend. Key-Value vs Wide-Column: Four Levels of Access Pattern Depth Instead of looking at feature lists, it’s better to compare these models by access pattern depth: at what level can you address, read, and write your data? Level 1: Key level. One key maps to one value. The value is opaque. The database has no knowledge of what is inside it. You get it and you put it in. This is K-V, the model behind most caching layers and session stores. Redis is the canonical example. The ceiling is the value boundary – you can replace it, you cannot address inside it. Level 2: Row level. A primary key maps to a set of named bins. Each bin holds a schemaless value. You can address individual bins by name, you can project specific bins in a read, and you can also update bins independently. This is K-V Wide Table, one key, multiple named fields, no schema enforcement on values. This model adds meaningful structure over K-V without requiring upfront schema design. Aerospike is the canonical example here. The ceiling is the bin boundary – you can update a bin, but you cannot address inside one. Level 3: Column level. A partition key combined with a clustering key addresses a row. Each column in that row is individually typed. The database understands the type of every value it stores. This is KKV Wide Table, the two-key model is what puts the second K in KKV. Typed columns enable the database to make smarter decisions about storage layout, compression, and update semantics. Cassandra reaches this level. The ceiling is the column boundary – typed and addressable, but complex values inside a column must be declared frozen. In other words, the entire value is serialized as a single blob that the engine cannot see into. Level 4: Within-column level. This is a key differentiator for KKV Wide Table. The engine starts working at a granularity that the other models can’t reach. A KKV Wide Table column can hold a collection: a map, a set, a list, a user-defined type, or nested combinations of these. Whether the database can address what’s inside that collection determines your actual access pattern depth. A frozen collection is serialized as a single blob. The engine stores it, retrieves it, and replaces it, but cannot see inside it. An unfrozen collection is stored element by element. Each entry is individually addressable. That distinction is the central architectural argument at this level. Cassandra touches this level but can’t reliably live here. Unfrozen collections exist in Cassandra, but tombstone accumulation makes them a liability in production. In ScyllaDB, Level 4 becomes practical. With an unfrozen collection, ScyllaDB stores each element individually. Whether you add an entry to a map, append to a list, or remove an element from a set – no read is required first and the database operates at element level. With a frozen collection, ScyllaDB serializes the entire value as a single cell. The engine can’t address inside it. For whole-value access patterns, that’s not a limitation, it’s an optimization. With this: There’s no per-element metadata. Reads pull one contiguous cell. Writes replace one contiguous cell. ScyllaDB’s UDT performance benchmarks show frozen collections outperforming unfrozen ones by up to 228% on write throughput and 162% on read throughput for 50-field UDTs. For the right access pattern, frozen is the faster choice. Don’t focus on frozen vs unfrozen; look at access pattern first and the right tool should follow from there. Figure: Frozen vs. unfrozen UDT, 50-field profile accessed as a whole. Frozen write throughput 228% higher, read throughput 162% higher. One cell write vs. 50-element writes plus 50 metadata records. The problem isn’t that it’s frozen; the access pattern mismatch is what’s causing the performance difference. An engineer who needs element-level updates and chooses frozen UDTs has, for those columns, given back Level 4 access. The operation degrades to read-modify-write: read the entire value, apply the change in memory, write it back as a whole. That is the same pattern a K-V Wide Table bin requires. The technology supports Level 4, but the schema choice has opted out of it. Figure: Four levels of access pattern depth. K-V gives key-level access. K-V Wide Table adds bin projection. KKV Wide Table adds typed columns and, with unfrozen collections, element-level access. Frozen collections are a performance optimization for whole-value access patterns, not a fallback. The opposite mistake is also a problem. An engineer who uses large unfrozen collections for values they always access as a whole pays per-element TTL and timestamp metadata on every element in the collection – at compaction time, continuously. A map with 10K entries carries 10K individual metadata records. That overhead snowballs over time. Choose frozen collections when you access the value as a whole. Choose small unfrozen collections when you need element-level updates. Large unfrozen collections are their own design smell, regardless of access pattern. Figure: Read granularity, requesting one field from a 30-field record. K-V reads the entire blob. K-V Wide Table reads the entire record and returns one bin. KKV Wide Table reads only the requested column, leaving 29 columns untouched on disk. How Access Pattern Depth Meets Memory: Three Scenarios The relationship between your dataset size and available memory determines which architecture is working with its strengths and which one is working against them. Figure: Data model behavior across memory scenarios, relative I/O and cost overhead for K-V, K-V Wide Table, and KKV Wide Table as dataset size moves from fits-in-RAM through keys-only-in-RAM to neither-fits-in-RAM. Scenario 1: Everything Fits in Memory When the entire dataset lives in RAM, a memory-resident hash index is fast. Point lookups are a hash computation and a pointer dereference. This is where K-V and K-V Wide Table architectures shine for read latency. But “what’s fast?” and “what’s cost-effective?” are different questions. If your dataset is 2 TB, you are paying for 2 TB of RAM across your cluster. An architecture designed around SSDs with efficient memory-resident metadata can deliver reads in the low hundreds of microseconds while your data lives on storage that costs a fraction of RAM per gigabyte. Although the access pattern performance difference on reads may be negligible, the infrastructure cost difference is not. Figure: Storage cost at scale, all-RAM vs NVMe SSD across dataset sizes from 0.5 TB to 32 TB. DDR5 ECC at ~$8/GB vs NVMe SSD at ~$0.10/GB. The gap compounds quickly past 1 TB. This is also the scenario where honesty matters. If your access pattern is truly “put blob, get blob” on ephemeral data with simple lookups, a K-V store is the right tool. The operational simplicity is a genuine advantage. There are fewer moving parts and fewer things to misconfigure. If your values are small and your queries never need to reach inside them, a K-V store will serve you well and be easy to operate. Scenario 2: Keys Fit in Memory, Values Do Not This is what K-V Wide Table architectures market as their sweet spot. Here, you have a primary index in memory, records on SSD, and fast key lookups that pull values from disk. For simple reads, bin projection works well here. Request three specific bins, get three bins back. You are not forced to read the entire record on every read. The problem surfaces at Level 4. Assume one bin holds a serialized map of user preferences and you need to update a single entry in that map. In this case, the system must: Read the entire bin from disk Deserialize the collection structure in memory Apply the modification Serialize the updated structure Write the entire bin back. That is a read-modify-write cycle on every collection update, regardless of how small the change is. The K-V Wide Table model has no path to Level 4 access. The bin is the floor. A KKV Wide Table model with unfrozen collections handles the same update without a read. The new map entry goes directly to the write-ahead log and the in-memory table. There’s no deserialization or full-bin read. The merge with existing data happens during compaction, as a background operation that does not block the write path. Compression: typed columns vs. schemaless bins. K-V Wide Table bins are schemaless. Within an SSTable block, different records interleave bin data without type information. That limits what a compressor can do across records. A KKV Wide Table stores typed column data within the same partition contiguously in SSTable blocks. For example, ScyllaDB writes all values for the event_ts column across rows in a partition together. Because those values share the same type, a dictionary-based compressor like zstd has much more to work with. This is not columnar storage in the analytics sense. ScyllaDB is an LSM-tree row-based engine at the partition level, not Parquet. The compression benefit comes from typed column homogeneity within SSTable blocks rather than a columnar storage layout. Frozen vs. unfrozen compression tradeoffs. Frozen UDTs compress well for a specific reason. A frozen UDT is a single cell with a consistent serialized layout. The same 50-field structure appears as the same byte sequence across records, which dictionary compression handles efficiently. Unfrozen collections are a different story. Each element carries its own TTL and timestamp metadata. ScyllaDB groups column values within SSTable blocks, which helps the element values themselves compress, but the metadata overhead scales with collection cardinality. For small unfrozen collections, it’s negligible. For large unfrozen collections, it can negate a meaningful portion of the compression gain. The compression advantage of typed columns applies most cleanly to simple typed columns and small unfrozen collections. Figure: K-V Wide Table SSTable blocks mix types across schemaless bins, limiting compression. KKV Wide Table SSTable blocks group typed column data within partitions. Frozen UDTs compress well as consistent serialized blobs. Unfrozen collections carry per-element metadata that can offset compression gains at high cardinality. Data locality. In a shard-per-core architecture (e.g., ScyllaDB’s), all columns within a partition live on the same CPU core. A read that touches three columns in a single partition involves zero cross-core coordination. This avoids locking and message passing between threads. This data locality might not be significant at low throughput. However, it matters a lot at hundreds of thousands of operations per second. Scenario 3: Neither Keys Nor Values Fit in Memory This is where memory-dependent index architectures hit a wall. If your architecture puts the primary index in RAM and your keyspace outgrows available memory, you are either: Adding nodes to hold the index, or Paging index entries to disk, which adds a disk read in front of every data read An architecture built for disk-resident data from the start does not have this problem. ScyllaDB (and to a degree Cassandra) uses Bloom filters to determine probabilistically whether a partition exists in a given SSTable without loading a full index into memory. Partition index summaries provide efficient lookup with a small, fixed memory footprint regardless of key count. And compaction strategies manage on-disk data organization to keep read amplification bounded. This is all strategic design for an architecture that assumes data will not fit in memory. Don’t just think about whether a system can handle disk-resident data; consider whether it was designed for it. The Update Path: Where Access Depth Becomes I/O Pattern Most evaluations obsess over reads. However, the update path is where access pattern depth differences tend to surface at scale. Consider updating a single element in a collection, one value in a map with 500 entries. In a K-V Wide Table architecture, collection updates require a full read-modify-write cycle: read the entire bin from disk, deserialize the collection structure in memory, apply the modification, serialize the updated structure, then write the entire bin back. Under concurrent updates to the same record, this becomes a serialization bottleneck. Under write-heavy workloads, write throughput is gated by read throughput. Figure: K-V Wide Table collection update path. A single-element update requires reading, deserializing, modifying, serializing, and rewriting the entire bin. In a KKV Wide Table architecture with unfrozen collections, the same update works like this: write the new value for that map entry directly to the memtable. This avoids the read, the deserialization, and the serialization. The entry lands in the write-ahead log and the in-memory table. The merge with existing data happens during compaction, as a background operation. Figure: KKV Wide Table update path with unfrozen collection. The write goes directly to WAL and memtable. No read required. Compaction merges data in the background. This is where access pattern honesty matters most. The append-only unfrozen update is fast for element-level changes to bounded collections. When your access pattern is whole-value, you write the entire UDT atomically and read it back as a unit. Here, frozen is the right choice. There is no read penalty and no per-element overhead. The ScyllaDB UDT benchmark shows 228% write throughput improvement for frozen UDTs in exactly this scenario: a 50-field UDT accessed and written as a whole. The frozen cell is one write operation. The equivalent unfrozen collection is 50 element writes plus 50 metadata records. The difference at 1,000 operations per second is negligible. But at 100,000 operations per second, with large collections and concurrent writes, the wrong frozen/unfrozen choice becomes the bottleneck in either direction. Figure: Write latency vs. collection size for a single-entry update. K-V Wide Table read-modify-write latency grows linearly with the number of entries in the collection. KKV Wide Table unfrozen update latency stays flat, the write goes to the WAL and memtable regardless of collection size. Figure: Single-element update latency vs. collection size, illustrating how wasted I/O grows with collection size for read-modify-write architectures, while direct-write latency remains constant. Choosing Honestly: Key-Value, K-V Wide Table, or KKV Wide Table These three models exist because different access patterns have different requirements. K-V is the right model for caching, session storage, and any workload where the access pattern is “put blob, get blob.” Its simplicity is a real advantage because you end up with fewer moving parts and fewer things to misconfigure. If your values are small and your queries never need to reach inside them, a K-V store will serve you well and be easy to operate. K-V Wide Table adds meaningful capability for workloads that need to address individual fields without upfront schema design. It’s a pragmatic choice for moderate-scale applications where operational simplicity matters, bin-level read projection is sufficient, and collection updates are infrequent or small. It sits at Level 2–3 access depth and does that job well. KKV Wide Table earns its complexity when your access patterns require Level 3 or 4 depth: frequent updates to large collections, datasets that will outgrow available memory, workloads where typed column compression meaningfully reduces storage cost, or write-heavy workloads that cannot afford read-modify-write on every collection update. The richer data model requires upfront schema design and demands that you get frozen versus unfrozen semantics right. Don’t rely on your intuition; choose strategically, based on your actual access pattern: Use frozen when you always read or write the whole value. A 50-field profile UDT that you always write and read back as a unit is a frozen candidate. The performance data supports it. Use small unfrozen collections when you need element-level updates. Append to a list. Update one key in a map. This is what unfrozen exists for. Use large unfrozen collections only if your access pattern is genuinely element-granular and your collection cardinality stays bounded. Per-element metadata overhead compounds. It affects both compaction cost and compression ratios. Figure: Decision flow for choosing a data model based on required access pattern depth. Don’t focus on which model is “best.” Think about which model best matches the access patterns your workload will experience in production. Start with the access patterns. Let the data model follow. Then pick the technology that supports that model at the depth you need. Get that order right and the database works with you. Get it wrong, and you spend your time working around it. *** If your use case requires low latencies at scale, and you’re frustrated with fighting your current database, ScyllaDB Cloud might be worth a look. Find me on LinkedIn – I’m always happy to talk data models.What’s new in Cassandra® 6? A roundup of features for users and operators
Apache Cassandra 6 is shaping up to be significant release as some of its biggest changes affect the core behavior of the database:
- How metadata is coordinated
- How Cassandra is moving toward broader transaction support via Accord protocol
- How repair is scheduled, and
- How operators inspect and manage the system.
Let’s focus on a few changes that stand out:
- Accord transactions
- Transactional Cluster Metadata (TCM)
- Automated repair
- Constraints framework
- Zstandard dictionary compression, and
- Cursor-based compaction improvements.
Taken together, these changes point to a version of Cassandra that is becoming more structured internally and easier to operate.
Accord transactions for ACID guaranteesAccord is a general-purpose transaction framework that uses a leaderless consensus protocol to have highly available transactions and is used in Cassandra 6. The goal is broader transactional support across multiple keys, with strict serializable isolation and without a central bottleneck.
This matters because multi-key consistency is hard to handle cleanly in application code. Once a workflow spans more than one partition, the application often ends up doing coordination work that really belongs in the database.
Accord enables ACID behavior on transactional tables, which lets developers coordinate multi-step, multi-partition changes with stronger correctness guarantees, reducing the amount of custom consistency logic they have to build in the application.
Including multi-partition, conditional work has historically been difficult to express cleanly in Cassandra. For operators, it signals that transactions are becoming a more important part of the platform and something to watch closely as Cassandra continues to mature.
Read our deep dive on Accord transactions here.
Transactional Cluster Metadata (TCM)TCM changes how Cassandra coordinates cluster-wide metadata. TCM introduces a Cluster Metadata Service that keeps an ordered log of metadata changes and makes those changes visible in a more consistent, coordinated way. That includes things like membership, token ownership, and schema state.
This was introduced because Cassandra’s older model depended heavily on eventual consistency and the Gossip Protocol to spread metadata changes across the cluster. TCM is meant to make those changes more explicit, more ordered, and easier to reason about.
For operators, this is one of the biggest architectural shifts in Cassandra 6. It does not mean Gossip Protocol disappears everywhere, but it does mean Cassandra is moving away from Gossip as the primary way cluster membership, schema, and data placement changes are coordinated and made visible. For users, the result should be more predictable schema and topology operations.
Automated repair orchestrationAutomated repair brings repair orchestration into Cassandra itself. Repair is the mechanism Cassandra uses to reconcile replicas over time so they stay consistent, and the goal is to make repair scheduling and coordination a built-in database service rather than something operators must orchestrate with external tools.
This was introduced because repair is essential, but historically it has placed a real burden on operators. Teams have had to build their own schedules, decide how to run repair safely, and keep it consistent over time.
For operators, automated repair could be one of the most practical changes in the release. It reduces manual coordination, supports full and incremental repair, adds useful safeguards, and makes repair easier to treat as a normal part of cluster maintenance—just like it has happened with major compactions with Unified Compaction Strategy in Cassandra 5. For users, it means a better chance that maintenance happens regularly and with fewer gaps.
At NetApp Instaclustr, our expert TechOps team already orchestrates laborious tasks like repair for our Apache Cassandra customers, ensuring their clusters stay online. Our platform handles the complexity so you can get up and running fast.
Constraints framework for data validationThe constraints framework lets Cassandra enforce more targeted
validation rules as part of the table schema. It enforces them at
write time instead of relying entirely on application code to
reject invalid data. Some examples of constraints include: Scalar
(>, <, >=,
<=), LENGTH(),
OCTET_LENGTH(), NOT NULL,
JSON(), REGEXP().
A simple example of an in-line constraint:
CREATE TABLE users ( username text PRIMARY KEY, age int CHECK age >= 0 and age < 120 );
This was introduced because Cassandra already had some broad limits, but they were not very granular or expressive. The constraints framework gives teams a more precise way to protect the shape of their data and guard against bad writes from misconfigured clients.
Operators gain more control and better predictability around what gets written into the cluster. For developers, it means some validation can move closer to the schema instead of being duplicated across every service.
Zstd dictionary compressionZstandard, or Zstd, dictionary compression extends SSTable compression by letting Cassandra use trained Zstd dictionaries for repetitive data patterns. Instead of relying only on generic compression, it can use a dictionary built from representative data to improve results.
This was introduced to primarily improve compression ratio while keeping the design manageable in production. It is recommended to use minimal dictionaries and only adopt new ones when they’re noticeably better.
This makes compression more configurable and more visible for operators. It adds training workflows, dictionary lifecycle management, and observability into dictionary size and cached dictionary memory usage. For users, the main benefit is better storage efficiency, because data with strong repeating patterns can compress better, leading to potential performance gains.
You can read more about the constraints framework and Zstd dictionary compression in our article detailing recent CEPs.
Cursor-based compaction improvementsCursor-based compaction is a new low-allocation compaction path in Cassandra 6 that processes SSTable data in a more streaming-oriented way, using reusable cursor-like readers and writers instead of constantly creating large numbers of temporary in-memory objects. In practical terms, it is designed to reduce heap allocation and garbage collection overhead during compaction.
Compaction is one of Cassandra’s most important background processes, and when it becomes cheaper and more efficient, nodes can spend less time fighting garbage collection and less heap on temporary work. For operators, that can mean smoother performance and better efficiency on large datasets. For developers, it is mostly an under-the-hood improvement, but one that can help clusters behave more consistently under load.
Conclusion: A more manageable databaseWhat stands out about Cassandra 6 is that many of its biggest changes are not isolated features. They reshape core parts of how Cassandra behaves and how it is operated.
Accord introduces a broader transactional model. TCM changes how metadata is coordinated. Automated repair brings a core maintenance task into the database. Constraints make schemas more defensive. Zstd dictionary compression improves how Cassandra approaches storage efficiency, and cursor-based compaction makes the system easier to run.
Taken together, Cassandra 6 focused on making the database more deliberate internally and more manageable operationally.
Stay tuned for a preview release of Cassandra 6 on the Instaclustr Platform!
Ready to get started?If you want to experience the power of Apache Cassandra without the operational headache, we have you covered. If you are an existing customer and would like to try Cassandra 5 before 6.0 is released, you can spin up a cluster today. If you don’t have an account yet, sign up for a free trial and experience the latest generation of Apache Cassandra on the Instaclustr Managed Platform.
Read all our technical documentation here.
Discover the 10 rules you need to know when managing Apache Cassandra.
If you are using a relational database and are interested in vector search, check out this blog on support for pgvector, which is available as an add-on for Instaclustr for PostgreSQL services.
The post What’s new in Cassandra® 6? A roundup of features for users and operators appeared first on Instaclustr.
Introducing ScyllaDB Agent Skills
A new set of best practices and usage patterns for AI agents working with ScyllaDB Cloud clusters Today we’re releasing a curated set of best practices and usage patterns for AI agents working with ScyllaDB Cloud clusters. If you just want to grab the skills and go build, here you go:npx skills add
scylladb/agent-skills If you want to understand why these
skills are useful and what problems they solve, read on. ** You may
have noticed a short warning at the bottom of many AI applications:
“AI can make mistakes. Double-check the output.” Or something along
those lines. This is also true when it comes to working with
databases. We’ve seen agents reach for the wrong driver, fail to
connect to ScyllaDB Cloud, generate schemas that fit a relational
database but not NoSQL, and produce queries that technically
execute but perform poorly at scale.
For more on agents getting things wrong, see this video…These problems can all be minimized by using agent skills. What are Agent Skills? Agent Skills are markdown files that give your AI agent best practices and domain-specific knowledge. They follow the standard format and help your agent reduce hallucinations. They are also essential to give the agent up-to-date information. Since LLM training data doesn’t include real-time updates by default, these skills help bridge that gap. A specialized skill helps make the agent’s behavior more consistent and predictable. Available ScyllaDB Skills The ScyllaDB Agent Skills cover three distinct areas: scylladb-cloud-setup: Guides agents through the full connection flow: retrieving cluster credentials from the Cloud Console, selecting the correct shard-aware driver for the user’s language, configuring DC-aware load balancing with the right datacenter name, and verifying the connection. scylladb-data-modeling: Encodes query-first design methodology, partition key and clustering column patterns, anti-patterns (
ALLOW FILTERING, hot partitions,
unbounded partition growth), time-series bucketing, and guidance on
when to use secondary indexes versus denormalized tables. The goal
is to create schemas and queries that hold up under production load
(just returning correct results in development is not sufficient).
scylladb-vector-search: Covers vector index
creation, ANN queries, filtering strategies (global vs. local
indexes and when each applies), quantization, and driver
configuration. You can install all three at once, or pick only what
your project needs. Each skill loads on demand when a relevant task
comes up, they don’t interfere with each other. Let’s look at the
main areas where AI systems get ScyllaDB wrong. Shard-aware drivers
ScyllaDB has its own family of shard-aware drivers for Python,
Java, Go, Rust, C++, and
more. Agents sometimes decide to download the wrong driver.
While it may appear to work, unofficial drivers bypass ScyllaDB’s
shard-aware routing and degrade performance. In other cases, agents
may hallucinate non-existent drivers. Besides making it impossible
to connect to the ScyllaDB cluster, this also introduces a security
risk: you may install a fake package designed to trick the AI (this
is called slopsquatting).
Connecting to ScyllaDB Cloud Connecting to ScyllaDB Cloud requires
DC-aware load balancing configured with the exact datacenter name
(e.g. AWS_US_EAST_1) from your cluster. If your agent
gets that wrong, the driver will fail to connect. Data modeling
ScyllaDB’s data model requires you to have a query-first approach.
You design tables around your access patterns, not your entities.
Agents tend to be trained more heavily on SQL and relational
databases than on NoSQL systems such as ScyllaDB. That means they
are more likely to generate an entity-first schema, then use
ALLOW FILTERING to force queries. This can result in
suboptimal performance when using ScyllaDB. Vector Search Vector
search on ScyllaDB is powerful but specific. There are global and
local vector indexes with different filtering semantics and
performance considerations. There’s an ANN OF
operator, and quantization options that matter at scale. Choosing
the wrong index type for a filtered query can hurt performance.
Getting started Install all skills using the Vercel Skills CLI
(requires Node.js): npx skills add
scylladb/agent-skills Or install a specific skill: npx
skills add scylladb/agent-skills --skill
scylladb-data-modeling You can also install manually by
cloning the GitHub repository
and copying the skill folders into your agent’s skills directory:
Agent Skills directory Claude Code ~/.claude/skills/
Cursor ~/.cursor/skills/ OpenAI Codex
~/.codex/skills/ OpenCode
~/.config/opencode/skills/ The skills follow the
Agent Skills open standard
and work with any agent that supports it, including Claude Code,
Cursor, Codex, and GitHub Copilot. Native Claude Code and Cursor
plugins are coming soon. We recommend installing all three skills
in any project that uses ScyllaDB. You get full coverage of the
areas where agents most commonly go wrong, with no overhead when
those skills aren’t relevant to the current task. As of now, the
skills cover the CQL interface; Alternator (DynamoDB
API) is not yet included. Feedback is welcome. Create an
issue on GitHub! New Research on Cloud Database Trends: Technical Risks, Cost Pressures, and Migration Triggers
Good enough until it isn’t: the database complacency trap A database is like a water heater. When all is well, it just does its job in the background. You don’t fantasize about replacing it or envy the one your friend just got. Really, you don’t even think about it — until something goes awry. But new research reveals a key difference: With databases, the problems don’t blindside you. Some 38% of technology leaders worry that their current database won’t meet their needs in the near future. However, they aren’t acting on it. They wait until some compelling event (e.g., a production incident, usage spike, budget cut, or cloud strategy pivot) pushes the database to the top of the priority list. That’s just one of the interesting findings from the Futurum Group’s latest research study, commissioned by ScyllaDB, which explores the latest trends in cloud database cost pressures, performance risks, and migration motivations. Respondents include technical decision-makers who shape cloud database strategy as well as team members directly responsible for the database. Guy Currier, Futurum Group Chief Analyst, summarizes the findings this way: “Those technology leaders expressed complacency with their cloud databases at the same time as concern and caution. This combination suggests that although they would prefer not to take immediate action, they know they will have to move when compelling events force a change.” The full report, Is Cloud Database Complacency Affecting Your Business Objectives?, is available now. Here are some key takeaways. Comfort masks concern A third of the leaders surveyed report satisfaction with the performance of their current cloud databases. Yet, 38% worry that their database isn’t fit to support future AI/ML workloads and the resulting explosion in data volume. The prime characteristic of these workloads is their unpredictability; past database performance is a poor indicator of future behavior as the technology evolves and as volumes increase. “Organizations experience what we might call ‘good enough for now’ syndrome,” Currier noted. “Their databases handle today’s workloads adequately, but leaders doubt these solutions will scale to meet tomorrow’s demands.” Cloud database costs are also a major concern. The research found that 35% of leaders want to improve performance but feel constrained by budget. Another 35% are concerned about rising costs despite being satisfied with performance. The top cloud database cost drivers include: Unexpected loads (40%) New or strict technical requirements (38%) Networking bandwidth growth (38%) Storage growth (38%) The 10% cost-savings tipping point Nearly 40% of organizations are meeting their cloud database budgets, but just as many consider their predictable costs too high. As Currier explains, “Organizations might tolerate high costs when they can plan for them. However, this tolerance creates an opening for solutions that can deliver similar predictability at lower price points.” That opening is quite specific: A 10% cost reduction is all it would take for many tech leaders to consider migrating their cloud database. Why so low? Likely, the answer lies in scale. When database costs climb into the millions annually – which is not unusual for platforms like DynamoDB, according to the research – even a modest 10% translates to substantial savings. Event-driven database migration triggers Still, technical leaders don’t proactively seek alternatives that are more cost-efficient or better prepared for the technical needs of current/future AI/ML workloads. They wait for trigger events that force them into a crisis-driven decision. Leadership changes (36%) and major production incidents (32%) emerged as the primary catalysts. Other significant triggers include: Load spikes (32%) Cost reductions of 10% or more (31%) Maintenance burdens (31%) Performance issues (29%) Volatile costs (28%) Most of these triggers highlight the reactive nature of these migrations, rather than proactive, strategic changes. Note that volatile database costs drive 28% of switching decisions, suggesting that sheer unpredictability can be nearly as disruptive as high costs. “Database decisions are rarely made in a vacuum,” the research report notes. “Even when teams identify performance or cost inefficiencies, acting on them competes with feature delivery, roadmap commitments, limited operational bandwidth, and against their familiar tech stack.” Early warning signs While water heater issues tend to surface without warning, database issues can usually be anticipated. There are several early warning signs that a database is starting to become a constraint: Cost is growing faster than throughput. When database spend rises faster than the throughput it’s handling, the system may not be as scalable as it appears. Teams patch their way forward (e.g., with caches) to sustain performance. But the cost per query keeps climbing. Rising tail latency. When P95 or P99 latency starts to climb during peak periods or background operations, it indicates the system is nearing its breaking point. These changes might be dismissed if they don’t immediately violate SLAs, but they’re canaries in the coal mine. Increasing operational friction. More manual tuning, more frequent capacity adjustments, more time spent managing the database to maintain the same level of performance…all these signal diminishing returns from the current architecture. Disproportionate complexity for organic growth. When routine scaling or new workload support requires outsized engineering effort, it’s a sign that the database has become a constraint rather than an enabler. From reactive to strategic Recognizing these signals is one thing, but actually acting on them before a crisis forces your hand is another. Some due diligence now will help you stay ahead of it. Get a general sense of what options are available for your use cases Define vendor-neutral evaluation criteria Stress test your existing database to understand its breaking point – before production traffic exposes it for you Set clear decision triggers (e.g., specific performance thresholds, cost targets, and capability gaps) Map your database capabilities against your 12–24 month strategic roadmap, not just your current workloads As Currier concludes: “Your database might be ‘good enough for now,’ but if that isn’t aligned with where your business needs to go, complacency is already costing you.” Download the full report here; you’ll also get access to an expert panel discussing the research findings.Native Vector Search for the DynamoDB API
Developers building on the DynamoDB API can run vector similarity search without the complexity of bolted-on “Zero ETL” For users in the DynamoDB environment, implementing vector search has been overly complicated. Amazon’s “Zero ETL” forces a dual-service approach (managing both DynamoDB and OpenSearch) and requires using two separate APIs just for Vector Semantic Search queries. ScyllaDB believes this is unnecessary complexity. We’re eliminating the heavy lifting by integrating vector search capabilities into Alternator, our DynamoDB-compatible API. This gives DynamoDB users high-performance similarity search within their familiar API, without the need for extra clusters or constant API context-switching. Architectural Differences: Unified vs. Fragmented Amazon’s approach to vector search exports data to S3 and then syncs it to OpenSearch via DynamoDB Streams. While “Zero ETL” sounds hands-off, you’re still responsible for the cost and complexity of a separate search cluster. The AWS cost is composed of DynamoDB, DynamoDB Streams, S3, OpenSearch, and the OSIS pipeline. Each of these elements’ pricing is complex on its own. Amazon Vector Search (using Open Search) for DynamoDB architecture. Source: AWS Blog. ScyllaDB Alternator simplifies this by integrating the vector store engine directly into the backend. Simple module: The ScyllaDB database hosts both the data and the vector index. Native API: You perform vector searches using DynamoDB Query operations. Performance: 10 Million Vectors on a Budget In our latest benchmark using a 10-million-vector dataset (768-dimensional Cohere embeddings), a modest five-node ScyllaDB cluster delivered over 12K QPS with single-digit millisecond latency.Setup: 10M vectors; 768 dimensions; K: 10 (retrieve top K values); No QuantizationResults Recall: ~90% Throughput: 12,763 QPS P99 Latency: 7.8 ms Cost: $1,643 / Month for 1Y full up front Estimating the AWS cost for this case is not trivial. The write-path includes DynamoDB (storage+ops), DynamoDB streams, S3 (storage, API), OpenSearch (data nodes, master nodes, EBS), and the OSIS pipeline. To read more on the pricing of Amazon Zero ETL, see Implementing search on Amazon DynamoDB data using zero-ETL integration with Amazon OpenSearch service. Code Examples Note: The exact JSON format might change in the next few months. 1. Enabling a Vector Index You can enable vector indexing during
CreateTable or via
UpdateTable. Note the new
VectorSecondaryIndexUpdates parameter. // Adding
a vector index to an existing table { "TableName":
"ProductCatalog", "AttributeDefinitions": [ {"AttributeName":
"ProductEmbedding", "AttributeType": "V"} ],
"VectorSecondaryIndexUpdates": [ { "Create": { "IndexName":
"VectorIdx", "VectorAttribute": { "AttributeName":
"ProductEmbedding", "Dimensions": 768 }, "IndexOptions": {
"SimilarityFunction": "COSINE", "M": 32, "ef_construction": 256 } }
} ] } Pro Tip: You will get the best
results with ScyllaDB’s optimized “V” (Vector)
type. Although you can use standard DynamoDB Lists, the
“V” type will store data as a tight array of 32-bit floats – and
that saves storage while boosting performance. 2. Performing a
Vector Search To search, use the Query operation with the ScyllaDB
VectorSearch parameter. { "TableName":
"ProductCatalog", "IndexName": "VectorIdx", "VectorSearch": {
"QueryVector": [0.12, 0.05, ..., 0.88], "Oversampling": 1.5 },
"Limit": 10, "ReturnVectorSearchSimilarity": "SIMILARITY" }
Example Use Cases Semantic Product Search Instead of relying on
exact keyword matches, users can find products based on intent. For
example, a search for “waterproof rugged hiking gear” can surface
relevant items even if those exact words aren’t in the title. RAG
(Retrieval-Augmented Generation) For knowledge bases, precision is
non-negotiable. Using the High Recall
configuration, ScyllaDB delivers 99.2% recall. That way, the LLM
receives the most accurate context possible for generating
responses. Semantic Deduplication At the Max
Throughput end of the spectrum, ScyllaDB can quickly scan
millions of incoming vectors to find near-duplicates. That prevents
redundant data from cluttering your system – reducing costs and
improving performance. Conclusion With ScyllaDB, DynamoDB users now
have a “fast track” to AI-ready infrastructure. By unifying storage
and vector search into a single API, you eliminate the operational
tax of “Zero ETL” without sacrificing the sub-millisecond
performance ScyllaDB is known for. ScyllaDB Vector Search Benchmark: 10M Vectors on a Compact Cluster
Even a small, compact setup achieved up to 12,840 QPS at k=10 with a serial P99 latency of 5.5 ms Our 1-billion-vector benchmark demonstrated that ScyllaDB Vector Search can sustain 252,000 QPS with 2 ms P99 latency across a large-scale deployment. But not every workload starts at a billion vectors. Many production use cases (e.g., product catalogs, knowledge bases for RAG, and semantic caches) live comfortably in the 10–100 million range. This post presents a smaller benchmark: a 10-million-vector dataset of 768-dimensional Cohere embeddings on a compact five-node cluster. It used three modest storage nodes and two memory-optimized search nodes, all running on AWS Graviton. We explore four index configurations that span the recall-throughput spectrum, from near-perfect recall to maximum throughput. The results show that even this small setup can deliver up to 12,840 QPS at k=10 with a serial P99 latency of 5.5 ms — without any quantization. Architecture at a Glance First, some background. ScyllaDB Vector Search separates storage and indexing responsibilities while keeping the system unified from the user’s perspective. The ScyllaDB storage nodes hold both the structured attributes and the vector embeddings in the same distributed table. Meanwhile, a dedicated Vector Store service — implemented in Rust and powered by the USearch engine — consumes updates from ScyllaDB via CDC and builds approximate nearest neighbor (ANN) indexes in memory. Queries are issued through standard CQL:SELECT … ORDER BY vector_column ANN OF
? LIMIT k; The queries are internally routed to the Vector
Store service, which performs the HNSW similarity search and
returns the candidate rows. This design allows each layer to scale
independently, optimizing for its own workload characteristics and
eliminating resource interference. For a detailed architectural
deep-dive, see the
1-billion-vector benchmark and the technical blog
Building a Low-Latency Vector Search Engine for ScyllaDB.
Benchmark Setup Here’s a look at the dataset and hardware used for
the benchmark. Dataset Property
Value Vectors 10,000,000
Dimensions 768 Embedding model
Cohere Similarity function COSINE
Quantization None (f32) Hardware
Role Instance
vCPUs RAM Count
Storage nodes i8g.large 2 16 GB 3 Search
nodes r7g.2xlarge 8 64 GB 2 With 768-dimensional f32
vectors and M values up to 64, the in-memory index size can be
estimated as: Memory ≈ N × (D × 4 + M × 16) × 1.2 For the largest
configuration (M=64): 10M × (768 × 4 + 64 × 16) × 1.2 ≈ 49
GB, which fits comfortably in the 64 GB of a single
r7g.2xlarge search node. No quantization is needed at this
scale. Experiments We tested four HNSW index
configurations, progressively lowering graph connectivity (M) and
search effort (ef_search) to shift the balance from
recall toward throughput. Experiment
M ef_construction
ef_search k tested
#1 (high quality) 64 384 192 100, 10
#2 (balanced) 32 256 128 100, 10
#3 (high throughput) 24 256 64 100, 10
#4 (max throughput) 20 256 48 10 The three HNSW
parameters control different aspects of the index:
M
(maximum_node_connections): Maximum edges per node in
the HNSW graph. Higher values create a richer, better-connected
graph that improves recall at the cost of more memory and slower
inserts and queries. ef_construction
(construction_beam_width): Controls how thoroughly the
algorithm searches for the best neighbors when inserting a new
vector. Higher values produce a higher-quality graph but slow down
index building. This is a one-time cost.
ef_search
(search_beam_width): The main tuning knob for query
performance. Controls the size of the candidate beam during search.
Higher values evaluate more candidates, which improves recall but
increases query latency. Since vector index options cannot be
changed after creation, each experiment required dropping and
recreating the index. Here are the CQL statements used: --
Experiment #1: M=64, ef_construction=384, ef_search=192 CREATE
CUSTOM INDEX vdb_bench_collection_vector_idx ON
vdb_bench.vdb_bench_collection (vector) USING 'vector_index' WITH
OPTIONS = { 'search_beam_width': '192', 'construction_beam_width':
'384', 'maximum_node_connections': '64', 'similarity_function':
'COSINE' }; -- Experiment #2: M=32, ef_construction=256,
ef_search=128 CREATE CUSTOM INDEX vdb_bench_collection_vector_idx
ON vdb_bench.vdb_bench_collection (vector) USING 'vector_index'
WITH OPTIONS = { 'search_beam_width': '128',
'construction_beam_width': '256', 'maximum_node_connections': '32',
'similarity_function': 'COSINE' }; -- Experiment #3: M=24,
ef_construction=256, ef_search=64 CREATE CUSTOM INDEX
vdb_bench_collection_vector_idx ON vdb_bench.vdb_bench_collection
(vector) USING 'vector_index' WITH OPTIONS = { 'search_beam_width':
'64', 'construction_beam_width': '256', 'maximum_node_connections':
'24', 'similarity_function': 'COSINE' }; -- Experiment #4: M=20,
ef_construction=256, ef_search=48 CREATE CUSTOM INDEX
vdb_bench_collection_vector_idx ON vdb_bench.vdb_bench_collection
(vector) USING 'vector_index' WITH OPTIONS = { 'search_beam_width':
'48', 'construction_beam_width': '256', 'maximum_node_connections':
'20', 'similarity_function': 'COSINE' }; The benchmark was
run using VectorDBBench with
the upcoming ScyllaDB Python driver built on a Rust core (a dev
version is available at
python-rs-driver). VectorDBBench ramps concurrency from 1 to
150 concurrent search clients and measures QPS, P99 and average
latency at each level. A separate serial run of 1,000 queries
measures recall and nDCG against brute-force ground truth. Results
Peak QPS Comparison To start our analysis, let’s examine the
maximum throughput that each index configuration can sustain under
peak concurrency. When strictly looking at the highest throughput
achieved:
The bar chart highlights the dramatic impact of index parameters at
k=10: throughput rises sharply as the index becomes lighter. At
k=100, the differences are much smaller; all configurations cluster
between 2,300 and 3,000 QPS. QPS vs Concurrency The chart below
shows how each index configuration scales as concurrency ramps from
1 to 150 clients.
At k=10, the lighter configurations (Experiments
#3 and #4) scale nearly linearly up to 60–80 concurrent clients
before saturating. Experiment #4 demonstrates the benefit of a
leaner graph: it achieves 5.5X higher peak QPS
than Experiment #1 at k=10. At k=100, all
configurations converge to a narrower throughput band (2,300–3,025
QPS). This shows that retrieving 100 neighbors dominates the
per-query cost regardless of index parameters. P99 and Average
Latency vs Concurrency As expected, increasing throughput adds
queuing delay, and that leads to higher tail latencies.
<!-- Note: The original document had 6 images. The source note
lists the order as 4-1-2-3-5-6. The text contains 7 [image]
placeholders. Based on the document's structure, I will assume the
sixth placeholder corresponds to the last chart (Average Latency)
and omit the extra placeholder, as the source note only accounts
for six images. I will adjust the numbering below. The original
list was 4-1-2-3-5-6. I will use the final placeholder (Image 6
from source) here. The next section has another chart, so I will
add a seventh placeholder and mark it as 'Source Unknown'.
Lighter configurations start at dramatically lower baseline
latencies. Experiment #4 maintains sub-6 ms P99 latency up to 30
concurrent clients, while Experiment #1 starts above 13 ms, even at
concurrency 1. All configurations show latency rising
proportionally once throughput saturates. This is the expected
queuing behavior when the system is at capacity. QPS vs P99 Latency
(Pareto View) Plotting throughput directly against tail latency
provides a Pareto frontier of our benchmark configurations:
This view makes the operational trade-off easier to read than the
concurrency charts alone. At k=10, Experiments #3 and #4 push the
frontier outward, with much higher QPS at the same or lower tail
latency. At k=100, the frontier is tighter, which again shows that
returning more neighbors dominates the total cost per query. Recall
vs Peak QPS Finally, plotting recall helps select the optimal index
strategy based on business requirements:
This chart summarizes the core choice in a single picture: should
you spend compute on accuracy or throughput? Experiment #1 sits at
the high-recall end, Experiment #4 at the high-throughput end, and
Experiment #2 emerges as the practical middle ground for workloads
that need both. Scenario Analysis With the charts above as a visual
reference, let’s examine the three main usage scenarios that emerge
from the data. Scenario 1: Maximum Throughput Experiments #3 (M=24,
ef_search=64) and #4 (M=20, ef_search=48) target workloads where
throughput is the primary objective and moderate recall is
acceptable — for example, coarse candidate retrieval stages in
recommendation pipelines or semantic deduplication. At
k=10, Experiment #4 reached a peak of
12,840 QPS at concurrency 100, with a serial P99
latency of just 5.5 ms and recall of
92.0%. Experiment #3 achieved 9,719
QPS with marginally better recall at
95.0% and a serial P99 of 6.0 ms.
Even at k=100, these lightweight configurations
delivered competitive throughput: Experiment #3 peaked at
3,025 QPS (87.9% recall), which is comparable to
the heavier configurations. Retrieval of 100 neighbors per query
inherently requires more work, which limits the throughput range
across all configurations. Scenario 2: High Recall Experiment #1
(M=64, ef_search=192) prioritizes accuracy for applications that
cannot tolerate missed results (e.g., high-fidelity semantic
search, retrieval-augmented generation [RAG] pipelines, or
compliance-sensitive retrieval). At k=10, the
system delivered 99.2% recall and 99.1%
nDCG — essentially indistinguishable from exact
brute-force search. Peak QPS reached 2,324 with a
serial P99 latency of 14.6 ms. At
k=100, recall was 96.8% with
2,345 QPS and a serial P99 of 15.2
ms. The higher latency and lower throughput are a direct
consequence of the richer graph (64 connections per node) and wider
search beam (192 candidates), which evaluate substantially more
distance computations per query. Scenario 3: Balanced Experiment #2
(M=32, ef_search=128) takes the middle ground, offering strong
recall with significantly better throughput than the high-recall
configuration. At k=10, it achieved 97.7%
recall with 4,897 QPS — roughly double
the throughput of Experiment #1, with only a 1.5 percentage-point
recall reduction. The serial P99 was 8.7 ms. At
k=100, recall was 92.0% with
2,975 QPS and a serial P99 of 9.6
ms. This configuration represents a practical sweet spot
for many production deployments where both recall and throughput
matter. Summary Tables k=100 Metric #1
M=64 ef_s=192 #2 M=32 ef_s=128 #3
M=24 ef_s=64 Peak QPS 2,345 (c=150) 2,975
(c=40) 3,025 (c=40) QPS @ c=10 947 1,314 1,489
Serial P99 Latency 15.2 ms 9.6 ms 7.8 ms
P99 Latency @ c=1 15.5 ms 9.9 ms 8.1 ms
P99 Latency @ c=100 81.2 ms 49.9 ms 49.6 ms
Recall 96.8% 92.0% 87.9% nDCG
97.3% 93.1% 89.7% k=10 Metric #1 M=64
ef_s=192 #2 M=32 ef_s=128 #3 M=24
ef_s=64 #4 M=20 ef_s=48 Peak
QPS 2,324 (c=100) 4,897 (c=80) 9,719 (c=80) 12,840 (c=100)
QPS @ c=10 1,054 1,602 2,046 2,311 Serial
P99 Latency 14.6 ms 8.7 ms 6.0 ms 5.5 ms P99
Latency @ c=1 14.0 ms 8.5 ms 6.2 ms 5.5 ms P99
Latency @ c=100 81.0 ms 38.1 ms 18.0 ms 12.3 ms
Recall 99.2% 97.7% 95.0% 92.0%
nDCG 99.1% 97.6% 94.9% 92.0% Key Takeaways
k=10 vs k=100: At k=10, lighter index parameters
yield massive throughput gains (up to 5.5X) with modest recall
loss. At k=100, all configurations converge to a narrow QPS band
(~1.3X range) because retrieving more neighbors dominates per-query
cost. Recall trade-offs are favorable: At k=10,
recall drops only 7.2 pp (99.2% to 92.0%) for a 5.5X QPS increase.
At k=100, the trade-off is steeper: 8.9 pp for just 1.3X gain.
Latency tracks index weight: Serial P99 drops from
14.6 ms to 5.5 ms at k=10, and from 15.2 ms to 7.8 ms at k=100, as
lighter graphs require fewer distance computations.
Saturation points differ: Experiments #1–#3
plateau around c=40–80; Experiment #4 scales further to c=100
before saturating, reflecting its lower per-query compute cost.
Conclusion These results show that ScyllaDB Vector Search delivers
strong performance even on a compact, five-node cluster with 10
million 768-dimensional vectors. A pair of r7g.2xlarge search nodes
provides enough memory to hold the full HNSW index at f32 precision
– without requiring any quantization. The three storage nodes with
replication factor 3, combined with vector search nodes distributed
across availability zones, also provide high availability. The
system is designed to tolerate node failures without data loss or
service interruption. Depending on the index configuration, the
system can prioritize near-perfect recall (99.2% at k=10) or
maximize throughput (12,840 QPS at k=10 with 92% recall), with
practical balanced options in between. This 10M scenario represents
the accessible end of the scale. For workloads that push into
hundreds of millions or billions of vectors, quantization,
additional search nodes and larger instances extend the same
architecture. See the ScyllaDB
1-billion-vector benchmark for results at extreme scale, and
look for our upcoming 100-million-vector benchmark
post. At K=10, the performance bottleneck resides within the vector
index nodes, leaving ScyllaDB with significant headroom. This means
you can likely add a Vector Search index to your cluster and
continue running a similar workload on your existing ScyllaDB
infrastructure – without needing to scale your database
nodes. The full Jupyter notebook with interactive charts and all
data is available
in this repository. Ready to try it yourself? Follow the
ScyllaDB Vector Search Quick Start Guide to get started.