Managing ScyllaDB Background Operations with Task Manager

Learn about Task Manager, which provides a unified way to observe and control ScyllaDB’s background maintenance work In each ScyllaDB cluster, there are a lot of background processes that help maintain data consistency, durability, and performance in a distributed environment. For instance, such operations include compaction (which cleans up on-disk data files) and repair (which ensures data consistency in a cluster). These operations are critical for preserving cluster health and integrity. However, some processes can be long-running and resource-intensive. Given that ScyllaDB is used for latency-sensitive database workloads, it’s important to monitor and track these operations. That’s where ScyllaDB’s Task Manager comes in. Task Manager allows administrators of self-managed ScyllaDB to see all running operations, manage them, or get detailed information about a specific operation. And beyond being a monitoring tool, Task Manager also provides a unified way to manage asynchronous operations. How Task Manager Organizes and Tracks Operations Task Manager adds structure and visibility into ScyllaDB’s background work. It groups related maintenance activities into modules, represents them as hierarchical task trees, and tracks their lifecycle from creation through completion. The following sections explain how operations are organized, retained, and monitored at both node and cluster levels. Supported Operations Task Manager supports the following operations: Local: Compaction; Repair; Streaming; Backup; Restore. Global: Tablet repair; Tablet migration; Tablet split and merge; Node operations: bootstrap, replace, rebuild, remove node, decommission. Reviewing Active/Completed Tasks Task Manager is divided into modules: the entities that gather information about operations of similar functionality. Task Manager captures and exposes this data using tasks. Each task covers an operation or its part (e.g., a task can represent the part of the repair operation running on a specific shard). Each operation is represented by a tree of tasks. The tree root covers the whole operation. The root may have children, which give more fine-grained control over the operation. The children may have their own children, etc. Let’s consider the example of a global major compaction task tree: The root covers the compaction of all keyspaces in a node; The children of the root task cover a single keyspace; The second-degree descendants of the root task cover a single keyspace on a single shard; The third-degree descendants of the root task cover a single table on a single shard; etc. You can inspect a task from each depth to see details on the operation’s progress.   Determining How Long Tasks Are Shown Task Manager can show completed tasks as well as running ones. The completed tasks are removed from Task Manager after some time. To customize how long a task’s status is preserved, modify task_ttl_in_seconds (aka task_ttl) and user_task_ttl_in_seconds (aka user_task_ttl) configuration parameters. Task_ttl applies to operations that are started internally, while user_task_ttl refers to those initiated by the user. When the user starts an operation, the root of the task tree is a user-task. Descendant tasks are internal and such tasks are unregistered after they finish, propagating their status to their parents. Node Tasks vs Cluster Tasks Task Manager tracks operations local to a node as well as global cluster-wide operations. A local task is created on a node that the respective operation runs on. Its status may be requested only from a node on which the task was created. A global task always covers the whole operation. It is the root of a task tree and it may have local children. A global task is reachable from each node in a cluster. Task_ttl and user_task_ttl are not relevant for global tasks. Per-Task Details When you list all tasks in a Task Manager module, it shows brief information about them with task_stats. Each task has a unique task_id and sequence_number that’s unique within its module. All tasks in a task tree share the same sequence_number. Task stats also include several descriptive attributes: kind: either “node” (a local operation) or “cluster” (a global one). type: what specific operation this task involves (e.g., “major compaction” or “intranode migration”). scope: the level of granularity (e.g., “keyspace” or “tablet”). Additional attributes such as shard, keyspace, table, and entity can further specify the scope. Status fields summarize the task’s state and timing: state: indicate if the task was created, running, done, failed, or suspended. start_time and end_time: indicate when the task began and finished. If a task is still running, its end_time is set to epoch. When you request a specific task’s status, you’ll see more detailed metrics: progress_total and progress_completed show how much work is done, measured in progress_units. parent_id and children_ids place the task within its tree hierarchy. is_abortable indicates whether the task can be stopped before completion. If the task failed, you will also see the exact error message. Interacting with Task Manager Task Manager provides a REST API for listing, monitoring, and controlling ScyllaDB’s background operations. You can also use it to manage the execution of long-running maintenance tasks started with the asynchronous API instead of blocking a client call. If you prefer command-line tools, the same functionality is available through nodetool tasks. Using the Task Management API Task Manager exposes a REST API that lets you manage tasks: GET /task_manager/list_modules – lists all supported Task Manager modules. GET /task_manager/list_module_tasks/{module} – lists all tasks in a specified module. GET /task_manager/task_status/{task_id} – shows the detailed status of a specified task. GET /task_manager/wait_task/{task_id} – waits for a specified task and shows its status. POST /task_manager/abort_task/{task_id} – aborts a specified task. GET /task_manager/task_status_recursive/{task_id} – gets statuses of a specified task and all its descendants. GET/POST /task_manager/ttl – gets/sets task_ttl. GET/POST /task_manager/user_ttl – gets/sets user_task_ttl. POST /task_manager/drain/{module} – drains the finished tasks in a specified module. Running Maintenance Tasks Asynchronously Some ScyllaDB maintenance operations can take a while to complete, especially at scale. Waiting for them to finish through a synchronous API call isn’t always practical. Thanks to Task Manager, existing synchronous APIs are easily and consistently converted into asynchronous ones. Instead of waiting for an operation to finish, a new API can immediately return the ID of the root task representing the started operation. Using this task_id, you can check the operation’s progress, wait for completion, or abort it if needed. This gives you a unified and consistent way to manage all those long-running tasks. Nodetool A task can be managed using nodetool’s tasks command. For details, see the related nodetool docs page. Example: Tracking and Managing Tasks Preparation To start, we locally set up a cluster of three nodes with the IP addresses 127.43.0.1, 127.43.0.2, and 127.43.0.3. Next, we create two keyspaces: keyspace1 with replication factor 3 and keyspace2 with replication factor 2. In each keyspace, we create 2 tables: table1 and table2 in keyspace1, and table3 and table4 in keyspace2. We populate them with data. Exploring Task Manager Let’s start by listing the modules supported by Task Manager: nodetool tasks modules -h 127.43.0.1 ["sstables_loader","node_ops","tablets","repair","snapshot","compaction"] Starting and Tracking a Repair Task We request a tablet repair on all tokens of table keyspace2.table3. curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://127.43.0.3:10000/storage_service/tablets/repair?ks=keyspace2&table=table3&tokens=all' {"tablet_task_id":"2f06bff0-ab45-11f0-94c2-60ca5d6b2927"} In response, we get the task id of the respective tablet repair task. We can use it to track the progress of the repair. Let’s check whether the task with id 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 will be listed in a tablets module. nodetool tasks list tablets -h 127.43.0.1 [{"task_id":"88a7ceb0-ab44-11f0-9016-68b61792a9a7","state":"running","type":"intranode_migration","kind":"cluster","scope":"tablet","keyspace":"keyspace1","table":"table1","entity":"","sequence_number":0,"shard":0,"start_time":"2025-10-17T10:32:08Z","end_time":"1970-01-01T00:00:00Z"}, {"task_id":"2f06bff0-ab45-11f0-94c2-60ca5d6b2927","state":"running","type":"user_repair","kind":"cluster","scope":"table","keyspace":"keyspace2","table":"table3","entity":"","sequence_number":0,"shard":0,"start_time":"2025-10-17T10:36:47Z","end_time":"1970-01-01T00:00:00Z"}, {"task_id":"88ac6290-ab44-11f0-9016-68b61792a9a7","state":"running","type":"intranode_migration","kind":"cluster","scope":"tablet","keyspace":"keyspace2","table":"table4","entity":"","sequence_number":0,"shard":0,"start_time":"2025-10-17T10:32:08Z","end_time":"1970-01-01T00:00:00Z"}] Apart from the repair task, we can see that there are two intranode migrations running. All the tasks are of type “cluster”, which means that they cover the global operations. All these tasks would be visible regardless of which node we request them from. We can also see the scope of the operations. We always migrate one tablet at a time, so the migration tasks’ scope is “tablet”. For repair, the scope is “table” because we previously started the operation on a whole table. Entity, sequence_number, and shard are irrelevant for global tasks. Since all tasks are running, their end_time is set to a default value (epoch). Examining Task Status Let’s examine the status of the tablet repair using its task_id. Global tasks are available on the whole cluster, so we change the requested node… just because we can. 😉 nodetool tasks status 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 -h 127.43.0.3 {"id": "2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "type": "user_repair", "kind": "cluster", "scope": "table", "state": "running", "is_abortable": true, "start_time": "2025-10-17T10:36:47Z", "end_time": "1970-01-01T00:00:00Z", "error": "", "parent_id": "none", "sequence_number": 0, "shard": 0, "keyspace": "keyspace2", "table": "table3", "entity": "", "progress_units": "", "progress_total": 0, "progress_completed": 0, "children_ids": [{"task_id": "52b5bff5-467f-4f4c-a280-95e99adde2b6", "node": "127.43.0.1"},{"task_id": "1eb69569-c19d-481e-a5e6-0c433a5745ae", "node": "127.43.0.2"},{"task_id": "70d098c4-df79-4ea2-8a5e-6d7386d8d941", "node": "127.43.0.3"},...]} The task status contains detailed information about the tablet repair task. We can see whether the task is abortable (via task_manager API). There could also be some additional information that’s not applicable for this particular task : error, which would be set if the task failed; parent_id, which would be set if it had a parent (impossible for a global task); progress_unit, progress_total, progress_comwepleted, which would indicate task progress (not yet supported for tablet repair tasks). There’s also a list of tasks that were created as a part of the global task. The list above has been shortened to improve readability. The key point is that children of a global task may be created on all nodes in a cluster. Those children are local tasks (because global tasks cannot have a parent). Thus, they are reachable only from the nodes where they were created. For example, the status of a task 1eb69569-c19d-481e-a5e6-0c433a5745ae should be requested from node 127.43.0.2. nodetool tasks status 1eb69569-c19d-481e-a5e6-0c433a5745ae -h 127.43.0.2 {"id": "1eb69569-c19d-481e-a5e6-0c433a5745ae", "type": "repair", "kind": "node", "scope": "keyspace", "state": "done", "is_abortable": true, "start_time": "2025-10-17T10:36:48Z", "end_time": "2025-10-17T10:36:48Z", "error": "", "parent_id": "2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "sequence_number": 15, "shard": 0, "keyspace": "keyspace2", "table": "", "entity": "", "progress_units": "ranges", "progress_total": 1, "progress_completed": 1, "children_ids": [{"task_id": "52dedd00-7960-482c-85a1-9114131348c3", "node": "127.43.0.2"}]} As expected, the child’s kind is “node”. Its parent_id references the tablet repair task’s task_id. The task has completed successfully, as indicated by the state. The end_time of a task is set. Its sequence_number is 15, which means it is the 15th task in its module. The task’s scope is wider than the parent’s. It could encompass the whole keyspace, but – in this case – it is limited to the parent’s scope. The task’s progress is measured in ranges, and we can see that exactly one range was repaired. This task has one child that is created on the same node as its parent. That’s always true for local tasks. nodetool tasks status 70d098c4-df79-4ea2-8a5e-6d7386d8d941 -h 127.43.0.3 {"id": "70d098c4-df79-4ea2-8a5e-6d7386d8d941", "type": "repair", "kind": "node", "scope": "keyspace", "state": "done", "is_abortable": true, "start_time": "2025-10-17T10:37:49Z", "end_time": "2025-10-17T10:37:49Z", "error": "", "parent_id": "2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "sequence_number": 25, "shard": 0, "keyspace": "keyspace2", "table": "", "entity": "", "progress_units": "ranges", "progress_total": 1, "progress_completed": 1, "children_ids": [{"task_id": "20e95420-9f03-4cca-b069-6f16bd23dd14", "node": "127.43.0.3"}]} We may examine other children of the global tablet repair task too. However, we may only check each one on the node where it was created. Let’s wait until the global task is completed. nodetool tasks wait 2f06bff0-ab45-11f0-94c2-60ca5d6b2927 -h 127.43.0.2 {"id": "2f06bff0-ab45-11f0-94c2-60ca5d6b2927", "type": "user_repair", "kind": "cluster", "scope": "table", "state": "done", "is_abortable": true, "start_time": "2025-10-17T10:36:47Z", "end_time": "2025-10-17T10:47:30Z", "error": "", "parent_id": "none", "sequence_number": 0, "shard": 0, "keyspace": "keyspace2", "table": "table3", "entity": "", "progress_units": "", "progress_total": 0, "progress_completed": 0, "children_ids": [{"task_id": "52b5bff5-467f-4f4c-a280-95e99adde2b6", "node": "127.43.0.1"},{"task_id": "1eb69569-c19d-481e-a5e6-0c433a5745ae", "node": "127.43.0.2"},{"task_id": "70d098c4-df79-4ea2-8a5e-6d7386d8d941", "node": "127.43.0.3"},...]} We can see that its state is “done” and its end_time is set. Working with Compaction Tasks Let’s start some compactions and have a look at the compaction module. nodetool tasks list compaction -h 127.43.0.2 [{"task_id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","state":"running","type":"major compaction","kind":"node","scope":"keyspace","keyspace":"keyspace1","table":"","entity":"","sequence_number":685,"shard":1,"start_time":"2025-10-17T11:00:01Z","end_time":"1970-01-01T00:00:00Z"}, {"task_id":"0861e058-349e-41e1-9f4f-f9c3d90fcd8c","state":"done","type":"major compaction","kind":"node","scope":"keyspace","keyspace":"keyspace1","table":"","entity":"","sequence_number":671,"shard":1,"start_time":"2025-10-17T10:50:58Z","end_time":"2025-10-17T10:50:58Z"}] We can see that one of the major compaction tasks is still running. Let’s abort it and check its task tree. nodetool tasks abort 16a6cdcc-bb32-41d0-8f06-1541907a3b48 -h 127.43.0.2 nodetool tasks tree 16a6cdcc-bb32-41d0-8f06-1541907a3b48 -h 127.43.0.2 [{"id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","type":"major compaction","kind":"node","scope":"keyspace","state":"failed","is_abortable":true,"start_time":"2025-10-17T11:00:01Z","end_time":"2025-10-17T11:01:14Z","error":" seastar::abort_requested_exception (abort requested)","parent_id":"none","sequence_number":685,"shard":1,"keyspace":"keyspace1","table":"","entity":"","progress_units":"bytes","progress_total":208,"progress_completed":206,"children_ids":[{"task_id":"9764694a-cb44-4405-b653-95a6c8cebf45","node":"127.43.0.2"},{"task_id":"b6949bc8-0489-48e0-9325-16c6411d0fcc","node":"127.43.0.2"}]}, {"id":"9764694a-cb44-4405-b653-95a6c8cebf45","type":"major compaction","kind":"node","scope":"shard","state":"done","is_abortable":false,"start_time":"2025-10-17T11:00:01Z","end_time":"2025-10-17T11:00:01Z","error":"","parent_id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","sequence_number":685,"shard":1,"keyspace":"keyspace1","table":"","entity":"","progress_units":"bytes","progress_total":0,"progress_completed":0}, {"id":"b6949bc8-0489-48e0-9325-16c6411d0fcc","type":"major compaction","kind":"node","scope":"shard","state":"failed","is_abortable":false,"start_time":"2025-10-17T11:00:01Z","end_time":"2025-10-17T11:01:14Z","error":"seastar::abort_requested_exception (abort requested)","parent_id":"16a6cdcc-bb32-41d0-8f06-1541907a3b48","sequence_number":685,"shard":0,"keyspace":"keyspace1","table":"","entity":"","progress_units":"bytes","progress_total":208,"progress_completed":206}] We can see that the abort request propagated to one of the task’s children and aborted it. That task now has a failed state and its error field contains abort_requested_exception. Managing Asynchronous Operations Beyond examining the running operations, Task Manager can manage asynchronous operations started with the REST API. For example, we may start a major compaction of a keyspace synchronously with /storage_service/keyspace_compaction/{keyspace} or use an asynchronous version of this API: curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://127.43.0.1:10000/tasks/compaction/keyspace_compaction/keyspace2' "4c6f3dd4-56dc-4242-ad6a-8be032593a02" The response includes the task_id of the operation we just started. This id may be used in Task Manager to track the progress, wait for the operation, or abort it. Key Takeaways The Task Manager provides a clear, unified way to observe and control background maintenance work in ScyllaDB. Visibility: It shows detailed, hierarchical information about ongoing and completed operations, from cluster-level tasks down to individual shards. Consistency: You can use the same mechanisms for listing, tracking, and managing all asynchronous operations. Control: You can check progress, wait for completion, or abort tasks directly, without guessing what’s running. Extensibility: It also provides a framework for turning synchronous APIs into asynchronous ones by returning task IDs that can be monitored or managed through the Task Manager. Together, these capabilities make it easier to see what ScyllaDB is doing, keep the system stable, and convert long-running operations to asynchronous workflows.

The Cost of Multitenancy

DynamoDB and ScyllaDB share many similarities, but DynamoDB is a multi-tenant database, while ScyllaDB is single-tenant The recent DynamoDB outage is a stark reminder that even the most reliable and mature cloud services can experience downtime. Amazon DynamoDB remains a strong and proven choice for many workloads, and many teams are satisfied with its latency and cost. However, incidents like this highlight the importance of architecture, control, and flexibility when building for resilience. DynamoDB and ScyllaDB share many similarities: Both are distributed NoSQL databases with the same “ancestor”: the Dynamo paper (although both databases have significantly evolved from the original concept). A compatible API: The DynamoDB API is one of two supported APIs in ScyllaDB Cloud. Both use multi-zone deployment for higher HA. Both support multi-region deployment. DynamoDB uses Global Tablets (See this analysis for more). ScyllaDB can go beyond and allow multi-cloud deployments, or on-prem / hybrid deployments. But they also have a major difference: DynamoDB is a multi-tenant database, while ScyllaDB is single-tenant. Source: https://blog.bytebytego.com/p/a-deep-dive-into-amazon-dynamodb Multi-tenancy has notable advantages for the vendor: Lower infrastructure cost: Since tenants’ peaks don’t align, the vendor can provision for the aggregate average rather than the sum of all peaks, and even safely over-subscribe resources. Shared burst capacity: Extra capacity for traffic spikes is pooled across all users. Multi-tenancy also comes with significant technical challenges and is never perfect. All users still share the same underlying resources (CPU, storage, and network) while the service works hard to preserve the illusion of a dedicated environment for each tenant (e.g., using various isolation mechanisms). However, sometimes the isolation breaks and the real architecture behind the curtain is revealed. One example is the Noisy Neighbor issue. Another is that when a shared resource breaks, like the DNS endpoint in the latest DynamoDB outage, MANY users are affected. In this case, all DynamoDB users in a region suffer. ScyllaDB Cloud takes a different approach: all database resources are completely separated from each other. Each ScyllaDB database is running: On dedicated VMs On a dedicated VPC In a dedicated Security Group Using a dedicated endpoint and (an optional) dedicated Private Link Isolated authorization and authentication (per database) Dedicated Monitoring and Administration (ScyllaDB Manager) servers When using ScyllaDB Cloud Bring Your Own Account (BYOA), the entire deployment is running on the *user* account, often on a dedicated sub-account. This provides additional isolation. The ScyllaDB Cloud control plane is loosely coupled to the managed databases. Even in the case of a disconnect, the database clusters will continue to serve requests. This design greatly reduces the blast radius of any one issue. While the single-tenant architecture is more resilient, it does come with a few challenges: Scaling: To scale, ScyllaDB needs to allocate new resources (nodes) from EC2, and depend on the EC2 API to allocate them. Tablets and X Cloud have made a great improvement in reducing scaling time. Workload Isolation: ScyllaDB allows users to control the resource bandwidth per workload with Workload Prioritization (docs | tech talk | demo) Pricing: Using numerous optimization techniques, like shard-per-core, ScyllaDB achieves extreme performance per node, which allows us to provide lower prices than DynamoDB for most use cases. To conclude: DynamoDB optimizes for multi-tenancy, whereas ScyllaDB favors stronger tenant isolation and a smaller blast radius.

Cache vs. Database: How Architecture Impacts Performance

Lessons learned comparing Memcached with ScyllaDB Although caches and databases are different animals, databases have always cached data and caches started to use disks, extending beyond RAM. If an in-memory cache can rely on flash storage, can a persistent database also function as a cache? And how far can you reasonably push each beyond its original intent, given the power and constraints of its underlying architecture? A little while ago, I joined forces with Memcached maintainer Alan Kasindorf (a.k.a. dormando) to explore these questions. The collaboration began with the goal of an “apples to oranges” benchmark comparing ScyllaDB with Memcached, which is covered in the article “We Compared ScyllaDB and Memcached and… We Lost?” A few months later, we were pleasantly surprised that the stars aligned for P99 CONF. At the last minute, Kasindorf was able to join us to chat about the project – specifically, what it all means for developers with performance-sensitive use cases. Note: P99 CONF is a highly technical conference on performance and low-latency engineering. We just wrapped P99 CONF 2025, and you can watch the core sessions on-demand. Watch on demand  Cache Efficiency Which data store uses memory more efficiently? To test it, we ran a simple key-value workload on both systems. The results: Memcached cached 101 million items before evictions began ScyllaDB cached only 61 million items before evictions Cache efficiency comparison What’s behind the difference? ScyllaDB also has its own LRU (Least Recently Used) cache, bypassing the Linux cache. But unlike Memcached, ScyllaDB supports a wide-column data representation: A single key may contain many rows. This, along with additional protocol overhead, causes a single write in ScyllaDB to consume more space than a write in Memcached. Drilling down into the differences, Memcached has very little per-item overhead. In the example from the image above, each stored item consumes either 48 or 56 bytes, depending on whether compare and swap (CAS) is enabled. In contrast, ScyllaDB has to handle a lot more (it’s a persistent database after all!). It needs to allocate space for its memtables, Bloom filters and SSTable summaries so it can efficiently retrieve data from disk when a cache miss occurs. On top of that, ScyllaDB supports a much richer data model(wide column). Another notable architectural difference stands out in the performance front: Memcached is optimized for pipelined requests (think batching, as in DynamoDB’s BatchGetItem), considerably reducing the number of roundtrips over the network to retrieve several keys. ScyllaDB is optimized for single (and contiguous) key retrievals under a wide-column representation. Read-only in-memory efficiency comparison Following each system’s ideal data model, both ScyllaDB and Memcached managed to saturate the available network throughput, servicing around 3 million rows/s while sustaining below single-digit millisecond P99 latencies. Disks and IO Efficiency Next, the focus shifted to disks. We measured performance under different payload sizes, as well as how efficiently each of the systems could maximize the underlying storage. With Extstore and small (1K) payloads, Memcached stored about 11 times more items (compared to its in-memory workload) before evictions started to kick in, leaving a significant portion of free available disk space. This happens because, in addition to the regular per-key overhead, Memcached stores an additional 12 bytes per item in RAM as a pointer to storage. As RAM gets depleted, Extstore is no longer effective and users will no longer observe savings beyond that point. Disk performance with small payloads comparison For the actual performance tests, we stressed Extstore against item sizes of 1KB and 8KB. The table below summarizes the results: Test Type Payload Size I/O Threads GET Rate P99 Latency perfrun_metaget_pipe 1KB 32 188K/s 4~5 ms perfrun_metaget 1KB 32 182K/s <1ms perfrun_metaget_pipe 1KB 64 261K/s 5~6 ms perfrun_metaget 1KB 64 256K/s 1~2ms perfrun_metaget_pipe 8KB 16 92K/s 5~6 ms perfrun_metaget 8KB 16 90K/s <1ms perfrun_metaget_pipe 8KB 32 110K/s 3~4 ms perfrun_metaget 8KB 32 105K/s <1ms We populated ScyllaDB with the same number of items as we used for Memcached. ScyllaDB actually achieved higher throughput – and just slightly higher latency – than Extstore. I’m pretty sure that if the throughput had been reduced, the latency would have been lower. But even with no tuning, the performance is quite comparable. This is summarized below: Test Type Payload Size GET Rate Server-Side P99 Client-Side P99 1KB Read 1KB 268.8K/s 2ms 2.4ms 8KB Read 8KB 156.8K/s 1.54ms 1.9ms A few notable points from these tests: Extstore required considerable tuning to fully saturate flash storage I/O. Due to Memcached’s architecture, smaller payloads are unable to fully use the available disk space, providing smaller gains compared to ScyllaDB. ScyllaDB rates were overall higher than Memcached in a key-value orientation, especially under higher payload sizes. Latencies were better than pipelined requests, but slightly higher than individual GETs in Memcached. I/O Access Methods Discussion These disk-focused tests unsurprisingly sparked a discussion about the different I/O access methods used by ScyllaDB vs. Memcached/Extstore. I explained that ScyllaDB uses asynchronous direct I/O. For an extensive discussion of this, read this blog post by ScyllaDB CTO and cofounder Avi Kivity. Here’s the short version: ScyllaDB is a persistent database. When people adopt a database, they rightfully expect that it will persist their data. So, direct I/O is a deliberate choice. It bypasses the kernel page cache, giving ScyllaDB full control over disk operations. This is critical for things like compactions, write-ahead logs and efficiently reading data off disk. A user-space I/O scheduler is also involved. It lives in the middle and decides which operation gets how much I/O bandwidth. That could be an internal compaction task or a user-facing query. It arbitrates between them. That’s what enables ScyllaDB to balance persistence work with latency-sensitive operations. Extstore takes a rather very different approach: keep things as simple as possible and avoid touching the disk unless it’s absolutely necessary. As Kasindorf put it: “We do almost nothing.” That’s fully intentional. Most operations — like deletes, TTL updates, or overwrites — can happen entirely in memory. No disk access needed. So Extstore doesn’t bother with a scheduler.” Without a scheduler, Extstore performance tuning is manual. You can change the number of Extstore I/O threads to get better utilization. If you roll it out and notice that your disk doesn’t look fully utilized – and you still have a lot of spare CPU – you can bump up the thread count. Kasindorf mentioned that it will likely become self-tuning at some point. But for now, it’s a knob that users can tweak. Another important piece is how Extstore layers itself on top of Memcached’s existing RAM cache. It’s not a replacement; it’s additive. You still have your in-memory cache and Extstore just handles the overflow. Here’s how Kasindorf explained it: “If you have, say, five gigs of RAM and one gig of that is dedicated to these small pointers that point from memory into disk, we still have a couple extra gigs left over for RAM cache.” That means if a user is actively clicking around, their data may never even go to disk. The only time Extstore might need to read from disk is when the cache has gone cold (for instance, a user returning the next day). Then the entries get pulled back in. Basically, while ScyllaDB builds around persistent, high-performance disk I/O (with scheduling, direct control and durable storage), Extstore is almost the opposite. It’s light, minimal and tries to avoid disk entirely unless it really has to. Conclusion and Takeaways Across these and the other tests that we performed in the full benchmark, Memcached and ScyllaDB both managed to maximize the underlying hardware utilization and keep latencies predictably low. So which one should you pick? The real answer: It depends. If your existing workload can accommodate a simple key-value model and it benefits from pipelining, then Memcached should be more suitable to your needs. On the other hand, if the workload requires support for complex data models, then ScyllaDB is likely a better fit. Another reason for sticking with Memcached: It easily delivers traffic far beyond what a network interface card can sustain. In fact, in this Hacker News thread, dormando mentioned that he could scale it up past 55 million read ops/sec for a considerably larger server. Given that, you could make use of smaller and/or cheaper instance types to sustain a similar workload, provided the available memory and disk footprint meet your workload needs. A different angle to consider is the data set size. Even though Extstore provides great cost savings by allowing you to store items beyond RAM, there’s a limit to how many keys can fit per gigabyte of memory. Workloads with very small items should observe smaller gains compared to those with larger items. That’s not the case with ScyllaDB, which allows you to store billions of items irrespective of their sizes. It’s also important to consider whether data persistence is required. If it is, then running ScyllaDB as a replicated distributed cache provides you greater resilience and non-stop operations, with the tradeoff being (and as Memcached correctly states) that replication halves your effective cache size. Unfortunately, Extstore doesn’t support warm restarts and thus the failure or maintenance of a single node is prone to elevating your cache miss ratios. Whether this is acceptable depends on your application semantics: If a cache miss corresponds to a round-trip to the database, then the end-to-end latency will be momentarily higher. Regardless of whether you choose a cache like Memcached or a database like ScyllaDB, I hope this work inspires you to think differently about performance testing. As we’ve seen, databases and caches are fundamentally different. And at the end of the day, just comparing performance numbers isn’t enough. Moreover, recognize that it’s hard to fully represent your system’s reality with simple benchmarks, and every optimization comes with some trade-offs. For example, pipelining is great, but as we saw with Extstore, it can easily introduce I/O contention. ScyllaDB’s shard-per-core model and support for complex data models are also powerful, but they come with costs too, like losing some pipelining flexibility and adding memory overhead.  

11X Faster ScyllaDB Backup

Learn about ScyllaDB’s new native backup, which improves backup speed up to 11X by using Seastar’s CPU and IO scheduling ScyllaDB’s 2025.3 release introduces native backup functionality. Previously, an external process managed backups independently, without visibility into ScyllaDB’s internal workload. Now, Seastar’s CPU and I/O schedulers handle backups internally, which gives ScyllaDB full control over prioritization and resource usage. In this blog post, we explain why we changed our approach to backup, share what users need to know, and provide a preview of what to expect next. What We Changed and Why Previously, SSTable backups to S3 were managed entirely by ScyllaDB Manager and the Scylla Manager Agent running on each node. You would schedule the backup, and Manager would coordinate the required operations (taking snapshots, collecting metadata, and orchestrating uploads). Scylla Manager Agent handled all the actual data movement. The problem with this approach was that it was often too slow for our users’ liking, especially at the massive scale that’s common across our user base. Since uploads ran through an external process, they competed with ScyllaDB for resources (CPU, disk I/O, and network bandwidth). The rclone process read from \disk at the same time that ScyllaDB did – so two processes on the same node were performing heavy disk I/O simultaneously. This contention on the disk could impact query latencies when user requests were being processed during a backup. To mitigate the effect on real-time database requests, we use Systemd slice to control Scylla Manager Agent resources. This solution successfully reduced backup bandwidth, but failed to increase the bandwidth when the pressure from online requests was low. To optimize this process, ScyllaDB now provides a native backup capability. Rather than relying on an external agent (ScyllaDB Manager) to copy files, ScyllaDB uploads files directly to S3. The new approach is faster and more efficient because ScyllaDB uses its internal IO and CPU scheduling to control the backup operations. Backup operations are assigned a lower priority than user queries. In the event of resource contention, ScyllaDB will deprioritize them so they don’t interfere with the latency of the actual workload. Note that this new native backup capability is currently available for AWS. It is coming soon for other backup targets (such as GCP Cloud Storage and Azure Storage). To enable native backup, configure the S3 connectivity on each node’s scylla.yaml and set the desired strategy (Native, Auto, or Rclone) in ScyllaDB Manager. Note that the rclone agent is always used to upload backup metadata, so you should still configure the Manager Agent even if you are using native backup and restore. Performance Improvements So how much faster is the new backup approach? We recently ran some tests to find out. We ran two tests which are the same in all aspects except for the tool being used for backup: rclone in one and native scylla in the other. Test Setup The test uses 6 nodes i4i.2xlarge with total injected data of 2TB with RF=3. That means that the 2TB injected data becomes 6TB (RF=3) and these 6TB are spread across 6 nodes, resulting in each node holding 1TB of data. The backup benchmark then measures how long it takes to backup the entire cluster, indicating the data size of one node Native Backup Here are the results of the native backup tests: Name Size Time [s] native_backup_1016_2234 1.057 TiB 00:19:18 Data was uploaded at a rate of approximately  900 MB/s. OS Tx Bytes during backup The slightly higher values for the OS metrics are due to for example tcp-retransmit, size of HTTP headers that is not part of the data but part of the transmitted bytes, and more alike. rclone Backup The same exact test with rclone produced the following results: Name Size Time [s] rclone_backup_1017_2334 1.057 TiB 03:48:57 Here, data was uploaded at a rate of approximately 80MB/s Next Up: Faster Restore Next, we’re optimizing restore, which is the more complex part of the backup/restore process. Backups are relatively straightforward: you just upload the data to object storage. But restoring that data is harder, especially if you need to bring a cluster back online quickly or restore it onto a topology that’s different from the original one. The original cluster’s nodes, token ranges, and data distribution might look quite different from the new setup – but during restore, ScyllaDB must somehow map between what was backed up and what the new topology expects. Replication adds even more complexity. ScyllaDB replicates data according to the specified replication factor (RF), so the backup has multiple copies of the same data. During the restore process, we don’t want to redundantly download or process those copies; we need a way to handle them efficiently. And one more complicating factor: the restore process must understand whether the cluster uses virtual nodes or tablets because that affects how data is distributed. Wrapping Up ScyllaDB’s move to native integration with object storage is a big step forward for the faster backup/restore operations that many of our large-scale users have been asking for. We’ve already sped up backups by eliminating the extra rclone layer. Now, our focus is on making restores equally efficient while handling complex topologies, replication, and data distribution. This will make it faster and easier to restore large clusters. Looking ahead, we’re working on using object storage not only for backup and restore, but also for tiering: letting ScyllaDB read data directly from object storage as if it were on local disk. For a more detailed look at ScyllaDB’s plans for backup, restore, and object storage as native storage, see this video:

Vector search benchmarking: Embeddings, insertion, and searching documents with ClickHouse® and Apache Cassandra®

Welcome back to our series on vector search benchmarking. In part 1, we dove into setting up a benchmarking project and explored how to implement vector search in PostgreSQL from the example code in GitHub. We saw how a hands-on project with students from Northeastern University provided a real-world testing ground for Retrieval-Augmented Generation (RAG) pipelines.

Now, we’re continuing our journey by exploring two more powerful open source technologies: ClickHouse and Apache Cassandra. Both handle vector data differently and understanding their methods is key to effective vector search benchmarking. Using the same student project as our guide, this post will examine the code for embedding, inserting, and retrieving data to see how these technologies stack up.

Let’s get started.

Vector search benchmarking with ClickHouse

ClickHouse is a column-oriented database management system known for its incredible speed in analytical queries. It’s no surprise that it has also embraced vector search. Let’s see how the student project team implemented and benchmarked the core components.

Step 1: Embedding and inserting data

scripts/vectorize_and_upload.py

This is the file that handles Step 1 of the pipeline for ClickHouse. Embeddings in this file (scripts/vectorize_and_upload.py) are used as vector representations of Guardian news articles for the purpose of storing them in a database and performing semantic search. Here’s how embeddings are handled step-by-step (the steps look similar to PostgreSQL).

First up, is the generation of embeddings. The same SentenceTransformer model used in part 1 (all-MiniLM-L6-v2) is loaded in the class constructor. In the method generate_embeddings(self, articles), for each article:

  • The article’s title and body are concatenated into a text string.
  • The model generates an embedding vector (self.model.encode(text_for_embedding)), which is a numerical representation of the article’s semantic content.
  • The embedding is added to the article’s dictionary under the key embedding.

Then the embeddings are stored in ClickHouse as follows.

  • The database table guardian_articles is created with an embedding Array(Float64) NOT NULL column specifically to store these vectors.
  • In upload_to_clickhouse_debug(self, articles_with_embeddings), the script inserts articles into ClickHouse, including the embedding vector as part of each row.

Step 2: Vector search and retrieval

services/clickhouse/clickhouse_dao.py

The steps to search are the same as for PostgreSQL in part 1. Here’s part of the related_articles method for ClickHouse:

def related_articles(self, query: str, limit: int = 5):

"""Search for similar articles using vector similarity""" 

    ... 

    query_embedding = self.model.encode(query).tolist() 
    search_query = f""" 
    SELECT  
        url, 
        title, 
        body, 
        publication_date, 
        cosineDistance(embedding, {query_embedding}) as distance 
    FROM guardian_articles 
    ORDER BY distance ASC 
    LIMIT {limit} 
    """ 

    ...

When searching for related articles, it encodes the query into an embedding, then performs a vector similarity search in ClickHouse using cosineDistance between stored embeddings and the query embedding, and results are ordered by similarity, returning the most relevant articles.

Vector search benchmarking with Apache Cassandra

Next, let’s turn our attention to Apache Cassandra. As a distributed NoSQL database, Cassandra is designed for high availability and scalability, making it an intriguing option for large-scale RAG applications.

Step 1: Embedding and inserting data

scripts/pull_docs_cassandra.py

As in the above examples, embeddings in this file are used to convert article text (body) into numerical vector representations for storage and later retrieval in Cassandra.

For each article, the code extracts the body and computes the embeddings:

embedding = model.encode(body)
embedding_list = [float(x) for x in embedding]
  • model.encode(body) converts the text to a NumPy array of 384 floats.
  • The array is converted to a standard Python list of floats for Cassandra storage.

Next, the embedding is stored in the vector column of the articles table using a CQL INSERT:

insert_cql = SimpleStatement(""" 
    INSERT INTO articles (url, title, body, publication_date, vector) 
    VALUES (%s, %s, %s, %s, %s) 
    IF NOT EXISTS; 
""") 
result = session.execute(insert_cql, (url, title, body, publication_date, embedding_list))

The schema for the table specifies: vector vector<float, 384>, meaning each article has a corresponding 384-dimensional embedding. The code also creates a custom index for the vector column:

session.execute(""" 
    CREATE CUSTOM INDEX IF NOT EXISTS ann_index ON articles(vector) 
    USING 'StorageAttachedIndex'; 
""")

This enables efficient vector (ANN: Approximate Nearest Neighbor) search capabilities, allowing similarity queries on stored embeddings.

A key part of the setup is the schema and indexing. The Cassandra schema in services/cassandra/init/01-schema.cql defines the vector column.

Being a NoSQL database, Cassandra schemas are a bit different to normal SQL databases, so it’s worth taking a closer look. This Cassandra schema is designed to support Retrieval-Augmented Generation (RAG) architectures, which combine information retrieval with generative models to answer queries using both stored data and generative AI. Here’s how the schema supports RAG:

  • Keyspace and table structure
    • Keyspace (vectorembeds): Analogous to a database, this isolates all RAG-related tables and data.
    • Table (articles): Stores retrievable knowledge sources (e.g., articles) for use in generation.
  • Table columns
    • url TEXT PRIMARY KEY: Uniquely identifies each article/document, useful for referencing and deduplication.
    • title TEXT and body TEXT: Store the actual content and metadata, which may be retrieved and passed to the generative model during RAG.
    • publication_date TIMESTAMP: Enables filtering or ranking based on recency.
    • vector VECTOR<FLOAT, 384>: Stores the embedding representation of the article. The new Cassandra vector data type is documented here.
  • Indexing
    • Sets up an Approximate Nearest Neighbor (ANN) index using Cassandra’s Storage Attached Index.

More information about Cassandra vector support is in the documentation.

Step 2: Vector search and retrieval

The retrieval logic in services/cassandra/cassandra_dao.py showcases the elegance of Cassandra’s vector search capabilities.

The code to create the query embeddings and perform the query is similar to the previous examples, but the CQL query to retrieve similar documents looks like this:

query_cql = """ 
    SELECT url, title, body, publication_date 
    FROM articles 
    ORDER BY vector ANN OF ? 
    LIMIT ? 
""" 
prepared = self.client.prepare(query_cql) 
rows = self.client.execute(prepared, (emb, limit))

What have we learned?

By exploring the code from this RAG benchmarking project we’ve seen distinct approaches to vector search. Here’s a summary of key takeaways:

  • Critical steps in the process:
    • Step 1: Embedding articles and inserting them into the vector databases.
    • Step 2: Embedding queries and retrieving relevant articles from the database.
  • Key design pattern:
    • The DAO (Data Access Object) design pattern provides a clean, scalable way to support multiple databases.
    • This approach could extend to other databases, such as OpenSearch, in the future.
  • Additional insights:
    • It’s possible to perform vector searches over the latest documents, pre-empting queries, and potentially speeding up the pipeline.

What’s next?

So far, we have only scratched the surface. The students built a complete benchmarking application with a GUI (using Steamlit), used multiple other interesting components (e.g. LangChain, LangGraph, FastAPI and uvicorn), Grafana and LangSmith for metrics, and Claude to use the retrieved articles to answer questions, and Docker support for the components. They also revealed some preliminary performance results! Here’s what the final system looked like (this and the previous blog focused on the bottom boxes only).

student-built benchmarking application flow chart

In a future article, we will examine the rest of the application code, look at the preliminary performance results the students uncovered, and discuss what they tell us about the trade-offs between these different databases.

Ready to learn more right now? We have a wealth of resources on vector search. You can explore our blogs on ClickHouse vector search and Apache Cassandra Vector Search (here, here, and here) to deepen your understanding.

The post Vector search benchmarking: Embeddings, insertion, and searching documents with ClickHouse® and Apache Cassandra® appeared first on Instaclustr.

P99 CONF 2025 Recap: Latency to LLMs

Another year has flown by — in the blink of an eye, I found myself back in the US hosting P99 CONF again. This makes it my third time on stage and the fifth in this incredible series of talks from engineers around the world, all sharing their stories about chasing that elusive P99 latency. Beyond raw speed and tail latency, we explored modern systems programming, kernel innovation, databases and storage at scale, observability, testing, performance insights… and of course, this year’s big wave: artificial intelligence, with vector search and LLMs taking center stage. In this blog post, I’ll help you chart a course through it all (though honestly, every session is worth your time). Watch 60+ P99 CONF Talks On Demand Starting with low latency “Taming tail latency” has been a running theme at P99 for years, and this time PayPal and TigerBeetle both took the stage to show how they deal with unpredictable outliers that wreck user experience. PayPal showed how tiny inefficiencies multiply under load. And (as we’ve come to expect from the past optimization talks), TigerBeetle engineered determinism, single-threaded scheduling, predictable I/O, and batching everywhere. ScyllaDB joined the party with their low-latency vector search engine, proving that AI queries don’t have to experience high latency. If your database architecture already handles the long tail at the storage layer, your vector workloads inherit that speed for free. And since I can’t resist some speculative trading (Aussies love a bet), I enjoyed the talks from Maven Securities on lock-free queues for trading systems and Bloomberg on building scalable, end-to-end latency metrics from distributed traces. If you’re chasing nanoseconds instead of milliseconds, check out Steve Heller’s Design Considerations for P99-Optimized Hash Tables. Rust was everywhere (again) Turso is rewriting SQLite in Rust. ClickHouse tried converting 1.5 million lines of C++ … or at least part of it. Neon rebuilt its I/O stack with tokio and io_uring, Datadog squeezed out extra juice from Lambda extensions in Rust, and Trigo visualized async abstractions (also Rust). Maybe we’re the unofficial Rust conf? But Go and C++ held their ground too. Miguel Young de la Sota presented a faster protobuf implementation in Go, and Manya Bansal (MIT PhD student) gave a highly detailed talk on how not to program GPUs with some C++ insights there. On the database and storage side Avi Kivity kicked off the conference by sharing how ScyllaDB’s Seastar CPU scheduler prioritizes complex workloads while keeping latency predictable. Nadav Har’El showed how ScyllaDB manages client ingestion to prevent memory blowouts, Dor Laor connected theory to real-world tail behavior, and Felipe explained how tablet replication delivers true elasticity. And Andy Pavlo took us through the very real challenge of both humans and autonomous systems tuning databases. Other databases showed up strong too. Turso is pushing SQLite into new dimensions, DragonflyDB nailed sorted sets with B+ trees, and Qian Li from DBOS shared how they merged app logic and state right into the database. These are all good reminders that performance starts at the data layer. Performance engineering, testing, and observability My old mate Ashutosh Agrawal shared how he built and tested systems for 32 million concurrent cricket fans. Also on testing, I loved the double feature on deterministic simulation testing: Resonate’s approach to catching Heisenbugs and Antithesis running fuzzing workloads at scale in near real time. eBPF continues to fascinate. Cosmic elaborated on reliability versus memory trade-offs, and Tanel Poder showed thread-level observability with some seriously impressive tooling. AWS’s Geoff Blake introduced aperf for profiling the nitty-gritty, Arm unlocked new insights with the PMUv3 plugin, and Raphael Carvalho took us on a wild hunt through a Linux kernel bug triggered by io_uring. Proper detective work, that one. And then there’s AI and ML, the new frontier Chip Huyen delivered a standout keynote on LLM inference optimization, tying together hardware, software, and architecture choices. Eshcar Hillel went deep into KV cache offloading, Microsoft’s Magdalen Manohar showed how to make vector search cost-effective and fast, and ScyllaDB’s Pawel Pery explained how we decoupled a Rust-based vector engine for high-performance ANN queries. We wrapped it up with a cracking conversation between Rachel Stephens from RedMonk and Adrian Cockcroft, exploring AI-assisted analytics and that eternal hunt for predictability at the tail. Hosting this conference is always a privilege. Big thanks to Natalie Estrada, who keeps the whole thing running (and somehow curates killer playlists). Cynthia Dunlop, who finds the best speakers and replies to messages faster than anyone I know, and Felipe and the lounge crew, daring the demo deities and answering audience questions live. Plus the many ScyllaDB engineers who present, coordinate, and quietly make it all happen behind the scenes. And to all 30,000 registrants and 100,000+ participants so far… You’re what makes this community so special. From all of us at ScyllaDB: a huge thank you. See you at the next one. Join is for P99 CONF’s sister conference, Monster SCALE Summit

Optimizing Cassandra Repair for Higher Node Density

This is the fourth post in my series on improving the cost efficiency of Apache Cassandra through increased node density. In the last post, we explored compaction strategies, specifically the new UnifiedCompactionStrategy (UCS) which appeared in Cassandra 5.

Now, we’ll tackle another aspect of Cassandra operations that directly impacts how much data you can efficiently store per node: repair. Having worked with repairs across hundreds of clusters since 2012, I’ve developed strong opinions on what works and what doesn’t when you’re pushing the limits of node density.

Building a Movie Recommendation App with ScyllaDB Vector Search

Use ScyllaDB to perform semantic search across movie plot descriptions We built a sample movie recommendation app to showcase ScyllaDB’s new vector search capabilities. The sample app gives you a simple way to experience building low-latency semantic search and vector-based applications with ScyllaDB. Join the Vector Search Early Access Program In this post, we’ll show how to perform semantic search across movie plot descriptions to find movies by meaning, not keywords. This example also shows how you can add ScyllaDB Vector Search to your existing applications.    Before diving into the application, let’s clarify what we mean by semantic search and provide some context about similarity functions. About vector similarity functions Similarity between two vectors can be calculated in several ways. The most common methods are cosine similarity, dot product (inner product), and L2 (Euclidean) distance. ScyllaDB Vector Search supports all of these functions. For text embeddings, cosine similarity is the most often used similarity function. That’s because, when working with text, we mostly focus on the direction of the vector, rather than its magnitude. Cosine similarity considers only the angle between the vectors (i.e., the difference in directions) and ignores the magnitude (length of the vector). For example, a short document (1 page) and a longer document (10 pages) on the same topic will still point in similar directions in the vector space even though they are different lengths. This is what makes cosine similarity ideal for capturing topical similarity. Cosine similarity formula In practice, many embedding models (e.g., OpenAI models) produce normalized vectors. Normalized vectors all have the same length (magnitude of 1). For normalized vectors, cosine similarity and the dot product return the same result. This is because cosine similarity divides the dot product by the magnitudes of the vectors, which are all 1 when vectors are normalized. The L2 function produces different distance values compared to the dot product or cosine similarity, but the ordering of the embeddings remains the same (assuming normalized vectors). Now that you have a better understanding of semantic similarity functions, let’s explain how the recommendation app works. App overview The application allows users to input what kind of movie they want to watch. For example, if you type “American football,” the app compares your input to the plots of movies stored in the database. The first result is the best match, followed by other similar recommendations. This comparison uses ScyllaDB Vector Search. You can find the source code on GitHub, along with setup instructions and a step-by-step tutorial in the documentation. For the dataset, we are reusing a TMDB dataset available on Kaggle. Project requirements To run the application, you need a ScyllaDB Cloud account and a vector search enabled cluster. Right now, you need to use the API to create a vector search enabled cluster. Follow the instructions here to get started! The application depends on a few Python packages: ScyllaDB Python driver – for connecting and querying ScyllaDB. Sentence Transformers – to generate embeddings locally without requiring OpenAI or other paid APIs. Streamlit – for the UI. Pydantic – to make working with query results easier. By default, the app uses the all-MiniLM-L6-v2 model so anyone can run it locally without heavy compute requirements. Other than ScyllaDB Cloud, no commercial or paid services are needed to run the example. Configuration and database connection A config.py file stores ScyllaDB Cloud credentials, including the host address and connection details. A separate ScyllaDB helper module handles the following: Creating the connection and session Inserting and querying data Providing helper functions for clean database interactions Database schema The schema is defined in a schema.cql file, executed when running the project’s migration script. It includes: Keyspace creation (with a replication factor of 3) Table definition for movies, storing fields like release_date, title, genre, and plot Vector search index Schema highlights: `plot` – text, stores the movie description used for similarity comparison. `plot_embedding` – vector, embedding representation of the plot, defined using the vector data type with 384 dimensions (matching the Sentence Transformers model). `Primary key` – id as the partition key for efficient lookups querying by id CDC enabled – required for ScyllaDB vector search. `Vector index` – an Approximate Nearest Neighbor (ANN) index created on the plot_embedding column to enable efficient vector queries. The goal of this schema is to allow efficient search on the plot embeddings and store additional information alongside the vectors. Embeddings An Embedding Creator class handles text embedding generation with Sentence Transformers. The function accepts any text input and returns a list of float values that you can insert into ScyllaDB’s `vector` column. Recommendations implemented with vector search The app’s main function is to provide movie recommendations. These recommendations are implemented using vector search. So we create a module called recommender that handles Taking the input text Turning the text into embeddings Running vector search Let’s break down the vector search query: SELECT * FROM recommend.movies ORDER BY plot_embedding ANN OF [0.1, 0.2, 0.3, …] LIMIT 5; User input is first converted to an embedding, ensuring that we’re comparing embedding to embedding. The rows in the table are ordered by similarity using the ANN operator (ANN OF). Results are limited to five similar movies. The SELECT statement retrieves all columns from the table. In similarity search, we calculate the distance between two vectors. The closer the vectors in vector space, the more similar their underlying content. Or, in other words, a smaller distance suggests higher similarity. Therefore, an ORDER BY sort results in ascending order, with smaller distances appearing first. Streamlit UI The UI, defined in app.py, ties everything together. It takes the user’s query, converts it to an embedding, and executes a vector search. The UI displays the best match and a list of other similar movie recommendations. Try it yourself! If you want to get started building with ScyllaDB Vector Search, you have several options: Explore the source code on GitHub Use the README to set up the app on your computer Follow the tutorial to build the app from scratch And if you have questions, use the forum and we’ll be happy to help.