Retaining Database Goodput Under Stress with Per-Partition Query Rate Limiting

How we implemented ScyllaDB’s per-partition rate limit feature and the impact on “goodput” Distributed database clusters operate best when the data is spread across a large number of small partitions, and reads/writes are spread uniformly across all shards and nodes. But misbehaving client applications (e.g., triggered by bugs, malicious actors, or unpredictable events sending something “viral”) could suddenly upset a previously healthy balance, causing one partition to receive a disproportionate number of requests. In turn, this usually overloads the owning shards, creating a scenario called a “hot partition” and causing the total cluster latency to worsen. To prevent this, ScyllaDB’s per-partition rate limit feature allows users to limit the rate of accepted requests on a per-partition basis. When a partition exceeds the configured limit of operations of a given type (reads/writes) per second, the cluster will then start responding with errors to some of the operations for that partition. That way, the rate of accepted requests is statistically kept at the configured limit. Since rejected operations use fewer resources, this alleviates the “hot partition” situation. This article details why and how we implemented this per-partition rate limiting and measures the impact on “goodput”: the amount of useful data being transferred between clients and servers. Background: The “Hot Partition” Problem To better understand the problem we were addressing, let’s first take a quick look at ScyllaDB’s architecture. ScyllaDB is a distributed database. A single cluster is composed of multiple nodes, and each node is responsible for storing and handling a subset of the total data set. Furthermore, each node is divided into shards. A single shard is basically a CPU core with a chunk of RAM assigned to it. Each of a node’s shards handles a subset of the data the node is responsible for. Client applications that use our drivers open a separate connection to each shard in the cluster. When the client application wants to read or write some data, the driver determines which shard owns this data and sends the request directly to that shard on its dedicated connection. This way, the CPU cores rarely have to communicate with each other, minimizing the cross-core synchronization cost. All of this enables ScyllaDB to scale vertically as well as horizontally. The most granular unit of partitioning is called a partition. A partition in a table is a set of all rows that share the same partition key. For each partition, a set of replica nodes is chosen. Each of the replicas is responsible for storing a copy of the partition. While read and write requests can be handled by non-replica nodes as well, the coordinator needs to contact the replicas in order to execute the operation. Replicas are equal in importance and not all have to be available in order to handle the operation (the number required depends on the consistency level that the user set). This data partitioning scheme works well when the data and load are evenly distributed across all partitions in the cluster. However, a problem can occur when one partition suddenly starts receiving a disproportionate amount of traffic compared to other partitions. Because ScyllaDB assigns only a part of its computing resources to each partition, the shards on replicas responsible for that partition can easily become overloaded. The architecture doesn’t really allow other shards or nodes to “come to the rescue,” therefore even a powerful cluster will struggle to serve requests to that partition. We call this situation a “hot partition” problem. Moreover, the negative effects aren’t restricted to a single partition. Each shard is responsible for handling many partitions. If one of those partitions becomes hot, then any partitions that share at least a single replica shard with that overloaded partition will be impacted as well. It’s the Schema’s Fault (Maybe?) Sometimes, hot partitions occur because the cluster’s schema isn’t a great fit for your data. You need to be aware of how your data is distributed and how it is accessed and used, then model it appropriately so that it fits ScyllaDB’s strengths and avoids its weaknesses. That responsibility lies with you as the system designer; ScyllaDB won’t automatically re-partition the data for you. It makes sense to optimize your data layout for the common case. However, the uncommon case often occurs because end-users don’t always behave as expected. It’s essential that unanticipated user behavior cannot bring the whole system to a grinding halt. What’s more, something inside your system itself could also stir up problems. No matter how well you test your code, bugs are an inevitable reality. Lurking bugs might cause erratic system behavior, resulting in highly unbalanced workloads. As a result, it’s important to think about overload protection at the per-component level as well as at the system level. What is “Goodput” and How Do You Maintain It? My colleague Piotr Sarna explained goodput as follows in the book Database Performance at Scale: “A healthy distributed database cluster is characterized by stable goodput, not throughput. Goodput is an interesting portmanteau of good + throughput, and it’s a measure of useful data being transferred between clients and servers over the network, as opposed to just any data. Goodput disregards errors and other churn-like redundant retries, and is used to judge how effective the communication actually is. This distinction is important. Imagine an extreme case of an overloaded node that keeps returning errors for each incoming request. Even though stable and sustainable throughput can be observed, this database brings no value to the end user.” With hot partitions, requests arrive faster than the replica shards can process them. Those requests form a queue that keeps growing. As the queue grows, requests need to wait longer and longer, ultimately reaching a point where most requests fail because they time out before processing even begins. The system has good throughput because it accepts a lot of work, but poor goodput because most of the work will ultimately be wasted. That problem can be mitigated by rejecting some of the requests when we have reason to believe that we won’t be able to process all of them. Requests should be rejected as early as possible and as cheaply as possible in order to leave the most computing resources for the remaining requests. Enter Per-Partition Rate Limiting In ScyllaDB 5.1 we implemented per-partition rate limiting which follows that idea. For a given table, users can specify a per-partition rate limit: one for reads, another for writes. If the cluster detects that the reads or writes for a given partition are starting to exceed that user-defined limit, the database will begin rejecting some of the requests in an attempt to keep the rate of unrejected requests below the limit. For example, the limit could be set as follows: ALTER TABLE ks.tbl WITH per_partition_rate_limit = { 'max_writes_per_second': 100, 'max_reads_per_second': 200 }; We recommend that users set the limit high enough to avoid reads/writes being rejected during normal usage, but also low enough so that ScyllaDB will reject most of the reads/writes to a partition when it becomes hot. A simple guideline is to estimate the maximum expected reads/writes per second and apply a limit that is an order of magnitude higher. Although this feature tries to accurately apply the defined limit, the actual rate of accepted requests may be higher due to the distributed nature of ScyllaDB. Keep in mind that this feature is not meant for enforcing strict limits (e.g. for API purposes). Rather, it was designed as an overload protection mechanism. The following sections will provide a bit more detail about why there’s sometimes a discrepancy. How Per-Partition Rate Limiting Works To determine how many requests to reject, ScyllaDB needs to estimate the request rate for each partition. Because partitions are replicated on multiple nodes and not all replicas are usually required to perform an operation, there is no obvious single place to track the rate. Instead, we perform measurements on each replica separately and use them to estimate the rate. Let’s zoom in on a single replica and see what is actually being measured. Each shard keeps a map of integer counters indexed by token, table, and operation type. When a partition is accessed, the counter relevant to the operation is increased by one. On the other hand, every second we cut every counter in half, rounding towards zero. Due to this, it is a simple mathematical exercise to show that if a partition has a steady rate of requests, then the counter value will eventually oscillate between X and 2*X, where X is the rate in terms of requests per second. We managed to implement the counter map as a hashmap that uses a fixed, statically allocated region of memory that is independent from the number of tables or the size of the data set. We employ several tricks to make it work without losing too much accuracy. You can review the implementation here. Case 1: When the coordinator is a replica Now, let’s see how these counters are used in the context of the whole operation. Let’s start with the case where the coordinator is also a replica. This is almost always the case when an application uses a properly configured shard-aware driver for ScyllaDB [learn more about shard-aware drivers]. Because the coordinator is a replica, it has direct access to one of the counters for the partition relevant to the operation. We let the coordinator make a decision based only on that counter. If it decides to accept the operation, it then increments its counter and tells other replicas to do the same. If it decides to reject it, then the counter is only incremented locally and replicas are not contacted at all. Although skipping the replicas causes some undercounting, the discrepancy isn’t significant in practice – and it saves processing time and network resources. Case 2: When the coordinator is not a replica When the coordinator is not a replica, it doesn’t have direct access to any of the counters. We cannot let the coordinator ask a replica for a counter because it would introduce additional latency – and that’s unacceptable for a mechanism that is supposed to be low cost. Instead, the coordinator proceeds with sending the request to replicas and lets them decide whether to accept or reject the operation. Unfortunately, we cannot guarantee that all replicas will make the same decision. If they don’t , it’s not the end of the world, but some work will go to waste. Here’s how we try to reduce the chance of that happening. First, cutting counters in half is scheduled to happen at the start of every second, according to the system clock. ScyllaDB depends on an accurate system clock, so cutting counters on each node should be synchronized relatively well. Assuming this and that nodes accept a similar, steady rate of replica work, the counter values should be close to each other most of the time. Second, the coordinator chooses a random number from range 0..1 and sends it along with a request. Each replica computes the probability of rejection (calculated based on the counter value) and rejects if the coordinator sends a number below it. Because counters are supposed to be close together, replicas will usually agree on the decision. Read vs Write Accuracy There is one important difference in measurement accuracy between reads and writes that is worth mentioning. When a write operation occurs, the coordinator asks all live replicas to perform the operation. The consistency level only affects the number of replicas the coordinator is supposed to wait for. All live replicas increment their counters for every write operation so our current calculations are not affected. However, in the case of reads, the consistency level also affects the number of replicas contacted. It can lead to the shard counters being incremented fewer times than in the case of writes. For example, with replication factor 3 and a quorum consistency level, only 2 out of 3 replicas will be contacted for each read. The cluster will think that the real read rate is proportionally smaller, which can lead to a higher rate of requests being accepted than the user limit allows. We found this acceptable for an overload prevention solution where it’s more important to prevent the cluster performance from collapsing rather than enforcing some strict quota. Results: Goodput Restored after Enabling Rate Limit Here is a benchmark that demonstrates the usefulness of per-partition rate limiting. We set up a cluster on 3 i3.4xlarge AWS nodes with ScyllaDB 5.1.0-rc1. We pre-populated it with a small data set that fit in memory. First, we ran a uniform read workload of 80k reads per second from a c5.4xlarge instance – this represents the first section on the following charts. (Note that the charts show per-shard measurements.) Next, we started another loader instance. Instead of reading uniformly, this loader performed heavy queries on a single partition only – 8 times as much concurrency and 10 times the data fetched for each read. As expected, three shards become overloaded. Because the benchmarking application uses fixed concurrency, it becomes bottlenecked on the overloaded shards and its read rate dropped from 80k to 26k requests per second. Finally, we applied a per-partition limit of 10 reads per second. Here the chart shows that the read rate recovered. Even though the shards that were previously overloaded now coordinate many more requests per second, nearly all of the problematic requests are rejected. This is much cheaper than trying to actually handle them, and it’s still within the capabilities of the shards. Summary The article explored ScyllaDB’s solution to the “hot partition” problem in distributed database clusters through per-partition rate limiting. This capability was designed to maintain stable performance under stress by rejecting excess requests to maintain stable goodput. The implementation works by estimating request rates for each partition and making decisions based on those estimates. And benchmarks confirmed how rate limiting restored goodput, even under stressful conditions, by rejecting problematic requests. You can read the detailed design notes at https://github.com/scylladb/scylladb/blob/master/docs/dev/per-partition-rate-limit.md

IoT Overdrive Part 2: Where Can We Improve?

In our first blog, “IoT Overdrive Part 1: Compute Cluster Running Apache Cassandra® and Apache Kafka®”, we explored the project’s inception and Version 1.0. Now, let’s dive into the evolution of the project via a walkthrough of Version 1.4 and our plans with Version 2.0.

Some of our initial challenges that need to be addressed include:

  • Insufficient hardware
  • Unreliable WiFi
  • Finding a way to power all the new and existing hardware without as many cables
  • Automating the setup of the compute nodes

Upgrade: Version 1.4

Version 1.4 served as a steppingstone to Version 2.0, codenamed Ericht, which needed to be portable for shows and talks. Scaling up and out was a big necessity for this version, to make it operable at demonstrations in a timely manner.

Why the jump to 1.4? We made a few changes in between that were relatively minor, and so we rolled v1.1–1.3 into v1.4.

Let’s walk through the major changes:

Hardware

To resolve the insufficient hardware issue, we enhanced the hardware, implementing both vertical and horizontal scaling, and introduced a new method to power the Pi computers. This includes lowering the baseline model for a node at a 4GB quad-core computer (for vertical scale) and increasing the number of nodes for horizontal scaling.

These nodes are comparable to smaller AWS instances, and the cluster could be compared to a free cluster on the Instaclustr managed service.

Orange Pi 5

We upgraded the 4 Orange Pi worker nodes to Orange Pi 5s with 8gb RAM and added 256GB M.2 storage drives to each Orange Pi for storing database and Kafka data.

Raspberry Pi Workers

The 4 Raspberry Pi workers were upgraded to 8GB RAM Raspberry Pi 4bs, and the 4GB models were reassigned as Application Servers. We added 256GB USB drives to the Raspberry Pis for storing database and Kafka data.

After all that, the cluster plan looked like this:

Source: Kassian Wren

We needed a way to power all of this; luckily there’s a way to kill two birds with one stone here.

Power Over Ethernet (PoE) Switch

To eliminate power cable clutter and switch to ethernet connections, I used a Netgear 16-port PoE switch. Each Pi has its own 5V/3A adapter, which splits out into power USB-C and ethernet connectors.

This setup allowed for a single power plug to be used for the entire cluster instead of one for each Pi, greatly reducing the need for power strips and slimming down the cluster considerably.

The entire cluster plus the switch consume about 20W of power, which is not small but not unreasonable to power with a large solar panel.

Source: Kassian Wren

Software

I made quite a few tweaks on the software side in v1.4. However, the major software enhancement was the integration of Ansible.

Ansible automated the setup process for both Raspberry and Orange Pi computers, from performing apt updates to installing Docker and starting the swarm.

Making it Mobile

We made the enclosure more transport-friendly with configurations for both personal travel and secure shipping. An enclosure was designed from acrylic plates with screw holes to match the new heat sink/fan cases for the Pi computers.

Using a laser cutter/engraver, we cut the plates out of acrylic and used standoff screws for motherboards to stand the racks two high; although it can be configured to be four-high, it gets tall and ungainly.

The next challenge was coming up with a way to ship it for events.

There are two configurations: one where the cluster travels with me, and the other where the cluster is shipped to my destination.

For shipping, I use one large pelican case (hard-sided suitcase) with the pluck foam layers laid out to cradle the enclosures, with the switch in the layer above.

The “travel with me” configuration is still a work in progress; fitting everything into a case small enough to travel with is interesting! I use a carry-on pelican case and squish the cluster in. You can see the yellow case in this photo:

Source: Kassian Wren

More Lessons Learned

While Version 1.4 was much better, it did bring to light the need for more services and capabilities, and showed how my implementation of the cluster was pushing the limits of Docker.

Now that we’ve covered 1.4 and all of the work we’ve done so far, it’s time to look to the future with version 2.0, codenamed “Ericht.”

Next: Version 2.0 (Ericht)

With 2.0, I needed to build out a mobile, easy-to-move version of the cluster.

Version 2.0 focuses on creating a mobile, easily manageable version of the cluster, with enhanced automation for container deployment and management. It also implements a service to monitor the health of the cluster and pass and store the data using Kafka and Cassandra.

The problems we’re solving with 2.0 are:

  • Container management overhaul
  • Create mobility for the cluster
  • Give the cluster something to do and show
  • Monitor the health and status of the cluster

Hardware for 2.0

A significant enhancement in this new version is the integration of INA226 Voltage/Current sensors into each Pi. This addition is key in providing advanced power monitoring, and detailed insights into each unit’s energy consumption.

Health Monitoring

I will be implementing an advanced health monitoring system that tracks factors such as CPU and RAM usage, Docker status, and more. These will be passed into the Cassandra database using Kafka consumers and producers.

Software Updates

Significant software updates are planned on the software side of things. As the Docker swarm requires a lot of manual maintenance, I wanted a better way to orchestrate the containers. One way we thought of was Kubernetes.

Kubernetes (K8s)

We’re transitioning to utilizing with Docker, an open source container orchestration system, to manage our containers. Though Kubernetes is often used for more transient containers than Kafka and Cassandra, we have operators available such as Strimzi for Kafka and K8ssandra for Cassandra to help make this integration more feasible and effective.

Moving Ahead

As we continue to showcase and demo this cluster more and more often, (we were at Current 2023 and Community Over Code NA!), we are learning a lot. This project’s potential for broader technological applications is becoming increasingly evident.

As I move forward, I’ll continue to document my progress and share my findings here on the blog. Our next post will go into getting Apache Cassandra and Apache Kafka running on the cluster in Docker Swarm.

The post IoT Overdrive Part 2: Where Can We Improve? appeared first on Instaclustr.

Gemini Code Assist + Astra DB: Build Generative AI Apps Faster

Today at Google Cloud Next, DataStax is showcasing its integration with Gemini Code Assist to allow for automatic generation of Apache Cassandra compliant CQL. This integration, a collaboration with Google Cloud, will enable developers to build applications that use Astra DB faster. Developers...

Unlearning Old Habits: From Postgres to NoSQL

Where an RDBMS-pro’s intuition led him astray – and what he learned when our database performance expert helped him get up and running with ScyllaDB NoSQL Recently, I was asked by the ScyllaDB team if I could document some of my learnings moving from relational to NoSQL databases. Of course, there are many (bad) habits to break along the way, and instead of just documenting these, I ended up planning a 3 part livestream series with my good friend Felipe from ScyllaDB. ScyllaDB is a NoSQL database that is designed from the ground up with performance in mind. And by that, I mean optimizing for speed and throughput. I won’t go into the architecture details here, there’s a lot of material that you might find interesting if you want to read more in the resources section of this site. Scaling your database is a nice problem to have, it’s a good indicator that your business is doing well. Many customers of ScyllaDB have been on that exact journey – some compelling event that perhaps has driven them to consider moving from one database or another, all with the goal of driving down latency and increasing throughput. I’ve been through this journey myself, many times in the past, and I thought it would be great if I could re-learn, or perhaps just ditch some bad habits, by documenting the trials and tribulations of moving from a relational database to ScyllaDB. Even better if someone else benefits from these mistakes! Fictional App, Real Scaling Challenges So we start with a fictional application I have built. It’s the Rust-Eze Racing application which you can find on my personal GitHub here: https://github.com/timkoopmans/rust-eze-racing If you have kids, or are just a fan of the Cars movies, you’re no doubt familiar with Rust-Eze, which is one of the Piston Cup teams. — and also, a sponsor of the famous Lightning McQueen… Anyway, I digress. The purpose of this app is built around telemetry, a key factor in modern motor racing. It allows race engineers to interpret data that is captured from car systems so that they can tune for optimum performance. I created a simple Rust application that could simulate these metrics being written to the database, while at the same time having queries that read different metrics in real time. For the proof of concept, I used Postgres for data storage and was able to get decent read and write throughput from it on my development machine. Since we’re still in the world of make believe and cars that can talk, I want you to imagine that my PoC was massively successful, and I have been hired to write the whole back end system for the Piston Cup. At this point in the scenario, with overwhelming demands from the business, I start to get nervous about my choice of database: Will it scale? What happens when I have millions or perhaps billions of rows? What happens when I add many more columns and rows with high cardinality to the database? How can I achieve the highest write throughput while maintaining predictable low latency? All the usual questions a real full-stack developer might start to consider in a production scenario… This leads us to the mistakes I made and the journey I went through, spiking ScyllaDB as my new database of choice. Getting my development environment up and running, and connecting to the database for the first time In my first 1:1 livestreamed session with Felipe, I got my development environment up and running, using docker-compose to set up my first ScyllaDB node. I was a bit over-ambitious and set up a 3-node cluster, since a lot of the training material references this as a minimum requirement for production. Felipe suggested it’s not really required for development and that one node is perfectly fine. There are some good lessons in there, though… I got a bit confused with port mapping and the way that works in general. In Docker, you can map a published port on the host to a port in the container – so I naturally thought: Let’s map each of the container’s 9042 ports to a unique number on the host to facilitate client communication. I did something like this: ports: - "9041:9042" And I changed the port number for 3 nodes such that 9041 was node 1, 9042 was node 2 and 9043 was node 3. Then in my Rust code, I did something like this: SessionBuilder::new() .known_nodes(vec!["0.0.0.0:9041", "0.0.0.0:9042", "0.0.0.0:9043"]) .build() .await I did this thinking that the client would then know how to reach each of the nodes. As it turns out, that’s not quite true, depending on what system you’re working on. On my Linux machine, there didn’t seem to be any problem with this, but on macOS there are problems. Docker runs differently on macOS than Linux – Docker uses the Linux kernel, so these routes would always work, but macOS doesn’t have a Linux Kernel, so Docker has to run in a Linux virtual machine. When the client app connects to a node in ScyllaDB, part of the discovery logic in the driver is to ask for other nodes it can communicate with (read more on shard aware drivers here). Since ScyllaDB will advertise other nodes on 9042 ,they simply won’t be reachable. So on Linux, it looks like it established TCP comms with three nodes: ❯ netstat -an | grep 9042 | grep 172.19 tcp 0 0 172.19.0.1:52272 172.19.0.2:9042 ESTABLISHED tcp 0 0 172.19.0.1:36170 172.19.0.3:9042 ESTABLISHED tcp 0 0 172.19.0.1:40718 172.19.0.4:9042 ESTABLISHED But on macOS it looks a little different, with only one of the nodes with established TCP comms and the others stuck in SYN_SENT. The short of it is, you don’t really need to do this in development if you’re just using a single node! I was able to simplify my docker-compose file and avoid this problem. In reality, production nodes would most likely be on separate hosts/pods, so no need to map ports anyway. The nice thing about running with one node instead of three is that you’ll also avoid this type of problem, depending on which platform you’re using: std::runtime_error The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr I was able to circumvent this issue by using this flag: --reactor-backend=epoll This option switches Seastar threads to use epoll for event polling, as opposed to the default linux-aio implementation. This may be necessary for development workstations (in particular Mac OS deployments) where increasing the value for fs.aio-max-nr on the host system may not turn out to be so easy. Note that linux-aio (the default) is still the recommended option for production deployments. Another mistake I made was using this argument: --overprovisioned As it turns out, this argument needs a value, e.g. 1 to set this flag. However, it’s not necessary since this argument and the following are already set when you’re using ScyllaDB in docker: --developer-mode The refactored code looks something like this: scylladb1: image: scylladb/scylla container_name: scylladb1 expose: - "19042" - "7000" ports: - "9042:9042" restart: always command: - --smp 1 - --reactor-backend=epoll The only additional argument worth using in development is the  --smp command line option to restrict ScyllaDB to a specific number of CPUs. You can read more about that argument and other recommendations when using Docker to run ScyllaDB here. As I write all this out, I think to myself: This seems like pretty basic advice. But you will see from our conversation that these learnings are pretty typical for a developer new to ScyllaDB. So hopefully, you can take something away from this and avoid making the same mistakes as me. Watch the First Livestream Session On Demand Next Up: NoSQL Data Modeling In the next session with Felipe, we’ll dive deeper into my first queries using ScyllaDB and get the app up and running for the PoC. Join us as we walk through it on  April 18 in Developer Data Modeling Mistakes: From Postgres to NoSQL.

Benchmarking MongoDB vs ScyllaDB: IoT Sensor Workload Deep Dive

benchANT’s comparison of ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for an IoT sensor workload BenchANT recently benchmarked the performance and scalability of the market-leading general-purpose NoSQL database MongoDB and its performance-oriented challenger ScyllaDB. You can read a summary of the results in the blog Benchmarking MongoDB vs ScyllaDB: Performance, Scalability & Cost, see the key takeaways for various workloads in this technical summary,  and access all results (including the raw data) from the benchANT site. This blog offers a deep dive into the tests performed for the IoT sensor workload. The IoT sensor workload is based on the YCSB and its default data model, but with an operation distribution of 90% insert operations and 10% read operations that simulate a real-world IoT application. The workload is executed with the latest request distribution patterns. This workload is executed against the small database scaling size with a data set of 250GB and against the medium scaling size with a data set of 500GB. Before we get into the benchmark details, here is a summary of key insights for this workload. ScyllaDB outperforms MongoDB with higher throughput and lower latency results for the sensor workload except for the read latency in the small scaling size ScyllaDB provides constantly higher throughput that increases with growing data sizes up to 19 times ScyllaDB provides lower (down to 20 times) update latency results compared to MongoDB MongoDB provides lower read latency for the small scaling size, but ScyllaDB provides lower read latencies for the medium scaling size Throughput Results for MongoDB vs ScyllaDB The throughput results for the sensor workload show that the small ScyllaDB cluster is able to serve 60 kOps/s with a cluster utilization of ~89% while the small MongoDB cluster serves only 8 kOps/s under a comparable cluster utilization of 85-90%. For the medium cluster sizes, ScyllaDB achieves an average throughput of 236 kOps/s with ~88% cluster utilization and MongoDB 21 kOps/s with a cluster utilization of 75%-85%. Scalability Results for MongoDB vs ScyllaDB Analogous to the previous workloads, the throughput results allow us to compare the theoretical scale up factor for throughput with the actually achieved scalability. For ScyllaDB the maximal theoretical throughput scaling factor is 400% when scaling from small to medium. For MongoDB, the theoretical maximal throughput scaling factor is 600% when scaling size from small to medium. The ScyllaDB scalability results show that ScyllaDB is able to nearly achieve linear scalability by achieving a throughput scalability of 393% of the theoretically possible 400%. The scalability results for MongoDB show that it achieves a throughput scalability factor of 262% out of the theoretically possible 600%.   Throughput per Cost Ratio In order to compare the costs/month in relation to the provided throughput, we take the MongoDB Atlas throughput/$ as baseline (i.e. 100%) and compare it with the provided ScyllaDB Cloud throughput/$. The results show that ScyllaDB provides 6 times more operations/$ compared to MongoDB Atlas for the small scaling size and 11 times more operations/$ for the medium scaling size. Similar to the caching workload, MongoDB is able to scale the throughput with growing instance/cluster sizes, but the preserved operations/$ are decreasing.   Latency Results for MongoDB vs ScyllaDB The P99 latency results for the sensor workload show that ScyllaDB and MongoDB provide constantly low P99 read latencies for the small and medium scaling size. MongoDB provides the lowest read latency for the small scaling size, while ScyllaDB provides the lowest read latency for the medium scaling size. For the insert latencies, the results show a similar trend as for the previous workloads. ScyllaDB provides stable and low insert latencies, while MongoDB experiences up to 21 times higher update latencies. Technical Nugget – Performance Impact of the Data Model The default YCSB data model is composed of a primary key and a data item with 10 fields of strings that results in documents with 10 attributes for MongoDB and a table with 10 columns for ScyllaDB. We analyze how performance changes if a pure key-value data model is applied for both databases: a table with only one column for ScyllaDB and a document with only one field for MongoDB keeping the same record size of 1 KB. Compared to the data model impact for the social workload, the throughput improvements for the sensor workload are clearly lower. ScyllaDB improves the throughput by 8% while for MongoDB there is no throughput improvement. In general, this indicates that using a pure k-v improves the performance of read-heavy workloads rather than write-heavy workloads. Continue Comparing ScyllaDB vs MongoDB Here are some additional resources for learning about the differences between MongoDB and ScyllaDB: Benchmarking  MongoDB vs ScyllaDB: Results from benchANT’s complete benchmarking study that comprises 133 performance and scalability measurements that compare MongoDB against ScyllaDB. Benchmarking MongoDB vs ScyllaDB: Caching Workload Deep Dive: benchANT’s comparison of  ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for a caching workload A Technical Comparison of MongoDB vs ScyllaDB: benchANT’s technical analysis of how MongoDB and ScyllaDB compare with respect to their features, architectures, performance, and scalability. ScyllaDB’s MongoDB vs ScyllaDB page: Features perspectives from users – like Discord – who have moved from MongoDB to ScyllaDB.

Steering ZEE5’s Migration to ScyllaDB

Eliminating cloud vendor lockin and supporting rapid growth – with a 5X cost reduction Kishore Krishnamurthy, CTO at ZEE5, recently shared his perspective on Zee’s massive, business-critical database migration as the closing keynote at ScyllaDB Summit. You can read highlights of his talk below. All of the ScyllaDB Summit sessions, including Mr. Krishnamurthy’s session and the tech talk featuring Zee engineers, are available on demand. Watch On Demand About Zee Zee is a 30-year-old publicly-listed Indian media and entertainment company. We have interests in broadcast, OTT, studio, and music businesses. ZEE5 is our premier OTT streaming service, available in over 190 countries. We have about 150M monthly active users. We are available on web Android, iOS, smart TVs, and various other television and OEM devices. Business Pressures on Zee The media industry around the world is under a lot of bottom-line pressure. The broadcast business is now moving to OTT. While a lot of this business is being effectively captured on the OTT side, the business models on OTT are not scaling up similarly to the broadcast business. In this interim phase, as these business models stabilize, etc., there is a lot of pressure on us to run things very cost-efficiently. This problem is especially pronounced in India. Media consumption is on the rise in India. The biggest challenge we face is in terms of monetizability: our revenue is in rupees while our technology expenses are in dollars. For example, what we make from one subscription customer in one year is what Netflix or Disney would make in a month. We are on the lookout for technology and vendor partners who have a strong India presence, who have a cost structure that aligns with the kind of scale we provide, and who are able to provide the cost efficiency we want to deliver. Technical Pressures on the OTT Platform A lot of the costs we had on the platform scale linearly with usage and user base. That’s particularly true with some aspects of the platform like our heartbeat API, which was the primary use case driving our consideration of ScyllaDB. The linear cost escalation limited us in terms of what frequency we could run these kinds of solutions like heartbeat. A lot of our other solutions – like our playback experience, our security, and our recommendation systems – leverage heartbeat in their core infrastructure. Given the cost limitation, we could never scale that up. We also had challenges in terms of the distributed architecture of the solution we had. We were working towards a multi-tenant solution. We were exploring cloud-neutral solutions, etc. What Was Driving the Database Migration Sometime last year, we decided that we wanted to get rid of cloud vendor lockin. Every solution we were looking for had to be cloud-neutral, and the database choice also needed to deliver on that. We were also in the midst of a large merger. We wanted to make sure that our stack was multitenant and ready to onboard multiple OTT platforms. So these were reasons why we were eagerly looking for a solution that was both cloud-neutral and multitenant. Top Factors in Selecting ScyllaDB We wanted to move away from the master-slave architecture we had. We wanted our solution to be infinitely scalable. We wanted the solution to be multi-region-ready. One of the requirements from a compliance perspective was to be ready for any kind of regional disaster. When we came up with a solution for multi-region, the cost became significantly higher. We wanted a high-availability, multi-region solution and ScyllaDB’s clustered architecture allowed us to do that, and to move away from cloud vendor lockin. ScyllaDB allowed us to cut dependencies on the cloud provider. Today, we can run a multi-cloud solution on top of ScyllaDB. We also wanted to make sure that the migration would be seamless. ScyllaDB’s clustered architecture across clouds helped us when we were doing our recent cloud migration It allowed us to make it very seamless. From a support perspective, the ScyllaDB team was very responsive. They had local support in India, they had a local competency in terms of solution architects who held our hands along the way. So we were very confident we could deliver with their support. We found that operationally, ScyllaDB was very efficient. We could significantly reduce the number of database nodes when we moved to ScyllaDB. That also meant that the costs came down. ScyllaDB also happened to be a drop-in replacement for the current incumbents like Cassandra and DynamoDB. All of this together made it an easy choice for us to select ScyllaDB over the other database choices we were looking at. Migration Impact The migration to ScyllaDB was seamless. I’m happy to say there was zero downtime. After the migration to ScyllaDB, we also did a very large-scale cloud migration. From what we heard from the cloud providers, nobody else in the world had attempted this kind of migration overnight. And our migration was extremely smooth. ScyllaDB was a significant part of all the components we migrated, and that second migration was very seamless as well. After the migration, we moved about 525M users’ data, including their references, login details, session information, watch history, etc., to ScyllaDB. We have now hundreds of millions of heartbeats recorded on ScyllaDB. The overall data we store on ScyllaDB is in the tens of terabytes range at this point. Our overall cost is a combination of the efficiency that ScyllaDB provides in terms of the reduction in the number of nodes we use, and the cost structure that ScyllaDB provides. Together, this has given us a 5x improvement in cost – that’s something our CFO is very happy with. As I mentioned before, the support has been excellent. Through both the migrations – first the migration to ScyllaDB and subsequently the cloud migration – the ScyllaDB team was always available on demand during the peak periods. They were available on-prem to support us and hand-hold us through the whole thing. All in all, it’s a combination: the synergy that comes from the efficiency, the cost-effectiveness, and the scalability. The whole is more than the sum of the parts. That’s how I feel about our ScyllaDB migration. Next Steps ScyllaDB is clearly a favorite with the developers at Zee. I expect a lot more database workloads to move to ScyllaDB in the near future. If you’re curious about the intricacies of using heartbeats to track video watch progress, then catch the talk by Srinivas and Jivesh, where they explained the phenomenally efficient system that we have built. Watch the Zee Tech Talk

Inside a DynamoDB to ScyllaDB Migration

A detailed walk-through of an end-to-end DynamoDB to ScyllaDB migration We previously discussed ScyllaDB Migrator’s ability to easily migrate data from DynamoDB to ScyllaDB – including capturing events. This ensures that your destination table is consistent and abstracts much of the complexity involved during a migration. We’ve also covered when to move away from DynamoDB, exploring both technical and business reasons why organizations seek DynamoDB alternatives, and examined what a migration from DynamoDB looks like, walking through how a migration from DynamoDB looks like and how to accomplish it within the DynamoDB ecosystem. Now, let’s switch to a more granular (and practical) level. Let’s walk through an end-to-end DynamoDB to ScyllaDB migration, building up on top of what we discussed in the two previous articles. Before we begin, a quick heads up: The ScyllaDB Spark Migrator should still be your tool of choice for migrating from DynamoDB to ScyllaDB Alternator – ScyllaDB’s DynamoDB compatible API. But there are some scenarios where the Migrator isn’t an option: If you’re migrating from DynamoDB to CQL – ScyllaDB’s Cassandra compatible API If you don’t require a full-blown migration and simply want to stream a particular set of events to ScyllaDB If bringing up a Spark cluster is an overkill at your current scale You can follow along using the code in this GitHub repository. End-to-End DynamoDB to ScyllaDB Migration Let’s run through a migration exercise where: Our source DynamoDB table contains Items with an unknown (but up to 100) number of Attributes The application constantly ingests records to DynamoDB We want to be sure both DynamoDB and ScyllaDB are in sync before fully switching For simplicity, we will migrate to ScyllaDB Alternator, our DynamoDB compatible API. You can easily transform it to CQL as you go. Since we don’t want to incur any impact to our live application, we’ll back-fill the historical data via an S3 data export. Finally, we’ll use AWS Lambda to capture and replay events to our destination ScyllaDB cluster. Environment Prep – Source and Destination We start by creating our source DynamoDB table and ingesting data to it. The create_source_and_ingest.py script will create a DynamoDB table called source, and ingest 50K records to it. By default, the table will be created in On-Demand mode, and records will be ingested using the BatchWriteItem call serially. We also do not check whether all Batch Items were successfully processed, but this is something you should watch out for in a real production application. The output will look like this, and it should take a couple minutes to complete: Next, spin up a ScyllaDB cluster. For demonstration purposes, let’s spin a ScyllaDB container inside an EC2 instance: Beyond spinning up a ScyllaDB container, our Docker command-line does two things: Exposes port 8080 to the host OS to receive external traffic, which will be required later on to bring historical data and consume events from AWS Lambda Starts ScyllaDB Alternator – our DynamoDB compatible API, which we’ll be using for the rest of the migration. Once your cluster is up and running, the next step is to create your destination table. Since we are using ScyllaDB Alternator, simply run the create_target.py script to create your destination table: NOTE: Both source and destination tables share the same Key Attributes. In this guided migration, we won’t be performing any data transformations. If you plan to change your target schema, this would be the time for you to do it. 🙂 Back-fill Historical Data For back-filling, let’s perform a S3 Data Export. First, enable DynamoDB’s point-in-time recovery: Next, request a full table export. Before running the below command, ensure the destination S3 bucket exists: Depending on the size of your actual table, the export may take enough time for you to step away to grab a coffee and some snacks. In our tests, this process took around 10-15 minutes to complete for our sample table. To check the status of your full table export, replace your table ARN and execute: Once the process completes, modify the s3Restore.py script accordingly (our modified version of the LoadS3toDynamoDB sample code) and execute it to load the historical data: NOTE: To prevent mistakes and potential overwrites, we recommend you use a different Table name for your destination ScyllaDB cluster. In this example, we are migrating from a DynamoDB Table called source to a destination named dest. Remember that AWS allows you to request incremental backups after a full export. If you feel like that process took longer than expected, simply repeat it in smaller steps with later incremental backups. This can be yet another strategy to overcome the DynamoDB Streams 24-hour retention limit. Consuming DynamoDB Changes With the historical restore completed, let’s get your ScyllaDB cluster in sync with DynamoDB. Up to this point, we haven’t necessarily made any changes to our source DynamoDB table, so our destination ScyllaDB cluster should already be in sync. Create a Lambda function Name your Lambda function and select Python 3.12 as its Runtime: Expand the Advanced settings option, and select the Enable VPC option. This is required, given that our Lambda will directly write data to ScyllaDB Alternator, currently running in an EC2 instance. If you omit this option, your Lambda function may be unable to reach ScyllaDB. Once that’s done, select the VPC attached to your EC2 instance, and ensure that your Security Group allows Inbound traffic to ScyllaDB. In the screenshot below, we are simply allowing inbound traffic to ScyllaDB Alternator’s port: Finally, create the function. NOTE: If you are moving away from the AWS ecosystem, be sure to attach your Lambda function to a VPC with external traffic. Beware of the latency across different regions or when traversing the Internet, as it can greatly delay your migration time. If you’re migrating to a different protocol (such as CQL), ensure that your Security Group allows routing traffic to ScyllaDB relevant ports. Grant Permissions A Lambda function needs permissions to be useful. We need to be able to consume events from DynamoDB Streams and load them to ScyllaDB. Within your recently created Lambda function, go to Configuration > Permissions. From there, click the IAM role defined for your Lambda: This will take you to the IAM role page of your Lambda. Click Add permissions > Attach policies: Lastly, proceed with attaching the AWSLambdaInvocation-DynamoDB policy. Adjust the Timeout By default, a Lambda function runs for only about 3 seconds before AWS kills the process. Since we expect to process many events, it makes sense to increase the timeout to something more meaningful. Go to Configuration > General Configuration, and Edit to adjust its settings: Increase the timeout to a high enough value (we left it at 15 minutes) that allows you to process a series of events. Ensure that when you hit the timeout limit, DynamoDB Streams won’t consider the Stream to have failed processing, which effectively sends you into an infinite loop. You may also adjust other settings, such as Memory, as relevant (we left it at 1Gi). Deploy After the configuration steps, it is time to finally deploy our logic! The dynamodb-copy folder contains everything needed to help you do that. Start by editing the dynamodb-copy/lambda_function.py file and replace the alternator_endpoint value with the IP address and port relevant to your ScyllaDB deployment. Lastly, run the deploy.sh script and specify the Lambda function to update: NOTE: The Lambda function in question simply issues PutItem calls to ScyllaDB in a serial way, and does nothing else. For a realistic migration scenario, you probably want to handle DeleteItem and UpdateItem API calls, as well as other aspects such as TTL and error handling, depending on your use case. Capture DynamoDB Changes Remember that our application is continuously writing to DynamoDB, and our ultimate goal is to ensure that all records ultimately exist within ScyllaDB, without incurring any data loss. At this step, we’ll simply enable DynamoDB Streams to Capture change events as they go. To accomplish that, simply turn on DynamoDB streams to capture Item level changes: In View Type, specify that you want a New Image capture, and proceed with enabling the feature: Create a Trigger At this point, your Lambda is ready to start processing events from DynamoDB. Within the DynamoDB Exports and streams configuration, let’s create a Trigger to invoke our Lambda function every time an item gets changed: Next, choose the previously created Lambda function, and adjust the Batch size as needed (we used 1000): Once you create the trigger, data should start flowing from DynamoDB Streams to ScyllaDB Alternator! Generate Events To show the situation of an application frequently updating records, let’s simply re-execute the initial create_source_and_ingest.py program: It will insert another 50K records to DynamoDB, whose Attributes and values will be very different from the existing ones in ScyllaDB: The program found out the source table already exists and has simply overwritten all its existing records. The new records were then captured by DynamoDB Streams, which should now trigger our previously created Lambda function, which will stream its records to ScyllaDB. Comparing Results It may take some minutes for your Lambda to catch up and ingest all events to ScyllaDB (did we say coffee?). Ultimately, both databases should get in sync after a few minutes. Our last and final step is simply to compare both database records. Here, you can either compare everything, or just a few selected ones. To assist you with that, here’s our final program: compare.py! Simply invoke it, and it will compare the first 10K records across both databases and report any mismatches it finds: Congratulations! You moved away from DynamoDB! 🙂 Final Remarks In this article, we explored one of the many ways to migrate a DynamoDB workload to ScyllaDB. Your mileage may vary, but the general migration flow should be ultimately similar to what we’ve covered here. If you are interested in how organizations such as Digital Turbine or Zee migrated from DynamoDB to ScyllaDB, you may want to see their recent ScyllaDB Summit talks. Or perhaps you would like to learn more about different DynamoDB migration approaches? In that case, watch my talk in our NoSQL Data Migration Masterclass. If you want to get your specific questions answered directly, talk to us!

Inside Natura &Co Global Commercial Platform with ScyllaDB

Filipe Lima and Fabricio Rucci share the central role that ScyllaDB plays in their Global Commercial Platform Natura, a multi-brand global cosmetics group including Natura and Avon, spans across 70 countries and an ever-growing human network of over 7 million sales consultants. Ranked as one of the world’s strongest cosmetics brands, Natura’s operations require processing an intensive amount of consumer data to drive campaigns, run predictive analytics, and support its commercial operations. In this interview, Filipe Lima (Architecture Manager) and Fabricio Pinho Rucci (Data and Solution Architect) share their insights on where ScyllaDB fits inside Natura’s platform, how the database powers its growing operations, and why they chose ScyllaDB to drive their innovation at scale. “Natura and ScyllaDB have a longstanding partnership. We consider them as an integral part of our team”, said Filipe, alluding to their previous talk covering their migration from Cassandra to ScyllaDB back in 2018. Natura operations have dramatically scaled since then and ScyllaDB scaled alongside them, supporting their growth. Here are some key moments from the interview… Who is Natura? Filipe: Wherever you go inside ANY Brazilian house, you will find Natura cosmetics there. This alone should give you an idea of the magnitude and reach we have. We are really proud to be present in every Brazilian house! To put that into perspective, we are currently the seventh largest country in population, with over 200 million people. Natura &Co is today one of the largest cosmetics companies in the world. We are made up of two iconic beauty brands: Avon, which should be well-known globally Natura, which has a very strong presence in LATAM Today Natura’s operations span over a hundred countries, where most of our IT infrastructure is entirely cloud native. Operating at such a large scale does come with its own set of challenges, given our multi-channel presence – such as e-commerce, retail and mainly direct sales, plus managing over seven million sales consultants and brand representatives. Natura strongly believes in challenging the status quo in order to promote a real and positive social economic impact. Our main values are Cooperation, Co-creation, and Collaboration. Hence why we are Natura &Co. Given the scale of our operations, it becomes evident that we have several hundreds of different applications and integrations to manage. That brings complexity, and challenges of running our 24/7 mission-critical operations. What is the Global Commercial Platform? Why do you need it? Filipe: Before we discuss the Global Commercial Platform (GCP), let me provide you with context on why we needed it. We started this journey around 5 years ago, when we decided to create a single platform to manage all of our direct selling. At the time, we were facing scaling challenges. We lacked a centralized system interface for keeping and managing our business and data rules. And we relied on a loosely coupled infrastructure that had multiple points of failure. All of this could affect our sales process as we grew. The main reason we decided to build our own platform, rather than purchase an existing one from within the market, is because Natura’s business model is very specific and unique. On top of it, given our large product portfolio, integrating and re-architecting all of our existing applications to work and complement a third-party solution could become a very time-consuming process. At the time, we called that program LEGO, with 5 main components to manage our sales force, in addition to e-commerce that serves the end consumer. The LEGO program is defined as five components, each one covering specific parts of the direct selling process – including structuring, channel control, performance and payment of the sales force. Our five components are as follows: People/Registry (GPP) Relationships (GRP) Order Capture (GSP) Direct Sales Management (GCP) Data Operations (GDP) The platform responsible for generating data and sales indicators for the other platforms is GCP (Global Commercial Platform). GCP manages, integrates, and handles all rules related to Natura’s commercial models and their relationships, processes KPIs, and processes all intrinsic aspects related to direct selling, such as profits and commission bonuses. Why and where does ScyllaDB fit? Fabricio: We have been proud and happy users of ScyllaDB for many years now. Our journey with ScyllaDB started back in 2018. Back then, our old systems were very hard to scale, in a way that it got to a point where it became an impediment to managing our own operations and keeping up with our ongoing innovation. In 2018 we started this journey of migrating from our previous solution to ScyllaDB. Past that, we shifted to AWS and since then we have been expanding the reach of our platform to other business areas. For example, last year we started using ScyllaDB CDC, and currently we are studying to implement multi-region deployments for some of our applications. The main reason why we decided to shift to ScyllaDB was because of its impressive scaling power. Our indicator processing requires real-time execution, with the lowest latency possible. We receive several events per second, and the inability to process them in a timely manner would result in a backlog of requests, ruining our users’ experience. The fact that ScyllaDB scales linearly, both up and out, was also a key decision factor. We started small and later migrated more workloads to it gradually. Whenever we required more capacity, we simply added more nodes, in a planned and conscious way. “Bill shock” was never a problem for us with ScyllaDB. Our applications are Global (hence the platform’s acronym), and currently span several countries. Therefore, we could no longer work with maintenance windows incurring downtime. We needed a solution that would be always on and process our workloads in an uninterruptible way. ScyllaDB’s active-active architecture perfectly fits what we were looking for. We plan to cover the Northern Virginia and São Paulo regions on AWS in the near future with a multi-datacenter cluster, and so we can easily ensure strong consistency for our users thanks to ScyllaDB’s tunable consistency. What else can you tell us about your KPIs and their numbers? Filipe: One aspect to understand before we talk about the numbers ScyllaDB delivers to us is how our business model works. In a nutshell, Natura is made by people, and for people. We have Beauty Consultants all around the world bringing our products to the consumer market. The reason why the Natura brand is so strong (especially within Brazil), is primarily because we have a culture of dealing with people before we make important decisions, such as buying a car or a house. What typically happens is this: You have a friend, who is one of our Beauty Consultants. This friend of yours offers you her products. Since you trust your friend and you like the products, you eventually end up trying it out. In the end, you realize that you fell in love with it, and decide to always check in with your friend as time goes by. Ultimately, you also refer this friend of yours to other friends, as people ask which lotion or perfume you’re using, and that’s how it goes. Now imagine that same situation I described on a much larger scale. Remember: We have over 7 million consultants in our network of people. Therefore, we need to provide these consultants with incentives and campaigns for them to keep on doing the great job they are doing today. This involves, for example, checking whether they are active, if they recovered after a bad period, or whether they simply ceased engaging with us. If the consultant is a new recruit, it is important that we know this as well because every one of them is treated differently in a personalized way. That way, by treating our consultants with respect and appreciation, we leverage our platform to help us and them make the best decisions. Today ScyllaDB powers over 73K indicators, involving data of over 4 million consultants within 6 countries of Latin America. This includes over USD 120M just in orders and transactions. All of this is achieved on top of a ScyllaDB cluster delivering an average throughput of 120K operations per second, with single-digit millisecond latencies of 6 milliseconds for both reads and writes. How complex is Natura’s Commercial Platform architecture today? Fabricio: Very complex, as you can imagine for a business of that size! GCP is primarily deployed within AWS (heh!). We have several input sources coming from our data producers. These involve Sales, our Commercial Structures, Consultants, Orders, Final Customers, etc. Once these producers send us requests, their submissions enter our data pipelines. This information arrives in queues (MSK) and is consumed using Spark (EMR), some streaming and others batch, this data is transformed according to our business logic, which eventually reaches our database layer, which is where ScyllaDB is located. We of course have other databases in our stack, but for APIs and applications requiring real-time performance and low latency, we end up choosing ScyllaDB as the main datastore. For querying ScyllaDB we developed a centralized layer for our microservices using AWS Lambda and API Gateway. This layer consults ScyllaDB and then provides the requested information to all consumers that require it. As for more details about our ScyllaDB deployment, we currently have 12 nodes running on top of AWS i3.4xlarge EC2 instances. Out of the 120K operations I previously mentioned, 35K are writes, with an average latency of 3 milliseconds. The rest are reads, with an average latency of 6 milliseconds.