IoT Overdrive Part 2: Where Can We Improve?

In our first blog, “IoT Overdrive Part 1: Compute Cluster Running Apache Cassandra® and Apache Kafka®”, we explored the project’s inception and Version 1.0. Now, let’s dive into the evolution of the project via a walkthrough of Version 1.4 and our plans with Version 2.0.

Some of our initial challenges that need to be addressed include:

  • Insufficient hardware
  • Unreliable WiFi
  • Finding a way to power all the new and existing hardware without as many cables
  • Automating the setup of the compute nodes

Upgrade: Version 1.4

Version 1.4 served as a steppingstone to Version 2.0, codenamed Ericht, which needed to be portable for shows and talks. Scaling up and out was a big necessity for this version, to make it operable at demonstrations in a timely manner.

Why the jump to 1.4? We made a few changes in between that were relatively minor, and so we rolled v1.1–1.3 into v1.4.

Let’s walk through the major changes:


To resolve the insufficient hardware issue, we enhanced the hardware, implementing both vertical and horizontal scaling, and introduced a new method to power the Pi computers. This includes lowering the baseline model for a node at a 4GB quad-core computer (for vertical scale) and increasing the number of nodes for horizontal scaling.

These nodes are comparable to smaller AWS instances, and the cluster could be compared to a free cluster on the Instaclustr managed service.

Orange Pi 5

We upgraded the 4 Orange Pi worker nodes to Orange Pi 5s with 8gb RAM and added 256GB M.2 storage drives to each Orange Pi for storing database and Kafka data.

Raspberry Pi Workers

The 4 Raspberry Pi workers were upgraded to 8GB RAM Raspberry Pi 4bs, and the 4GB models were reassigned as Application Servers. We added 256GB USB drives to the Raspberry Pis for storing database and Kafka data.

After all that, the cluster plan looked like this:

Source: Kassian Wren

We needed a way to power all of this; luckily there’s a way to kill two birds with one stone here.

Power Over Ethernet (PoE) Switch

To eliminate power cable clutter and switch to ethernet connections, I used a Netgear 16-port PoE switch. Each Pi has its own 5V/3A adapter, which splits out into power USB-C and ethernet connectors.

This setup allowed for a single power plug to be used for the entire cluster instead of one for each Pi, greatly reducing the need for power strips and slimming down the cluster considerably.

The entire cluster plus the switch consume about 20W of power, which is not small but not unreasonable to power with a large solar panel.

Source: Kassian Wren


I made quite a few tweaks on the software side in v1.4. However, the major software enhancement was the integration of Ansible.

Ansible automated the setup process for both Raspberry and Orange Pi computers, from performing apt updates to installing Docker and starting the swarm.

Making it Mobile

We made the enclosure more transport-friendly with configurations for both personal travel and secure shipping. An enclosure was designed from acrylic plates with screw holes to match the new heat sink/fan cases for the Pi computers.

Using a laser cutter/engraver, we cut the plates out of acrylic and used standoff screws for motherboards to stand the racks two high; although it can be configured to be four-high, it gets tall and ungainly.

The next challenge was coming up with a way to ship it for events.

There are two configurations: one where the cluster travels with me, and the other where the cluster is shipped to my destination.

For shipping, I use one large pelican case (hard-sided suitcase) with the pluck foam layers laid out to cradle the enclosures, with the switch in the layer above.

The “travel with me” configuration is still a work in progress; fitting everything into a case small enough to travel with is interesting! I use a carry-on pelican case and squish the cluster in. You can see the yellow case in this photo:

Source: Kassian Wren

More Lessons Learned

While Version 1.4 was much better, it did bring to light the need for more services and capabilities, and showed how my implementation of the cluster was pushing the limits of Docker.

Now that we’ve covered 1.4 and all of the work we’ve done so far, it’s time to look to the future with version 2.0, codenamed “Ericht.”

Next: Version 2.0 (Ericht)

With 2.0, I needed to build out a mobile, easy-to-move version of the cluster.

Version 2.0 focuses on creating a mobile, easily manageable version of the cluster, with enhanced automation for container deployment and management. It also implements a service to monitor the health of the cluster and pass and store the data using Kafka and Cassandra.

The problems we’re solving with 2.0 are:

  • Container management overhaul
  • Create mobility for the cluster
  • Give the cluster something to do and show
  • Monitor the health and status of the cluster

Hardware for 2.0

A significant enhancement in this new version is the integration of INA226 Voltage/Current sensors into each Pi. This addition is key in providing advanced power monitoring, and detailed insights into each unit’s energy consumption.

Health Monitoring

I will be implementing an advanced health monitoring system that tracks factors such as CPU and RAM usage, Docker status, and more. These will be passed into the Cassandra database using Kafka consumers and producers.

Software Updates

Significant software updates are planned on the software side of things. As the Docker swarm requires a lot of manual maintenance, I wanted a better way to orchestrate the containers. One way we thought of was Kubernetes.

Kubernetes (K8s)

We’re transitioning to utilizing with Docker, an open source container orchestration system, to manage our containers. Though Kubernetes is often used for more transient containers than Kafka and Cassandra, we have operators available such as Strimzi for Kafka and K8ssandra for Cassandra to help make this integration more feasible and effective.

Moving Ahead

As we continue to showcase and demo this cluster more and more often, (we were at Current 2023 and Community Over Code NA!), we are learning a lot. This project’s potential for broader technological applications is becoming increasingly evident.

As I move forward, I’ll continue to document my progress and share my findings here on the blog. Our next post will go into getting Apache Cassandra and Apache Kafka running on the cluster in Docker Swarm.

The post IoT Overdrive Part 2: Where Can We Improve? appeared first on Instaclustr.

Gemini Code Assist + Astra DB: Build Generative AI Apps Faster

Today at Google Cloud Next, DataStax is showcasing its integration with Gemini Code Assist to allow for automatic generation of Apache Cassandra compliant CQL. This integration, a collaboration with Google Cloud, will enable developers to build applications that use Astra DB faster. Developers...

Unlearning Old Habits: From Postgres to NoSQL

Where an RDBMS-pro’s intuition led him astray – and what he learned when our database performance expert helped him get up and running with ScyllaDB NoSQL Recently, I was asked by the ScyllaDB team if I could document some of my learnings moving from relational to NoSQL databases. Of course, there are many (bad) habits to break along the way, and instead of just documenting these, I ended up planning a 3 part livestream series with my good friend Felipe from ScyllaDB. ScyllaDB is a NoSQL database that is designed from the ground up with performance in mind. And by that, I mean optimizing for speed and throughput. I won’t go into the architecture details here, there’s a lot of material that you might find interesting if you want to read more in the resources section of this site. Scaling your database is a nice problem to have, it’s a good indicator that your business is doing well. Many customers of ScyllaDB have been on that exact journey – some compelling event that perhaps has driven them to consider moving from one database or another, all with the goal of driving down latency and increasing throughput. I’ve been through this journey myself, many times in the past, and I thought it would be great if I could re-learn, or perhaps just ditch some bad habits, by documenting the trials and tribulations of moving from a relational database to ScyllaDB. Even better if someone else benefits from these mistakes! Fictional App, Real Scaling Challenges So we start with a fictional application I have built. It’s the Rust-Eze Racing application which you can find on my personal GitHub here: If you have kids, or are just a fan of the Cars movies, you’re no doubt familiar with Rust-Eze, which is one of the Piston Cup teams. — and also, a sponsor of the famous Lightning McQueen… Anyway, I digress. The purpose of this app is built around telemetry, a key factor in modern motor racing. It allows race engineers to interpret data that is captured from car systems so that they can tune for optimum performance. I created a simple Rust application that could simulate these metrics being written to the database, while at the same time having queries that read different metrics in real time. For the proof of concept, I used Postgres for data storage and was able to get decent read and write throughput from it on my development machine. Since we’re still in the world of make believe and cars that can talk, I want you to imagine that my PoC was massively successful, and I have been hired to write the whole back end system for the Piston Cup. At this point in the scenario, with overwhelming demands from the business, I start to get nervous about my choice of database: Will it scale? What happens when I have millions or perhaps billions of rows? What happens when I add many more columns and rows with high cardinality to the database? How can I achieve the highest write throughput while maintaining predictable low latency? All the usual questions a real full-stack developer might start to consider in a production scenario… This leads us to the mistakes I made and the journey I went through, spiking ScyllaDB as my new database of choice. Getting my development environment up and running, and connecting to the database for the first time In my first 1:1 livestreamed session with Felipe, I got my development environment up and running, using docker-compose to set up my first ScyllaDB node. I was a bit over-ambitious and set up a 3-node cluster, since a lot of the training material references this as a minimum requirement for production. Felipe suggested it’s not really required for development and that one node is perfectly fine. There are some good lessons in there, though… I got a bit confused with port mapping and the way that works in general. In Docker, you can map a published port on the host to a port in the container – so I naturally thought: Let’s map each of the container’s 9042 ports to a unique number on the host to facilitate client communication. I did something like this: ports: - "9041:9042" And I changed the port number for 3 nodes such that 9041 was node 1, 9042 was node 2 and 9043 was node 3. Then in my Rust code, I did something like this: SessionBuilder::new() .known_nodes(vec!["", "", ""]) .build() .await I did this thinking that the client would then know how to reach each of the nodes. As it turns out, that’s not quite true, depending on what system you’re working on. On my Linux machine, there didn’t seem to be any problem with this, but on macOS there are problems. Docker runs differently on macOS than Linux – Docker uses the Linux kernel, so these routes would always work, but macOS doesn’t have a Linux Kernel, so Docker has to run in a Linux virtual machine. When the client app connects to a node in ScyllaDB, part of the discovery logic in the driver is to ask for other nodes it can communicate with (read more on shard aware drivers here). Since ScyllaDB will advertise other nodes on 9042 ,they simply won’t be reachable. So on Linux, it looks like it established TCP comms with three nodes: ❯ netstat -an | grep 9042 | grep 172.19 tcp 0 0 ESTABLISHED tcp 0 0 ESTABLISHED tcp 0 0 ESTABLISHED But on macOS it looks a little different, with only one of the nodes with established TCP comms and the others stuck in SYN_SENT. The short of it is, you don’t really need to do this in development if you’re just using a single node! I was able to simplify my docker-compose file and avoid this problem. In reality, production nodes would most likely be on separate hosts/pods, so no need to map ports anyway. The nice thing about running with one node instead of three is that you’ll also avoid this type of problem, depending on which platform you’re using: std::runtime_error The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr I was able to circumvent this issue by using this flag: --reactor-backend=epoll This option switches Seastar threads to use epoll for event polling, as opposed to the default linux-aio implementation. This may be necessary for development workstations (in particular Mac OS deployments) where increasing the value for fs.aio-max-nr on the host system may not turn out to be so easy. Note that linux-aio (the default) is still the recommended option for production deployments. Another mistake I made was using this argument: --overprovisioned As it turns out, this argument needs a value, e.g. 1 to set this flag. However, it’s not necessary since this argument and the following are already set when you’re using ScyllaDB in docker: --developer-mode The refactored code looks something like this: scylladb1: image: scylladb/scylla container_name: scylladb1 expose: - "19042" - "7000" ports: - "9042:9042" restart: always command: - --smp 1 - --reactor-backend=epoll The only additional argument worth using in development is the  --smp command line option to restrict ScyllaDB to a specific number of CPUs. You can read more about that argument and other recommendations when using Docker to run ScyllaDB here. As I write all this out, I think to myself: This seems like pretty basic advice. But you will see from our conversation that these learnings are pretty typical for a developer new to ScyllaDB. So hopefully, you can take something away from this and avoid making the same mistakes as me. Watch the First Livestream Session On Demand Next Up: NoSQL Data Modeling In the next session with Felipe, we’ll dive deeper into my first queries using ScyllaDB and get the app up and running for the PoC. Join us as we walk through it on  April 18 in Developer Data Modeling Mistakes: From Postgres to NoSQL.

Benchmarking MongoDB vs ScyllaDB: IoT Sensor Workload Deep Dive

benchANT’s comparison of ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for an IoT sensor workload BenchANT recently benchmarked the performance and scalability of the market-leading general-purpose NoSQL database MongoDB and its performance-oriented challenger ScyllaDB. You can read a summary of the results in the blog Benchmarking MongoDB vs ScyllaDB: Performance, Scalability & Cost, see the key takeaways for various workloads in this technical summary,  and access all results (including the raw data) from the benchANT site. This blog offers a deep dive into the tests performed for the IoT sensor workload. The IoT sensor workload is based on the YCSB and its default data model, but with an operation distribution of 90% insert operations and 10% read operations that simulate a real-world IoT application. The workload is executed with the latest request distribution patterns. This workload is executed against the small database scaling size with a data set of 250GB and against the medium scaling size with a data set of 500GB. Before we get into the benchmark details, here is a summary of key insights for this workload. ScyllaDB outperforms MongoDB with higher throughput and lower latency results for the sensor workload except for the read latency in the small scaling size ScyllaDB provides constantly higher throughput that increases with growing data sizes up to 19 times ScyllaDB provides lower (down to 20 times) update latency results compared to MongoDB MongoDB provides lower read latency for the small scaling size, but ScyllaDB provides lower read latencies for the medium scaling size Throughput Results for MongoDB vs ScyllaDB The throughput results for the sensor workload show that the small ScyllaDB cluster is able to serve 60 kOps/s with a cluster utilization of ~89% while the small MongoDB cluster serves only 8 kOps/s under a comparable cluster utilization of 85-90%. For the medium cluster sizes, ScyllaDB achieves an average throughput of 236 kOps/s with ~88% cluster utilization and MongoDB 21 kOps/s with a cluster utilization of 75%-85%. Scalability Results for MongoDB vs ScyllaDB Analogous to the previous workloads, the throughput results allow us to compare the theoretical scale up factor for throughput with the actually achieved scalability. For ScyllaDB the maximal theoretical throughput scaling factor is 400% when scaling from small to medium. For MongoDB, the theoretical maximal throughput scaling factor is 600% when scaling size from small to medium. The ScyllaDB scalability results show that ScyllaDB is able to nearly achieve linear scalability by achieving a throughput scalability of 393% of the theoretically possible 400%. The scalability results for MongoDB show that it achieves a throughput scalability factor of 262% out of the theoretically possible 600%.   Throughput per Cost Ratio In order to compare the costs/month in relation to the provided throughput, we take the MongoDB Atlas throughput/$ as baseline (i.e. 100%) and compare it with the provided ScyllaDB Cloud throughput/$. The results show that ScyllaDB provides 6 times more operations/$ compared to MongoDB Atlas for the small scaling size and 11 times more operations/$ for the medium scaling size. Similar to the caching workload, MongoDB is able to scale the throughput with growing instance/cluster sizes, but the preserved operations/$ are decreasing.   Latency Results for MongoDB vs ScyllaDB The P99 latency results for the sensor workload show that ScyllaDB and MongoDB provide constantly low P99 read latencies for the small and medium scaling size. MongoDB provides the lowest read latency for the small scaling size, while ScyllaDB provides the lowest read latency for the medium scaling size. For the insert latencies, the results show a similar trend as for the previous workloads. ScyllaDB provides stable and low insert latencies, while MongoDB experiences up to 21 times higher update latencies. Technical Nugget – Performance Impact of the Data Model The default YCSB data model is composed of a primary key and a data item with 10 fields of strings that results in documents with 10 attributes for MongoDB and a table with 10 columns for ScyllaDB. We analyze how performance changes if a pure key-value data model is applied for both databases: a table with only one column for ScyllaDB and a document with only one field for MongoDB keeping the same record size of 1 KB. Compared to the data model impact for the social workload, the throughput improvements for the sensor workload are clearly lower. ScyllaDB improves the throughput by 8% while for MongoDB there is no throughput improvement. In general, this indicates that using a pure k-v improves the performance of read-heavy workloads rather than write-heavy workloads. Continue Comparing ScyllaDB vs MongoDB Here are some additional resources for learning about the differences between MongoDB and ScyllaDB: Benchmarking  MongoDB vs ScyllaDB: Results from benchANT’s complete benchmarking study that comprises 133 performance and scalability measurements that compare MongoDB against ScyllaDB. Benchmarking MongoDB vs ScyllaDB: Caching Workload Deep Dive: benchANT’s comparison of  ScyllaDB vs MongoDB in terms of throughput, latency, scalability, and cost for a caching workload A Technical Comparison of MongoDB vs ScyllaDB: benchANT’s technical analysis of how MongoDB and ScyllaDB compare with respect to their features, architectures, performance, and scalability. ScyllaDB’s MongoDB vs ScyllaDB page: Features perspectives from users – like Discord – who have moved from MongoDB to ScyllaDB.

Steering ZEE5’s Migration to ScyllaDB

Eliminating cloud vendor lockin and supporting rapid growth – with a 5X cost reduction Kishore Krishnamurthy, CTO at ZEE5, recently shared his perspective on Zee’s massive, business-critical database migration as the closing keynote at ScyllaDB Summit. You can read highlights of his talk below. All of the ScyllaDB Summit sessions, including Mr. Krishnamurthy’s session and the tech talk featuring Zee engineers, are available on demand. Watch On Demand About Zee Zee is a 30-year-old publicly-listed Indian media and entertainment company. We have interests in broadcast, OTT, studio, and music businesses. ZEE5 is our premier OTT streaming service, available in over 190 countries. We have about 150M monthly active users. We are available on web Android, iOS, smart TVs, and various other television and OEM devices. Business Pressures on Zee The media industry around the world is under a lot of bottom-line pressure. The broadcast business is now moving to OTT. While a lot of this business is being effectively captured on the OTT side, the business models on OTT are not scaling up similarly to the broadcast business. In this interim phase, as these business models stabilize, etc., there is a lot of pressure on us to run things very cost-efficiently. This problem is especially pronounced in India. Media consumption is on the rise in India. The biggest challenge we face is in terms of monetizability: our revenue is in rupees while our technology expenses are in dollars. For example, what we make from one subscription customer in one year is what Netflix or Disney would make in a month. We are on the lookout for technology and vendor partners who have a strong India presence, who have a cost structure that aligns with the kind of scale we provide, and who are able to provide the cost efficiency we want to deliver. Technical Pressures on the OTT Platform A lot of the costs we had on the platform scale linearly with usage and user base. That’s particularly true with some aspects of the platform like our heartbeat API, which was the primary use case driving our consideration of ScyllaDB. The linear cost escalation limited us in terms of what frequency we could run these kinds of solutions like heartbeat. A lot of our other solutions – like our playback experience, our security, and our recommendation systems – leverage heartbeat in their core infrastructure. Given the cost limitation, we could never scale that up. We also had challenges in terms of the distributed architecture of the solution we had. We were working towards a multi-tenant solution. We were exploring cloud-neutral solutions, etc. What Was Driving the Database Migration Sometime last year, we decided that we wanted to get rid of cloud vendor lockin. Every solution we were looking for had to be cloud-neutral, and the database choice also needed to deliver on that. We were also in the midst of a large merger. We wanted to make sure that our stack was multitenant and ready to onboard multiple OTT platforms. So these were reasons why we were eagerly looking for a solution that was both cloud-neutral and multitenant. Top Factors in Selecting ScyllaDB We wanted to move away from the master-slave architecture we had. We wanted our solution to be infinitely scalable. We wanted the solution to be multi-region-ready. One of the requirements from a compliance perspective was to be ready for any kind of regional disaster. When we came up with a solution for multi-region, the cost became significantly higher. We wanted a high-availability, multi-region solution and ScyllaDB’s clustered architecture allowed us to do that, and to move away from cloud vendor lockin. ScyllaDB allowed us to cut dependencies on the cloud provider. Today, we can run a multi-cloud solution on top of ScyllaDB. We also wanted to make sure that the migration would be seamless. ScyllaDB’s clustered architecture across clouds helped us when we were doing our recent cloud migration It allowed us to make it very seamless. From a support perspective, the ScyllaDB team was very responsive. They had local support in India, they had a local competency in terms of solution architects who held our hands along the way. So we were very confident we could deliver with their support. We found that operationally, ScyllaDB was very efficient. We could significantly reduce the number of database nodes when we moved to ScyllaDB. That also meant that the costs came down. ScyllaDB also happened to be a drop-in replacement for the current incumbents like Cassandra and DynamoDB. All of this together made it an easy choice for us to select ScyllaDB over the other database choices we were looking at. Migration Impact The migration to ScyllaDB was seamless. I’m happy to say there was zero downtime. After the migration to ScyllaDB, we also did a very large-scale cloud migration. From what we heard from the cloud providers, nobody else in the world had attempted this kind of migration overnight. And our migration was extremely smooth. ScyllaDB was a significant part of all the components we migrated, and that second migration was very seamless as well. After the migration, we moved about 525M users’ data, including their references, login details, session information, watch history, etc., to ScyllaDB. We have now hundreds of millions of heartbeats recorded on ScyllaDB. The overall data we store on ScyllaDB is in the tens of terabytes range at this point. Our overall cost is a combination of the efficiency that ScyllaDB provides in terms of the reduction in the number of nodes we use, and the cost structure that ScyllaDB provides. Together, this has given us a 5x improvement in cost – that’s something our CFO is very happy with. As I mentioned before, the support has been excellent. Through both the migrations – first the migration to ScyllaDB and subsequently the cloud migration – the ScyllaDB team was always available on demand during the peak periods. They were available on-prem to support us and hand-hold us through the whole thing. All in all, it’s a combination: the synergy that comes from the efficiency, the cost-effectiveness, and the scalability. The whole is more than the sum of the parts. That’s how I feel about our ScyllaDB migration. Next Steps ScyllaDB is clearly a favorite with the developers at Zee. I expect a lot more database workloads to move to ScyllaDB in the near future. If you’re curious about the intricacies of using heartbeats to track video watch progress, then catch the talk by Srinivas and Jivesh, where they explained the phenomenally efficient system that we have built. Watch the Zee Tech Talk

Inside a DynamoDB to ScyllaDB Migration

A detailed walk-through of an end-to-end DynamoDB to ScyllaDB migration We previously discussed ScyllaDB Migrator’s ability to easily migrate data from DynamoDB to ScyllaDB – including capturing events. This ensures that your destination table is consistent and abstracts much of the complexity involved during a migration. We’ve also covered when to move away from DynamoDB, exploring both technical and business reasons why organizations seek DynamoDB alternatives, and examined what a migration from DynamoDB looks like, walking through how a migration from DynamoDB looks like and how to accomplish it within the DynamoDB ecosystem. Now, let’s switch to a more granular (and practical) level. Let’s walk through an end-to-end DynamoDB to ScyllaDB migration, building up on top of what we discussed in the two previous articles. Before we begin, a quick heads up: The ScyllaDB Spark Migrator should still be your tool of choice for migrating from DynamoDB to ScyllaDB Alternator – ScyllaDB’s DynamoDB compatible API. But there are some scenarios where the Migrator isn’t an option: If you’re migrating from DynamoDB to CQL – ScyllaDB’s Cassandra compatible API If you don’t require a full-blown migration and simply want to stream a particular set of events to ScyllaDB If bringing up a Spark cluster is an overkill at your current scale You can follow along using the code in this GitHub repository. End-to-End DynamoDB to ScyllaDB Migration Let’s run through a migration exercise where: Our source DynamoDB table contains Items with an unknown (but up to 100) number of Attributes The application constantly ingests records to DynamoDB We want to be sure both DynamoDB and ScyllaDB are in sync before fully switching For simplicity, we will migrate to ScyllaDB Alternator, our DynamoDB compatible API. You can easily transform it to CQL as you go. Since we don’t want to incur any impact to our live application, we’ll back-fill the historical data via an S3 data export. Finally, we’ll use AWS Lambda to capture and replay events to our destination ScyllaDB cluster. Environment Prep – Source and Destination We start by creating our source DynamoDB table and ingesting data to it. The script will create a DynamoDB table called source, and ingest 50K records to it. By default, the table will be created in On-Demand mode, and records will be ingested using the BatchWriteItem call serially. We also do not check whether all Batch Items were successfully processed, but this is something you should watch out for in a real production application. The output will look like this, and it should take a couple minutes to complete: Next, spin up a ScyllaDB cluster. For demonstration purposes, let’s spin a ScyllaDB container inside an EC2 instance: Beyond spinning up a ScyllaDB container, our Docker command-line does two things: Exposes port 8080 to the host OS to receive external traffic, which will be required later on to bring historical data and consume events from AWS Lambda Starts ScyllaDB Alternator – our DynamoDB compatible API, which we’ll be using for the rest of the migration. Once your cluster is up and running, the next step is to create your destination table. Since we are using ScyllaDB Alternator, simply run the script to create your destination table: NOTE: Both source and destination tables share the same Key Attributes. In this guided migration, we won’t be performing any data transformations. If you plan to change your target schema, this would be the time for you to do it. 🙂 Back-fill Historical Data For back-filling, let’s perform a S3 Data Export. First, enable DynamoDB’s point-in-time recovery: Next, request a full table export. Before running the below command, ensure the destination S3 bucket exists: Depending on the size of your actual table, the export may take enough time for you to step away to grab a coffee and some snacks. In our tests, this process took around 10-15 minutes to complete for our sample table. To check the status of your full table export, replace your table ARN and execute: Once the process completes, modify the script accordingly (our modified version of the LoadS3toDynamoDB sample code) and execute it to load the historical data: NOTE: To prevent mistakes and potential overwrites, we recommend you use a different Table name for your destination ScyllaDB cluster. In this example, we are migrating from a DynamoDB Table called source to a destination named dest. Remember that AWS allows you to request incremental backups after a full export. If you feel like that process took longer than expected, simply repeat it in smaller steps with later incremental backups. This can be yet another strategy to overcome the DynamoDB Streams 24-hour retention limit. Consuming DynamoDB Changes With the historical restore completed, let’s get your ScyllaDB cluster in sync with DynamoDB. Up to this point, we haven’t necessarily made any changes to our source DynamoDB table, so our destination ScyllaDB cluster should already be in sync. Create a Lambda function Name your Lambda function and select Python 3.12 as its Runtime: Expand the Advanced settings option, and select the Enable VPC option. This is required, given that our Lambda will directly write data to ScyllaDB Alternator, currently running in an EC2 instance. If you omit this option, your Lambda function may be unable to reach ScyllaDB. Once that’s done, select the VPC attached to your EC2 instance, and ensure that your Security Group allows Inbound traffic to ScyllaDB. In the screenshot below, we are simply allowing inbound traffic to ScyllaDB Alternator’s port: Finally, create the function. NOTE: If you are moving away from the AWS ecosystem, be sure to attach your Lambda function to a VPC with external traffic. Beware of the latency across different regions or when traversing the Internet, as it can greatly delay your migration time. If you’re migrating to a different protocol (such as CQL), ensure that your Security Group allows routing traffic to ScyllaDB relevant ports. Grant Permissions A Lambda function needs permissions to be useful. We need to be able to consume events from DynamoDB Streams and load them to ScyllaDB. Within your recently created Lambda function, go to Configuration > Permissions. From there, click the IAM role defined for your Lambda: This will take you to the IAM role page of your Lambda. Click Add permissions > Attach policies: Lastly, proceed with attaching the AWSLambdaInvocation-DynamoDB policy. Adjust the Timeout By default, a Lambda function runs for only about 3 seconds before AWS kills the process. Since we expect to process many events, it makes sense to increase the timeout to something more meaningful. Go to Configuration > General Configuration, and Edit to adjust its settings: Increase the timeout to a high enough value (we left it at 15 minutes) that allows you to process a series of events. Ensure that when you hit the timeout limit, DynamoDB Streams won’t consider the Stream to have failed processing, which effectively sends you into an infinite loop. You may also adjust other settings, such as Memory, as relevant (we left it at 1Gi). Deploy After the configuration steps, it is time to finally deploy our logic! The dynamodb-copy folder contains everything needed to help you do that. Start by editing the dynamodb-copy/ file and replace the alternator_endpoint value with the IP address and port relevant to your ScyllaDB deployment. Lastly, run the script and specify the Lambda function to update: NOTE: The Lambda function in question simply issues PutItem calls to ScyllaDB in a serial way, and does nothing else. For a realistic migration scenario, you probably want to handle DeleteItem and UpdateItem API calls, as well as other aspects such as TTL and error handling, depending on your use case. Capture DynamoDB Changes Remember that our application is continuously writing to DynamoDB, and our ultimate goal is to ensure that all records ultimately exist within ScyllaDB, without incurring any data loss. At this step, we’ll simply enable DynamoDB Streams to Capture change events as they go. To accomplish that, simply turn on DynamoDB streams to capture Item level changes: In View Type, specify that you want a New Image capture, and proceed with enabling the feature: Create a Trigger At this point, your Lambda is ready to start processing events from DynamoDB. Within the DynamoDB Exports and streams configuration, let’s create a Trigger to invoke our Lambda function every time an item gets changed: Next, choose the previously created Lambda function, and adjust the Batch size as needed (we used 1000): Once you create the trigger, data should start flowing from DynamoDB Streams to ScyllaDB Alternator! Generate Events To show the situation of an application frequently updating records, let’s simply re-execute the initial program: It will insert another 50K records to DynamoDB, whose Attributes and values will be very different from the existing ones in ScyllaDB: The program found out the source table already exists and has simply overwritten all its existing records. The new records were then captured by DynamoDB Streams, which should now trigger our previously created Lambda function, which will stream its records to ScyllaDB. Comparing Results It may take some minutes for your Lambda to catch up and ingest all events to ScyllaDB (did we say coffee?). Ultimately, both databases should get in sync after a few minutes. Our last and final step is simply to compare both database records. Here, you can either compare everything, or just a few selected ones. To assist you with that, here’s our final program:! Simply invoke it, and it will compare the first 10K records across both databases and report any mismatches it finds: Congratulations! You moved away from DynamoDB! 🙂 Final Remarks In this article, we explored one of the many ways to migrate a DynamoDB workload to ScyllaDB. Your mileage may vary, but the general migration flow should be ultimately similar to what we’ve covered here. If you are interested in how organizations such as Digital Turbine or Zee migrated from DynamoDB to ScyllaDB, you may want to see their recent ScyllaDB Summit talks. Or perhaps you would like to learn more about different DynamoDB migration approaches? In that case, watch my talk in our NoSQL Data Migration Masterclass. If you want to get your specific questions answered directly, talk to us!

Inside Natura &Co Global Commercial Platform with ScyllaDB

Filipe Lima and Fabricio Rucci share the central role that ScyllaDB plays in their Global Commercial Platform Natura, a multi-brand global cosmetics group including Natura and Avon, spans across 70 countries and an ever-growing human network of over 7 million sales consultants. Ranked as one of the world’s strongest cosmetics brands, Natura’s operations require processing an intensive amount of consumer data to drive campaigns, run predictive analytics, and support its commercial operations. In this interview, Filipe Lima (Architecture Manager) and Fabricio Pinho Rucci (Data and Solution Architect) share their insights on where ScyllaDB fits inside Natura’s platform, how the database powers its growing operations, and why they chose ScyllaDB to drive their innovation at scale. “Natura and ScyllaDB have a longstanding partnership. We consider them as an integral part of our team”, said Filipe, alluding to their previous talk covering their migration from Cassandra to ScyllaDB back in 2018. Natura operations have dramatically scaled since then and ScyllaDB scaled alongside them, supporting their growth. Here are some key moments from the interview… Who is Natura? Filipe: Wherever you go inside ANY Brazilian house, you will find Natura cosmetics there. This alone should give you an idea of the magnitude and reach we have. We are really proud to be present in every Brazilian house! To put that into perspective, we are currently the seventh largest country in population, with over 200 million people. Natura &Co is today one of the largest cosmetics companies in the world. We are made up of two iconic beauty brands: Avon, which should be well-known globally Natura, which has a very strong presence in LATAM Today Natura’s operations span over a hundred countries, where most of our IT infrastructure is entirely cloud native. Operating at such a large scale does come with its own set of challenges, given our multi-channel presence – such as e-commerce, retail and mainly direct sales, plus managing over seven million sales consultants and brand representatives. Natura strongly believes in challenging the status quo in order to promote a real and positive social economic impact. Our main values are Cooperation, Co-creation, and Collaboration. Hence why we are Natura &Co. Given the scale of our operations, it becomes evident that we have several hundreds of different applications and integrations to manage. That brings complexity, and challenges of running our 24/7 mission-critical operations. What is the Global Commercial Platform? Why do you need it? Filipe: Before we discuss the Global Commercial Platform (GCP), let me provide you with context on why we needed it. We started this journey around 5 years ago, when we decided to create a single platform to manage all of our direct selling. At the time, we were facing scaling challenges. We lacked a centralized system interface for keeping and managing our business and data rules. And we relied on a loosely coupled infrastructure that had multiple points of failure. All of this could affect our sales process as we grew. The main reason we decided to build our own platform, rather than purchase an existing one from within the market, is because Natura’s business model is very specific and unique. On top of it, given our large product portfolio, integrating and re-architecting all of our existing applications to work and complement a third-party solution could become a very time-consuming process. At the time, we called that program LEGO, with 5 main components to manage our sales force, in addition to e-commerce that serves the end consumer. The LEGO program is defined as five components, each one covering specific parts of the direct selling process – including structuring, channel control, performance and payment of the sales force. Our five components are as follows: People/Registry (GPP) Relationships (GRP) Order Capture (GSP) Direct Sales Management (GCP) Data Operations (GDP) The platform responsible for generating data and sales indicators for the other platforms is GCP (Global Commercial Platform). GCP manages, integrates, and handles all rules related to Natura’s commercial models and their relationships, processes KPIs, and processes all intrinsic aspects related to direct selling, such as profits and commission bonuses. Why and where does ScyllaDB fit? Fabricio: We have been proud and happy users of ScyllaDB for many years now. Our journey with ScyllaDB started back in 2018. Back then, our old systems were very hard to scale, in a way that it got to a point where it became an impediment to managing our own operations and keeping up with our ongoing innovation. In 2018 we started this journey of migrating from our previous solution to ScyllaDB. Past that, we shifted to AWS and since then we have been expanding the reach of our platform to other business areas. For example, last year we started using ScyllaDB CDC, and currently we are studying to implement multi-region deployments for some of our applications. The main reason why we decided to shift to ScyllaDB was because of its impressive scaling power. Our indicator processing requires real-time execution, with the lowest latency possible. We receive several events per second, and the inability to process them in a timely manner would result in a backlog of requests, ruining our users’ experience. The fact that ScyllaDB scales linearly, both up and out, was also a key decision factor. We started small and later migrated more workloads to it gradually. Whenever we required more capacity, we simply added more nodes, in a planned and conscious way. “Bill shock” was never a problem for us with ScyllaDB. Our applications are Global (hence the platform’s acronym), and currently span several countries. Therefore, we could no longer work with maintenance windows incurring downtime. We needed a solution that would be always on and process our workloads in an uninterruptible way. ScyllaDB’s active-active architecture perfectly fits what we were looking for. We plan to cover the Northern Virginia and São Paulo regions on AWS in the near future with a multi-datacenter cluster, and so we can easily ensure strong consistency for our users thanks to ScyllaDB’s tunable consistency. What else can you tell us about your KPIs and their numbers? Filipe: One aspect to understand before we talk about the numbers ScyllaDB delivers to us is how our business model works. In a nutshell, Natura is made by people, and for people. We have Beauty Consultants all around the world bringing our products to the consumer market. The reason why the Natura brand is so strong (especially within Brazil), is primarily because we have a culture of dealing with people before we make important decisions, such as buying a car or a house. What typically happens is this: You have a friend, who is one of our Beauty Consultants. This friend of yours offers you her products. Since you trust your friend and you like the products, you eventually end up trying it out. In the end, you realize that you fell in love with it, and decide to always check in with your friend as time goes by. Ultimately, you also refer this friend of yours to other friends, as people ask which lotion or perfume you’re using, and that’s how it goes. Now imagine that same situation I described on a much larger scale. Remember: We have over 7 million consultants in our network of people. Therefore, we need to provide these consultants with incentives and campaigns for them to keep on doing the great job they are doing today. This involves, for example, checking whether they are active, if they recovered after a bad period, or whether they simply ceased engaging with us. If the consultant is a new recruit, it is important that we know this as well because every one of them is treated differently in a personalized way. That way, by treating our consultants with respect and appreciation, we leverage our platform to help us and them make the best decisions. Today ScyllaDB powers over 73K indicators, involving data of over 4 million consultants within 6 countries of Latin America. This includes over USD 120M just in orders and transactions. All of this is achieved on top of a ScyllaDB cluster delivering an average throughput of 120K operations per second, with single-digit millisecond latencies of 6 milliseconds for both reads and writes. How complex is Natura’s Commercial Platform architecture today? Fabricio: Very complex, as you can imagine for a business of that size! GCP is primarily deployed within AWS (heh!). We have several input sources coming from our data producers. These involve Sales, our Commercial Structures, Consultants, Orders, Final Customers, etc. Once these producers send us requests, their submissions enter our data pipelines. This information arrives in queues (MSK) and is consumed using Spark (EMR), some streaming and others batch, this data is transformed according to our business logic, which eventually reaches our database layer, which is where ScyllaDB is located. We of course have other databases in our stack, but for APIs and applications requiring real-time performance and low latency, we end up choosing ScyllaDB as the main datastore. For querying ScyllaDB we developed a centralized layer for our microservices using AWS Lambda and API Gateway. This layer consults ScyllaDB and then provides the requested information to all consumers that require it. As for more details about our ScyllaDB deployment, we currently have 12 nodes running on top of AWS i3.4xlarge EC2 instances. Out of the 120K operations I previously mentioned, 35K are writes, with an average latency of 3 milliseconds. The rest are reads, with an average latency of 6 milliseconds.

Enhanced Cluster Scaling for Apache Cassandra®

The Instaclustr Managed Platform now supports the zero-downtime scaling-up of local storage (also known as instance store or ephemeral storage) nodes running Apache Cassandra on AWS, Azure and GCP!

Enhanced Vertical Scaling 

NetApp has released an extension to the Cluster Scaling feature on the Instaclustr Managed Platform, which now supports scaling up the local storage of the Cassandra nodes. This enhancement builds upon the existing network block storage scaling option, offering greater flexibility and control over cluster configurations.

Customers can now easily scale up their Cassandra clusters on demand to respond to growing data needs. This development not only provides unparalleled flexibility and performance, but also distinguishes Instaclustr from competitors who lack support for local storage nodes. 

Local Storage 

Local storage refers to the physical storage directly attached to the node or instance. Unlike network-backed storage (such as EBS), local storage eliminates the need for data to travel over the network, leading to lower latency, higher throughput, and improved performance in data-intensive applications.

Moreover, the cost of the local storage is included in the instance pricing which can lead to local storage, when used in conjunction with Reserved Instance and similar concepts, being the optimal cost infrastructure choice for many use cases. 

Whether you need to store large volumes of data or run complex computational tasks, the ability to scale up local storage nodes gives you the flexibility to manage node sizes based on your requirements. Scaling local storage nodes with minimal disruption is complex. Instaclustr leverages advanced internal tools for on-demand vertical scaling. Our updated replace tool with “copy data” mode streamlines the process without compromising data integrity or cluster health. 

Additionally, this enhancement gives our customers the capability to switch between local storage (ephemeral) and network-based storage (persistent) while scaling up their clusters. Customer storage needs vary over time, ranging from the high I/O performance of local storage to the cost-effectiveness and durability of network-based storage. Instaclustr provides our customers with a variety of storage options to scale up their workloads based on performance and cost requirements.

Not sure where to start with your cluster scaling needs? Read about the best way to add capacity to your cluster here. 


With this enhancement to the cluster scaling feature, we are expanding Instaclustr’s self-service capabilities, empowering customers with greater control and flexibility over their infrastructure. Scaling up Cassandra clusters has become more intuitive and is just a few clicks away.

This move towards greater autonomy is supported by production SLAs, ensuring scaling operations are completed without data loss or downtime. Cassandra nodes can be scaled up using the cluster scaling functionality available through the Instaclustr console, API, or Terraform provider. Visit our documentation for guidance on seamlessly scaling your storage for the Cassandra cluster. 

While this enhancement allows the scaling up of local storage, downscaling operations are not yet supported via self-service. Should you need to scale down the storage capacity of your cluster, our dedicated Support team is ready to assist 

Scaling up improves performance and operational flexibility, but it will also result in increased costs, so it is important to consider the cost implications. You can review the pricing of the available nodes with the desired storage on the resize page of the console. Upon selection, a summary of current and new costs will be provided for easy comparison. 

Leverage the latest enhancements to our Cluster Scaling feature via the console, API or Terraform provider, and if you have any questions about this feature, please contact Instaclustr Support at any time. 

Unlock the flexibility and ease of scaling Cassandra clusters at the click of a button and sign in now! 

The post Enhanced Cluster Scaling for Apache Cassandra® appeared first on Instaclustr.