Rust and ScyllaDB: 3 Ways to Improve Performance

We’ve been working hard to develop and improve the scylla-rust-driver. It’s an open-source ScyllaDB (and Apache Cassandra) driver for Rust, written in pure Rust with a fully async API using Tokio. You can read more regarding its benchmark results and how our developers solved a performance regression.

In different benchmarks, the Rust driver proved more performant than other drivers, which gave us the idea of using it as a unified core for other drivers as well.

This blog post is based on the ScyllaDB University Rust lesson. In this post, I’ll cover the essentials of the lesson. You’ll learn about Prepared Statements, Paging, and Retries and see an example using the ScyllaDB Rust driver. The ultimate goal is to demonstrate how some minor changes can significantly improve the application’s performance.

To continue learning about ScyllaDB and Rust with additional exercises and hands-on examples, log in or register for ScyllaDB University (it’s free). You’ll be on the path to new certification and also gain unlimited access to all of our NoSQL database courses.

Starting ScyllaDB in Docker

Download the example from git:


git clone https://github.com/scylladb/scylla-code-samples.git

cd scylla-code-samples/Rust_Scylla_Driver/chat/

To quickly get ScyllaDB up and running, use the official Docker image:


docker run \
  -p 9042:9042/tcp \
  --name some-scylla \
  --hostname rust-scylla \
  -d scylladb/scylla:4.5.0 \
  --smp 1 --memory=750M --overprovisioned 1

Example Application

In this example, you’ll create a console application that reads messages from standard input and puts them into a table in ScyllaDB.

First, create the keyspace and the table:


docker exec -it some-scylla cqlsh

CREATE KEYSPACE IF NOT EXISTS log WITH REPLICATION = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};

CREATE TABLE IF NOT EXISTS log.messages (
  id bigint,
  message text,
  PRIMARY KEY (id)
);

Now, look at the main code of the application:

The application connects to the database, reads some lines from the console, and stores them in the table log.messages. It then reads those lines from the table and prints them.
So far, this is quite similar to what you saw in the Getting Started with Rust lesson. Using this application, you’ll see how some minor changes can improve the application’s performance.

Prepared Statements

In every iteration of the while loop, we want to insert new data into the log.messages table. Doing so naively is inefficient as every call to session.query would send the entire query string to the database, which then parses it. One can prepare a query in advance using the session to avoid unnecessary database-side calculations.prepare method. A call to this method will return a PreparedStatement object, which can be used later with session.execute() to execute the desired query.

What Exactly are Prepared Statements?

A prepared statement is a query parsed by ScyllaDB and then saved for later use. One of the valuable benefits of using prepared statements is that you can continue to reuse the same query while modifying variables in the query to match parameters such as names, addresses, and locations.

When asked to prepare a CQL statement, the client library will send a CQL statement to ScyllaDB. ScyllaDB will then create a unique fingerprint for that CQL statement by MD5 hashing it. ScyllaDB then uses this hash to check its query cache and see if it has already seen it. If so, it will return a reference to that cached CQL statement. If ScyllaDB does not have that unique query hash in its cache, it will then proceed to parse the query and insert the parsed output into its cache.

The client will then be able to send and execute a request specifying the statement id (which is encapsulated in the PreparedStatement object) and providing the (bound) variables, as you will see next.

Using Prepared Statements In the Application

Go over the sample code above and modify it to use prepared statements.
The first step is to create a prepared statement (with the help of session.prepare) before the while loop. Next, you need to replace session.query with session.execute inside the while loop.

After these two steps, the app will reuse the prepared statement insert_message instead of sending raw queries. This significantly improves performance.

Paging

Look at the last lines of the application:

There is a call to the Session::query method, and an unprepared select query is sent. Since this query is only executed once, it isn’t worth preparing. However, if we suspect that the result will be large, it might be better to use paging.

What is Paging?

Paging is a way to return a lot of data in manageable chunks.

Without paging, the coordinator node prepares a single result entity that holds all the data and returns it. In the case of a large result, this may have a significant performance impact as it might use up a lot of memory, both on the client and on the ScyllaDB server side.

To avoid this, use paging, so the results are transmitted in chunks of limited size, one chunk at a time. After transmitting each chunk, the database stops and waits for the client to request the next one. This is repeated until the entire result set is transmitted.

The client can limit the size of the pages according to the number of rows it can contain. If a page reaches the size limit before it reaches the client-provided row limit, it’s called a short page or short read.

Adding Paging to Our App

As you may have guessed by now, Session::query does not use paging. It fetches the whole result into memory in one go. An alternative Session method uses paging under the hood – Session::query_iter (Session::execute_iter is another alternative that works with prepared statements). The Session::query_iter method takes a query and a value list as arguments and returns an async iterator (stream) over the result Rows. This is how it is used:

After the query_iter invocation, the driver starts a background task that fetches subsequent rows. The caller task (the one that invoked query_iter) consumes newly fetched rows by using an iterator-like stream interface. The caller and the background task run concurrently, so one of them can fetch new rows while the other consumes them.

By adding paging to the app, you reduce memory usage and increase the application’s performance.

Retries

After a query fails, the driver might decide to retry it based on the retry policy and on the query itself. The retry policy can be configured for the whole Session or just for a single query.

Available Retry Policies

The driver offers two policies to choose from:

It is possible to provide a custom retry policy by implementing RetryPolicy and RetrySesssion.

Using Retry Policies

The key to enjoying the benefits of retry policies is to provide more information about query idempotency. A query is idempotent if it can be applied multiple times without changing the result of the initial application. The driver will not retry a failed query if it is not idempotent. Marking queries as idempotent is expected to be done by the user, as the driver does not parse query strings.

Mark the app’s select statement as an idempotent one:

By making this change, you will be able to use retries (provided by the default retry policy) in case of a select statement execution error.

Additional Resources

  • To run the application and see the results, check out the complete lesson on ScyllaDB University.
  • The Rust Driver docs page contains a quick start guide for people who want to use our driver.
  • The P99 CONF conference is happening soon – it’s a great (free and virtual) opportunity to learn about high-performance applications. Register now to save your spot.
  • Check out some other NoSQL courses on ScyllaDB University

Apache Cassandra® 4.0.4 and 3.11.13

Instaclustr is pleased to announce the general availability of Apache Cassandra® version 4.0.4 and 3.11.13 on the Instaclustr platform.

Apache Cassandra® new versions for the 4.0 and 3.0 major versions are now generally available on Instaclustr’s managed platform. For the 4.0 major version, Java 11 has shifted from experimental to fully supported with new features introduced in 4.0.1. New and existing customers are recommended to use Apache Cassandra 4.0.4 for their clusters on Instaclustr’s managed platform. 

Below are the fixes that are included in Apache Cassandra 4.0 and 3.0 major versions. Please be in contact with our Support team for assistance where the following items require changes to the configuration of your Cassandra fleet.

  • Correction to unit values for Setting the Stream throughput – for customers this will affect setters and getters in JMX MBean and the nodetool commands (for details see CASSANDRA-17243). Instaclustr Cassandra customers may see a lower stream throughput than what was applied in previous versions as a result of the unit value correction.
  • Internode messaging improvement – inbound and outbound byte shuffles for internode messaging were renamed (for details see CASSANDRA-15066). For Instaclustr customers on Cassandra 3.0 major versions, the improvements have backward compatibility that support upgrades between versions from Cassandra 3.0 to 4.0 major versions.
  • Update to jcrypt library from 0.3m to 0.4 causes ‘fail to start’ error – Please contact our Support team for customers that are seeking guidance to upgrade from 3.0 to 4.0 major versions (for details see CASSANDRA-9384).
  • Packet coalescing is deprecated – The Apache Cassandra project has deprecated packet coalescing and our Cassandra customers can expect a lower throughput as a result. Note that the Apache Cassandra project has advised that packet coalescing will be removed (at the earliest) with the next major release.

For new customers, we recommend using version 4.0.4 for your Cassandra clusters on Instaclustr’s managed platform to gain the full benefit of Cassandra. For existing customers who are currently running an older version and wish to upgrade, please contact our Support team for assistance.

The post Apache Cassandra® 4.0.4 and 3.11.13 appeared first on Instaclustr.

Overheard at Pulsar Summit 2022

Apache Pulsar is an open source event streaming platform continually rising in popularity since its first public release in 2016. It graduated from incubator status to top level Apache Foundation project in just two years. Since then the Pulsar community continues to grow by leaps and bounds, now boasting over 500 contributors.

The scale, timeline and growth of Apache Pulsar mirrors the rise of ScyllaDB in many ways. Both systems cater to highly scalable, low-latency and high performance use cases. And it is no surprise there is increasing interest in our respective user bases to see how a big, fast NoSQL database like ScyllaDB can be used by a big, fast event streaming service like Apache Pulsar. In fact, thousands of people signed up to learn exactly how to do just that when StreamNative and ScyllaDB partnered to host a Distributed Data Systems Masterclass in June.

I was honored and pleased to be given an opportunity to speak at the recent Pulsar Summit in San Francisco. Another day I’ll share the talk I gave. For now I wanted to give you a glimpse into many of the other speakers who presented, and some insights into the momentum this event reflected on the broad adoption of Apache Pulsar and its integration into other open source communities.

Top 5 Apache Software Foundation Project

The keynote of StreamNative CEO Sijie Guo was informative as to just how popular and influential the Apache Pulsar project has become. He pointed out that Pulsar was one of the Top Five Apache Software Foundation (ASF) projects in terms of commits. It continues to broaden in all dimensions of a healthy and booming open source community, from committer base (now over 500 developers, who have made over 10,000 commits), to Slack community size (over 7,000 members), to the thousand-plus organizations who are running Pulsar in production.

He then turned the stage over to StreamNative CTO Matteo Merli, who introduced (pf)SQL. This capability, built on top of Pulsar Functions, will allow filtering, transformation and routing of data, and also allow transformations for Pulsar IO connectors. It won’t replace a dedicated system that provides such a service — for example, Apache Flink — but it will allow users to write more complex components that reside natively within Pulsar.

Apache Beam

Matteo was followed by Google’s Bryon Ellis, who is working on the Apache Beam project. Beam, another ASF top-level project, is a unified system to do both batch and stream data processing pipelines and transformations. Neither batch nor streaming is an afterthought (as Bryon’s slides noted, “Batch is batch, and streaming is streaming”).

What makes Beam somewhat unique is that the I/O connectors are directly part of the project. Given the growing popularity of Pulsar, the Beam team decided to add it to their growing list of supported systems. The great news is that as of June 2022 the first pull request from StreamNative of PulsarIO was done, and you can try it out now on your own machine using the Beam DirectRunner, or by running Beam pipelines in Spark, Flink, or Google Cloud Dataflow. There is still a lot more work to cover all the features of Pulsar, though, so interested community members can follow progress (or offer to contribute) on the related JIRA page here.

Apache Pinot

Next up on the main stage was Xiang Fu, co-founder of StarTree, the team behind the Apache Pinot project. Pinot was designed to do Online Analytical Processing (OLAP) blazingly fast. Designed for near real time ingestion and low latency, Pinot makes for a perfect pairing with Apache Pulsar.

Xiang observed these new data architectures and analytics capabilities allow organizations to fundamentally change how they use and expose data. Whereas in the past analytics was seen as the proprietary purview of the organization itself, increasingly there is a movement to expose data analytics to external partners and customers. This data democratization has to take into account the time-value of data, and the window of opportunity for these events. We’ve moved from a world of days or hours of computation to produce results to timeframes measured in seconds or even milliseconds.

The challenge is the cycle we’ve created. As we improve user experience and increase engagement, we also require collecting and managing more and more events. We’ve moved from basic publish-and-subscribe (pub-sub), to log aggregation, to stream processing (such as Flink or Beam) to real time analytics (such as Pinot). As our data ingestion happens at ever increasing scales, to do so with low latency millions of times per second becomes hard.

Xiang brought up three examples of Pinot being used at scale. The first was LinkedIn (birthplace of the Pinot project), where they ingest a million events per second, run over 200,000 queries against their dataset per second, and query latency is measured in milliseconds. The second was at Uber, with over 200 terabytes of data, 30,000 queries per second, and latencies faster than 100 milliseconds. The third was at Stripe, where the massive trillion-row dataset is over a petabyte in total size. Even then they get subsecond latency for their queries.

Mercado Libre

Ignacio Alvarez of Mercado Libre then spoke about their use of Pulsar. Mercado Libre is an online shopping platform popular across Latin America, with a massive user base. Some estimates put it as high as 140 million active users per month. Supporting a real-time shopping experience for a user base that large translates to 200 million requests per minute (3.3 million events per second). Mercado Libre maintains thirty active Pulsar clusters on AWS Graviton-based servers, allowing them to run cost-effectively. Definitely a use case to pay attention to.

Databricks Data Lakehouse

At this point the event broke into two tracks. I participated in the Ecosystem track, emcee’d by StreamNative’s Tim Spann, and the following will reflect what I was able to see therein. The first talk in the track was delivered by DataBricks’ Nick Karpov, who described the movement from the ETL-oriented data warehouses of the 1980s and data lake of the 2010s to the streaming-oriented architecture of the “data lakehouse.” The lakehouse architecture also provides new query engines, and metadata and governance layers on top of the raw data, making data easier to manage and faster to effectively access.

I found the discussion about Z-order multidimensional clustering for data-skipping particularly interesting. It’s a method to improve on sorting, minimizing the number of files needed to scan and the number of false positive results.

Then again, if you are more of a fan of Change Data Capture (CDC), this is called Change Data Feed in the Databricks Delta table context, and Nick covered that too in his talk, noting that it can either be read as a batch or as an event feed.

Specifically for Apache Pulsar, there are Lakehouse connectors available, allowing you to integrate Lakehouse as both source and sink. It is under active development, providing opportunities for contribution. For example CDC files are not yet implemented, and the Delta-Pulsar-Delta round trip only supports append-only workflows. So stay tuned for updates!

Apache Hudi

Next up was Alexey Kudinkin of Onehouse, the lakehouse-as-a-service provider built on Apache Hudi. While conceptually there are a lot of components you can connect to Hudi, at its core it provides transactions, upserts, and deletes on datalake storage and enables CDC when used with Pulsar.

One particular facet that Alexey drilled down on was comparing copy-on-write (COW) versus merge-on-read (MOR). In COW data is stored in Parquet files in a columnar format; the file is copied and updates are merged into the file on writes. MOR is a bit more complicated, because data is stored in both columnar-oriented Parquet files and row-based Avro formats. In this latter case, data is logged to delta files and later compacted. You can discover more in the Hudi documentation here. Which system you use depends on the various tradeoffs to optimize for in terms of latency, update cost, file size, and write amplification.

Addison Higham, StreamNative’s Chief Architect and Head of Cloud Engineering, joined Alexey onstage to talk about the use cases and best practices in joining Pulsar and Hudi together.

Currently there are a few ways to currently integrate Pulsar and Hudi:

  • Pulsar’s Apache Spark connector
  • Hudi’s DeltaStreamer utility
  • StreamNative’s Lakehouse Sink (currently in Beta)

One of the critical determinations users need to make when combining Hudi and Pulsar together is based on the domain of data over time. Are these results you need processed in milliseconds, or over months? Based on the atomicity of the data you are dealing with, or the complexity or sheer scale of it, will determine if you should lean towards the event-driven engines of Pulsar or the massive analytics capabilities of Hudi. And there are these in-between places, where you may be pre- or post-processing data, such as in Flink, Pinot, Hive or Spark.

Apache Flink

Speaking about Apache Flink, Caito Scherr of Ververica came to provide insights of using this popular stream processing platform. There are a lot of challenges to master. You can’t pause a stream to fix it. You’re dealing with a lot of data coming at you fast, and you have to ingest it in multiple formats. There’s failures to recover from, and there’s this inexorable need to scale.

Hence Flink. Caito noted the name itself means “nimble” in German. It was designed to make stream processing easier by solving a lot of problems in streaming. Yet there are still conceptual barriers you have to wrap your head around to really “get” Flink. Such as running SQL queries on event streaming: you aren’t returned a discreet result set; your query is continuous.

If you are interested in getting involved with Flink you can check out their community page, and join their Slack community.

Kafka-on-Pulsar

AWS’ Ricardo Ferreira presented one of those crazy-ideas-that-just-might-work: Kafka-on-Pulsar (KoP). What if you were a Kafka developer and you wanted to write a “Kafka” service that actually had Apache Pulsar at its heart?

You would write your microservice as you might usually do in Spring Boot. Or consume CDC data from MySQL using a Debezium connector. Or maybe you want to be able to do stream processing using ksqlDB to transform events — flattening out a nested structure, and move them from JSON to protobuf. You think it’s all Kafka. But meanwhile, hidden away, the backbone is Pulsar.

Ricardo walked us through the method to “wrap” Pulsar in such a way that it looked, smelled, and acted like Kafka to external systems. Not only was his talk entertaining and illuminating, but the extensive Github repository, both code and documentation, will give users great hands-on examples they can use for their own environments.

Pulsar Functions and Function Meshes

The talk just before mine was hosted by Neng Lu, StreamNative’s Platform Engineering Lead, Compute. He recapped the serverless function mesh architecture. It consists of a set of Custom Resource Definitions (CRDs) for defining Pulsar Functions and sink or source connectors, plus an operator that constantly reconciles the submitted CR, creating it, updating according to user change, or auto-scaling if configured. There are a lot of use cases for Pulsar Functions and function meshes, such as filtering/routing, or to add transformations for connectors.

To facilitate the development cycle, Neng described the SQL Abstraction they devised. This abstraction was a simplified way to develop Pulsar Functions pipelines; it was not an interactive tool just to run ad-hoc queries. The SQL Abstraction consisted of a gateway, runner and CLI component. The gateway consisted of a parser-to-runner component, and a REST API server to CLI component.

More to Come!

There was a lot more than I could cover to each of these talks. And there were a lot more talks than I could personally cover. I’m looking forward to seeing the full videos posted online to be able to catch up on all the sessions I missed, and to see if I missed anything in the sessions I did attend.

Also, I’ll come back at another time to blogify the session I presented myself, which was on Distributed Database Design Decisions to Support High Performance Event Streaming. For now, I just wish to congratulate all the Pulsar Summit 2022 presenters, and the event staff and the organizers at StreamNative who put on such a great event.

And for anyone looking for other cool events to attend, let me remind you that ScyllaDB is sponsoring the upcoming P99 CONF, a free online virtual event October 19th and 20th, all about high performance distributed systems. We’ll have talks on observability and performance tuning, event streaming and distributed databases, and advanced programming and Linux operating system methods. Much like Pulsar Summit, it’s an event you won’t want to miss. Register today!

REGISTER NOW FOR P99 CONF

What Technical Problems are Teams Solving with ScyllaDB NoSQL?

If you’re in the process of selecting a distributed database, you’re probably quite familiar with peer review sites like G2. These are independent forums where users share product feedback free from vendor influence.

But even if you’re not currently researching distributed databases, these sites have some interesting tidbits. For example, G2 prompts reviewers to share what technical problems they are solving with the product and how that’s benefiting them.

Here’s a quick look at some of the recent responses to that question for ScyllaDB. If you want to see more perspectives or take a deeper dive into the complete reviews, visit the G2 ScyllaDB page. Also – if you’re experienced using any flavor of ScyllaDB (open source, cloud, or enterprise), please consider sharing your own insights with the community. Your input not only helps your peers; it also influences our product strategy.

 

What Problems is ScyllaDB Solving and How is That Benefiting You?

Please note we are citing these G2 reviews verbatim; they were not edited for grammar.

“We are solving a significant issue of tech debt with ScyllaDB. We’ve migrated our event-driven architecture where we collect our user’s health stats and data through our platform that was written initially to a MongoDB store that was becoming an unwieldy monolith, to Scylla, which sped up our response times to customers when they wanted that data and also enabled us to store even more information robustly. We’ve unlocked ourselves from this monolithic architecture with Mongo to a highly extendable, modular platform with ScyllaDB providing the performance we need at the scale we want at the heart of it all!”

— Anonymous, Health, Wellness and Fitness

“We previously used Cassandra to back a natural language processing system, storing news articles and metadata for an IR frontend. We evaluated migrating to ScyllaDB and found it an absolute joy to work with. Creating the same schema was easy, and loading the snapshot from Cassandra to ScyllaDB made it really easy to move the data over without having to write one-off custom scripts. We wanted to obtain better read performance as well as reliability, and wanted to future-proof our solution. I think ScyllaDB has enabled us to be more productive in this way.”

— Anonymous, Newspaper Industry

“We have many devices streaming data to the cloud. On the flip side, we have many users querying device data to help run their businesses. With managed ScyllaDB, we focus on delivering value add apps and services for our customers. We know that writes will be consistent and queries fast. ScyllaDB has reliably managed infrastructure – securing and optimizing the cluster, and keeping us up to date.”

— Kevin J, Small Business

 

“I needed ScyllaDB for implementing a Video and Audio engineering codec repository so I could catalog videos and audios for my company, The main benefit is not being stuck to domain modelling like in a SQL database, which is mandatory when you are dealing with video and audio codec metadata.”

— Gustavo H. M. Silva, System Integration Engineer

 

“We are building a next-gen Customer Data Platform using ScyllaDB. Our use-cases vary widely from OLTP to OLAP to real-time data ingestions which we were able to satisfy using ScyllaDB.”

— Shubham P, Lead Software Engineer

 

“Storing time-series-like immutable events to be queried later on is incredible. Time Window Compaction Strategy works nicely for this workload. This enabled us to increase the TTL for our data, allowing for a more complete analysis when querying data while keeping low costs.”

— Rafael A, Software Engineer

 

“Heavy throughput on a 24/7 ecommerce application and ability to be a replacement for current in-production MongoDB. Reliability has been phenomenal as well, but is comparable on both.”

— Internal Consultant, Insurance Industry

 

“Working on projects to separate functions of a cloud ERP application that demand quick query and response. As a DBA I am evaluating the existing solutions on the market and ScyllaDB has been meeting the needs.”

— Lucas F, Database Administrator

 

“We are solving the need for reading/write latency on over 1TB of data in less than 5ms with ScyllaDB. Our main application needs to respond very quickly and we have used ScyllaDB to replace PostgreSQL and Redis. The benefits here are lower cloud costs and less complexity.”

— DBA, Marketing and Advertising Industry

 

“Using ScyllaDB, we are able to keep stateful processing in our data pipeline to update the data model in real-time. Earlier, we were using Redis where our cost was enormous. Using ScyllaDB we reduced our cost by 80%.”

— Priyansh M, SDE 2 (Data & Intelligence)

 

“We currently use it to store a lot of customer data associated with some of our game development and website apps. We require pretty quick and reliable connections due to the nature of our work.”

— Christopher F. A, President & Founder

 

“We have started off using NoSQL in many new application areas in production and POCs. The use cases are real-time data streaming and IoT. We experienced significant performance gains and cost benefits by switching to the ScyllaDB NoSQL setup.”

— DBA, Computer Software Industry

 

“We are collecting a lot of data from different sources like api’s, sensors or telemetry mostly unstructured data. since the data is in higher volume, we are ingesting data as it is in ScyllaDB.”

— Anonymous, Financial Services Industry

 

“I am currently building a streaming app and picked ScyllaDB to test how suitable it will be with storing streaming data. It is pretty neat with support for a wide range of APIs, which gives me the flexibility to build in my own space.”

— Amadou B, SQL Server Database Administrator

 

“We use ScyllaDB in our organization to generate reports. The time taken for report generation in ScyllaDB is significantly faster compared to MongoDB, which we are currently using as our main Production DB.”

— Harshavardhan K, AI and ML Engineer

 

“We are using this as an L1/L2 cache for a tier-1 demand service to serve content to our clients.”

— Anonymous, Leisure, Travel & Tourism Industry

 

“We are trying to improve the performance of the back-end system while using Janus graph for storing data in a graph structure.”

— Ganesh Kumar B, Director of Technology

 

“We mainly use the software to generate reports. It is also helpful for backups and restores, especially in operations equipment. ScyllaDB quickly restores to the latest version when a problem occurs. We can even access previous versions in our database without interrupting the workflow.”

— Maric S, Project Manager

Browse More ScyllaDB NoSQL Use Cases

Intrigued? Take a deeper dive into how ScyllaDB users are tackling tough data-related challenges. We’ve collected our most popular user-contributed content in our NoSQL resource center. The selections range from 2 minute snapshots, to 20 minute tech talks, to 60-minute on-demand webinars and user-authored blogs.

Share Your Own ScyllaDB NoSQL Use Cases

If you’ve got a ScyllaDB use case to share, we invite you to add it to the collection at G2. Also, keep your eyes open for the ScyllaDB Summit 2023 call for speakers this fall. Year after year, the most popular conference sessions are the ones where users explain how they’re solving tough technical challenges with ScyllaDB.

If you’ve already shared a review, thank you! Here’s how ScyllaDB is currently positioned, based on real user feedback as of today.

 G2 Grid® for Wide Column Database Software

 

READ MORE SCYLLADB REVIEWS ON G2

SUBMIT A SCYLLADB REVIEW ON G2

Alan Shimel Chats NoSQL with ScyllaDB VP of Product, Tzach Livyatan

ScyllaDB recently announced ScyllaDB V: the latest evolution of our monstrously fast and scalable NoSQL database.

To explore what’s new and how the latest innovations impact application development teams, DevOps luminary Alan Shimel recently connected with ScyllaDB VP of Product, Tzach Livyatan for a TechStrongTV interview. Alan and Tzach chatted about NoSQL performance and security, the shift to database-as-a-service (DBaaS), the performance impact of Kubernetes, new AWS instances, observability, and more.

Watch the interview in its entirety:

If you prefer to read rather than watch, here’s a transcript (lightly edited for brevity and clarity) of the core conversation.

Intro

Alan: Welcome to another TechStrongTV segment here. Our guest today is Tzach Livyatan. He’s a first-time guest on TechstrongTV, but we’ve covered the company (ScyllaDB) before. Tzach, welcome to TechStrong.

Tzach: Thanks! Thanks for having me, and thanks for pronouncing my name correctly – I appreciate it.

Alan: You know, I work at it. Tzach, a lot of folks out here maybe they’ve heard of ScyllaDB, maybe they haven’t. Maybe they’re not quite sure what you guys do. Why don’t we start with that? Give them a little company background and maybe a little bit of your own personal background.

Tzach: Okay, sure. Let me start with myself. I have a background in Computer Science and have been working in the industry for many, many years – I try not to count anymore. I started as a developer, then moved to product management. I spent many years in the telecom domain in a startup, and later at Oracle’s communication unit. After that, I switched to the NoSQL domain at ScyllaDB.

ScyllaDB’s Sweet Spot in the NoSQL / Distributed Databases World

Tzach: ScyllaDB is a NoSQL database. There are many, many no SQL databases out there. Each of them has a special unique sweet spot. ScyllaDB falls roughly into the wide column databases category. You might have heard of databases like DynamoDB or Apache Cassandra; we address many of the same use cases as they do.

ScyllaDB is focused on high availability. It can stretch from a 3-node cluster to a 100-node cluster – although you don’t need as many nodes as you would with other databases. Let me give you an example of its high availability. One of our customers has a three region cluster in three different regions, and one of their regions actually burned down in a fire – a well-known cloud provider’s data center completely burned down. But, the ScyllaDB deployment just continued to work as is. You have a deployment of 10 nodes in each region, so 20 nodes continue to work without a glitch when one region’s nodes burned down. This is ScyllaDB’s high availability in action.

NoSQL now has a long history, 15 years or more. The first generation NoSQL database was very much focused on high availability…but performance was an afterthought on some of them. I don’t want to trash other databases because all databases made progress. But a lot of them were initially designed as a project and not as a product. Performance, security, and other administrative/operational capabilities were not highly prioritized back then.

You can think about ScyllaDB as a second generation of NoSQL database that was designed for performance, security, and administration from the ground up. So ScyllaDB was the reinvention, if you will, of Apache Cassandra. We wanted to keep the emphasis on high availability, but we re-invented and re-designed everything from scratch in C++ for performance.

Think of performance in terms of two main metrics: throughput, and latency. Each of them can be optimized almost independently. So why is throughput important? If you can, for example, give 10X the throughput of another database, which we do, you can reduce the number of servers by a factor of 10, and you save on administration – it’s much easier to administrate. Of course, it’s also less expensive. If you’re working on AWS and another cloud, just shrinking the number of servers by 10X is a great improvement.

The second parameter is latency. This is a little bit more tricky because it’s harder to improve. While you can always spend more money and add more nodes or servers to handle your throughput problem, you cannot do the same for latency because latency is very much coupled with architecture.

ScyllaDB was designed from scratch with an architectural approach called shard-per-core. I won’t go too deep into the technical parts, but it’s designed from scratch for good performance and low latency. This is how we are able to provide less than one millisecond on average and less than five milliseconds on P99 regardless of the throughput. You can throw up to a million requests per second, which is a lot, on one ScyllaDB node and still get that sub-millisecond latency. This is very significant. If you want, I can explain why it’s significant, but this is a great improvement over the first generation of NoSQL databases that I mentioned earlier.

The Importance of Throughput and Latency at Scale

Alan: Our audience might not be familiar with ScyllaDB, but they’re very technical. I don’t think we need to explain to anyone how important both throughput and latency are. Today, more than ever, it’s about scale. Companies must scale up given the current market conditions. It’s scale up economically or die, right?

Apache Cassandra is an amazing product, an amazing project. Of course, there have been several companies that have tried to monetize it and productize it. And there’s a couple of models, right? There’s the hosted model. There’s the support and training – you know, the usual open source kind of stuff. And then there’s sort of an open core model where you have freemium functionality. ScyllaDB’s special sauce is obviously around reducing latency and increasing throughput. How does ScyllaDB go to market?

ScyllaDB Product Options: DBaaS, Enterprise, Open Source

Tzach: We offer three options. We have ScyllaDB open source. You can go to GitHub, get the code, and compile it yourself. We have many many open source users. That’s definitely an offering that we are supporting and even encouraging. We encourage people in the community to use ScyllaDB and contribute to it.

We also have an enterprise version, which is the closed source version closely following the open source. About 95% of the code is the same. However, it offers some unique features for ScyllaDB Enterprise – a lot of them are around security. For example, let me give you a quick example: encryption at rest. Encryption at rest is a feature that we are offering exclusively in ScyllaDB Enterprise. There are other enterprise features like that, for performance, security and so on.

We also offer ScyllaDB as a service: ScyllaDB Cloud. This is a fully managed ScyllaDB Enterprise version. It’s the same enterprise database, but we manage it for you – either on AWS or GCP. You consume the database, and we take care of all the monitoring, upgrades, security patches, whatever is involved in managing the database. We do it all for you.

In the last year, it’s become more and more popular, and ScyllaDB Cloud database-as-a-service (DBaaS) is now the most popular out of those three options. ScyllaDB Cloud seems to be more and more popular. I’m sure that’s not surprising to you; it’s just following the industry trend that everyone wants to move to the cloud.

Alan: Look, we live in a “SaaS-y” world, right? If you offer it as a SaaS, people like to consume it like that. They pay as they go, there’s not that big upfront cost, and they don’t have to deal with upgrades and patching and all that good stuff.

What’s New in ScyllaDB’s Latest Release

Alan: All right. I think we’ve done a good job laying all this out now Tzach. Tell us what’s new.

Tzach: We just released ScyllaDB Open Source 5.0, which is our yearly major release and the first milestone in our ScyllaDB V series of releases. We’re very excited about it.

If I can squeeze the history of ScyllaDB (which is almost eight years now) into three phases, the first phase was about performance and the architecture for performance, which I mentioned earlier; the second was about complete functionality, and we are now completely compatible with both Apache Cassandra and Amazon DynamoDB; and the third phase, which we are starting now, is about innovation and going beyond compatibility, introducing new features.

Let me talk about two of the most exciting features.

The first is that we’re introducing a new consensus algorithm with Raft. ScyllaDB basically is an eventual consistency database. But there are two parts of consensus or consistency for any database. One is for the data itself, and one is for the metadata. What is metadata? The user, the node, the node in the cluster, the schema — everything which is not the data itself. And so far, this data was propagated in the cluster through Gossip, which is eventual consistency. That was fine a few years back, but today it’s not fast enough or not consistent enough.

With ScyllaDB 5.0 we’re introducing a new consensus algorithm for that. From the user perspective, this means that you can safely update both the schema and the topology of the cluster in a safe way, whenever you want. What do I mean by “change topology?” For example, add nodes. Let’s say that you want to scale your cluster from 3 nodes to 10 nodes. You can now do it in a safe way, and very fast. This serves not only ScyllaDB Open Source and Enterprise, but also our own cloud offering (ScyllaDB Cloud), which can now scale much faster and even shrink faster if you want to.

All of the new ScyllaDB V features, including the performance gain, also apply to Kubernetes, by the way. We have a strong offering on Kubernetes. Everyone is running on Kubernetes these days; it’s not new. We made a lot of effort to make sure that ScyllaDB’s performance on Kubernetes is very close to how it performs running on bare metal and VM. This is not trivial. But, I’m happy to say that we achieved it and we are very happy with that.

Alan: That’s huge

Tzach: The second aspect is performance. Although we are already the fastest NoSQL out there, we keep pushing the limits of performance. I’m happy to say that we achieved a lot of improvement in this release. In particular, we worked with AWS to optimize ScyllaDB for the new I4i instances. I4i is the successor of the i3 and i2 families, which are instances that AWS offers for databases. We got a particularly surprising result on the I4i instances: more than 2X performance compared to the i3. So, for the same number of cores, you get more than 2.5X the throughput and even lower latency. That was a combination of the great hardware that AWS offers (both the storage and network and the CPU) and ScyllaDB being designed from scratch to work with these types of high-end instances.

To conclude, we improved safety and elasticity with Raft and continue to push on performance, especially with the I4i. We introduce new features first on open source, let it stabilize a little bit then we backport to the enterprise version, then backport that to ScyllaDB Cloud. We are now in the first phase of it; in a few weeks, it will be introduced to ScyllaDB Enterprise and ScyllaDB Cloud.

Editor’s note: This Enterprise release has occurred since the interview – see what’s new in ScyllaDB Enterprise. To hear more, including the discussions on Kubernetes and observability, watch the complete interview above. 

 

 

ScyllaDB Student Projects: CQL-over-WebSocket

Introduction: University of Warsaw Student Projects

Since 2019, ScyllaDB has cooperated with the University of Warsaw by mentoring teams of students in their Bachelor’s theses. After SeastarFS, ScyllaDB Rust Driver and many other interesting projects, the 2021/2022 edition of the program brings us CQL-over-WebSocket: a way of interacting with ScyllaDB straight through your browser.

Motivation

The most popular tool for interacting with ScyllaDB is cqlsh. It’s a Python-based command line tool allowing you to send CQL requests, browse their results, and so on. It’s based on the Python driver for Cassandra, and thus brings in a few dependencies: Python itself, the driver library and multiple indirect dependencies. Forcing users to install extra packages is hardly a “batteries included” experience, so we decided to try and create a terminal for interacting with ScyllaDB right from the Web browser.

Sounds easy, but there are a few major roadblocks for implementing such a tool. First of all, browsers do not allow pages to establish raw TCP connections. Instead, all communication is done either by issuing HTTP(S) requests, or via the WebSocket protocol. WebSocket is interesting because it allows full duplex connections directly from the browser and can multiplex streams of messages over TCP. If you are a JavaScript developer, check out this page. Also, this article goes into greater detail on specific differences between HTTP and WebSocket.

Secondly, while Cassandra already has a JavaScript driver, it’s meant for Node.js, and is not directly usable from the browser, not to mention that JavaScript lacks the type safety guarantees provided by more modern languages, e.g., TypeScript.

CQL is stateful in its nature, and clients usually establish long living sessions in order to communicate with the database. That makes HTTP a rather bad candidate for a building block, given its stateless nature and verbosity. CQL is also a binary protocol, so transferring data on top of a text-based one would cause way too much pain for performance-oriented developers.

Disqualifying HTTP makes the choice rather clear — WebSocket it is! Once this crucial design decision had been made, we could move forward to implementing all the layers:

  1. WebSocket server implementation for Seastar
  2. Injecting a WebSocket server into ScyllaDB
  3. Making a TypeScript driver for CQL, intended specifically for web browsers
  4. Implementing an in-browser cqlsh terminal

Part I: WebSocket Server Implementation for Seastar

In order to be able to receive WebSocket communication in ScyllaDB, we first need to provide a WebSocket server implementation for Seastar, our asynchronous C++ framework. Seastar already has a functional HTTP server, which was very helpful, as the WebSocket handshake is based on HTTP. As any other server implementation embedded into Seastar, this one also needed to be latency-friendly and fast.

Protocol Overview

WebSocket protocol is specified in RFC 6455 and the W3C WebSocket Living Standard. We also used a great tutorial on writing WebSocket servers published by Mozilla. In short, a minimal WebSocket server implementation needs to support the following procedures:

  • a HTTP-based handshake, which verifies that the client wishes to upgrade the connection to WebSocket, and also potentially specifies a subprotocol for the communication
  • exchanging data frames, which also includes decoding the frame headers, as well as masking and unmasking the data
  • handling PING and PONG messages, used in a WebSocket connection to provide heartbeat capabilities

Interface

The implementation of Seastar WebSocket server is still experimental and subject to change, but it’s already fully functional and allows the developers to implement their own custom WebSocket servers.

At the core of implementing your own WebSocket server in Seastar, there’s a concept of a “handler”, allowing you to specify how to handle an incoming stream of WebSocket data. Seastar takes care of automatically decoding the frames and unmasking the data, so the developers simply operate on incoming and outgoing streams, as if it was a plain TCP connection. Each handler is bound to a specific subprotocol. During handshake, if a client declares that they want to speak in a particular subprotocol, they will be rerouted to a matching handler. Here’s an example of how to implement an echo server, which returns exactly the same data it receives:

A full demo application implementing an echo server can be found here: https://github.com/scylladb/seastar/blob/master/demos/websocket_demo.cc

And here’s a minimal WebSocket client, which can be used for testing the server, runnable directly in IPython:

Part II: Injecting a WebSocket server into ScyllaDB

Now that we have WebSocket support in Seastar, all that’s left to do server-side is to inject such a server into ScyllaDB. Fortunately, in the case of CQL-over-WebSocket, the implementation is rather trivial. ScyllaDB already has a fully functional CQL server, so all we need to do is reroute all the decoded WebSocket traffic into our CQL server, and then send all the responses back, as-is, via WebSocket straight to the client. At the time of writing this blog post, this support is still not merged upstream, but it’s already available for review: https://github.com/scylladb/scylla/pull/10921

In order to enable the CQL-over-WebSocket server, it’s enough to simply specify the chosen port in scylla.yaml configuration file, e.g.:

cql_over_websocket_port: 8222

After that, ScyllaDB is capable of accepting WebSocket connections which declare their subprotocol to be “cql“.

Part III: TypeScript Driver for CQL

Neither ScyllaDB nor Cassandra had a native TypeScript driver, and the only existing JavaScript implementation was dedicated for server-side Node.js support, which made it unusable from the browser. Thus, we jumped at the opportunity and wrote a TypeScript driver from scratch. The result is here: https://github.com/dfilimonow/CQL-Driver.

While still very experimental and, notably, lacking concurrency support, load balancing policies, and other vital features, it’s also fully functional and capable of sending and receiving CQL requests straight from the browser via a WebSocket connection, which is exactly what we needed in order to implement a browser-only client.

Part IV: In-browser cqlsh Terminal

The frontend of our project is an in-browser implementation of a cqlsh-like terminal. It’s a static webpage based on TypeScript code compiled to JavaScript, coded with React and Material UI. It supports authentication, maintains shell history and presents query results in a table, with paging support.

The source code is available here: https://github.com/gbzaleski/ZPP-ScyllaDB-Front

Ideally, in order to provide full “batteries included” experience, the page could be served by ScyllaDB itself from a local address, but until that happens, here’s some preview screen captures from the current build of the frontend:

Note that since this code isn’t available in a production release of ScyllaDB, it requires a custom-compiled version of ScyllaDB running locally to get this to work. If you are curious about running this yourself, chat me up in the ScyllaDB user Slack community.

Source Code and the Paper

One of the outcomes of this project, aside from upstreaming changes into Seastar and ScyllaDB, is a Bachelor’s thesis. The paper is available here:

DOWNLOAD THE THESIS

and its official description and abstract can be found here:

READ THE ABSTRACT

Congratulations to those students who put in the time and effort to make this project so successful: Barłomiej Kozaryna, Daniel Filimonow, Andrzej Stalke, and Grzegorz Zaleski. Thanks, team! And, as always, thank you to the instructors and staff of the University of Warsaw for such a great partnership. We’re now looking forward to the 2022/2023 edition of the student projects program, bringing new cool innovations and improvements to our ScyllaDB open source community!

JOIN OUR OPEN SOURCE COMMUNITY

 

Performance Engineering Masterclass for Optimizing Distributed Systems: What to Expect

 

Editor’s Note: The live event has concluded, but you can now access the masterclass videos and certification exam on-demand

ACCESS ON DEMAND

READ HIGHLIGHTS

Performance engineering in distributed systems is a critical skill to master. Because while it’s one thing to get massive terabyte-to-petabyte scale systems up and running, it’s a whole other thing to make sure they are operating at peak efficiency. In fact, it’s usually more than just “one other thing.” Performance optimization of large distributed systems is usually a multivariate problem — combining aspects of underlying hardware, networking, tuning operating systems or finagling with layers of virtualization and application architectures.

Such a complex problem can’t be taught from just one perspective. So, for our upcoming Performance Engineering Masterclass, we’re bringing together three industry experts who want to pass along to you the insights they have learned over their careers. Attendees will benefit from their perspectives based on extensive production and systems testing experience.

During this free, online – and highly interactive – 3 hour masterclass, you will learn how to apply SLO-driven strategies to optimize observability – and ultimately performance – from the front-end to the back-end of your distributed systems. Our experts walk you through how to optimize distributed systems performance with SLO-driven strategies.  Specifically, we’ll cover how to measure and optimize performance for latency-sensitive distributed systems built using modern DevOps approaches with progressive delivery. At the end, you will have the opportunity to demonstrate what you learned and earn a certificate that shows your achievement.

You will learn best practices on how to:

  • Optimize your team’s pipelines and processes for continuous performance
  • Define, validate, and visualize SLOs across your distributed systems
  • Diagnose and fix subpar distributed database performance using observability methods

Schedule

The Masterclass program runs for three hours, between 8:30 – 11:30 AM Pacific Daylight Time, June 21st, 2022. The platform will open at 8:00 AM and close at 12:00 PM (noon), so there will be time before and after the scheduled sessions for you to meet and mingle with your fellow attendees, plus to ask questions in our live staffed virtual lounges:

TIME (PDT) CONTENT
8:00-8:30am PLATFORM OPENS (TECHNOLOGY LOUNGES)
8:30-8:40am Welcome & Opening Remarks
Peter Corless, ScyllaDB
8:40 – 9:30am Introduction to Modern Performance
Leandro Melendez, Grafana k6
9:30-10:00am Efficient Automation with the Help of SRE Concepts

Henrik Rexed, Dynatrace
10:00-10:30am Observability Best Practices in Distributed Databases
Felipe Cardeneti Mendes, ScyllaDB
10:30-11:00am Panel Discussion + Live Q&A
11:00-11:30am Certification Examination and Awards
11:30am-12:00pm PLATFORM CLOSING (TECHNOLOGY LOUNGES)

Agenda

The day’s learning will be centered around three main lessons:

Introduction to Modern Performance

The relentless drive to cloud-native distributed applications means no one can sit back and hope applications are always going to perform at their peak capabilities. Whether your applications are built on microservices and serverless architectures, you need to be able to observe and improve the performance of your always-on, real-time systems. We’ll kick off our masterclass by providing a grounding in current expectations of what performance engineering and load testing entail.

This session defines the modern challenges developers face, including continuous performance principles, Service Level Objectives (SLOs) and Service Level Indicators (SLIs). It will delineate best practices, and provide hands-on examples using Grafana k6, an open source modern load testing tool built on Go and JavaScript.

Efficient Automation with the Help of SRE Concepts

Service Level Objectives (SLOs) are a key part of modern software engineering practices. They help quantify the quality of service provided to end users, and therefore maintaining them becomes important for modern DevOps approaches with progressive delivery.

SLOs have become a powerful tool allowing teams to manage the monitoring of the reliability of production systems and validate the test cycles automatically launched through various pipelines. We’ll walk you through how to measure, validate and visualize these SLOs using Prometheus, an open observability platform, to provide concrete examples.

Next, you will learn how to automate your deployment using Keptn, a cloud-native event-based life-cycle orchestration framework. Discover how it can be used for multi-stage delivery, remediation scenarios, and automating production tasks.

Observability Best Practices in Distributed Databases

In the final session, dive into distributed database observability, showing several real-world situations that may affect workload performance and how to diagnose them. Gaining proper visibility against your database is important in order to predict, analyze and observe patterns and critical problems that emerge. Managing scenarios such as unbounded concurrency, alerting, queueing, anti-entropy and typical database background activities are often a production-grade requirement for most mission critical platforms. However, simply having access to these metrics is not enough. DevOps, Database and Application professionals need a deeper understanding of them to know what to watch for.

To show how to put these principles into practice, this session uses the ScyllaDB NoSQL database and ScyllaDB Monitoring Stack built on open source Grafana, Grafana Loki, Alert Manager and Prometheus as an example of instrumentation, observability tools, and troubleshooting methodologies. Discover how to triage problems by priority and quickly gather live insights to troubleshoot application performance.

Your Instructors

A masterclass is distinguished by the experience of its instructors, and we are pleased to introduce our 3 amazing instructors:

Leandro Melendez
Grafana k6

Leandro Melendez, also known by his nom de guerre “Señor Performo,” has decades of experience as a developer, DBA, and project manager. He has spent the past decade focusing on QA for performance and load testing. Now a developer advocate at Grafana k6, Leandro seeks to share his experiences with broader audiences, such as hosting the Spanish-language edition of the PerfBytes podcast. He will also be speaking at the upcoming P99 CONF in October 2022.

Henrik Rexed
Dynatrace

Henrik Rexed is a cloud native advocate with a career of nearly two decades focusing specifically on performance engineering. As a Senior Staff Engineer at Dynatrace, he is a frequent speaker in conferences, webinars and podcasts. His current passion is the movement to the emerging Cloud Native Computing Foundation (CNCF) standard for OpenTelemetry. Henrik will also be a speaker at the upcoming P99 CONF in October 2022.

Felipe Cardeneti Mendes
ScyllaDB

Felipe Cardeneti Mendes is a published author and solutions architect at ScyllaDB. Before ScyllaDB he spent nearly a decade as a systems administrator at one of the largest tech companies on the planet. He focuses on helping customers adapt their systems to meet the challenges of this next tech cycle, whether that means building low latency applications in Rust, or migrating or scaling their systems to the latest hardware, operating systems and cloud environments.

Achieve Certification

At the end of the prepared presentations, there will be an examination offered to all attendees. Those who complete the examination with a passing grade will be presented with a Certificate of Completion for the Masterclass.

Attendees who do not achieve a passing grade during the live event will have a chance to retake the examination at a later date to achieve their certification.

Sponsors

This Masterclass is brought to you for free through a joint collaboration of ScyllaDB, Grafana k6, and Dynatrace.

Access it Now

The world of cloud-native distributed applications, observability and performance optimization is changing in real time, so there is no better time than right now to commit to deepening your knowledge.

ACCESS ON DEMAND

Introducing ScyllaDB Enterprise 2022.1

ScyllaDB is pleased to announce the availability of ScyllaDB Enterprise 2022.1, a production-ready ScyllaDB Enterprise major release. After more than 6,199 commits originating from five open source releases, we’re excited to now move forward with ScyllaDB Enterprise 2022.

ScyllaDB Enterprise builds on the proven features and capabilities of our ScyllaDB Open Source NoSQL database, providing greater reliability from additional vigorous testing. Between ScyllaDB Enterprise 2021 to this release of 2022, we’ve closed 1,565 issues. Plus, it provides a set of unique enterprise-only features such as LDAP authorization and authentication.

ScyllaDB Enterprise 2022 is immediately available for all ScyllaDB Enterprise customers, and will be deployed to ScyllaDB Cloud users as part of their fully-managed service. A 30-day trial version is available for those interested in testing its capabilities against their own use cases.

New Features in ScyllaDB Enterprise 2022

Support for AWS EC2 I4i Series Instances

ScyllaDB now supports the new AWS EC2 I4i series instances. The I4i series provides superior performance over the I3 series due to a number of factors: the 3rd generation Intel Xeon “Ice Lake” processors, the AWS Nitro System hypervisor, and low-latency Nitro NVMe SSDs. ScyllaDB can achieve 2x throughput and lower latencies on I4i instances over comparable i3 servers.

Support for Arm-based Systems

With ScyllaDB Open Source 4.6 we implemented support for Arm-based architectures, including the new AWS Im4gn and Is4gen storage-optimized instances powered by Graviton2 processors. Release 4.6 also supports the low cost T4g burstable instance for development of cloud-based applications. Since ScyllaDB is now compiled to run on any Aarch64 architecture, you can even run it on an Arm-based M1-powered Macintosh using Docker for local development.

Change Data Capture (CDC)

This feature allows you to track changes made to a base table in your database for visibility or auditing. Changes made to the base table are stored in a separate table that can be queried by standard CQL. Our CDC implementation uses a configurable Time To Live (TTL) to ensure it does not occupy an inordinate amount of your disk. Note: This was one of the few features brought in through a maintenance release (2021.1.1). It is new since the last major release of ScyllaDB Enterprise 2021.

LDAP

ScyllaDB Enterprise 2021.1.2 introduced support for LDAP Authorization and LDAP Authentication.

  • LDAP Authentication allows users to manage the list of ScyllaDB Enterprise users and passwords in an external Directory Service (LDAP server), for example, MS Active Directory. ScyllaDB leverages authentication by a third-party utility named saslauthd to handle authentication, which, in turn, supports many different authentication mechanisms. Read the complete documentation on how to configure and use saslauthd and LDAP Authentication.
  • LDAP Authorization allows users to manage the roles and privileges of ScyllaDB users in an external Directory Service (LDAP server), for example, MS Active Directory. To do that, one needs a role_manager entry in scylla.yaml set to com.scylladb.auth.LDAPRoleManager. When this role manager is chosen, ScyllaDB forbids GRANT and REVOKE role statements (CQL commands) as all users get their roles from the contents of the LDAP directory.When LDAP Authorization is enabled and a ScyllaDB user authenticates to ScyllaDB, a query is sent to the LDAP server, whose response sets the user’s roles for that login session. The user keeps the granted roles until logout; any subsequent changes to the LDAP directory are only effective at the user’s next login to ScyllaDB. Read the complete documentation on how to configure and use LDAP Authorization.

Note: This is one of the few features brought in through a maintenance release (2021.1.2). It is new since the last major release of ScyllaDB Enterprise 2021.

Timeout per Operation

There is now new syntax for setting timeouts for individual queries with “USING TIMEOUT”. This is particularly useful when one has queries that are known to take a long time. Till now, you could either increase the timeout value for the entire system (with request_timeout_in_ms), or keep it low and see many timeouts for the longer queries. The new Timeout per Operation allows you to define the timeout in a more granular way. Conversely, some queries might have tight latency requirements, in which case it makes sense to set their timeout to a small value. Such queries would get time out faster, which means that they won’t needlessly hold the server’s resources. You can use the new TIMEOUT parameters for both queries (SELECT) and updates (INSERT, UPDATE, DELETE). Note: This was one of the few features brought in through a maintenance release (2021.1.1). It is new since the last major release of ScyllaDB Enterprise 2021.

Improvements to Cloud Formation

A new variable VpcCidrIp allows you to set CIDR IP range for VPC. Previously, the range was hard coded to 172.31.0.0/16. Plus the Cloud Formation template was reordered for better readability. Note: This was one of the few features brought in through a maintenance release (2021.1.5). It is new since the last major release of ScyllaDB Enterprise 2021.

I/O Scheduler Improvements

A new I/O scheduler was integrated via a Seastar update. The new scheduler is better at restricting disk I/O in order to keep latency low. This implementation introduces a new dispatcher in the middle of a traditional producer-consumer model. The IO scheduler seeks to find what is known as the effective dispatch rate — the fastest rate at which the system can process data without running into internal queuing jams.

Improved Reverse Queries

Reverse queries are SELECT statements that use reverse order from the table schema. If no order was defined, the default order is ascending (ASC). For example, imagine rows in a partition sorted by time in ascending order. A reverse query would sort rows in descending order, with the newest rows first. Reverse queries were improved in ScyllaDB Open Source 4.6, and are further improved in 5.0, first, to return short pages to limit memory consumption, and secondly, for reverse queries to leverage ScyllaDB’s row-based cache (before 5.0 they bypassed the cache).

New Virtual Tables for Configuration and Nodetool Information

A new system.config virtual table allows querying and updating a subset of configuration parameters over CQL. These updates are not persistent, and will return to the scylla.yaml update after restart. Nodetool command information can also be accessed via virtual tables, including snapshots, protocol servers, runtime info, and a virtual table replacement for nodetool versions. Virtual tables allow remote access over CQL, including for ScyllaDB Cloud users.

Repair-based Node Operations (RBNO)

We leveraged our row-based repair mechanism for other repair-based node operations (RBNO), such as node bootstraps, decommissions, and removals, and allowed it to use off-strategy compaction to simplify cluster management.

SSTable Index Caching

We also added SSTable Index Caching (4.6). Up to this release, ScyllaDB only cached data from SSTables data rows (values), not the indexes themselves. As a result, if the data was not in cache readers had to touch the disk while walking the index. This was inefficient, especially for large partitions, increasing the load on the disk, and adding latency. Now, index blocks can be cached in memory, between readers, populated on access, and evicted on memory pressure – reducing the IO and decreasing latency. More info can be found in Tomasz Grabiec session in ScyllaDB Summit “SSTable Index Caching.”

SSTable Index Caching provides faster performance by avoiding
unnecessary queries to SSD.

Granular Timeout Controls

Timeouts per operation were also added, allowing more granular control over latency requirements and freeing up server resources (4.4). We expanded upon this by adding Service Level Properties, allowing you to associate attributes to rules and users, so you can have granular control over session properties, like per-service-level timeouts and workload types (4.6).

Guardrails

ScyllaDB is a very powerful tool, with many features and options. In many cases, these options, such as experimental or performance-impacting features, or a combination of them, are not recommended to run in production. Guardrails are a collection of reservations that make it harder for the user to use non-recommended options in production. A few examples:

  • Prevent users from using SimpleReplicationStrategy.
  • Warn or prevent usage of DateTieredCompactionStrategy, which has long since been deprecated in favor of TimeWindowCompactionStrategy.
  • Disable Thrift, a legacy interface, by default
  • Ensure that all nodes use the same snitch mode

ScyllaDB administrators can use our default settings or customize guardrails for their own environment and needs.

​​Improvements to Alternator

The Amazon DynamoDB-compatible interface has been updated to include a number of new features:

  • Support for Cross-Origin Resource Sharing (CORS)
  • Fully supports nested attribute paths
  • Support attribute paths in ConditionExpression, FilterExpression Support, and ProjectionExpression

Related Innovations

Kubernetes Operator Improvements

Our Kubernetes story was production-ready with ScyllaDB Operator 1.0, released in January of 2020. Since that time, we have continuously improved on our capabilities and performance tuning with cloud orchestration.

Faster Shard-Aware Drivers

Also during the past two years we have learned a lot about making our shard-aware drivers even faster. Open source community contributions led to a shard-aware Python driver which you can read about here and here. Since then our developers have embraced the speed and safety of Rust, and we are now porting our database drivers to async Rust, which will provide a core of code for multiple drivers with language bindings to C/C++ and Python.

New Release Cycle for Enterprise Customers

ScyllaDB Enterprise 2022.1 is a Long Term Support release (LTS) as per our new enterprise release policy. It used to be that Enterprise users would need to wait a year between feature updates in major ScyllaDB Enterprise releases. Starting later this year, we intend to provide additional new features in a Short Term Support (STS) release, versioned as 2022.2. This new mix of LTS and STS releases will allow us to provide new features on an intrayear basis. For you, it means you don’t need to wait as long to see new innovations show up in a stable enterprise release.

For example, let’s look at this chart from the historical releases we made in the era of ScyllaDB Open Source 3.0 to 4.0, and how that tracked to ScyllaDB Enterprise 2019 to 2020.

There was a four month lag between the release of ScyllaDB Open Source 3.0 (January 2019) to the related ScyllaDB Enterprise 2019.1 release (May 2019). And there was a similar three month lag between ScyllaDB Open Source 4.0 and ScyllaDB Enterprise 2020.1.

During the interim between those major releases there were three minor releases of ScyllaDB Open Source — 3.1, 3.2 and 3.3. Enterprise users needed to wait until the next major annual release to utilize any of those innovations — apart from a point feature such as support for IPv6.

In comparison, ScyllaDB Open Source 5.0 was released on July 7th, 2022. And here, within a calendar month, we are announcing ScyllaDB Enterprise 2022.

Plus, when ScyllaDB Enterprise 2022.2 is released later this year, it will include within it additional production-ready innovations brought in from ScyllaDB Open Source without requiring users to wait until the new major release in calendar year 2023.

This narrows the delay in production-ready software features for our Enterprise users. For example, this is what the rest of 2022 could look like:

Note ScyllaDB Enterprise 2022.2 will be a Short Term Support release. Which means some Enterprise users may still decide to wait for the next ScyllaDB Enterprise LTS release in 2023. Existing ScyllaDB Enterprise customers are encouraged to discuss their upgrade policies and strategies with their support team.

Get ScyllaDB Enterprise 2022 Now

ScyllaDB Enterprise 2022 is immediately available. Existing ScyllaDB Enterprise customers can work with their support team to manage their upgrades. ScyllaDB Cloud users will see the changes made automatically to their cluster in the coming days.

For those who wish to try ScyllaDB Enterprise for the first time, there is a 30-day trial available.

You can check out the documentation and download it now from the links below. If this is your first foray into ScyllaDB, or into NoSQL in general, remember to sign up for our free online courses in ScyllaDB University.

GET STARTED WITH SCYLLADB ENTERPRISE

Implementing a New IO Scheduler Algorithm for Mixed Read/Write Workloads

Introduction

Being a large datastore, ScyllaDB operates many internal activities competing for disk I/O. There are data writers (for instance memtable flushing code or commitlog) and data readers (which are most likely fetching the data from the SSTables to serve client requests). Beyond that, there are background activities, like compaction or repair, which  can do I/O in both directions. This competition is very important to tackle; otherwise just issuing a disk I/O without any fairness or balancing consideration, a query serving a read request can find itself buried in the middle of the pile of background writes. By the time it has the opportunity to run, all that wait would have translated into increased latency for the read.

While keeping the in-disk concurrency at a reasonably high level to utilize disk internal parallelism, it is then better for the database not to send all those requests to the disk in the first place, and keep them queued inside the database. Being inside the database, ScyllaDB can then tag those requests and through prioritization guarantee quality-of-service among various classes of I/O requests that the database has to serve: CQL query reads, commitlog writes, background compaction I/O, etc.

That being said, there’s a component in the ScyllaDB low-level engine called the I/O scheduler that operates several I/O request queues and tries to maintain a balance between two inefficiencies — overwhelming the disk with requests and underutilizing it. With the former, we achieve good throughput but end up compromising latency. With the latter, latency is good but throughput suffers. Our goal is to find the balance where we can extract most of the available disk throughput while still maintaining good latency.

Current State / Previous Big Change

Last time we touched this subject, it was about solving the disk throughput and IOPS capacity being statically partitioned between shards. Back then, the challenge was to make correct accounting and limiting of a shared resource constrained with the shard-per-core design of the library.

A side note: Seastar uses the “share nothing” approach, which means that any decisions made by CPU cores (often referred to as shards) are not synchronized with each other. In rare cases, when one shard needs the other shard’s help, they explicitly communicate with each other.

The essential part of the solution was to introduce a shared pair of counters that helped to achieve two goals: prevent shards from consuming more bandwidth than a disk can provide and also act as a “reservation” when each shard could claim the capacity, and wait for it to become available if needed — all in a “fair” manner.

Just re-sorting the requests, however, doesn’t make the I/O scheduler a good one. Another essential part of I/O scheduling is maintaining the balance between queueing and executing requests. Understanding what overwhelms the disk and what doesn’t turns out to be very tricky.

Towards Modeling the Disk

Most likely when evaluating a disk one would be looking at its 4 parameters — read/write IOPS and read/write throughput (such as in MB/s). Comparing these numbers to one another is a popular way of claiming one disk is better than the other and estimating the aforementioned “bandwidth capacity” of the drive by applying Little’s Law. With that, the scheduler’s job is to provide a certain level of concurrency inside the disk to get maximum bandwidth from it, but not to make this concurrency too high in order to prevent disk from queueing requests internally for longer than needed.

In its early days ScyllaDB took this concept literally and introduced the configuration option to control the maximum amount of requests that can be sent into the disk in one go. Later this concept was elaborated taking into account the disk bandwidth and the fact that reads and writes come at different costs. So the scheduler maintained not only the “maximum number of requests” but also the “maximum number of bytes” sent to the disk and evaluated these numbers differently for reads vs writes.

This model, however, didn’t take into account any potential interference read and write flows could have on each other. The assumption was that this influence should be negligible. Further experiments on mixed I/O showed that the model could be improved even further.

The Data

The model was elaborated by developing a tool, Diskplorer, to collect a full profile of how a disk behaves under all kinds of load. What the tool does is load the disk with reads and writes of different “intensities” (including pure workloads, when one of the dimensions is literally zero) and collects the resulting latencies of requests. The result is 5-dimension space showing how disk latency depends on the { read, write } x { iops, bandwidth } values. Drawing such a complex thing on the screen is challenging by itself. Instead, the tool renders a set of flat slices, some examples of which are shown below.

For instance, this is how read request latency depends on the intensity of small reads (challenging disk IOPS capacity) vs intensity of large writes (pursuing the disk bandwidth). The latency value is color-coded, the “interesting area” is painted in cyan — this is where the latency stays below 1 millisecond. The drive measured is the NVMe disk that comes with the AWS EC2 i3en.3xlarge instance.

This drive demonstrates almost perfect half-duplex behavior — increasing the read intensity several times requires roughly the same reduction in write intensity to keep the disk operating at the same speed.

Similarly this is what the plot looks like for yet another AWS EC2 instance — the i3.3xlarge.

This drive demonstrates less than perfect, but still half-duplex behavior. Increasing reads intensity needs writes to slow down much deeper to keep up with the required speeds, but the drive is still able to maintain the mixed workload.

Less relevant for ScyllaDB, but still very interesting is the same plot for some local HDD. The axis and coloring are the same, but the scale differs, in particular, the cyan outlines ~100 milliseconds area.

This drive is also half-duplex, but with write bandwidth above ~100MB/s the mixed workload cannot survive, despite a pure write workload being pretty possible on it.

It’s seen that the “safety area” when we can expect the disk request latency to stay below a value we want is a concave triangular-like area with its vertices located at zero and near disk’s maximum bandwidth and IOPS.

The Math

If trying to approximate the above plots with a linear equation, the result would be like this

Where Bx are maximum read and write bandwidth values, Ox are the maximum IOPS values, and K is the constant with the exact value taken from the collected profile. Taking into account that bandwidth and IOPS are both time derivative of the respective bytes and ops numbers, i.e.

and

and introducing relations between maximum bandwidth and IOPS values

the above equation turns into some simpler form of

Let’s measure each request with a two-values tuple T = {1, bytes} for reads and T = {Mo, Mb · bytes} for writes and define the normalization operation to be

Now the final equation looks like

Why is it better or simpler than the original one? The thing is that measuring bandwidth (or IOPS) is not simple. If you look at any monitoring tool, be it a plain Linux top command or sophisticated Grafana stack, you’ll find that all such speedish values are calculated within some pre-defined time-frame, say 1 second. Limiting something that’s only measurable with a certain time gap is an even more complicated task.

In its final form, the scheduling equation has the great benefit — it needs to accumulate some instant numerical values — the aforementioned request tuples. The only difficulty here is that the algorithm should limit not the accumulated value itself, but the speed at which it grows. This task has its well known solution.

The Code

There’s a well developed algorithm called token bucket out there. It originates from telecommunications and networking and is used to make sure that a packet flow doesn’t exceed the configured rate or bandwidth.

In its bare essence, the algorithm maintains a bucket with two inputs and one output. The first input is the flow of packets to be transmitted. The second input is the rate-limited flow of tokens. Rate-limited here means that unlike packets that flow into the bucket as they appear, tokens are put into a bucket with some fixed rate, say N tokens per second. The output is the rate-limited flow of packets that had recently arrived into the bucket. Each outgoing packet carries one or more tokens with it and the number of tokens corresponds to what “measure” is rate-limited. If the token bucket needs to ensure the packets-per-second rate, then each packet takes away one token. If the goal is to limit the output for bytes-per-second, then each packet must carry as many tokens as many bytes its length is.

This algorithm can be applied to our needs. Remember, the goal is to rate-limit the outgoing requests’ normalized cost-tuples. Respectively, each request is only dispatched when it can get N(T) tokens with it, and the tokens flow into the bucket with the rate of K.

In fact, the classical token bucket algo had to be slightly adjusted to play nicely with the unpredictable nature of modern SSDs. The thing is that despite all the precautions and modeling, disks still can slow down unexpectedly. One of the reasons is background garbage collection performed by the FTL. Continuing to serve IOs at the initial rate into the disk running its background hygiene should be avoided as much as possible. To compensate for this, we added a backlink to the algorithm — tokens that flow into the bucket to not appear from nowhere (even in the strictly rate-limited manner), instead they originate from another bucket. In turn, this second bucket gets its tokens from the disk itself after requests complete.

The modified algorithm makes sure the I/O scheduler serves requests with a rate that doesn’t exceed two values — the one predicted by the model and the one that the disk shows under real load.

Results

The results can be seen from two angles. First, is whether the scheduler does its job and ensures the bandwidth and IOPS stay in the “safety area” from the initial profile measurements. Second, is whether this really helps, i.e. does the disk show good latencies or not. To check both we ran a cassandra stress test over two versions of ScyllaDB — 4.6 with its old scheduler and 5.0 with the new one. There were four runs in a row for each version — the first run was to populate ScyllaDB with data, and subsequent to query this data back, but “disturbed” with background compaction (because we know that ordinary workload is too easy for modern NVMe disk to handle). The querying was performed with different rates of client requests — 10k, 5k and 7.5k requests per second.

Bandwidth/IOPS Management

Let’s first look at how the scheduler maintains I/O flows and respects the expected limitations.

Population bandwidth: ScyllaDB 4.6 (total) vs 5.0 (split, green — R, yellow — W)

On the pair of plots above you can see bandwidth for three different flows — commitlog (write only), compaction (both reads and writes) and memtable flushing (write only). Note that ScyllaDB 5.0 can report read and write bandwidth independently.

ScyllaDB 4.6 maintains the total bandwidth to be 800 MB/s which is the peak of what the disk can do. ScyllaDB 5.0 doesn’t let the bandwidth exceed a somewhat lower value of 710 MB/s because the new formula requires that the balanced sum of bandwidth and IOPS, not their individual values, stays within the limit.

Similar effects can be seen during the reading phase of the test on the plots below. Since the reading part involves the low-size reading requests coming from the CQL requests processing, this time it’s good to see not only the bandwidth plots, but also the IOPS ones. Also note, that this time there are three different parts of the test — the incoming request rate was 10k, 5k and 7.5k, thus there are three distinct parts on the plots. And last but not least — the memtable flushing and commitlog classes are not present here, because this part of the test was read-only with the compaction job running in the background.

Query bandwidth: ScyllaDB 4.6 (total) vs 5.0 (split, green — R, yellow — W)

The first thing to note is that the 5.0 scheduler inhibits compaction class much heavier than in 4.6, letting the query class get more disk throughput — which is one of the goals we wanted to achieve. Second, it’s seen that like during the population stage, the net bandwidth on the 5.0 scheduler is lower than the one on 4.6 — again because the new scheduler takes into account the IOPS value together with the bandwidth, not separately.

By looking at the IOPS plots one may notice that 4.6 seems to provide more or less equal IOPS for different request rates in the query class, while 5.0’s allocated capacity is aligned with the incoming rate. That’s the correct observation, the answer to it — 4.6 didn’t manage to sustain even the request rate of 5k requests per second, while 5.0 felt well even at the rate of 7.5k and died at 10k. Now that is the clear indication that the new scheduler does provide the expected I/O latencies.

Latency Implications

Seastar keeps track of two latencies — in-queue and in-disk. The scheduler’s goal is to maintain the latter one. The in-queue can grow as high as it wants; it’s up to upper layers to handle it. For example, if the requests happen to wait too long in the queue, ScyllaDB activates a so-called “backpressure” mechanism which involves several techniques and may end up canceling some requests sitting in the queue.

ScyllaDB 4.6 commitlog class delays (green — queue time, yellow — execution time)

ScyllaDB 5.0 commitlog class delays (green — queue time, yellow — execution time)

The scale of the plot doesn’t make it obvious, but from the pop-up on the bottom-right it’s seen that the in-disk latency of the commitlog class dropped three times — from 1.5 milliseconds to about half a millisecond. This is the direct consequence of the overall decreased bandwidth as was seen above.

When it comes to querying workload, things get even better, because this time scheduler keeps bulky writes away from the disk when the system needs short and latency sensitive reads.

ScyllaDB 4.6 query class delays (green — queue time, yellow — execution time)

ScyllaDB 5.0 query class delays (green — queue time, yellow — execution time)

Yet again, the in-queue latencies are orders of magnitude larger than the in-disk ones, so the idea of what the latter are can be gotten from the pop-ups on the bottom-right. Query in-disk latencies dropped two times thanks to the suppressed compaction workload.

What’s Next

The renewed scheduler will come as a part of lots of other great improvements in 5.0. However, some new features should be added on top. For example — the metrics list will be extended to report the described accumulated costs each scheduling class had collected so far. Another important feature is in adjusting the rate limiting factor (the K thing) run-time to address disks aging out and any potential mis-evaluations we might have done. And finally, to adjust the scheduler to be usable on slow drives like persistent cloud disks or spinning HDDs we’ll need to tune the configuration of the latency goal the scheduler aims at.

Discover More in ScyllaDB V

ScyllaDB V is our initiative to make ScyllaDB, already the most monstrously fast and scalable NoSQL database available, into even more of a monster to run your most demanding workloads. You can learn more about this in three upcoming webinars:

Webinar: ScyllaDB V Developer Deep Dive Series – Performance Enhancements + AWS I4i Benchmarking

August 4, 2022 | Live Online | ScyllaDB

We’ll explore the new IO model and scheduler, which provide fine-tuned balancing of read/write requests based on your disk’s capabilities. Then, we’ll look at how ScyllaDB’s close-to-the-hardware design taps the full power of high-performance cloud computing instances such as the new EC2 I4i instances.

Register Now >

Webinar: ScyllaDB V Developer Deep Dive Series – Resiliency and Strong Consistency via Raft

August 11, 2022 | Live Online | ScyllaDB

ScyllaDB’s implementation of the Raft consensus protocol translates to strong, immediately consistent schema updates, topology changes, tables and indexes, and more. Learn how the Raft consensus algorithm has been implemented, what you can do with it today, and what radical new capabilities it will enable in the days ahead.

Register Now >

Webinar: ScyllaDB V Developer Deep Dive Series – Rust-Based Drivers and UDFs with WebAssembly

August 18, 2022 | Live Online | ScyllaDB

Rust and Wasm are both perfect for ScyllaDB’s obsession with high performance and low latency. Join this webinar to hear our engineers’ insights on how Rust and Wasm work with ScyllaDB and how you can benefit from them.

Register Now >

 

Data Mesh — A Data Movement and Processing Platform @ Netflix

Data Mesh — A Data Movement and Processing Platform @ Netflix

By Bo Lei, Guilherme Pires, James Shao, Kasturi Chatterjee, Sujay Jain, Vlad Sydorenko

Background

Realtime processing technologies (A.K.A stream processing) is one of the key factors that enable Netflix to maintain its leading position in the competition of entertaining our users. Our previous generation of streaming pipeline solution Keystone has a proven track record of serving multiple of our key business needs. However, as we expand our offerings and try out new ideas, there’s a growing need to unlock other emerging use cases that were not yet covered by Keystone. After evaluating the options, the team has decided to create Data Mesh as our next generation data pipeline solution.

Last year we wrote a blog post about how Data Mesh helped our Studio team enable data movement use cases. A year has passed, Data Mesh has reached its first major milestone and its scope keeps increasing. As a growing number of use cases on board to it, we have a lot more to share. We will deliver a series of articles that cover different aspects of Data Mesh and what we have learned from our journey. This article gives an overview of the system. The following ones will dive deeper into different aspects of it.

Data Mesh Overview

A New Definition Of Data Mesh

Previously, we defined Data Mesh as a fully managed, streaming data pipeline product used for enabling Change Data Capture (CDC) use cases. As the system evolves to solve more and more use cases, we have expanded its scope to handle not only the CDC use cases but also more general data movement and processing use cases such that:

  • Events can be sourced from more generic applications (not only databases).
  • The catalog of available DB connectors is growing (CockroachDB, Cassandra for example)
  • More Processing patterns such as filter, projection, union, join, etc.

As a result, today we define Data Mesh as a general purpose data movement and processing platform for moving data between Netflix systems at scale.

Overall Architecture

The Data Mesh system can be divided into the control plane (Data Mesh Controller) and the data plane (Data Mesh Pipeline). The controller receives user requests, deploys and orchestrates pipelines. Once deployed, the pipeline performs the actual heavy lifting data processing work. Provisioning a pipeline involves different resources. The controller delegates the responsibility to the corresponding microservices to manage their life cycle.

Pipelines

A Data Mesh pipeline reads data from various sources, applies transformations on the incoming events and eventually sinks them into the destination data store. A pipeline can be created from the UI or via our declarative API. On the creation/update request the controller figures out the resources associated with the pipeline and calculates the proper configuration for each of them.

Connectors

A source connector is a Data Mesh managed producer. It monitors the source database’s bin log and produces CDC events to the Data Mesh source fronting Kafka topic. It is able to talk to the Data Mesh controller to automatically create/update the sources.

Previously we only had RDS source connectors to listen to MySQL and Postgres using the DBLog library; Now we have added Cockroach DB source connectors and Cassandra source connectors. They use different mechanisms to stream events out of the source databases. We’ll have blog posts deep dive into them.

In addition to managed connectors, application owners can emit events via a common library, which can be used in circumstances where a DB connector is not yet available or there is a preference to emit domain events without coupling with a DB schema.

Sources

Application developers can expose their domain data in a centralized catalog of Sources. This allows data sharing as multiple teams at Netflix may be interested in receiving changes for an entity. In addition, a Source can be defined as a result of a series of processing steps — for example an enriched Movie entity with several dimensions (such as the list of Talents) that further can be indexed to fulfill search use cases.

Processors

A processor is a Flink Job. It contains a reusable unit of data processing logic. It reads events from the upstream transports and applies some business logic to each of them. An intermediate processor writes data to another transport. A sink processor writes data to an external system such as Iceberg, ElasticSearch, or a separate discoverable Kafka topic.

We have provided a Processor SDK to help the advanced users to develop their own processors. Processors developed by Netflix developers outside our team can also be registered to the platform and work with other processors in a pipeline. Once a processor is registered, the platform also automatically sets up a default alert UI and metrics dashboard

Transports

We use Kafka as the transportation layer for the interconnected processors to communicate. The output events of the upstream processor are written to a Kafka topic, and the downstream processors read their input events from there.

Kafka topics can also be shared across pipelines. A topic in pipeline #1 that holds the output of its upstream processor can be used as the source in pipeline #2. We frequently see use cases where some intermediate output data is needed by different consumers. This design enables us to reuse and share data as much as possible. We have also implemented the features to track the data lineage so that our users can have a better picture of the overall data usage.

Schema

Data Mesh enforces schema on all the pipelines, meaning we require all the events passing through the pipelines to conform to a predefined template. We’re using Avro as a shared format for all our schemas, as it’s simple, powerful, and widely adopted by the community..

We make schema as the first class citizen in Data Mesh due to the following reasons:

  • Better data quality: Only events that comply with the schema can be encoded. Gives the consumer more confidence.
  • Finer granularity of data lineage: The platform is able to track how fields are consumed by different consumers and surface it on the UI.
  • Data discovery: Schema describes data sets and enables the users to browse different data sets and find the dataset of interest.

On pipeline creation, each processor in that pipeline needs to define what schema it consumes and produces. The platform handles the schema validation and compatibility check. We have also built automation around handling schema evolution. If the schema is changed at the source, the platform tries to upgrade the consuming pipelines automatically without human intervention.

Future

Data Mesh Initially started as a project to solve our Change Data Capture needs. Over the past year, we have observed an increasing demand for all sorts of needs in other domains such as Machine Learning, Logging, etc. Today, Data Mesh is still in its early stage and there are just so many interesting problems yet to be solved. Below are the highlights of some of the high priority tasks on our roadmap.

Making Data Mesh The Paved Path (Recommended Solution) For Data Movement And Processing

As mentioned above, Data Mesh is meant to be the next generation of Netflix’s real-time data pipeline solution. As of now, we still have several specialized internal systems serving their own use cases. To streamline the offering, it makes sense to gradually migrate those use cases onto Data Mesh. We are currently working hard to make sure that Data Mesh can achieve feature parity to Delta and Keystone. In addition, we also want to add support for more sources and sinks to unlock a wide range of data integration use cases.

More Processing Patterns And Better Efficiency

People use Data Mesh not only to move data. They often also want to process or transform their data along the way. Another high priority task for us is to make more common processing patterns available to our users. Since by default a processor is a Flink job, having each simple processor doing their work in their own Flink jobs can be less efficient. We are also exploring ways to merge multiple processing patterns into one single Flink job.

Broader support for Connectors

We are frequently asked by our users if Data Mesh is able to get data out of datastore X and land it into datastore Y. Today we support certain sources and sinks but it’s far from enough. The demand for more types of connectors is just enormous and we see a big opportunity ahead of us and that’s definitely something we also want to invest on.

Data Mesh is a complex yet powerful system. We believe that as it gains its maturity, it will be instrumental in Netflix’s future success. Again, we are still at the beginning of our journey and we are excited about the upcoming opportunities. In the following months, we’ll publish more articles discussing different aspects of Data Mesh. Please stay tuned!

The Team

Data Mesh wouldn’t be possible without the hard work and great contributions from the team. Special thanks should go to our stunning colleagues:

Bronwyn Dunn, Jordan Hunt, Kevin Zhu, Pradeep Kumar Vikraman, Santosh Kalidindi, Satyajit Thadeshwar, Tom Lee, Wei Liu


Data Mesh — A Data Movement and Processing Platform @ Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Instaclustr support for Apache Cassandra 3.0, 3.11 and 4.0

This blog provides a reference to Instaclustr’s Support for Apache Cassandra and can be used to understand the version status. The table below displayed the dates that the Apache Foundation project will end support for the corresponding major version. Instaclustr will provide our customers with extended support for these versions for 12 months beyond the Apache Foundation project dates. During the extended support period Instaclustr’s support will be limited to critical fixes only.

The Apache Cassandra versions supported on Instaclustr’s Managed Platform provided in this blog are also applicable for our Support-only customers.

The Apache Cassandra project has put in an impressive effort to support these versions thus far, and we’d like to take the opportunity to thank all the contributors and maintainers of these older versions.

Our extended support is provided to enable customers to plan their migrations with confidence. Apache Cassandra® 4.0.4 and 3.11.13 are now available on the Instaclustr platform. Read more on the new features and bug fixes here.

If you have any further concerns or wish to discuss migration, please get in touch with our Support team.

The post Instaclustr support for Apache Cassandra 3.0, 3.11 and 4.0 appeared first on Instaclustr.

NoSQL Summer Blockbusters: From Wrangling Rust to HA Trial by Fire

With summer blockbusters returning in full force this year, maybe you’re on a bit of a binge-watching, popcorn-munching kick? If your tastes veer towards the technical, we’ve got a marquee full of on-demand features that will keep you on the edge of your seat:

  • Escaping Jurassic NoSQL
  • Wrangling Rust
  • Daring database engineering feats
  • High availability trial by fire
  • Breaking up with Apache Cassandra

Here’s an expansive watchlist that crosses genres…

ScyllaDB Engineering

Different I/O Access Methods For Linux, What We Chose For ScyllaDB, And Why

When most server application developers think of I/O, they consider network I/O since most resources these days are accessed over the network: databases, object storage, and other microservices. However, the developer of a database must also consider file I/O.

In this video, ScyllaDB CTO Avi Kivity provides a detailed technical overview of the available choices for I/O access and their various tradeoffs. He then explains why ScyllaDB chose asynchronous direct I/O (AIO/DIO) as the access method for our high-performance low latency database and reviews how that decision has impacted our engineering efforts as well as our product performance.

Avi covers:

  • Four choices for accessing files on a Linux server: read/write, mmap, Direct I/O (DIO) read/write, and asynchronous direct I/O (AIO/DIO)
  • The tradeoffs among these choices with respect to core characteristics such as cache control, copying, MMU activity, and I/O scheduling
  • Why we chose AIO/DIO for ScyllaDB and a retrospective on that decision seven years later

WATCH NOW

Understanding Storage I/O Under Load

There’s a popular misconception about I/O that (modern) SSDs are easy to deal with; they work pretty much like RAM but use a “legacy” submit-complete API. And other than keeping in mind a disk’s possible peak performance and maybe maintaining priorities of different IO streams there’s not much to care about. This is not quite the case – SSDs do show non-linear behavior and understanding the disk’s real abilities is crucial when it comes to squeezing as much performance from it as possible.

Diskplorer is an open-source disk latency/bandwidth exploring toolset. By using Linux fio under the hood it runs a battery of measurements to discover performance characteristics for a specific hardware configuration, giving you an at-a-glance view of how server storage I/O will behave under load. ScyllaDB CTO Avi Kivity shares an interesting approach to measuring disk behavior under load, gives a walkthrough of Diskplorer and explains how it’s used.

With the elaborated model of a disk at hand, it becomes possible to build latency-oriented I/O scheduling that cherry-picks requests from the incoming queue keeping the disk load perfectly Balanced. ScyllaDB engineer Pavel Emelyanov also presents the scheduling algorithm developed for the Seastar framework and shares results achieved using it.

WATCH NOW

ScyllaDB User Experiences

Eliminating Volatile Latencies: Inside Rakuten’s NoSQL Migration

Patience with Apache Cassandra’s volatile latencies was wearing thin at Rakuten, a global online retailer serving 1.5B worldwide members. The Rakuten Catalog Platform team architected an advanced data platform – with Cassandra at its core – to normalize, validate, transform, and store product data for their global operations. However, while the business was expecting this platform to support extreme growth with exceptional end-user experiences, the team was battling Cassandra’s instability, inconsistent performance at scale, and maintenance overhead. So, they decided to migrate.

Watch this video to hear Hitesh Shah’s firsthand account of:

  • How specific Cassandra challenges were impacting the team and their product
  • How they determined whether migration would be worth the effort
  • What processes they used to evaluate alternative databases
  • What their migration required from a technical perspective
  • Strategies (and lessons learned) for your own database migration

WATCH NOW

Real-World Resiliency: Surviving Datacenter Disaster

Disaster can strike even the most seemingly prepared businesses. In March 2021, a fire completely destroyed a popular cloud datacenter in Strasbourg, France, run by OVHcloud, one of Europe’s leading cloud providers. Because of the fire, millions of websites went down for hours and a tremendous amount of data was lost.

One company that fared better than others was Kiwi.com, the popular European travel site. Despite the loss of an entire datacenter, Kiwi.com was able to continue operations uninterrupted, meeting their stringent Service Level Agreements (SLAs) without fail.

What was the secret to their survival while millions of other sites caught in the same datacenter fire went dark?

Watch this video to learn about Kiwi.com’s preparedness strategy and get best practices for your own resiliency planning, including:

  • Designing and planning highly available services
  • The Kiwi.com team’s immediate disaster response
  • Long-term recovery and restoration
  • Choosing and implementing highly resilient systems

WATCH NOW

NoSQL Trends and Best Practices

Beyond Jurassic NoSQL: New Designs for a New World

We live in an age of rapid innovation, but our infrastructure shows signs of rust and old age.  Under the shiny exterior of our new technology, some old and dusty infrastructure is carrying a weight it was never designed to hold.  This is glaringly true in the database world, which remains dominated by databases architected for a different age. These legacy NoSQL databases were designed for a different (nascent) cloud, different hardware, different bottlenecks, and even different programming models.

Avishai Ish-Shalom shares his insights on the massive disconnect between the potential of a brave new distributed world and the reality of “modern” applications relying on fundamentally aged databases. This video covers:

  • The changes in the operating environment of databases
  • The changes in the business requirements for databases
  • The architecture gap of legacy databases
  • New designs for a new world
  • How the adoption of modern databases impacts engineering teams and the products you’re building

WATCH NOW

Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline

Numberly operates business-critical data pipelines and applications where failure and latency means “lost money” in the best-case scenario. Most of their data pipelines and applications are deployed on Kubernetes and rely on Kafka and ScyllaDB, with Kafka acting as the message bus and ScyllaDB as the source of data for enrichment. The availability and latency of both systems are thus very important for data pipelines. While most of Numberly’s applications are developed using Python, they found a need to move high-performance applications to Rust in order to benefit from a lower-level programming language.

Learn the lessons from Numberly’s experience, including:

The rationale for selecting a lower-level language
Strategies for developing using a lower-level Rust code base
Observability and analyzing latency impacts with Rust
Tuning everything from Apache Avro to driver client settings
How to build a mission-critical system combining Apache Kafka and ScyllaDB
Feedback from 6 months of Rust in production

WATCH NOW

Latency and Consistency Tradeoffs in Modern Distributed Databases

Just over 10 years ago, Dr. Daniel Abadi proposed a new way of thinking about the engineering tradeoffs behind building scalable, distributed systems. According to Dr. Abadi, this new model, known as the PACELC theorem, comes closer to explaining the design of distributed database systems than the well-known CAP theorem.

Watch this video to hear Dr. Abadi’s reflections on PACELC ten years later, explore the impact of this evolution, and learn how ScyllaDB Cloud takes a unique approach to support modern applications with extreme performance and low latency at scale.

WATCH NOW

Getting the Most out of ScyllaDB University LIVE Summer School 2022

ScyllaDB University LIVE Summer Session for 2022 is right around the corner, taking place on Thursday, July 28th, 8AM-12PM PDT!

REGISTER NOW

As a quick reminder, ScyllaDB University LIVE is a FREE, half-day, instructor-led training event, led by  our top engineers and architects. We will have two parallel tracks – a getting started track focused on NoSQL and ScyllaDB Essentials and a track covering advanced ScyllaDB strategies and best practices. Across the tracks, we’ll have hands-on training and introduce new features. Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with ScyllaDB experts and network with other users.

Participants who complete the training will have access to more free, online, self-paced learning material such as our hands-on labs on ScyllaDB University.

Additionally, those that complete the training will be able to get a certification and some cool swag!

Detailed Agenda and How to Prepare

You can prepare for the event by taking some of our courses at ScyllaDB University. They are completely free and will allow you to better understand ScyllaDB and how the technology works.

Here are the sessions on the ScyllaDB University LIVE agenda. Below each session, we list the related ScyllaDB University lessons you might want to explore in advance.

Essentials Track Advanced  Track
ScyllaDB Essentials

Intro to Scylla, Basic concepts, Scylla Architecture, Hands-on Demo

Suggested learning material:

Kafka and Change Data Capture (CDC)

What is CDC? Consuming data, Under the hood, Hands-on Example

Suggested learning material:

ScyllaDB Basics

Basic Data Modeling, Definitions, Data Types, Primary Key Selection, Clustering key, Compaction, ScyllaDB Drivers, Overview

Suggested learning material:

Running ScyllaDB on Kubernetes

ScyllaDB Operator, ScyllaDB Deployment, Alternator Deployment, Maintenance, Hands-on Demo, Recent Updates

Suggested learning material:

Build Your First ScyllaDB-Powered App

This session is a hands-on example of how to create a full-stack app powered by Scylla Cloud and implement CRUD operations using NodeJS and Express.

Suggested learning material:

Advanced Topics in ScyllaDB

Collections, UDT, MV, Secondary Index, Prepared Statements, Paging, Retries, Sizing, TTL, Troubleshooting

Suggested learning material:

REGISTER NOW FOR SCYLLADB UNIVERSITY LIVE

Benchmarking Petabyte-Scale NoSQL Workloads with ScyllaDB

With the rise of real-time applications reading and writing petabytes of data daily, it’s not surprising that database speed at scale is gaining increased interest. Even if you’re not planning for growth, a surge could occur when you’re least expecting it. Yet scaling latency-sensitive data-intensive applications is not trivial. Teams often learn all too late that the database they originally selected is not up to the task.

Benchmarks performed at petabyte scale can help you understand how a particular database handles the extremely large workloads that your company expects (or at least hopes) to encounter. However, such benchmarks can be challenging to design and execute.

At ScyllaDB, we performed a foundational petabyte benchmark of our high-performance, low-latency database for a number of reasons:

  • To help the increasing number of ScyllaDB users and evaluators with petabyte-scale use cases understand if our database is aligned with their requirements for speed at scale.
  • To establish a baseline against which to measure the performance improvements achieved with the new series of  ScyllaDB V releases and the latest AWS EC2 instances, such as the powerful I4i family.
  • To quantify the latency impact of ScyllaDB Enterprise’s unique workload prioritization capability, which allows admins to allocate “shares” of the available hardware infrastructure to different workloads.

GET SCYLLADB CEO DOR LAOR’S PERSPECTIVE ON THE LATEST RELEASE

This blog provides a summary of the results. If you want to drill down deeper, take a look at the related paper that outlines the configuration process, results (with and without workload prioritization), and lessons learned that might benefit others planning their own petabyte-scale benchmark.

A Quick Look at the Petabyte Benchmark Results

ScyllaDB stored a 1 PB data set using only 20 large machines running two side-by-side mixed workloads at 7.5 million operations per second and single-digit millisecond latency.

The results reflect a storage density of 50 TB/server, which is unparalleled in the industry. The amount of servers contributes to a low total cost of ownership (TCO) as well as operational simplicity.

The ScyllaDB cluster achieved single-digit millisecond P99 latency with 7M TPS

In this benchmark, ScyllaDB demonstrated workload prioritization — one of its unique features that allows users to assign priorities per workload. Workload prioritization enables cluster consolidation and offers an additional level of savings. In this setup, a smaller – but highly important – workload of 1TB was hosted on the same mega 1 PB deployment. Traditionally, such a workload can get starved, since it is 1000x smaller than the large workload. Moreover, the large workload was dominating the cluster with 7M TPS while the smaller one had “only” 280K TPS. Nevertheless, when assigned a priority, the smaller workload reduced its latency by half, to only 1-2 ms for its P99.

To summarize, ScyllaDB allows you to scale to any workload size, and to consolidate multiple workloads into a single operational cluster.

Dive into the Details…and Lessons Learned

But that’s just the tip of the iceberg. If you’re serious about petabyte-scale performance, details about how we got these results – and how you can conduct your own petabyte-scale benchmarks – are critical. That’s where the paper, What We Learned Benchmarking Petabyte-Scale Workloads with ScyllaDB, comes in. Get the PDF for:

  • Details on how we configured the petabyte-scale benchmark
  • A deeper look at workload prioritization and its impact
  • Tips and tricks that might benefit others planning their own petabyte-scale benchmark

READ THE SCYLLADB PETABYTE SCALE BENCHMARK

ScyllaDB V: NoSQL Innovations for Extreme Scale

 

 

ScyllaDB is recognized for a number of things. Our KVM roots. A database built on Seastar’s shard-per-core architecture that enables unprecedented performance with unparalleled hardware efficiency. Tackling the toughest database engineering challenges with the world’s top infrastructure engineers. And, of course, the adorable one-eyed monster.

Today, I’m excited to share how we’re taking that mission to new levels with ScyllaDB V: the latest evolution of our monstrously fast and scalable NoSQL database. It represents the fifth generation of our distributed database architecture.

ScyllaDB 5.0 is the first milestone for ScyllaDB V. It introduces a host of functional, performance, and stability improvements. First and foremost, we integrated Raft to provide consistent manageability and consistent operations. We are also using a new IO scheduler that allows us to scale the cluster and cope with any failure mode with a mere 3ms increase in our P99 latency. It is a real breakthrough after 2 years of development. Moreover, ScyllaDB V is introducing additional enhancements that transform ScyllaDB into an extremely solid and robust database, with impressive stability and manageability.

JOIN THE RELEASE WEBINAR

GET STARTED NOW

 

The evolution to ScyllaDB V: NoSQL innovations at extreme scale

 

I: Performance

First generation ScyllaDB focused on raw, brute force performance. Our original goal was to utilize all of the cores of a modern CPU with a minimal performance penalty. Before writing the first line of code, our goal was to achieve 1 million operations per second per server. This goal was accomplished when we emerged out of stealth mode in late 2015.

Performance remains a focus and we continue to be excited about it. Today, ScyllaDB can run a petabyte-scale deployment with 20 i3en.metal AWS instances doing millions of transactions – all at single-digit millisecond P99 latency.

II: Cassandra Parity

Second generation ScyllaDB was about reaching full API compatibility with the popular Apache Cassandra database. Among many other issues, we solved the garbage collection and compaction challenges that plague Cassandra to this day. This certainly has not gone unnoticed or unappreciated; many of our users came to ScyllaDB with a single request: ‘get me out of Java/GC.’

III: Into the Cloud

Third generation ScyllaDB moved into the cloud, providing the industry’s fastest and most cost-efficient NoSQL database-as-a-service (DBaaS). Dev teams at industry-leading companies like Disney+ Hotstar and Instacart appreciate ScyllaDB Cloud’s speed and scale – without the hassle and toil of self-management. Given the simple startup and seamless scaling, it’s not surprising that usage of ScyllaDB Cloud has surged 198% year over year.

IV: Evolving the Ecosystem

With this accomplished, we turned to making ScyllaDB the best database to integrate into your overall data ecosystem. One of the key revolutions of the past few years has been the relentless drive toward event streaming. Integrating databases with event streaming was initially difficult with existing tech stacks. In response, we provided a shard-aware Kafka sink connector. Then, to effectively make ScyllaDB an efficient event streaming producer, we implemented Change Data Capture (CDC) with an elegant solution for a complex problem. It pairs perfectly with our Kafka Source connector based on Debezium. We also added event streaming to our Cassandra-compatible CQL interface and as Alternator Streams for our DynamoDB-compatible API. Additionally, we implemented a more efficient implementation of Lightweight Transactions (LWT), an entirely new compaction strategy, and other features.

V: Innovations for Extreme Scale

Now we turn the page on a new chapter. ScyllaDB V is focused on innovations for the extreme scale that is driving this next tech cycle.

The Raft consensus algorithm lies at the heart of the ScyllaDB V transition. ScyllaDB 5.0 utilizes Raft to provide transactional schema changes. No more schema conflict! Subsequent ScyllaDB 5.x releases will allow transactional topology changes, which will simplify operations and improve elasticity. Eventually, Raft will be used to provide immediate consistency instead of the good old eventual consistency.

In ScyllaDB V, the operator is king. The new IO scheduler smooths out the latency created by the enormous streaming load of terabytes traveling between the nodes. Repair, streaming, decommission, failures – all their impacts are now unnoticeable.

In addition, introducing repair-based operations makes node replacement operations restartable and faster. We committed an exciting new partition-rate-limit feature in the master branch, allowing you to control access to your database. Traditionally, there were cases where too much data could be entered into a database, (for example, malicious bots could overload partitions). With ScyllaDB V, your database is protected and can avoid global and local overloads.

Given ScyllaDB’s obsession with high throughput and low latency, we can’t overlook performance. We’ll share all the latest benchmark results in the coming weeks. As a little teaser, here’s the performance comparison of two 3-node, 16 vCPU clusters: one running ScyllaDB 4.4 on AWS i3.4xl and the other ScyllaDB 5.0 on the new AWS i4.4xl. Notice the improvement in both throughput and latency.

 

Why ScyllaDB V?

So why did we decide to focus on these innovations for extreme scale? Just look at what’s driving this next tech cycle:

  • Infrastructure has advanced substantially, offering tremendous power and performance to applications that can take advantage of it.
  • Applications need to work at previously-unimaginable scale — fast — and available all the time. They need to work flawlessly across a dizzying array of environments and conditions. And the teams building them need the ability to move seamlessly from MVP to global scale – and to rapidly evolve business-critical applications in production.
  • Data-intensive applications from food delivery, to fitness tracking apps, to communication platforms are now woven into the fabric of our lives. Data is involved in virtually everything that we do.
  • NFTs, cryptocurrency, distributed ledger technology, and the metaverse are taking distributed applications to a new level

This not only means more data, but also new pressures on the database. Organizations are now performing up to 100x more queries than before, on data sets that are often 10x larger than before. Data is being enriched, cleaned, streamed, fed into AI/ML pipelines, replicated, and cached from multiple sources. The more data you have, the more you use that data…and that means more opportunity to gain advantages via data in this new world of ours.

What does this mean for your database latency? If you have 100x the queries, P99 becomes P36 (100 queries to fulfill an app response, takes the P99 to the power of 100). Things break at scale. And costs skyrocket.

That’s why it’s more important than ever to have a database that’s up to the task. And that’s the driving force behind ScyllaDB V. To help fast-growing, fast-moving teams deliver lightning-fast experiences at extreme scale, this generation of ScyllaDB focuses on innovations that:

  • Extract every ounce of power from the latest and greatest infrastructure for better performance, fewer admin headaches, and lower costs
  • Enable massive clusters to be doubled or tripled almost instantly when demand surges
  • Make it feasible to manage thousands of clusters, use meganodes or tiny pods, and do further cluster consolidation and table consolidation within a cluster
  • Remove the performance penalties from working with large partitions, reversed queries, and even achieving the stronger consistency required by many use cases

For more details on how we’re achieving these and other innovations for extreme scale, I invite you to watch Avi Kivity (co-founder and CTO) share his take in the following video:

Also, see the ScyllaDB V page for high-level overview of key features.

INSIDE SCYLLADB V

I hope that you share my excitement about the many opportunities that ScyllaDB V unlocks. I’m eager to see all the great things that our users accomplish with it. As always, please don’t hesitate to contact us on Slack or through your representatives if you have any questions or feedback!

 

 

Upgrades to Our Internal Monitoring Pipeline: Upgrading our Instametrics Cassandra cluster from 3.11.6 to 4.0

In this series of blogs, we have been exploring the various ways we pushed our metrics pipeline—mainly our Apache Cassandra® cluster named Instametrics—to the limit, and how we went about reducing the load it was experiencing on a daily basis. In this blog we go through how our team tackled a zero-downtime upgrade of our internal Cassandra 3.11.6 to Cassandra 4.0, in order for them to get real production experience and expertise in the upgrade process. We were even able to utilize our Kafka monitoring system to perform some blue green testing, giving us greater confidence in the stability of the Cassandra 4.0 release.

One of the major projects for our Cassandra team over the last 12 months has been preparing for the release of Cassandra 4.0. This release had improved built-in auditing, made significant improvements to the speed of streaming, enhanced Java 11 compatibility, facilitated auditing out of the box, and more. However, we needed to be sure that before the version was deployed onto our customers’ production mission critical clusters, we were confident that it would have the same level of performance and reliability as the 3.X releases.

The first thing that we did was make sure that we had beta releases on our managed platform to trial, before the project had officially released 4.0. This meant that customers could provision a 4.0 cluster, in order to perform initial integration testing on their applications.

It is worth noting that as part of introducing the Cassandra 4.0 beta to the Instaclustr managed platform we did have to do some work on upgrading our tooling. This included ensuring that metrics collection continued to operate correctly, accounting for any metrics which had been renamed, as well as updates to our subrange repair mechanism. We also submitted some patches to the project in order to make the Cassandra Lucene plugin 4.0 compatible. 

We knew that as we got closer to the official project release we would want to have real-world experience upgrading, running, and supporting a Cassandra 4.0 cluster under load. For this, we turned to our own internal Instametrics Cassandra cluster, which stores all of our metrics for all our nodes under management. 

This was a great opportunity to put our money where our mouth was when it came to Cassandra 4.0 being ready to be used by customers. Our Cassandra cluster is a critical part of our infrastructure however, so we needed to be sure that our method for upgrading was going to cause no application downtime.

How We Configured Our Test Environment

Like most other organizations, Instaclustr maintains a separate but identical environment for our developers to test their changes in, before being released to our production environment. Our production environment supports around 7000 instances, where our test environment is usually somewhere around 70 developer test instances. So, whilst it is functionally identical, the load is not.

Part of this is a duplication of the Instametrics Cassandra Cluster, albeit at a much smaller scale of 3 nodes.

Our plan for testing in this environment was reasonably straightforward:

  1. Create an identical copy of our cluster by restoring from backup to a new cluster using the Instaclustr Managed Platform
  2. Upgrade the restored cluster from 3.X to 4.0, additionally upgrading the Spark add-on
  3. Test our aggregation Spark jobs on the 4.0 cluster, as well as reading and writing from other applications which integrate with the Cassandra cluster
  4. Switch over the test environment from using the 3.x cluster to the 4.0 cluster

Let’s break down these steps slightly, and outline why we chose to do each one.

The first step is all about having somewhere to test, and break, without affecting the broader test environment. This allowed our Cassandra team to take their time diagnosing any issues without affecting the broader development team, who may require a working metrics pipeline in order to progress their other tickets. The managed platform automated restore makes this a trivial task, and means we can test on the exact schema and data inside our original cluster.

When it came to upgrading the cluster from 3.X to 4.0, we discussed with our experienced Technical Operations team the best methodology to upgrade Cassandra Clusters. Our team is experienced in both major and minor version bumps with Cassandra, and outlined the methodology used on our managed platform. This meant that we could test that our process would still be applicable to the upgrade to 4.0. We were aware that the schema setting “read_repair_chance” had been removed as part of the 4.0 release, and so we updated our schema accordingly.

Finally, it was time to check that our applications, and their various Cassandra drivers, would continue to operate when connecting to Cassandra 4.0. Cassandra 4.0 has upgraded to using native protocol version 5, which some older versions of Cassandra drivers could not communicate with. 

There was a small amount of work required for us to upgrade our metric aggregation Spark jobs in order to work with the newer version of Spark, which was required for us upgrading our Spark Cassandra driver. Otherwise, all of our other applications continued to work without any additional changes. These included applications using the Java, Python, and Clojure drivers.

Once we had completed our integration testing in our test environment, we switched over all traffic in the test environment from our 3.X cluster to the new 4.0 cluster. In this situation we did not copy over any changes which were applied to the 3.X cluster in between restoring the backup, and switching over our applications. This was strictly due to this being a test environment, and these not being of high importance. 

We continued to leave this as the cluster being used in our test environment, in order to see if any issues would slowly be uncovered after an extended time. We began working on our plan for upgrading our production cluster to the 4.0 release.

Although we had initially intended to release the Cassandra 4.0.0 version to our Cassandra cluster as soon as it was available, unfortunately due to a nasty bug in the 4.0.0 release, we decided to delay this until a patch could be raised against it. 

The silver lining here was that due to additional work on our metrics pipeline being deployed, we had additional options for testing a production load on a Cassandra 4.0 cluster. As we covered in an earlier blog, we had deployed a Kafka architecture in our monitoring pipeline, including a microservice who writes all metrics to our Cassandra cluster. 

What this architecture allows us to do is effectively have many application groups consume the same stream of metrics from Kafka, with a minimal performance impact. We have already seen how we had one consumer writing these metrics to Cassandra, and another which writes it to a Redis cache.

So, what’s the big benefit here? Well, we can duplicate the writes to a test Cassandra cluster, while we perform the upgrade! Effectively giving us the ability to perform a test upgrade on our actual live production data, with no risk to console downtime or customer issues! All we have to do is create an additional Cassandra cluster, and an additional group of writer applications. 

So, we deployed an additional 3.11.6 Cassandra cluster which was configured identically to our existing Instametrics Cassandra cluster, and applied the same schema. We then configured a new group of writer applications, which would perform the exact same operations, on the same data, as our “live” production Cassandra cluster. In order to put a read load on the cluster, we also set up a Cassandra Stress instance to put a small read load on the cluster. We then left this running for a number of days in order to place an appropriate amount of data into the cluster for the upgrade.

The Test Upgrade

Now came the fun part! Upgrading our test Cassandra cluster from 3.X to 4.0, while under load. We applied our upgrade procedure, paying careful attention to any application side errors from either of our writers, or Cassandra stress. Cassandra by design should be able to perform any upgrade, major or minor, without any application side errors if your data model is designed correctly. 

We did not experience any issues or errors during our test upgrade procedure, and the upgrade process was a successful operation! We did see slightly elevated CPU usage and OS load during the upgrade process, but that is to be expected due to upgrading sstables, running repairs, and a reduction in available nodes during the upgrade. 

In order to gain further confidence in the system, we also left this configuration running for a number of days. This was to ascertain if there was any performance or other issues with longer running operations such as compactions or repairs. Again, we did not see any noticeable impacts or performance drops across the two clusters when running side by side.

The Real Upgrade

Filled with confidence in our approach, we set out to apply the exact same process to our live production cluster, with much the same effect. There was no application downtime, and no issues with any operation during the upgrade process. Keeping a watchful eye on the cluster longer term, we did not see any application-side latency increases, or any other issues other than the elevated CPU usage and OS load we saw on the test cluster.

Once completed, we removed our additional writer infrastructure that had been created.

Wrapping Up

It has now been a number of months since we upgraded our Cassandra cluster to 4.0.1, and we have not experienced any performance or stability issues. Cassandra 4.0.1 continues to be our recommended version that customers should be using for their Cassandra workloads.

The post Upgrades to Our Internal Monitoring Pipeline: Upgrading our Instametrics Cassandra cluster from 3.11.6 to 4.0 appeared first on Instaclustr.