ScyllaDB Developer Hackathon: Docker-ccm

Our most recent ScyllaDB developer conference and hackathon was a virtual event. As a hackathon team, we wanted to itch one of our own scratches and make testability a bit more simple and straightforward task. While many people these days are focusing on Kubernetes, we’ll explain below why we implemented this in Docker and are not aiming for Kubernetes.

What’s CCM (a.k.a history lesson)

Cassandra Distributed Tests (known as dtests, which you can find at cassandra-dtest) are the heart of functional testing for Cassandra and Scylla. It’s based on Cassandra Cluster Manager (CCM). CCM is also used in driver integration, to test them against a real Cassandra/Scylla cluster.

CCM is the script/library that allows a user to create, manage and remove a cluster. In Cassandra’s case, you can download a Cassandra version with the use of CCM from a git repository or from official tarballs.

For the past few few years we only supported using CCM out of the Scylla source tree, meaning you had to first compile Scylla locally to get dtests working. Later we had dbuild that help with the building part, but it still was a long process.

Then came the relocatable packages, which eased the process, but still the housekeeping of handling a few running clusters at the same time, was an issue for our CI. And it was a constant struggle for any new team member joining (on how to get a version, and how to setup it correctly for tests to run)

Why Docker?

Ever since I’ve seen the cassandra-dtest and our fork of it, and our fork of CCM (scylla-ccm) I thought using Docker could help make it a much more pleasant experience for those who are using it for testing.

Packaging was a real struggle, from where to store it on S3 to how to lookup the correct version you want, and how to squeeze the poor install script into putting the different parts into place.

And all that from 3 different tarballs (that later become one); it’s a very fragile process.

One docker pull command from DockerHub eliminates almost all of it. And all packages are nicely installed and configured by the battle proven package manager, and not by all kinds of hacks we had in place.

With our current approach (relocatable package or running out of the source), every Scylla “node” is a bunch of local processes that need a lot of attention and housekeeping: how to select local addresses, make sure ports won’t conflict and keep tabs of each process id.

Docker is a perfect match for those things, since each container has its own network address and you can treat it almost as a real remote machine.

A nice side effect would be to help newcomers get up and running fast, since it also removes some of the dependencies we need to install, like Java and friends. Trading that with a working Docker, might prove a bit more problematic (yeah Fedora we are looking at you…), but we tend to be using it more and more in our daily routine.

Why Not Kubernetes?

While we came up with the idea some people suggested we might also support Kubernetes. In theory that sounds like a fantastic idea, but incorporating scylla-operator into the mix has a long list of challenges.

Since the usage of CCM is mainly for a functional test, you would like the test to control what is going on during the test. The nature of how a Kubernetes operator works kind of defies the notion you have control, since it would keep doing what it’s built for: fixing your cluster to the requirements you’ve set.

By the way, don’t worry, we are working very hard on testing the scylla-operator, but in the context of scylla-cluster-tests, where we aim for more realistic environments and run it as an example on top of GKE, and with bigger, more taxing loads. That whole effort is worth its own blog post (or even a series of them).

Team

Since it was a pain mostly felt by the testing team, I had Fabio Gelcer, Shlomi Balalis, Alex Bykov and Oren Efraimov, all from our QA team, Yaron Kaikov from our release engineering team, and Aleksandar Janković from the Scylla Manager team joining me.

Using Docker-ccm

We use CCM mainly in two ways, or via a command line or via environment variables when invocating dtest runs.

The usage from command line, should now support --docker option (instead of the previous --version option):

ccm create scylla-docker-cluster -n 3 --scylla --docker scylladb/scylla:4.1.8

And similar for the usage for dtest runs, a new environment variable was introduced:

SCYLLA_DOCKER_IMAGE=scylladb/scylla:latest
nosetest ...

We are working on a dedicated branch for that effort:

https://github.com/scylladb/scylla-ccm/tree/docker-ccm

We got a lots tests up and running in dtest, (1.5h running suite of 256 tests for sstableloader, as an example (hot straight out our IDE):

During the process we integrated with Github Actions for running integration tests on each PR (something we didn’t yet have for our ccm

Some PRs are still open (on `docker-ccm` branch)

We played with the idea of adding to the mix support for scylla-manager, but it didn’t get there in the 3 days we had.

Future Plans

The main target for this was using it in our dtest suite, which has around 1,800 tests that run daily. So we need to vet our work with all of those tests. That’s our next target to achieve before we think about switching all of our test jobs to use our new Docker based CCM.

We’re also considering replacing our runs in Continuous Integration (CI) to use Docker-ccm, since running the whole dtest suite takes about 6-8 hours in our current setup. Instead we are thinking of experimenting with something we’ve been contemplating for a long time now to be able to run them quicker: distributing the tests across a large number of EC2 machines with the spot-fleets jenkins plugin. (Yet again, something we’ll surely elaborate on once we have it fully operational, and in daily use.)

Next Steps

Hope you enjoyed this view into the kinds of work that we do behind the scenes. If you want to be part of ScyllaDB’s next internal hackathon, check out our job opportunities!

CAREERS AT SCYLLADB

 

The post ScyllaDB Developer Hackathon: Docker-ccm appeared first on ScyllaDB.

Consuming CDC with Java and Go

If you are a regular visitor of our blog, you probably heard about Change Data Capture (CDC) in Scylla. It’s a feature that allows you to track and react to changes made to data in your cluster. In Scylla 4.3, we are confident to say that CDC is production-ready. Along with that, we are releasing libraries for Java and Go which will simplify writing applications that will read from Scylla’s CDC.

We believe these two languages will appeal best to our highly opinionated developer base. Java is the older, more established and more broadly-used language. Golang is newer, supports better concurrency handling and faster performance.

In this blog post, we will cover both scylla-cdc-java, a Java library, and scylla-cdc-go, a Go library. These two libraries serve as frameworks for writing applications that process changes in the CDC log. You will learn what kind of challenges they help you avoid, and how to use them to write applications that print changes happening to a table in real-time.

If you want to read the reasoning behind our approach, read the next section “Why Use a Library?” Or, if you want to jump right in and get started using your favorite programming language, here are some handy links:

Why Use a Library?

Scylla’s design of CDC is based on the concept of CDC log tables. For every table whose changes you wish to track, an associated CDC log table is created. We refer to this new table as the CDC log table and the original table as a base table. Every time you modify your data in the base table — insert, update or delete — this fact is recorded by inserting one or more rows to the corresponding CDC log table.

This approach makes it possible to use tools that already exist in order to read from a CDC log. Everything is accessible through CQL and the schema of CDC log tables is documented by us, so it’s possible to write an application consuming CDC with the help of a driver (or even cqlsh).

However, the CDC log format is more complicated than a single queue of events. You need to know the design of Scylla CDC well in order to implement an application that is performant and robust. Fortunately, our libraries will handle those concerns for you. You can use their convenient API so that you can concentrate on writing the business logic of your application.

Challenges

Streams

A CDC log table does not form a single queue of events – instead, it is divided into multiple queues. This partitioning is defined by a set of streams. Each stream defines a set of partition keys such that all partitions in that stream are stored on the same set of replicas, and on the same shard. In turn, for each stream, the CDC log table has a partition that contains events about changes to partitions within that stream.

Thanks to this layout of data, Scylla can co-locate entries in a CDC log table with affected partitions in the base table. More specifically, if a partition is modified, the information will be put into the CDC log and will be stored on the same node as the partition in the base table. This reduces the number of nodes participating in a write to the base table, improves consistency between the base table and CDC table, and enables better performance of the cluster.

Generations

To make things even more complicated, the topology of the cluster may change during its lifetime. Because it modifies the token ring, this can break the co-location property of the CDC log data. In order to maintain good performance and consistency, Scylla changes the partitioning scheme of the CDC log after such an event. A new “generation” will be computed – a new set of stream IDs that will be used in CDC logs. At a scheduled point in time, Scylla stops writing to partitions marked with previous stream IDs – the old generation – and starts using the new set of stream IDs.

The scylla-cdc-java and scylla-cdc-go libraries manage the complexity of generations and streams. They guarantee that, within a single stream, your application will process changes in order. They also make sure that all changes from streams of the previous generation are processed before moving to reading from streams of the next generation. This is necessary to ensure that no record is missed.

If you are interested in learning more about generations and streams, check out our documentation on CDC streams and generations.

Getting Started with Java

Let’s see how to use the Java library. We will build an application that prints changes happening to a table in real-time. You can see the final code here.

Installing the library

The latest version of the Scylla CDC Java library is available here. Please follow the installation instructions and add the library as a dependency to your Java project.

Setting up the CDC consumer

First, we establish a connection to the Scylla cluster using the Scylla Java Driver. We’re using the driver in version 3.x but the newer driver from 4.x branch works as well:

Having established a connection, we have to specify which tables of the CDC log we want to read. The provided names should be of the base tables, not the CDC log tables (e.g. ks.t not ks.t_scylla_cdc_log):

To consume changes, we specify a class that implements RawChangeConsumer interface (here by using a lambda). The consumer returns a CompletableFuture, so you can react to CDC changes and perform some I/O or processing.

The CDC consumer is started multi-threaded, with a configurable number of threads. Each thread will read a distinct subset of the CDC log (partitioned based on Vnodes). Those multiple threads will cumulatively read the entire CDC log. All changes related to the same row (more generally the same partition key) will appear on the same thread. Note that after a topology change (adding or removing nodes from the Scylla cluster) this mapping will be reset.

Next, we create an instance of RawChangeConsumerProvider which returns a RawChangeConsumer for each thread. We could write the provider in two ways:

  1. A single consumer shared by all threads. With such a provider, a single consumer will receive rows read from all worker threads that read the CDC log. Note that the consumer should be thread-safe. Below is an example of such a provider:
  1. Separate consumer for each thread. With such a provider, a separate consumer will be created for each worker thread. Those multiple consumers will cumulatively read the entire CDC log. Because each consumer receives changes from a single worker thread, they don’t have to be thread-safe. Note that after the topology change (adding or removing a node from the Scylla cluster), consumers are recreated. Below is an example of such a provider:

Finally, we can build a CDCConsumer instance and start it! If we are finished consuming the changes, we should call the stop() method.

Consuming CDC changes

Let’s implement the printChange(RawChange change) method and see what information is available about the change. The RawChange object represents a single row of CDC log. First, we get information about the change id: its stream id and time:

Those accessors correspond to cdc$stream_id and cdc$time columns.

We can get the operation type (if it was an INSERT, UPDATE, etc.):

In each RawChange there is information about the schema of the change – column names, data types, whether the column is part of the primary key:

There are two types of columns inside ChangeSchema:

CDC log columns can be easily accessed by RawChange helper methods (such as getTTL(), getId()). Let’s concentrate on non-CDC columns (those are from the base table) and iterate over them:

We can also read the value of a given cell (column) in the change:

If we know the type of a given cell, we can get the value as a specific type:

Full Example

You can read the full source code here. You can run it using the following commands:

Where SOURCE is the IP address of the cluster.

Getting Started with Go

Now let’s look at how this same sort of CDC reader application can be implemented in Go. You can read the source code for this example here.

Installing the library

To install the library, simply run the following command:

If you use Go modules, make sure to run the command from your project’s directory.

The library uses gocql to read CDC from the cluster. For optimal performance, make sure you use our gocql fork. The fork has some Scylla-specific optimizations which result in better latency. Recently, it has gained optimizations for CDC, too – we will elaborate on that in a later blog post. You can learn how to switch to our fork here.

Setting up the CDC consumer

Like in the case of scylla-cdc-java, first we need to establish a connection to the Scylla cluster. To do that, we create a gocql session:

Next, we need to prepare a configuration for the scylla-cdc-go library. It is necessary to provide at least a session, list of fully qualified base table names, and a change consumer factory — we will come back to the last one in a moment. Table names need to be fully qualified and point to the base tables, not CDC log tables (e.g. ks.t, not ks.t_scylla_cdc_log). For a good measure, we also provide a logger, so that the library tells us what it is doing.

Now, we need to define a consumer type. Each instance of the consumer will be associated with a single CDC stream and will process changes from that stream in chronological order.

As mentioned before, the library processes generations one after another. When it starts processing a generation, it stops consumers for the old generation, and creates a new consumer for each stream in the new one. The library manages the lifetime of change consumers for you, therefore you need to provide a change consumer factory.

When the library starts processing a generation, it spawns a number of goroutines. Each goroutine periodically polls a fixed subset of streams, and feeds fetched changes to the consumers associated with those streams. This means that each consumer is associated with a single goroutine for its entire lifetime.

Keeping the multi-goroutine model in mind, there are two approaches to writing consumers and consumer factories:

  1. If your consumer is simple and stateless, you can model it as a function. In such case, you can easily create a factory for such a consumer by using a library-provided function:
  1. If your consumer needs to keep some state or run some custom logic on its creation or deletion, then you need to put in more work — you need to create a type for both consumer and a factory which will implement scyllacdc.ChangeConsumer and scyllacdc.ChangeConsumerFactory respectively:

Before our application can run, we need to do one more thing — actually start the reading process. We create a CDC reader object by using the configuration we prepared earlier and then start it:

Consuming CDC changes

No matter which method from the previous section you use, consuming changes boils down to analyzing a scyllacdc.Change object. Let’s use the first approach — we will implement the printerConsumer method and see what kind of information the change object offers.

A scyllacdc.Change object corresponds to a set of rows from the CDC table sharing the same cdc$stream_id and cdc$time column. Both of those parameters are available as fields of the change object:

A change partitions its rows into three groups: Delta, Preimage and Postimage. The Delta rows contain information about the change itself – which rows and columns were modified, and what kinds of modifications were applied to them. Preimage and Postimage rows represent the state of modified rows before and after the change. The last two groups will only appear if you enabled them in your CDC options for that table.

Rows from each group are represented by the scyllacdc.ChangeRow type. It provides a number of convenience methods.

First, you can use (*ChangeRow).Columns() to learn about the columns of the CDC log. It contains information about both CDC-specific and non-CDC-specific columns:

Next, there are methods for retrieving information about changes from the row. First, for partition keys and clustering keys, it is recommended to use (*ChangeRow).GetValue(column string) which directly returns the value of the column.

However, information about changes made to regular or static columns is usually split across multiple CDC log columns. For example, a column of name “v” and type “int” will be represented as two columns: “v” and “cdc$deleted_v”. Instead of fetching each column separately with GetValue, you can use convenience functions which already do it for you. In case of an int column, (*ChangeRow).GetAtomicChange(column string) will be the right function to use, but there are variants for non-atomic types such as tuples, collections and user defined types, too — refer to the documentation to learn more about them.

Finally, change row implements the Stringer interface. You can use it in fmt.Printf to have the row pretty-printed to the standard output:

Saving progress

Sometimes, your application will have to be stopped either due to a planned or an unplanned event. It might be desirable to keep track of how much data was processed in each stream and regularly save progress. The scylla-cdc-go library provides optional facilities which will help you avoid repeating unnecessary work in case your application was stopped. You can configure the library to either not save progress at all, store progress in a Scylla cluster, or use a user-defined mechanism for saving information about the progress.

Good to Go!

That’s it! It is a functional, albeit simplistic example of an application which reads from the CDC log. The application polls CDC streams, processes changes from each stream in order, and executes your callback for each change. It also takes care of topology changes out of the box.

Again, you can find all the code for this example here. You can run it using the following commands:

Where SOURCE is the IP address of the cluster.

Further Reading

In this blog, we have explained what problems the scylla-cdc-java and scylla-cdc-go libraries solve and how to write a simple application with each. If you would like to learn more, check out the links below:

  • Replicator example application in the scylla-cdc-java repository. It is an advanced application that replicates a table from one Scylla cluster to another one using the CDC log and scylla-cdc-java library.
  • Example applications in scylla-cdc-go repository. The repository currently contains two examples: “simple-printer”, which prints changes from a particular schema, “printer”, which is the same as the example presented in the blog, and “replicator”, which is a relatively complex application which replicates changes from one cluster to another.
  • API reference for scylla-cdc-go. Includes slightly more sophisticated examples which, unlike the example in this blog, cover saving progress.
  • CDC documentation. Knowledge about the design of Scylla’s CDC can be helpful in understanding the concepts in the documentation for both the Java and Go libraries. The parts about the CDC log schema and representation of data in the log is especially useful.
  • Change Data Capture (CDC) lesson in Scylla University.
  • ScyllaDB users slack. We will be happy to answer your questions about the CDC on the #cdc channel.

We hope all that talk about consuming data has managed to whet your appetite for CDC!

Happy and fruitful coding!

 

 

 

The post Consuming CDC with Java and Go appeared first on ScyllaDB.

Security Advisory: CVE-2020-17516

Earlier this week, a vulnerability was published to the Cassandra users mailing list which describes a flaw in the way Apache Cassandra nodes perform encryption in some configurations. 

The vulnerability description is reproduced here:

“Description:

When using ‘dc’ or ‘rack’ internode_encryption setting, a Cassandra instance allows both encrypted and unencrypted connections. A misconfigured node or a malicious user can use the unencrypted connection despite not being in the same rack or dc, and bypass mutual TLS requirement.”

The recommended mitigations are:

  • Users of ALL versions should switch from ‘dc’ or ‘rack’ to ‘all’ internode_encryption setting, as they are inherently insecure
  • 3.0.x users should additionally upgrade to 3.0.24
  • 3.11.x users should additionally upgrade to 3.11.10

By default, all Cassandra clusters running in the Instaclustr environment are configured to use ‘internode_encryption’ set to all. To confirm that our clusters are unaffected by this vulnerability, Instaclustr has checked the configuration of all Cassandra clusters in our managed service fleet and none are using the vulnerable configurations ‘dc’ or ‘rack’. 

Instaclustr restricts access to Cassandra nodes to only those IP addresses and port combinations required for cluster management and customer use, further mitigating the risk of compromise. 

In line with the mitigation recommendation, Instaclustr is developing a plan to upgrade all 3.0.x and 3.11.x Cassandra clusters to 3.0.24 and 3.11.10. Customers will be advised when their clusters are due for upgrade. 

Instaclustr recommends that our Support Only customers check their configurations to ensure that they are consistent with this advice, and upgrade their clusters as necessary to maintain a good security posture. 

Should you have any questions regarding Instaclustr Security, please contact us by email security@instaclustr.com.

If you wish to discuss scheduling of the upgrade to your system or have any other questions regarding the impact of this vulnerability, please contact support@instaclustr.com


To report an active security incident, email support@instaclustr.com.

The post Security Advisory: CVE-2020-17516 appeared first on Instaclustr.

Introducing the New Scylla Monitoring Advisor

Scylla Advisor is the newest member of the Scylla Monitoring stack. The Advisor focuses on highlighting important information, potential problems, configuration issues, and data model suggestions. What sets it apart is focus on potential issues rather than a general overview of the status.

The Advisor Section

The Advisor section is part of the overview dashboard. It has two parts, the Advisor table and the balance section, and they play two different roles.

The Advisor Table

The table on the left holds issues found by the advisor. An issue describes something that the advisor found and is a potential problem.

Here are a few examples:

  • Large Cells: Large cells are usually an indication of a problem in the data model. Though it’s not forbidden, it’s an anti-pattern and should be avoided for performance reasons. Scylla identifies large cells (It also identifies large rows and large partitions), prints a warning to the logs, and stores it in a dedicated table. The advisor would add a warning to the table with a navigation link to the CQL dashboard.
  • Non-Prepared Statements: In general, You should avoid unprepared statements. They have a performance impact and can cause security risks.

One more thing about the Advisor table section: the Advisor uses low-priority alerts that, by default, are only shown on the dashboard. You can configure the alertmanager to send those alerts to an email, slack channel, etc., as well.

Advisor – Balance Section

In a typical system, we expect all nodes and shards to act the same during normal workload.

The Balance section looks for outliers in different categories that are known to indicate potential problems.

Here are a few examples:

  • All shards should have the same number of connections. If this is not the case, there are not enough connections open, or the driver is not Scylla optimized.
  • We expect a uniform distribution of the traffic. If this is not true, either you are using a driver that is not shard-aware, or there is a hot-partition, and you need to change the data model.

How the Advisor Works

After we understand what the Advisor does, let’s understand how it works. The Advisor uses low priority alerts for the Advisor table and metrics for the balance section. The two new additions to the Scylla Monitoring Stack are Grafana’s Loki and recording rules.

grafana-loki-logo

Grafana Loki

Grafana Loki is used to generate metrics and alerts based on logs. Loki is a log collection that serves multi-purposes:

  • It can generate alerts based on traces and send them to the alert manager. These kinds of alerts contain information from the trace line and are hard to produce another way.
  • It serves as a metric source for Prometheus. Metrics are good at showing history and changes and are space-efficient to store.
  • And third, Loki acts as a Grafana data source. That means you can search Scylla’s logs directly from the dashboard using the dashboards explorer feature.

Prometheus Recording Rules

Prometheus recording rules are a method to generate metrics from existing one or more metrics calculation. The use of recording rules simplify alerts creation.

Next steps

Now that you’ve seen the changes that were made in Scylla Monitoring Stack 3.6 to make it even better. The next step is yours! Download Scylla Monitoring Stack 3.6 directly from Github. It’s free and open source. If you try it, we’d love to hear your feedback, either by contacting us privately or sharing your experience with your fellow users on our Slack channel.

DOWNLOAD SCYLLA MONITORING STACK

The post Introducing the New Scylla Monitoring Advisor appeared first on ScyllaDB.

Apache Cassandra Changelog #3 | January 2021

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

Apache Cassandra 4.0-beta4 (pgp, sha256 and sha512) was released on December 30. Please pay attention to release notes and let the community know if you encounter problems. Join the Cassandra mailing list to stay updated.

Changed

The current status of Cassandra 4.0 GA can be viewed on this Jira board (ASF login required). RC is imminent with testing underway. Read the latest summary from the community here.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

The Cassandra community welcomed one new PMC member and five new committers in 2020! Congratulations to Mick Semb Wever who joined the PMC and Jordan West, David Capwell, Zhao Yang, Ekaterina Dimitrova, and Yifan Cai who accepted invitations to become Cassandra committers!

Changed

The Kubernetes SIG is discussing how to extend the group’s scope beyond the operator, as well as sharing an update on current operator merge efforts in the latest meeting. Watch here.

IApache Cassandra Kubernetes SIG Meeting Header

User Space

Keen.io

Under the covers, Keen leverages Kafka, Apache Cassandra NoSQL database and the Apache Spark analytics engine, adding a RESTful API and a number of SDKs for different languages. Keen enriches streaming data with relevant metadata and enables customers to stream enriched data to Amazon S3 or any other data store. - Keen.io

Monzo

Suhail Patel explains how Monzo prepared for the recent crowdfunding (run entirely through its app, using the very same platform that runs the bank) which saw more than 9,000 people investing in the first five minutes. He covers Monzo’s microservice architecture (on Go and Kubernetes) and how they profiled and optimized key platform components such as Cassandra and Linkerd. - Suhil Patel

In the News

ZDNet - Meet Stargate, DataStax’s GraphQL for databases. First stop - Cassandra

CIO - It’s a good day to corral data sprawl

TechTarget - Stargate API brings GraphQL to Cassandra database

ODBMS - On the Cassandra 4.0 beta release. Q&A with Ekaterina Dimitrova, Apache Cassandra Contributor

Cassandra Tutorials & More

Intro to Apache Cassandra for Data Engineers - Daniel Beach, Confessions of a Data Guy

Impacts of many columns in a Cassandra table - Alex Dejanovski, The Last Pickle

Migrating Cassandra from one Kubernetes cluster to another without data loss - Flant staff

Real-time Stream Analytics and User Scoring Using Apache Druid, Flink & Cassandra at Deep.BI - Hisham Itani, Deep.BI

User thread: Network Bandwidth and Multi-DC replication (Login required)

Apache Cassandra Changelog Footer


Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

The Instaclustr LDAP Plugin for Cassandra 2.0, 3.0, and 4.0

LDAP (Lightweight Directory Access Protocol) is a common vendor-neutral and lightweight protocol for organizing authentication of network services. Integration with LDAP allows users to unify the company’s security policies when one user or entity can log in and authenticate against a variety of services. 

There is a lot of demand from our enterprise customers to be able to authenticate to their Apache Cassandra clusters against LDAP. As the leading NoSQL database, Cassandra is typically deployed across the enterprise and needs this connectivity.

Instaclustr has previously developed our LDAP plugin to work with the latest Cassandra releases. However, with Cassandra 4.0 right around the corner, it was due for an update to ensure compatibility. Instaclustr takes a great deal of care to provide cutting-edge features and integrations for our customers, and our new LDAP plugin for Cassandra 4.0 showcases this commitment. We always use open source and maintain a number of Apache 2.0-licensed tools, and have released our LDAP plugin under the Apache 2.0 license. 

Modular Architecture to Support All Versions of Cassandra

Previously, the implementations for Cassandra 2.0 and 3.0 lived in separate branches, which resulted in some duplicative code. With our new LDAP plugin update, everything lives in one branch and we have modularized the whole solution so it aligns with earlier Cassandra versions and Cassandra 4.0.

The modularization of our LDAP plugin means that there is the “base” module that all implementations are dependent on. If you look into the codebase on GitHub, you see that the implementation modules consist of one or two classes at maximum, with the rest inherited from the base module. 

This way of organizing the project is beneficial from a long-term maintenance perspective. We no longer need to keep track of all changes and apply them to the branched code for the LDAP plugin for each Cassandra version. When we implement changes and improvements to the base module, all modules are updated automatically and benefit.

Customizable for Any LDAP Implementation

This plugin offers a default authenticator implementation for connecting to a LDAP server and authenticating a user against it. It also offers a way to implement custom logic for specific use cases. In the module, we provide the most LDAP server-agnostic implementation possible, but there is also scope for customization to meet specific LDAP server nuances. 

If the default solution needs to be modified for a particular customer use case, it is possible to add in custom logic for that particular LDAP implementation. The implementation for customized connections is found in “LDAPPasswordRetriever” (DefaultLDAPServer being the default implementation from which one might extend and override appropriate methods). This is possible thanks to the SPI mechanism. If you need this functionality you can read more about it in the relevant section of our documentation.

Enhanced Testing for Reliability

Our GitHub build pipeline now tests the LDAP plugins for each supported Cassandra version on each merged commit. This update provides integration tests that will spin up a standalone Cassandra node as part of JUnit tests as well as a LDAP server. This is started in Docker as part of a Maven build before the actual JUnit tests. 

This testing framework enables us to test to make sure that any changes don’t break the authentication mechanism. This is achieved by actually logging in via the usual mechanism as well as via LDAP.

Packaged for Your Operating System

Last but not least, we have now added Debian and RPM packages with our plugin for each Cassandra version release. Until now, a user of this plugin had to install the JAR file to Cassandra libraries directory manually. With the introduction of these packages, you do not need to perform this manual action anymore. The plugin’s JAR along with the configuration file will be installed in the right place if the official Debian or RPM Cassandra package is installed too.

How to Configure LDAP for Cassandra

In this section we will walk you through the setup of the LDAP plugin and explain the most crucial parts of how the plugin works.

After placing the LDAP plugin JAR to Cassandra’s classpath—either by copying it over manually or by installing a package—you will need to modify a configuration file in /etc/cassandra/ldap.properties.

There are also changes that need to be applied to cassandra.yaml. For Cassandra 4.0, please be sure that your authenticator, authorizer, and role_manager are configured as follows:

authenticator: LDAPAuthenticator
authorizer: CassandraAuthorizer
role_manager: LDAPCassandraRoleManager

Before using this plugin, an operator of a Cassandra cluster should configure system_auth keyspace to use NetworkTopologyStrategy.

How the LDAP Plugin Works With Cassandra Roles

LDAP plugin works via a “dual authentication” technique. If a user tries to log in with a role that already exists in Cassandra, separate from LDAP, it will authenticate against that role. However, if that role is not present in Cassandra, it will reach out to the LDAP server and it will try to authenticate against it. If it is successful, from the user’s point of view, it looks like this role was in Cassandra the whole time as it logs in the user transparently. 

If your LDAP server is down, you will not be able to authenticate with the specified LDAP user. You can enable caching for LDAP users—available in the Cassandra 3.0 or 4.0 plugins—to take some load off a LDAP server when authentication is conducted frequently.

The Bottom Line

Our LDAP plugin meets the enterprise need for a consolidated security and authentication policy. 100% open source and supporting all major versions of Cassandra, the plugin works with all major LDAP implementations and can be easily customized for others. 

The plugin is part of our suite of supported tools for our support customers and Instaclustr is committed to actively maintaining and developing the plugin. Our work updating it to support the upcoming Cassandra 4.0 release is part of this commitment. You can download it here and feel free to get in touch with any questions you might have. Cassandra 4.0 beta 2 is currently in preview on our managed platform and you can use our free trial to check it out.

The post The Instaclustr LDAP Plugin for Cassandra 2.0, 3.0, and 4.0 appeared first on Instaclustr.

Announcing: Stargate 1.0 in Astra; REST, GraphQL, & Schemaless JSON for Your Cassandra Development

Apache Cassandra Changelog #2 | December 2020

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

Apache #Cassandra 4.0-beta3, 3.11.9, 3.0.23, and 2.2.19 were released on November 4 and are in the repositories. Please pay attention to release notes and let the community know if you encounter problems. Join the Cassandra mailing list to stay updated.

Changed

Cassandra 4.0 is progressing toward GA. There are 1,390 total tickets and remaining tickets represent 5.5% of total scope. Read the full summary shared to the dev mailing list and take a look at the open tickets that need reviewers.

Cassandra 4.0 will be dropping support for older distributions of CentOS 5, Debian 4, and Ubuntu 7.10. Learn more.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

The community weighed options to address reads inconsistencies for Compact Storage as noted in ticket CASSANDRA-16217 (committed). The conversation continues in ticket CASSANDRA-16226 with the aim of ensuring there are no huge performance regressions for common queries when you upgrade from 2.x to 3.0 with Compact Storage tables or drop it from a table on 3.0+.

Added

CASSANDRA-16222 is a Spark library that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory, one can achieve high performance with minimal impact to a production cluster. It was used to successfully export a 32TB Cassandra table (46bn CQL rows) to HDFS in Parquet format in around 70 minutes, a 20x improvement on previous solutions.

Changed

Great news for CEP-2: Kubernetes Operator, the community has agreed to create a community-based operator by merging the cass-operator and CassKop. The work being done can be viewed on GitHub here.

Released

The Reaper community announced v2.1 of its tool that schedules and orchestrates repairs of Apache Cassandra clusters. Read the docs.

Released

Apache Cassandra 4.0-beta-1 was released on FreeBSD.

User Space

Netflix

“With these optimized Cassandra clusters in place, it now costs us 71% less to operate clusters and we could store 35x more data than our previous configuration.” - Maulik Pandey

Yelp

“Cassandra is a distributed wide-column NoSQL datastore and is used at Yelp for both primary and derived data. Yelp’s infrastructure for Cassandra has been deployed on AWS EC2 and ASG (Autoscaling Group) for a while now. Each Cassandra cluster in production spans multiple AWS regions.” - Raghavendra D Prabhu

In the News

DevPro Journal - What’s included in the Cassandra 4.0 Release?

JAXenter - Moving to cloud-native applications and data with Kubernetes and Apache Cassandra

DZone - Improving Apache Cassandra’s Front Door and Backpressure

ApacheCon - Building Apache Cassandra 4.0: behind the scenes

Cassandra Tutorials & More

Users in search of a tool for scheduling backups and performing restores with cloud storage support (archiving to AWS S3, GCS, etc) should consider Cassandra Medusa.

Apache Cassandra Deployment on OpenEBS and Monitoring on Kubera - Abhishek Raj, MayaData

Lucene Based Indexes on Cassandra - Rahul Singh, Anant

How Netflix Manages Version Upgrades of Cassandra at Scale - Sumanth Pasupuleti, Netflix

Impacts of many tables in a Cassandra data model - Alex Dejanovski, The Last Pickle

Cassandra Upgrade in production : Strategies and Best Practices - Laxmikant Upadhyay, American Express

Apache Cassandra Collections and Tombstones - Jeremy Hanna

Spark + Cassandra, All You Need to Know: Tips and Optimizations - Javier Ramos, ITNext

How to install the Apache Cassandra NoSQL database server on Ubuntu 20.04 - Jack Wallen, TechRepublic

How to deploy Cassandra on Openshift and open it up to remote connections - Sindhu Murugavel

Apache Cassandra Changelog Footer

Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

Announcing the Astra Service Broker: Tradeoff-Free Cassandra in Kubernetes

[Webcast] The 411 on Storage Attached Indexing in Apache Cassandra

How to Keep Pirates “Hackers” Away From Your Booty “Data” with Cassandra RBAC

Sizing Matters: Sizing Astra for Apache Cassandra Apps

Apache Cassandra Changelog #1 (October 2020)

Introducing the first Cassandra Changelog blog! Our monthly roundup of key activities and knowledge to keep the community informed.

Cassandra Changelog header

Release Notes

Updated

The most current Apache Cassandra releases are 4.0-beta2, 3.11.8, 3.0.22, 2.2.18 and 2.1.22 released on August 31 and are in the repositories. The next cut of releases will be out soon―join the Cassandra mailing list to stay up-to-date.

We continue to make progress toward the 4.0 GA release with the overarching goal of it being at a state where major users should feel confident running it in production when it is cut. Over 1,300 Jira tickets have been closed and less than 100 remain as of this post. To gain this confidence, there are various ongoing testing efforts involving correctness, performance, and ease of use.

Added

With CASSANDRA-15013, the community improved Cassandra's ability to handle high throughput workloads, while having enough safeguards in place to protect itself from potentially going out of memory.

Added

The Harry project is a fuzz testing tool that aims to generate reproducible workloads that are as close to real-life as possible, while being able to efficiently verify the cluster state against the model without pausing the workload itself.

Added

The community published its first Apache Cassandra Usage Report 2020 detailing findings from a comprehensive global survey of 901 practitioners on Cassandra usage to provide a baseline understanding of who, how, and why organizations use Cassandra.

Community Notes

Updates on new and active Cassandra Enhancement Proposals (CEPs) and how to contribute.

Changed

CEP-2: Kubernetes Operator was introduced this year and is an active discussion on creation of a community-based operator with the goal of making it easy to run Cassandra on Kubernetes.

Added

CEP-7: Storage Attached Index (SAI) is a new secondary index for Cassandra that builds on the advancements made with SASI. It is intended to replace the existing built-in secondary index implementations.

Added

Cassandra was selected by the ASF Diversity & Inclusion committee to be included in a research project to evaluate and understand the current state of diversity.

User Space

Bigmate

"In vetting MySQL, MongoDB, and other potential databases for IoT scale, we found they couldn't match the scalability we could get with open source Apache Cassandra. Cassandra's built-for-scale architecture enables us to handle millions of operations or concurrent users each second with ease – making it ideal for IoT deployments." - Brett Orr

Bloomberg

"Our group is working on a multi-year build, creating a new Index Construction Platform to handle the daily production of the Bloomberg Barclays fixed income indices. This involves building and productionizing an Apache Solr-backed search platform to handle thousands of searches per minute, an Apache Cassandra back-end database to store millions of data points per day, and a distributed computational engine to handle millions of computations daily." - Noel Gunasekar

In the News

Solutions Review - The Five Best Apache Cassandra Books on Our Reading List

ZDNet - What Cassandra users think of their NoSQL DBMS

Datanami - Cassandra Adoption Correlates with Experience

Container Journal - 5 to 1: An Overview of Apache Cassandra Kubernetes Operators

Datanami - Cassandra Gets Monitoring, Performance Upgrades

ZDNet - Faster than ever, Apache Cassandra 4.0 beta is on its way

Cassandra Tutorials & More

A Cassandra user was in search of a tool to perform schema DDL upgrades. Another user suggested https://github.com/patka/cassandra-migration to ensure you don't get schema mismatches if running multiple upgrade statements in one migration. See the full email on the user mailing list for other recommended tools.

Start using virtual tables in Apache Cassandra 4.0 - Ben Bromhead, Instaclustr

Benchmarking Apache Cassandra with Rust - Piotr Kołaczkowski, DataStax

Open Source BI Tools and Cassandra - Arpan Patel, Anant Corporation

Build Fault Tolerant Applications With Cassandra API for Azure Cosmos DB - Abhishek Gupta, Microsoft

Understanding Data Modifications in Cassandra - Sameer Shukla, Redgate

Cassandra Changelog footer

Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

Building a Low-Latency Distributed Stock Broker Application: Part 4

In the fourth blog of the  “Around the World ” series we built a prototype of the application, designed to run in two georegions.

Recently I re-watched “Star Trek: The Motion Picture” (The original 1979 Star Trek Film). I’d forgotten how much like “2001: A Space Odyssey” the vibe was (a drawn out quest to encounter a distant, but rapidly approaching very powerful and dangerous alien called “V’ger”), and also that the “alien” was originally from Earth and returning in search of “the Creator”—V’ger was actually a seriously upgraded Voyager spacecraft!

Star Trek: The Motion Picture (V’ger)

The original Voyager 1 and 2 had only been recently launched when the movie came out, and were responsible for some remarkable discoveries (including the famous “Death Star” image of Saturn’s moon Mimas, taken in 1980). Voyager 1 has now traveled further than any other human artefact and has since left the solar system! It’s amazing that after 40+ years it’s still working, although communicating with it now takes a whopping 40 hours in round-trip latency (which happens via the Deep Space Network—one of the stations is at Tidbinbilla, near our Canberra office).

Canberra Deep Space Communication Complex (CSIRO)

Luckily we are only interested in traveling “Around the World” in this blog series, so the latency challenges we face in deploying a globally distributed stock trading application are substantially less than the 40 hours latency to outer space and back. In Part 4 of this blog series we catch up with Phileas Fogg and Passepartout in their journey, and explore the initial results from a prototype application deployed in two locations, Sydney and North Virginia.

1. The Story So Far

In Part 1 of this blog series we built a map of the world to take into account inter-region AWS latencies, and identified some “georegions” that enabled sub 100ms latency between AWS regions within the same georegion. In Part 2 we conducted some experiments to understand how to configure multi-DC Cassandra clusters and use java clients, and measured latencies from Sydney to North Virginia. In Part 3 we explored the design for a globally distributed stock broker application, and built a simulation and got some indicative latency predictions. 

The goal of the application is to ensure that stock trades are done as close as possible to the stock exchanges the stocks are listed on, to reduce the latency between receiving a stock ticker update, checking the conditions for a stock order, and initiating a trade if the conditions are met. 

2. The Prototype

For this blog I built a prototype of the application, designed to run in two georegions, Australia and the USA. We are initially only trading stocks available from stock exchanges in these two georegions, and orders for stocks traded in New York will be directed to a Broker deployed in the AWS North Virginia region, and orders for stocks traded in Sydney will be directed to a Broker deployed in the AWS Sydney region. As this Google Earth image shows, they are close to being antipodes (diametrically opposite each other, at 15,500 km apart, so pretty much satisfy the definition of  traveling “Around the World”), with a measured inter-region latency (from blog 2) of 230ms.

Sydney to North Virginia (George Washington’s Mount Vernon house)

The design of the initial (simulated) version of the application was changed in a couple of ways to: 

  • Ensure that it worked correctly when instances of the Broker were deployed in multiple AWS regions
  • Measure actual rather than simulated latencies, and
  • Use a multi-DC Cassandra Cluster. 

The prototype isn’t a complete implementation of the application yet. In particular it only uses a single Cassandra table for Orders—to ensure that Orders are made available in both georegions, and can be matched against incoming stock tickers by the Broker deployed in that region. 

Some other parts of the application are “stubs”, including the Stock Ticker component (which will eventually use Kafka), and checks/updates of Holdings and generation of Transactions records (which will eventually also be Cassandra tables). Currently only the asynchronous, non-market, order types are implemented (Limit and Stop orders), as I realized that implementing Market orders (which are required to be traded immediately) using a Cassandra table would result in too many tombstones being produced—as each order is deleted immediately upon being filled (rather than being marked as “filled” in the original design, to enable the Cassandra query to quickly find unfilled orders and prevent the Orders table from growing indefinitely). However, for non-market orders it is a reasonable design choice to use a Cassandra table, as Orders may exist for extended periods of time (hours, days, or even weeks) before being traded, and as there is only a small number of successful trades (10s to 100s) per second relative to the total number of waiting Orders (potentially millions), the number of deletions, and therefore tombstones, will be acceptable. 

We now have a look at the details of some of the more significant changes that were made to the application.

Cassandra

I created a multi-DC Cassandra keyspace as follows:

CREATE KEYSPACE broker WITH replication = {'class': 'NetworkTopologyStrategy', 'NorthVirginiaDC': '3', 'SydneyDC': '3'};

In the Cassandra Java driver, the application.conf file determines which Data Center the Java client connects to. For example, to connect to the SydneyDC the file has the following settings:

datastax-java-driver {
    basic.contact-points = [
        "1.2.3.4:9042"
    ]

    basic.load-balancing-policy {
        class = DefaultLoadBalancingPolicy
        local-datacenter = "SydneyDC"
    }
}

The Orders table was created as follows (note that I couldn’t use “limit” for a column name as it’s a reserved word in Cassandra!):

CREATE TABLE broker.orders (
    symbol text,
    orderid text,
    buyorsell text,
    customerid text,
    limitthreshold double,
    location text,
    ordertype text,
    quantity bigint,
    starttime bigint,
    PRIMARY KEY (symbol, orderid)
);

For the primary key, the partition key is the stock symbol, so that all outstanding orders for a stock can be found when each stock ticker is received by the Broker, and the clustering column is the (unique) orderid, so that multiple orders for the same symbol can be written and read, and a specific order (for an orderid) can be deleted. In a production environment using a single stock symbol partition may result in skewed and unbounded partitions which is not recommended.

The prepared statements for creating, reading, and deleting orders are as follows:

PreparedStatement prepared_insert = Cassandra.session.prepare(
"insert into broker.orders (symbol, orderid, buyorsell, customerid, limitthreshold, location, ordertype, quantity, starttime) values (?, ?, ?, ?, ?, ?, ?, ?, ?)");
                 
PreparedStatement prepared_select = Cassandra.session.prepare(
        "select * from broker.orders where symbol = ?");

PreparedStatement prepared_delete = Cassandra.session.prepare(
        "delete from broker.orders where symbol = ? and orderid = ?");

I implemented a simplified “Place limit or stop order” operation (see Part 3), which uses the prepared_insert statement to create each new order, initially in the Cassandra Data Center local to the Broker where the order was created from, which is then automatically replicated in the other Cassandra Data Center. I also implemented the “Trade Matching Order” operation (Part 3), which uses the prepared_select statement to query orders matching each incoming Stock Ticker, checks the rules, and then if a trade is filled deletes the order.

Deployment

I created a 3 node Cassandra cluster in the Sydney AWS region, and then added another identical Data Center in the North Virginia AWS regions using Instaclustr Managed Cassandra for AWS. This gave me 6 nodes in total, running on t3.small instances (5 GB SSD, 2GB RAM, 2 CPU Cores). This is a small developer sized cluster, but is adequate for a prototype, and very affordable (2 cents an hour per node for AWS costs) given that the Brokers are currently only single threaded so don’t produce much load. We’re more interested in latency at this point of the experiment, and we may want to increase the number of Data Centers in the future. I also spun up an EC2 instance (t3a.micro) in the same AWS regions, and deployed an instance of the Stock Broker on each (it only used 20% CPU). Here’s what the complete deployment looks like:

3. The Results

For the prototype, the focus was on demonstrating that the design goal of minimizing latency for trading stop and limit orders (asynchronous trades) was achieved. For the prototype, the latency for these order types is measured from the time of receiving a Stock Ticker, to the time an Order is filled. We ran a Broker in each AWS region concurrently for an hour, with the same workload for each, and measured the average and maximum latencies. For the first configuration, each Broker is connected to its local Cassandra Data Center, which is how it would be configured in practice. The results were encouraging, with an average latency of 3ms, and a maximum of 60ms, as shown in this graph.  

During the run, across both Brokers, 300 new orders were created each second, 600 stock tickers were received each second, and 200 trades were carried out each second. 

Given that I hadn’t implemented Market Orders yet, I wondered how I could configure and approximately measure the expected latency for these synchronous order types between different regions (i.e. Sydney and North Virginia)? The latency for Market orders in the same region will be comparable to the non-market orders. The solution turned out to be simple— just re-configure the Brokers to use the remote Cassandra Data Center, which introduces the inter-region round-trip latency which would also be encountered with Market Orders placed on one region and traded immediately in the other region. I could also have achieved a similar result by changing the consistency level to EACH_QUOROM (which requires a majority of nodes in each data center to respond). Not surprisingly, the latencies were higher, rising to 360ms average, and 1200ms maximum, as shown in this graph with both configurations (Stop and Limit Orders on the left, and Market Orders on the right):

So our initial experiments are a success, and validate the primary design goal, as asynchronous stop and limit Orders can be traded with low latency from the Broker nearest the relevant stock exchanges, while synchronous Market Orders will take significantly longer due to inter-region latency. 

Write Amplification

I wondered what else can be learned from running this experiment? We can understand more about resource utilization in multi-DC Cassandra clusters. Using the Instaclustr Cassandra Console, I monitored the CPU Utilization on each of the nodes in the cluster, initially with only one Data Center and one Broker, and then with two Data Centers and a single Broker, and then both Brokers running. It turns out that the read load results in 20% CPU Utilization on each node in the local Cassandra Data Center, and the write load also results in 20% locally.  Thus, for a single Data Center cluster the total load is 40% CPU. However, with two Data Centers things get more complex due to the replication of the local write loads to each other Data Center. This is also called “Write Amplification”.

The following table shows the measured total load for 1 and 2 Data Centers, and predicted load for up to 8 Data Centers, showing that for more than 3 Data Centers you need bigger nodes (or bigger clusters). A four CPU Core node instance type would be adequate for 7 Data Centers, and would result in about 80% CPU Utilization.  

  Number of Data  Centres Local Read Load Local Write Load Remote Write Load Total Write Load Total Load
  1 20 20 0 20 40
  2 20 20 20 40 60
  3 20 20 40 60 80
  4 20 20 60 80 100
  5 20 20 80 100 120
  6 20 20 100 120 140
  7 20 20 120 140 160
  8 20 20 140 160 180

Costs

The total cost to run the prototype includes the Instaclustr Managed Cassandra nodes (3 nodes per Data Center x 2 Data Centers = 6 nodes), the two AWS EC2 Broker instances, and the data transfer between regions (AWS only charges for data out of a region, not in, but the prices vary depending on the source region). For example, data transfer out of North Virginia is 2 cents/GB, but Sydney is more expensive at 9.8 cents/GB. I computed the total monthly operating cost to be $361 for this configuration, broken down into $337/month (93%) for Cassandra and EC2 instances, and $24/month (7%) for data transfer, to process around 500 million stock trades. Note that this is only a small prototype configuration, but can easily be scaled for higher throughputs (with incrementally and proportionally increasing costs).

Conclusions

In this blog we built and experimented with a prototype of the globally distributed stock broker application, focussing on testing the multi-DC Cassandra part of the system which enabled us to significantly reduce the impact of planetary scale latencies (from seconds to low milliseconds) and ensure greater redundancy (across multiple AWS regions), for the real-time stock trading function. Some parts of the application remain as stubs, and in future blogs I aim to replace them with suitable functionality (e.g. streaming, analytics) and non-functionality (e.g. failover) from a selection of Kafka, Elasticsearch and maybe even Redis!

The post Building a Low-Latency Distributed Stock Broker Application: Part 4 appeared first on Instaclustr.

Understanding the Impacts of the Native Transport Requests Change Introduced in Cassandra 3.11.5

Summary

Recently, Cassandra made changes to the Native Transport Requests (NTR) queue behaviour. Through our performance testing, we found the new NTR change to be good for clusters that have a constant load causing the NTR queue to block. Under the new mechanism the queue no longer blocks, but throttles the load based on queue size setting, which by default is 10% of the heap.

Compared to the Native Transport Requests queue length limit, this improves how Cassandra handles traffic when queue capacity is reached. The “back pressure” mechanism more gracefully handles the overloaded NTR queue, resulting in a significant lift of operations without clients timing out. In summary, clusters with later versions of Cassandra can handle more load before hitting hard limits.

Introduction

At Instaclustr, we are responsible for managing the Cassandra versions that we release to the public. This involves performing a review of Cassandra release changes, followed by performance testing. In cases where major changes have been made in the behaviour of Cassandra, further research is required. So without further delay let’s introduce the change to be investigated.

Change:
  • Prevent client requests from blocking on executor task queue (CASSANDRA-15013)
Versions affected:

Background

Native Transport Requests

Native transport requests (NTR) are any requests made via the CQL Native Protocol. CQL Native Protocol is the way the Cassandra driver communicates with the server. This includes all reads, writes, schema changes, etc. There are a limited number of threads available to process incoming requests. When all threads are in use, some requests wait in a queue (pending). If the queue fills up, some requests are silently rejected (blocked). The server never replies, so this eventually causes a client-side timeout. The main way to prevent blocked native transport requests is to throttle load, so the requests are performed over a longer period.

Prior to 3.11.5

Prior to 3.11.5, Cassandra used the following configuration settings to set the size and throughput of the queue:

  • native_transport_max_threads is used to set the maximum threads for handling requests.  Each thread pulls requests from the NTR queue.
  • cassandra.max_queued_native_transport_requests is used to set queue size. Once the queue is full the Netty threads are blocked waiting for the queue to have free space (default 128).

Once the NTR queue is full requests from all clients are not accepted. There is no strict ordering by which blocked Netty threads will process requests. Therefore in 3.11.4 latency becomes random once all Netty threads are blocked.

Native Transport Requests - Cassandra 3.11.4

Change After 3.11.5

In 3.11.5 and above, instead of blocking the NTR queue as previously described, it throttles. The NTR queue is throttled based on the heap size. The native transport requests are limited in terms of total size occupied in memory rather than the number of them. Requests are paused after the queue is full.

  • native_transport_max_concurrent_requests_in_bytes a global limit on the number of NTR requests, measured in bytes. (default heapSize / 10)
  • native_transport_max_concurrent_requests_in_bytes_per_ip an endpoint limit on the number of NTR requests, measured in bytes. (default heapSize / 40)

Maxed Queue Behaviour

From previously conducted performance testing of 3.11.4 and 3.11.6 we noticed similar behaviour when the traffic pressure has not yet reached the point of saturation in the NTR queue. In this section, we will discuss the expected behaviour when saturation does occur and breaking point is reached. 

In 3.11.4, when the queue has been maxed, client requests will be refused. For example, when trying to make a connection via cqlsh, it will yield an error, see Figure 2.

Cassandra 3.11.4 - queue maxed out, client requests refused
Figure 2: Timed out request

Or on the client that tries to run a query, you may see NoHostAvailableException

Where a 3.11.4 cluster previously got blocked NTRs, when upgraded to 3.11.6 NTRs are no longer blocked. The reason is that 3.11.6 doesn’t place a limit on the number of NTRs but rather on the size of memory of all those NTRs. Thus when the new size limit is reached, NTRs are paused. Default settings in 3.11.6 result in a much larger NTR queue in comparison to the small 128 limit in 3.11.4 (in normal situations where the payload size would not be extremely large).

Benchmarking Setup

This testing procedure requires the NTR queue on a cluster to be at max capacity with enough load to start blocking requests at a constant rate. In order to do this we used multiple test boxes to stress the cluster. This was achieved by using 12 active boxes to create multiple client connections to the test cluster. Once the cluster NTR queue is in constant contention, we monitored the performance using:

  • Client metrics: requests per second, latency from client perspective
  • NTR Queue metrics: Active Tasks, Pending Tasks, Currently Blocked Tasks, and Paused Connections.

For testing purposes we used two testing clusters with details provided in the table below:

Cassandra Cluster size Instance Type Cores RAM Disk
3.11.4 3 M5xl-1600-v2  4 16GB 1600 GB
3.11.6 3 m5xl-1600-v2 4 16GB 1600 GB
Table 1: Cluster Details

To simplify the setup we disabled encryption and authentication. Multiple test instances were set up in the same region as the clusters. For testing purposes we used 12 KB blob payloads. To give each cluster node a balanced mixed load, we kept the number of test boxes generating write load equal to the number of instances generating read load. We ran the load against the cluster for 10 mins to temporarily saturate the queue with read and write requests and cause contention for the Netty threads.

Our test script used cassandra-stress for generating the load, you can also refer to Deep Diving cassandra-stress – Part 3 (Using YAML Profiles) for more information.

In the stressSpec.yaml, we used the following table definition and queries:

table_definition: |
 CREATE TABLE typestest (
       name text,
       choice boolean,
       date timestamp,
       address inet,
       dbl double,
       lval bigint,
               ival int,
       uid timeuuid,
       value blob,
       PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
 ) WITH compaction = { 'class':'LeveledCompactionStrategy' }
   AND comment='A table of many types to test wide rows'
 
columnspec:
 - name: name
   size: fixed(48)
   population: uniform(1..1000000000) # the range of unique values to select for the field 
 - name: date
   cluster: uniform(20..1000)
 - name: lval
   population: gaussian(1..1000)
   cluster: uniform(1..4)
 - name: value
   size: fixed(12000)
 
insert:
 partitions: fixed(1)       # number of unique partitions to update in a single operation
                                 # if batchcount > 1, multiple batches will be used but all partitions will
                                 # occur in all batches (unless they finish early); only the row counts will vary
 batchtype: UNLOGGED               # type of batch to use
 select: uniform(1..10)/10       # uniform chance any single generated CQL row will be visited in a partition;
                                 # generated for each partition independently, each time we visit it
 
#
# List of queries to run against the schema
#
queries:
  simple1:
     cql: select * from typestest where name = ? and choice = ? LIMIT 1
     fields: samerow             # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
  range1:
     cql: select name, choice, uid  from typestest where name = ? and choice = ? and date >= ? LIMIT 10
     fields: multirow            # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
  simple2:
     cql: select name, choice, uid from typestest where name = ? and choice = ? LIMIT 1
     fields: samerow             # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

Write loads were generated with:

cassandra-stress user no-warmup 'ops(insert=10)' profile=stressSpec.yaml cl=QUORUM duration=10m -mode native cql3 maxPending=32768 connectionsPerHost=40 -rate threads=2048 -node file=node_list.txt

Read loads were generated by changing ops to

ops(simple1=10,range1=1)'

Comparison

3.11.4 Queue Saturation Test

The active NTR queue reached max capacity (at 128) and remained in contention under load. Pending NTR tasks remained above 128 throughout. At this point, timeouts were occurring when running 12 load instances to stress the cluster. Each node had 2 load instances performing reads and another 2 performing writes. 4 of the read load instances constantly logged NoHostAvailableExceptions as shown in the example below.

ERROR 04:26:42,542 [Control connection] Cannot connect to any host, scheduling retry in 1000 milliseconds
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ec2-18-211-4-255.compute-1.amazonaws.com/18.211.4.255:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-18-211-4-255.compute-1.amazonaws.com/18.211.4.255] Timed out waiting for server response), ec2-35-170-231-79.compute-1.amazonaws.com/35.170.231.79:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-35-170-231-79.compute-1.amazonaws.com/35.170.231.79] Timed out waiting for server response), ec2-35-168-69-19.compute-1.amazonaws.com/35.168.69.19:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-35-168-69-19.compute-1.amazonaws.com/35.168.69.19] Timed out waiting for server response))

The client results we got from this stress run are shown in Table 2.

Box Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
1 700.00 2,862.20 2,078.30 7,977.60 11,291.10 19,495.10 34,426.80
2 651.00 3,054.50 2,319.50 8,048.90 11,525.90 19,528.70 32,950.50
3 620.00 3,200.90 2,426.40 8,409.60 12,599.70 20,367.50 34,158.40
4 607.00 3,312.80 2,621.40 8,304.70 11,769.20 19,730.00 31,977.40
5 568.00 3,529.80 3,011.50 8,216.60 11,618.20 19,260.20 32,698.80
6 553.00 3,627.10 3,028.30 8,631.90 12,918.50 20,115.90 34,292.60
Writes 3,699.00 3,264.55 2,580.90 8,264.88 11,953.77 19,749.57 34,426.80
7 469.00 4,296.50 3,839.90 9,101.60 14,831.10 21,290.30 35,634.80
8 484.00 4,221.50 3,808.40 8,925.50 11,760.80 20,468.20 34,863.10
9 Crashed due to time out
10 Crashed due to time out
11 Crashed due to time out
12 Crashed due to time out
Reads 953.00 4,259.00 3,824.15 9,092.80 14,800.40 21,289.48 35,634.80
Summary 4,652.00 3,761.78 3,202.53 8,678.84 13,377.08 20,519.52 35,634.80
Table 2: 3.11.4 Mixed Load Saturating The NTR Queue

* To calculate the total write operations, we summed the values from 6 instances. For max write latency we used the max value from all instances and for the rest of latency values, we calculated the average of results. Write results are summarised in the Table 2 “Write” row. For the read result we did the same, and results are recorded in the “Read” row. The last row in the table summarises the results in “Write” and “Read” rows.

The 6 write load instances finished normally, but the read instances struggled. Only 2 of the read load instances were able to send traffic through normally, the other clients received too many timeout errors causing them to crash. Another observation we have made is that the Cassandra timeout metrics, under client-request-metrics, did not capture any of the client timeout we have observed.

Same Load on 3.11.6

Next, we proceeded to test 3.11.6 with the same load. Using the default NTR settings, all test instances were able to finish the stress test successfully.

Box Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
1 677.00 2,992.60 2,715.80 7,868.50 9,303.00 9,957.30 10,510.90
2 658.00 3,080.20 2,770.30 7,918.80 9,319.70 10,116.70 10,510.90
3 653.00 3,102.80 2,785.00 7,939.80 9,353.30 10,116.70 10,510.90
4 608.00 3,340.90 3,028.30 8,057.30 9,386.90 10,192.20 10,502.50
5 639.00 3,178.30 2,868.90 7,994.30 9,370.10 10,116.70 10,510.90
6 650.00 3,120.50 2,799.70 7,952.40 9,353.30 10,116.70 10,510.90
Writes 3,885.00 3,135.88 2,828.00 7,955.18 9,347.72 10,102.72 10,510.90
7 755.00 2,677.70 2,468.30 7,923.00 9,378.50 9,982.40 10,762.60
8 640.00 3,160.70 2,812.30 8,132.80 9,529.50 10,418.70 11,031.00
9 592.00 3,427.60 3,101.70 8,262.80 9,579.80 10,452.20 11,005.90
10 583.00 3,483.00 3,160.40 8,279.60 9,579.80 10,435.40 11,022.60
11 582.00 3,503.60 3,181.40 8,287.90 9,588.20 10,469.00 11,047.80
12 582.00 3,506.70 3,181.40 8,279.60 9,588.20 10,460.60 11,014.20
Reads 3,734.00 3,293.22 2,984.25 8,194.28 9,540.67 10,369.72 11,047.80
Summary 7,619.00 3,214.55 2,906.13 8,074.73 9,444.19 10,236.22 11,047.80
Table 3: 3.11.6 Mixed Load

Default Native Transport Requests (NTR) Setting Comparison

Taking the summary row from both versions (Table 2 and Table 3), we produced Table 4.

Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
3.11.4 4652 3761.775 3202.525 8678.839167 13377.08183 20519.52228 35634.8
3.11.6 7619 3214.55 2906.125 8074.733333 9444.191667 10236.21667 11047.8
Table 4: Mixed Load 3.11.4 vs 3.11.6


Figure 2: Latency 3.11.4 vs 3.11.6

Figure 2 shows the latencies from Table 4. From the results, 3.11.6 had slightly better average latency than 3.11.4. Furthermore, in the worst case where contention is high, 3.11.6 handled the latency of a request better than 3.11.4. This is shown by the difference in Latency Max. Not only did 3.11.6 have lower latency but it was able to process many more requests due to not having a blocked queue.

3.11.6 Queue Saturation Test

The default native_transport_max_concurrent_requests_in_bytes is set to 1/10 of the heap size. The Cassandra max heap size of our cluster is 8 GB, so the default queue size for our queue is 0.8 GB. This turns out to be too large for this cluster size, as this configuration will run into CPU and other bottlenecks before we hit NTR saturation.

So we took the reverse approach to investigate full queue behaviour, which is setting the queue size to a lower number. In cassandra.yaml, we added:

native_transport_max_concurrent_requests_in_bytes: 1000000

This means we set the global queue size to be throttled at 1MB. Once Cassandra was restarted and all nodes were online with the new settings, we ran the same mixed load on this cluster, the results we got are shown in Table 5.

3.11.6 Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
Write: Default setting 3,885.00 3,135.88 2,828.00 7,955.18 9,347.72 10,102.72 10,510.90
Write: 1MB setting 2,105.00 5,749.13 3,471.82 16,924.02 26,172.45 29,681.68 31,105.00
Read: Default setting 3,734.00 3,293.22 2,984.25 8,194.28 9,540.67 10,369.72 11,047.80
Read: 1MB setting 5,395.00 2,263.13 1,864.55 5,176.47 8,074.73 9,693.03 15,183.40
Summary: Default setting 7,619.00 3,214.55 2,906.13 8,074.73 9,444.19 10,236.22 11,047.80
Summary: 1MB setting 7,500.00 4,006.13 2,668.18 11,050.24 17,123.59 19,687.36 31,105.00

Table 5: 3.11.6 native_transport_max_concurrent_requests_in_bytes default and 1MB setting 

During the test, we observed a lot of paused connections and discarded requests—see Figure 3. For a full list of Instaclustr exposed metrics see our support documentation.

NTR Test - Paused Connections and Discarded Requests
Figure 3: 3.11.6 Paused Connections and Discarded Requests

After setting native_transport_max_concurrent_requests_in_bytes to a lower number, we start to get paused connections and discarded requests, write latency increased resulting in fewer processed operations, shown in Table 5. The increased write latency is illustrated Figure 4.

Cassandra 3.11.6 Write Latency Under Different Settings
Figure 4: 3.11.6 Write Latency Under Different Settings

On the other hand, read latency decreased, see Figure 5, resulting in a higher number of operations being processed.

Cassandra 3.11.6 Read Latency Under Different Settings
Figure 5: 3.11.6 Read Latency Under Different Settings
Cassandra 3.11.6 Operations Rate Under Different Settings
Figure 6: 3.11.6 Operations Rate Under Different Settings

As illustrated in Figure 6, the total number of operations decreased slightly with the 1MB setting, but the difference is very small and the effect of read and write almost “cancel each other out”. However, when we look at each type of operation individually, we can see that rather than getting equal share of the channel in a default setting of “almost unlimited queue”, the lower queue size penalizes writes and favors read. While our testing identified this outcome, further investigation will be required to determine exactly why this is the case.

Conclusion

In conclusion, the new NTR change offers an improvement over the previous NTR queue behaviour. Through our performance testing we found the change to be good for clusters that have a constant load causing the NTR queue to block. Under the new mechanism the queue no longer blocks, but throttles the load based on the amount of memory allocated to requests.

The results from testing indicated that the changed queue behaviour reduced latency and provided a significant lift in the number of operations without clients timing out. Clusters with our latest version of Cassandra can handle more load before hitting hard limits. For more information feel free to comment below or reach out to our Support team to learn more about changes to 3.11.6 or any of our other supported Cassandra versions.

The post Understanding the Impacts of the Native Transport Requests Change Introduced in Cassandra 3.11.5 appeared first on Instaclustr.

Apache Cassandra Usage Report 2020

Apache Cassandra is the open source NoSQL database for mission critical data. Today the community announced findings from a comprehensive global survey of 901 practitioners on Cassandra usage. It’s the first of what will become an annual survey that provides a baseline understanding of who, how, and why organizations use Cassandra.

“I saw zero downtime at global scale with Apache Cassandra. That’s a powerful statement to make. For our business that’s quite crucial.” - Practitioner, London

Key Themes

Cassandra adoption is correlated with organizations in a more advanced stage of digital transformation.

People from organizations that self-identified as being in a “highly advanced” stage of digital transformation were more likely to be using Cassandra (26%) compared with those in an “advanced” stage (10%) or “in process” (5%).

Optionality, security, and scalability are among the key reasons Cassandra is selected by practitioners.

The top reasons practitioners use Cassandra for mission critical apps are “good hybrid solutions” (62%), “very secure” (60%), “highly scalable” (57%), “fast” (57%), and “easy to build apps with” (55%).

A lack of skilled staff and the challenge of migration deters adoption of Cassandra.

Thirty-six percent of practitioners currently using Cassandra for mission critical apps say that a lack of Cassandra-skilled team members may deter adoption. When asked what it would take for practitioners to use Cassandra for more applications and features in production, they said “easier to migrate” and “easier to integrate.”

Methodology

Sample. The survey consisted of 1,404 interviews of IT professionals and executives, including 901 practitioners which is the focus of this usage report, from April 13-23, 2020. Respondents came from 13 geographies (China, India, Japan, South Korea, Germany, United Kingdom, France, the Netherlands, Ireland, Brazil, Mexico, Argentina, and the U.S.) and the survey was offered in seven languages corresponding to those geographies. While margin of sampling error cannot technically be calculated for online panel populations where the relationship between sample and universe is unknown, the margin of sampling error for equivalent representative samples would be +/- 2.6% for the total sample, +/- 3.3% for the practitioner sample, and +/- 4.4% for the executive sample.

To ensure the highest quality respondents, surveys include enhanced screening beyond title and activities of company size (no companies under 100 employees), cloud IT knowledge, and years of IT experience.

Rounding and multi-response. Figures may not add to 100 due to rounding or multi-response questions.

Demographics

Practitioner respondents represent a variety of roles as follows: Dev/DevOps (52%), Ops/Architect (29%), Data Scientists and Engineers (11%), and Database Administrators (8%) in the Americas (43%), Europe (32%), and Asia Pacific (12%).

Cassandra roles

Respondents include both enterprise (65% from companies with 1k+ employees) and SMEs (35% from companies with at least 100 employees). Industries include IT (45%), financial services (11%), manufacturing (8%), health care (4%), retail (3%), government (5%), education (4%), telco (3%), and 17% were listed as “other.”

Cassandra companies

Cassandra Adoption

Twenty-two percent of practitioners are currently using or evaluating Cassandra with an additional 11% planning to use it in the next 12 months.

Of those currently using Cassandra, 89% are using open source Cassandra, including both self-managed (72%) and third-party managed (48%).

Practitioners using Cassandra today are more likely to use it for more projects tomorrow. Overall, 15% of practitioners say they are extremely likely (10 on a 10-pt scale) to use it for their next project. Of those, 71% are currently using or have used it before.

Cassandra adoption

Cassandra Usage

People from organizations that self-identified as being in a “highly advanced” stage of digital transformation were more likely to be using Cassandra (26%) compared with those in an “advanced” stage (10%) in “in process” (5%).

Cassandra predominates in very important or mission critical apps. Among practitioners, 31% use Cassandra for their mission critical applications, 55% for their very important applications, 38% for their somewhat important applications, and 20% for their least important applications.

“We’re scheduling 100s of millions of messages to be sent. Per day. If it’s two weeks, we’re talking about a couple billion. So for this, we use Cassandra.” - Practitioner, Amsterdam

Cassandra usage

Why Cassandra?

The top reasons practitioners use Cassandra for mission critical apps are “good hybrid solutions” (62%), “very secure” (60%), “highly scalable” (57%), “fast” (57%), and “easy to build apps with” (55%).

“High traffic, high data environments where really you’re just looking for very simplistic key value persistence of your data. It’s going to be a great fit for you, I can promise that.” - Global SVP Engineering

Top reasons practitioners use Cassandra

For companies in a highly advanced stage of digital transformation, 58% cite “won’t lose data” as the top reason, followed by “gives me confidence” (56%), “cloud native” (56%), and “very secure” (56%).

“It can’t lose anything, it has to be able to capture everything. It can’t have any security defects. It needs to be somewhat compatible with the environment. If we adopt a new database, it can’t be a duplicate of the data we already have.… So: Cassandra.” - Practitioner, San Francisco

However, 36% of practitioners currently using Cassandra for mission critical apps say that a lack of Cassandra-skilled team members may deter adoption.

“We don’t have time to train a ton of developers, so that time to deploy, time to onboard, that’s really key. All the other stuff, scalability, that all sounds fine.” – Practitioner, London

When asked what it would take for practitioners to use Cassandra for more applications and features in production, they said “easier to migrate” and “easier to integrate.”

“If I can get started and be productive in 30 minutes, it’s a no brainer.” - Practitioner, London

Conclusion

We invite anyone who is curious about Cassandra to test the 4.0 beta release. There will be no new features or breaking API changes in future Beta or GA builds, so you can expect the time you put into the beta to translate into transitioning your production workloads to 4.0.

We also invite you to participate in a short survey about Kubernetes and Cassandra that is open through September 24, 2020. Details will be shared with the Cassandra Kubernetes SIG after it closes.

Survey Credits

A volunteer from the community helped analyze the report, which was conducted by ClearPath Strategies, a strategic consulting and research firm, and donated to the community by DataStax. It is available for use under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).