Scylla University Live

Following on the success of our online 2021 Scylla Summit Training Day, we will host our first-ever Scylla University Live event. Featuring helpful new coursework, Scylla University Live will take place on Thursday, April 22, 2021, 8:45 AM – 1 PM, Pacific Standard Time (PST). Register now and mark your calendars!

The event will be online and instructor-led. It will include two parallel tracks – one for Developers and Scylla Cloud users and one for DevOps and Architects. You can bounce back and forth between tracks or drop in for the sessions that most interest you.

SIGN UP FOR SCYLLA UNIVERSITY LIVE

The sessions will be led by many of the same Scylla engineers and developers who build our products. They will be available for questions during and after the sessions.

After the training sessions, you will have the opportunity to take accompanying Scylla University courses to get some hands-on experience, complete quizzes, receive certificates of completion, and earn some exclusive swag.

Agenda and Schedule

Time (PST) Developers & Scylla Cloud Users DevOps & Architects
08:45-09:00am Welcome & Agenda
09:00-09:55am Getting Started with Scylla Best Practices for Change Data Capture
09:55-10:00am Break/Passing Period
10:00-10:55am How to Write Better Apps Scylla on Kubernetes: CQL and DynamoDB compatible APIs
10:55-11:00am Break/Passing Period
11:00-11:55am Lightweight Transactions in Action Advanced Monitoring and Admin Best Practices
11:55am-12:00pm Break/Passing Period
12:00pm-1:00pm Birds of a Feather Roundtables

Bring any questions you have to the sessions. We aim to make them as live and interactive as possible.

We will also host a Birds of a Feather networking session where you can meet your fellow database monsters and chat with the Scylla team.

Details

  • Getting Started with Scylla: This session will introduce Scylla Architecture, basic concepts, and basic data modeling. Some of the topics covered include Scylla Terminology, Scylla Components, Data Replication, Consistency Level, Scylla Write, Read paths + sstable, memtable, Basic Data modeling. Also, I’ll show you a quick demo of how easy it is to spin up a cluster with docker.
  • How to Write Better Apps: In this class, we will touch on Advanced Data Modeling and Data Types, Scylla shard-aware drivers, Materialized Views, Global and Local Secondary Indexes, and Compaction Strategies. We will also talk about tips and best practices for writing applications for Scylla.
  • Lightweight Transactions in Action: In this class, you will get an overview of LWT, when and why this feature should be used, how it’s implemented in Scylla, best practices for using it. You’ll also learn how to configure and use Scylla’s “compare and set” operations.
  • Best Practices for Change Data Capture: This session will start with an overview of how Scylla implemented CDC as standard CQL tables, the different ways it can be configured to record pre-and post images, as well as differences, and then show how to consume this data via various methods.
  • Scylla on Kubernetes: CQL and DynamoDB compatible APIs: here, you will learn about using Scylla Operator for Kubernetes for cloud-native deployments, including how to take advantage of the three recently released helm charts.
  • Advanced Monitoring and Best Practices: This session will start with an overview of our Scylla Monitoring Stack, which is your first resource for understanding what’s happening in your cluster and your application and for getting alerts when something goes wrong. Scylla exports thousands of different metrics. We will show how to use Scylla Monitoring to make sense of all that data and provide a complete picture of how well your hardware is set up and used, the current/historical state of the cluster, and how well your app is written.

Get Started in Scylla University

If you haven’t done so yet, we recommend you first complete the Scylla Essentials and Mutant Monitoring System courses on Scylla University to better understand Scylla and how the technology works.

Hope to see you at Scylla University Live!

SIGN UP FOR SCYLLA UNIVERSITY LIVE

The post Scylla University Live appeared first on ScyllaDB.

Kiwi.com: Nonstop Operations with Scylla Even Through the OVHcloud Fire

Disasters can strike any business on any day. This particular disaster, a fire at the OVHcloud Strasbourg datacenter, struck recently and the investigation and recovery are still ongoing. This is an initial report of one company’s resiliency in the face of that disaster.

Overview of the Incident

Less than an hour after midnight on Wednesday, March 10, 2021, in the city of Strasbourg, at 0:47 CET, a fire began in a room at the SBG2 datacenter of OVHcloud, the popular French cloud provider. Within hours the fire had been contained, but not before wreaking havoc. The fire nearly entirely destroyed SBG2, and gutted four of twelve rooms in the adjacent SBG1 datacenter. Additionally, combatting the fire required proactively switching off the other two datacenters, SBG3 and SBG4.

Netcraft estimates this disaster accounted for knocking out 3.6 million websites spread across 464,000 domains. Of those,184,000 websites across nearly 60,000 domains were in the French country code Top Level Domain (ccTLD) .FR — about 1 in 50 servers for the entire .FR domain. As Netcraft stated, “Websites that went offline during the fire included online banks, webmail services, news sites, online shops selling PPE to protect against coronavirus, and several countries’ government websites.”

OVHcloud’s Strasbourg SBG2 Datacenter engulfed in flames. (Image: SDIS du Bas Rhin )

Kiwi.com Keeps Running

However, one company that had their servers deployed in OVHcloud fared better than others: Kiwi.com, the popular online travel site. Scylla, the NoSQL database Kiwi.com had standardized upon, was designed from the ground up to be highly available and resilient, even in the face of disaster.

Around 01:12 CET, about a half hour after the fire initially broke out, Kiwi.com’s monitoring dashboards produced alerts as nodes went down and left the cluster. There were momentary traffic spikes as these nodes became unresponsive, but soon the two other OVHcloud European datacenters used by Kiwi.com took over requests bound for Strasbourg.

Out of a thirty node distributed NoSQL cluster, ten nodes became suddenly unavailable. Other than a brief blip around 1:15, Kiwi.com’s Scylla cluster continued working seamlessly. Load on the remaining online nodes rose from ~25% before the outage to ~30-50% three hours later. (Source: Kiwi.com)

Kiwi.com had just lost 10 server nodes out of 30 nodes total, but the remaining Scylla database cluster was capable of rebalancing itself and handling the load. Plus, because Scylla is datacenter topology aware and kept multiple copies of data geographically distributed, their database kept running with zero data loss.

According to Kiwi.com’s Milos Vyletel, “As we designed Scylla to be running on three independent locations — every location at least 200 kilometers from another — Kiwi.com survived without any major impact of services.”

The multi-local OVHcloud infrastructure enabled Kiwi.com to build out a robust and scalable triple replicated Scylla database in three datacenters all in separate locations. The secure OVH vRack synchronised the connection of the three sites via a reliable private network, allowing the cluster optimal replication and scalability across multiple locations.

Indeed, Kiwi.com had done their disaster planning years before, even joking about their resiliency by having their initial Scylla cluster launch party in a Cold War era nuclear fallout shelter. Now their planning, and their technology choice, had paid back in full.

The OVH data center is seen in Strasbourg, eastern France, Thursday, March 11, 2021. A fire broke out on Wednesday in a room of one of the 4 OVH cloud data center affecting websites in several European countries. The origin of this fire is being investigated and no injuries are to be deplored. (AP Photo/Jean-Francois Badias)

As dawn broke, the fire was out, but the extensive damage to the OVHcloud Strasbourg datacenter was clear. (Image: AP Photo/Jean-Francois Badias)

With the dawning of a new day, load on Kiwi.com’s database picked up, which taxed the remaining servers, yet Scylla kept performing. As Milos informed the ScyllaDB support team, “Scylla seems fine. A bit red but everything works as designed.”

The Road to Disaster Recovery

In total, ten production nodes, plus two other development servers, located in SBG2 were lost to Kiwi.com and are unrecoverable. The next steps are to await for the other OVHcloud SBG buildings to be brought back up again, at which point Kiwi.com will refresh their hardware with new servers. Kiwi.com is also considering using this opportunity to update the servers in their other datacenters.

Lessons Learned

Milos provided this advice from Kiwi.com’s perspective: “One thing we have learned is to test full datacenter outages on a regular basis. We always wanted to test it on one product, as one of the devs was pushing us to do, but never really had taken the time.”

“Fortunately, we sized our Scylla cluster in a way that two DCs were able to handle the load just fine. We applied the same principles to other (non-Scylla) clusters as well, but over time as new functionality was added we have not been adding new capacity for various reasons — COVID impact being the major one over this last year or so. We are kind of pushing limits on those clusters — we had to do some reshuffling of servers to accommodate for the lost compute power.

“The bottom line is it is more expensive to have data replicated on multiple geographically distributed locations, providing enough capacity to survive a full DC outage, but when these kinds of situations happen it is priceless to be able to get over it with basically no downtime whatsoever.”

The post Kiwi.com: Nonstop Operations with Scylla Even Through the OVHcloud Fire appeared first on ScyllaDB.

A Shard-Aware Scylla C/C++ Driver

We are happy to announce the first release of a shard-aware C/C++ driver (connector library). It’s an API-compatible fork of Datastax cpp-driver 2.15.2, currently packaged for x86_64 CentOS 7 and Ubuntu 18.04 (with more to come!). It’s also easily compilable on most Linux distributions. The driver still works with Apache Cassandra and DataStax Enterprise (DSE), but when paired with Scylla enables shard-aware queries, delivering even greater performance than before.

GET THE SCYLLA SHARD-AWARE C/C++ DRIVER

Why?

Scylla C/C++ driver was forked from Datastax driver with the view to adding Scylla-specific features. It’s a rich, fast, async, battle-proven piece of C++ code, although its interface is in pure C, which opens a lot of possibilities, like creating Python or Erlang bindings. Its newest feature, which will be discussed in this post, is shard-awareness.

We’ve already written a neat introduction into this concept so, instead of repeating all of that here, we’ll just briefly remind you of the key topics. In short: shard-awareness is the ability of the client application to query specific CPUs within the Scylla cluster. As you know, Scylla’s code follows the concept of a shard-per-core architecture, which means that every node is a multithreaded process whose every thread performs some relatively independent work. It also means that each piece of data stored in your Scylla DB is bound to specific CPU(s). Unsurprisingly, also every CQL connection is served by a specific CPU, as you can see by querying system.clients table:

cqlsh> SELECT address, port, client_type, shard_id FROM system.clients;

 address   | port  | client_type | shard_id
-----------+-------+-------------+----------
 127.0.0.1 | 46730 |         cql |    2
 127.0.0.1 | 46732 |         cql |    3
 127.0.0.1 | 46768 |         cql |    0
 127.0.0.1 | 46770 |         cql |    1

(4 rows)

Each CQL connection is bound to a specific Scylla shard (column shard_id). The contents of this table are local (unique to every node).

By default, scylla-cpp-driver opens a CQL connection to every shard on every node, so it can communicate directly with any CPU in the cluster. Now — by locally hashing the partition key in your query — the driver determines which shard, on which node, would be the best for execution of your statement. Then it’s just a matter of pumping the query up the right CQL connection. The best part is that all of that happens behind the scenes: client code is unaffected, so to get this ability you’ll just need to re-link your C/C++ application with libscylla-cpp-driver!

Note: The driver doesn’t do client-side parsing of queries, so it must rely on Scylla to identify the partition keys in your queries. That’s why, for now, only prepared statements are shard-aware.

Note: To benefit from shard-awareness, token-aware routing needs to be enabled on the Cluster object. Don’t worry; it’s on by default.

How?

Scylla nodes extend the CQL SUPPORTED message with information about their shard count, current shard ID, sharding algorithm, etc. Knowing all this, the client can take one of two approaches to establish the pools of per-shard-connections: “basic” and “advanced”.

In the basic mode driver just opens connections until all shards are reached and the goal set by cass_cluster_set_core_connections_per_host() is met. It works because Scylla node assigns CQL connections to the least busy shards, so with sufficiently many connections all shards will eventually be hit. The downside to this is that usually some number of connections are opened unnecessarily, which led us to the idea of “advanced” shard selection.

In the advanced mode, available since Scylla Open Source 4.3, CQL SUPPORTED message contains a field named SCYLLA_SHARD_AWARE_PORT. If set, it indicates that Scylla listens to CQL connections on an additional port (by default 19042). This port works very much like the traditional 9042, with only one important difference: TCP connections incoming to this port are routed to specific shards depending on the client-side port number (aka. source port, local port). Clients can explicitly choose the target shard by setting the local port number to a value of N × node_shard_count + desired_shard_number where the allowed range for N is platform-specific. A call to:

cass_cluster_set_local_port_range(cluster, 50000, 60000);

tells the driver to use advanced shard-awareness and to bind its local sockets to ports from the range [50000, 60000). That’s all you need to do.

Note: Of course, the new port also does have a secure (SSL-ed) counterpart.

Note: Advanced shard selection falls back to the basic mode automatically in case of failure. Failure can happen for a number of reasons, such as when a client hides behind a NAT gateway or when SCYLLA_SHARD_AWARE_PORT is blocked. The fallback slows down the process of reconnecting, so we don’t recommend relying on it.

Installation

Our C/C++ driver releases are hosted on GitHub. If you are lucky to run CentOS 7, Ubuntu 18.04 or their relatives, you can simply download and install our packages along with their dependencies.

# Installation: CentOS 7
sudo yum install -y epel-release
sudo yum -y install libuv openssl zlib
wget https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-2.15.2-1.el7.x86_64.rpm https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-devel-2.15.2-1.el7.x86_64.rpm
sudo yum localinstall -y scylla-cpp-driver-2.15.2-1.el7.x86_64.rpm scylla-cpp-driver-devel-2.15.2-1.el7.x86_64.rpm

# Installation: Ubuntu 18.04
wget https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver_2.15.2-1_amd64.deb https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-dev_2.15.2-1_amd64.deb
sudo apt-get update
sudo apt-get install -y ./scylla-cpp-driver_2.15.2-1_amd64.deb ./scylla-cpp-driver-dev_2.15.2-1_amd64.deb

For those working on cutting-edge Linux distributions we recommend compilation from sources. Once you have the tools (cmake, make, g++) and dependencies (libuv-dev, zlib-dev, openssl-dev) installed please check out the latest tagged revision and proceed according to the linked instruction.

Example

In this short example we’ll just open a number of connections and exit after a minute of sleep.

// conn.cpp - demonstration of advanced shard selection in C++
#include
#include
#include
#include
 
int main(int argc, char* argv[]) {
  CassCluster* cluster = cass_cluster_new();
  CassSession* session = cass_session_new();
   
  cass_cluster_set_num_threads_io(cluster, 2);
  cass_cluster_set_contact_points(cluster, "127.0.0.1");
  cass_cluster_set_core_connections_per_host(cluster, 7);
  // Now enable advanced shard-awareness
  cass_cluster_set_local_port_range(cluster, 50000, 60000);
   
  CassFuture* connect_future = cass_session_connect(session, cluster);
   
  if (cass_future_error_code(connect_future) == CASS_OK) {
    std::cout << "Connected\n";
    std::this_thread::sleep_for(std::chrono::seconds(60)); // [1]
  } else {
    std::cout << "Connection ERROR\n";
  }
   
  // Cleanup
  cass_future_free(connect_future);
  cass_session_free(session);
  cass_cluster_free(cluster);
  }

If you installed the driver from packages, compilation is a breeze:

g++ conn.cpp -lscylla-cpp-driver -o conn

If you built the driver from sources but you hesitate to install it system-wide, you can still compile the example with:

g++ conn.cpp /cpp-driver/build/libscylla-cpp-driver.so -Wl,-rpath,/cpp-driver/build/ -I /cpp-driver/include/ -o conn

Assuming that you have a running 1-node Scylla cluster on localhost, the code above should greet you with “===== Using optimized driver!!! =====“. Then it would establish a session with 2 threads, each owning 7 connections (or the closest greater multiple of shard count on that instance of Scylla). While the sleep [1] takes place, we can use cqlsh to peek at the client-side port numbers and at the distribution of connections among shards:

cqlsh> SELECT address, port, client_type, shard_id FROM system.clients;
 
 address   | port  | client_type | shard_id
-----------+-------+-------------+----------
 127.0.0.1 | 37534 |         cql | 0
 127.0.0.1 | 37536 |         cql | 1
 127.0.0.1 | 38207 |         cql | 4
 127.0.0.1 | 50000 |         cql | 0
 127.0.0.1 | 50001 |         cql | 1
 127.0.0.1 | 50002 |         cql | 2
 127.0.0.1 | 50003 |         cql | 3
 127.0.0.1 | 50004 |         cql | 4
 127.0.0.1 | 50005 |         cql | 5
 127.0.0.1 | 50006 |         cql | 6
 127.0.0.1 | 50007 |         cql | 7
 127.0.0.1 | 50008 |         cql | 0
 127.0.0.1 | 50009 |         cql | 1
 127.0.0.1 | 50010 |         cql | 2
 127.0.0.1 | 50011 |         cql | 3
 127.0.0.1 | 50012 |         cql | 4
 127.0.0.1 | 50013 |         cql | 5
 127.0.0.1 | 50014 |         cql | 6
 127.0.0.1 | 50015 |         cql | 7

(19 rows)

Results of running conn.cpp against an 8-shard, single-node cluster of Scylla 4.3. There are two connections from cqlsh and one “control connection” from scylla-cpp-driver. Then 16 pooled connections follow (8 per each of 2 client threads), because the requested number of 7 was rounded up to the shard count. Please observe the advanced shard-awareness in action: the value in the port column determines shard_id of a pooled connection.

Summary

Shard-awareness has been present in Java, Go and Python drivers for a while, and we’re now actively developing a shard-aware Rust driver. Today we are delighted to welcome the C/C++ driver as a member of this honorable group. If you use the Datastax cpp-driver with Scylla you should switch to scylla-cpp-driver right now — it’s basically a free performance boost for your applications. And there is more to come: CDC partitioner, LWT support, RAII interface… The future will definitely bring these and a few other reasons to follow our driver!

GET THE SHARD-AWARE C/C++ DRIVER FOR SCYLLA

The post A Shard-Aware Scylla C/C++ Driver appeared first on ScyllaDB.

Zillow: Optimistic Concurrency with Write-Time Timestamps

Dan Podhola is a Principal Software Engineer at Zillow, the most-visited real estate website in the U.S. He specializes in performance tuning of high-throughput backend database services. We were fortunate to have him speak at our Scylla Summit on Optimistic Concurrency with Write-Time Timestamps. If you wish, you can watch the full presentation on-demand:

WATCH THE ZILLOW PRESENTATION NOW

Dan began by describing his team’s role at Zillow. They are responsible for processing property and listing records — what is for sale or rent — and mapping those to a common Zillow property IDs, then translating different message types into a common interchange format so their teams can talk to each other using the same type of data.

They are also responsible for deciding what’s best to display. He showed a high-level diagram of what happens when they receive a message from one of their data providers. It needs to be translated into a common output format.

“We fetch other data that we know about that property that’s also in that same format. We bundle that data together and choose a winner — I use the term ‘winner’ lightly here — and we send that bundle data out to our consumers.”

The Problem: Out-of-Order Data

The biggest problem that Zillow faced, and the reason for Dan’s talk, was that they have a highly threaded application. They receive message queues from two different data producers, and the messages can be received out of order. They cannot go backwards in time to look into the data, or, as Dan put it, “it would cause bad things to happen.” It would throw off analytics and many other systems that consume the data.

As an example, they might get a for-sale listing message that says a certain property has changed price. If they processed that sold message first, but then processed a price change which showed it was still for sale, it would create confusion.

It might require manual intervention to identify and fix, such as from a consumer complaining. “That’s not a great experience for anybody.” Especially not their analytics team.

This is the visual representation Dan showed to depict the problem. The message that was generated at time 2 has a later timestamp and should be accepted and processed. “It goes all the way through the system. At that point, we publish it. It’s golden. That’s what we want.” However an earlier message from Time 1 might be received from a backfilled queue. If it goes through the whole system and gets published, it’s already out-of-date.

“What’s the solution? You may already be thinking, ‘Well, just don’t write the older message.’ That’s a good solution and I agree with you. So I drew a diagram here that shows a decision tree that I added in the middle. It says ‘if the message is newer than the previous message then write it, otherwise don’t — just skip to the fetch.’”

Problems with Other Methods

“Now you might be wondering, ‘Well, why fetch it all? Just stop there.” If you’re grabbing a transaction lock and doing this in a SQL server type of way, that’s a good solution.

However, Dan then differentiated why you wouldn’t just implement this in the traditional SQL manner of doing things. That is, spinning up a SQL server instance, grabbing a lock and running a stored procedure that returns a result code. “Was this written? Was this not written? Then act on that information.”

The problem with the traditional SQL method, Dan pointed out, is the locking, “you don’t have the scalability of something like Scylla.” He also pointed out that one could also implement this with Scylla’s Lightweight Transactions (LWT), but those require three round-trip times of a normal transaction, and Zillow wanted to avoid them for performance reasons.

“So the second method is ‘store everything.’ That means we get every record and we store it, no matter when it was given to us. Then we have the application layer pull all the data at once. It does the filtering and then does action on it.”

However, this does lead to data bloat, “The problem with that is we have to trim data because we don’t need it all. We just didn’t want to have to deal with running a trim job or paying to store all that data we just didn’t need. Our system just needs to store the latest, most current state.”

Insert Using Timestamp

“Then finally there’s the ‘overwrite only-if-newer, but without a lock method.’”

The downside with this, Dan pointed out, is if you don’t know yet if your write has written to the database, actually committed the write, then you have to republish the results every single time.

Dan then showed the following CQL statement:

INSERT … USING TIMESTAMP = ?

“‘USING TIMESTAMP’ allows you to replace the timestamp that Scylla uses [by default] to decide which record to write, or not write, as it were. This is normally a great thing that you would want. You have a couple of nodes. Two messages come in. Those messages have different timestamps — though only the latest one will survive [based on] the consistency.”

“You can use USING TIMESTAMP to supply your own timestamp, and that’s what we’re doing here. This allows us to pick a timestamp that we decided as the best arbiter for what is written and what is not. The upside is that it solves our problem without having to use LWT or an application lock, [or a cache] like Redis.”

“The downside is that we don’t know if the record was actually written to the database. So the workaround for that is to pretend that the record always has been written.”


“We have to ensure that we get the latest documents back when we ask for all the documents. For us to do that we have to write it QUORUM and we have to read it QUORUM — for at least just these two statements. This allows us to avoid conditions where you might write just to a single node and then you fetch — and that should even be in QUORUM. If you don’t write a QUORUM you might pull from the other two nodes and get an older result set, which would mean that we could flicker and send an old result that we don’t want. QUORUM here solves that problem. We didn’t use ALL because if one of the nodes goes down we wouldn’t be able to process at all.”

Putting It All Together

“This is my message flow diagram. You can see at the top there’s a Data Service that has its own data set. It publishes to its own real time, its own backfill queue. There’s a more complicated service down at the bottom that mostly streams data in from Flink but has a manual data retrieval service that allows people to correct issues or manually request individual messages or properties. And then there’s a Flink state on the bottom that uses some magic to fill a backfill queue.”

“Another interesting thing here is that the listing processor — which is the service that my team has written — takes messages from those queues. It actually publishes them differently. It’ll publish real-time messages to a Kinesis stream and it’ll publish backfill messages to an s3 bucket.”

“We have some additional required producer actions here. These different systems operate in different ways and might theoretically have older timestamps. Backfill would be a great example. You generate a message. It goes into your backfill bucket and then you publish a real time change. Well, if we happen to be processing the real time change, and then the backfill later, that message would have an older timestamp.”

“Now if they were produced at roughly the same time and their system had different ways internally of how it generated that message then the message generation time could be wrong.

So we’ve required our producers to give us the timestamp in their messages that is the time that they fetch the data from their database. Because their database is their arbiter of what’s the most recent record that was in there that they send to us.”

“This also solves our DELETE scenario. If we were to manually DELETE a record from Scylla using one of Scylla’s timestamps, in order for us to get a fresher message on the DELETE we have to ask for a fresh message timestamp. Going to that service that requests the individual messages we would get a fresh timestamp that’s newer than the one that Scylla has for its DELETE.

Performance

“Our performance has been great,” Dan said. For Zillow’s normal real-time workloads Scylla has been ample. “We run on three i3.4xlarge’s for Scylla. Our service actually runs on a single c5.xlarge. It has 25 consumer threads. It autoscales on EBS, so if we happen to have a small spike in real time we’ll scale up to three c5.xlarge’s, turn through that data and we’ll be done. The real beauty of this is even just on three nodes [for Scylla] we can scale up to 35 of the c5.xlarge instances and we’ll process over 6,500 records per second plus the real-time workload.”

“No one will even notice that we’re processing the entirety of Zillow’s property and listings data in order to correct some data issue or change a business rule. The beauty of that is we can process the entirety of the data at Zillow that we care about in less than a business day and, again, no performance hit to real-time data.”

If you want to check out Scylla for yourself, you can download it now, or if you want to discuss your needs, feel free to contact us privately, or join our community on Slack.

WATCH THE ZILLOW PRESENTATION NOW

The post Zillow: Optimistic Concurrency with Write-Time Timestamps appeared first on ScyllaDB.

Developing an Enterprise-Level Apache Cassandra Sink Connector for Apache Pulsar

Join Apache Cassandra for Google Summer of Code 2021

I have been involved with Apache Cassandra for the past eight years, and I’m very proud to mention that my open source journey started a little more than a decade ago during my participation at the Google Summer of Code (GSoC).

GSoC is a program sponsored by Google to promote open source development, where post-secondary students submit project proposals to open source organizations. Selected students receive community mentorship and a stipend from Google to work on the project for ten weeks during the northern hemisphere summer. Over 16,000 students from 111 countries have participated so far! More details about the program can be found on the official GSoC website.

The Apache Software Foundation (ASF) has been a GSoC mentor organization since the beginning of the program 17 years ago. The ASF acts as an “umbrella” organization, which means that students can submit project proposals to any subproject within the ASF. Apache Cassandra mentored a successful GSoC project in 2016 and we are participating again this year. The application period opens on March 29, 2021 and ends on April 13, 2021. It’s a highly competitive program, so don’t wait to the last minute to prepare!

How to Get Involved

Getting Started

The best way to get started if you’re new to Apache Cassandra is to get acquainted by reading the documentation and setting up a local development environment. Play around with a locally running instance via cqlsh and nodetool to get a feel for how to use the database. If you run into problems or roadblocks during this exercise, don’t be shy to ask questions via the community channels like the developers mailing list or the #cassandra-dev channel on the ASF Slack.

GSoC Project Ideas

Once you have a basic understanding of how the project works, browse the GSoC ideas list to select ideas that you are interested in working on. Browse the codebase to identify components related to the idea you picked. You are welcome to propose other projects if they’re not on the ideas list.

Writing a proposal

Write a message on the JIRA ticket introducing yourself and demonstrating your interest in working on that particular idea. Sketch a quick proposal on how you plan to tackle the problem and share it with the community for feedback. If you don’t know where to start, don’t hesitate to ask for help!

Useful Resources

There are many good resources on the web on preparing for GSoC, particularly the ASF GSoC Guide and the Python community notes on GSoC expectations. The best GSoC students are self-motivated and proactive, and following the tips above should increase your chances of getting selected and delivering your project successfully. Good luck!

Apache Cassandra Changelog #5 | March 2021

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

We are expecting 4.0rc to be released soon, so join the Cassandra mailing list to stay up-to-date.

For the latest status on Cassandra 4.0 GA please check the Jira board (ASF login required). We are within line-of-sight to closing out beta scope, with the remaining tickets representing 2.6% of the total scope. Read the latest summary from the community here.

Proposed

The community has been discussing release cadence after 4.0 reaches GA. An official vote has not been taken on this yet, but the current consensus is one major release every year. Also under discussion are bleeding-edge snapshots (where stability is not guaranteed) and the duration of support for releases.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

We are pleased to announce that Paulo Motta has accepted the invitation to become a PMC member! This invite comes in recognition of all his contributions to the Apache Cassandra project over many years.

Added

Apache Cassandra is taking part in the Google Summer of Code (GSoC) under the ASF umbrella as a mentoring organization. We will be posting a separate blog soon detailing how post-secondary students can get involved.

Proposed

With 4.0 approaching completion, the idea of a project roadmap is also being discussed.

Changed

The Kubernetes SIG is looking at ways to invite more participants by hosting two meetings to accommodate people in different time zones. Watch here.

Cassandra Kubernetes SIG Meeting 2021-02-11

A community website dedicated to cass-operator is also in development focused on documentation for the operator. Going forward, the Kubernetes SIG is discussing release cadence and looking at six releases a year.

K8ssandra 1.0, an open source production-ready platform for running Apache Cassandra on Kubernetes, was also released on 25 February and announced on its new community website. Read the community blog to find out more and what’s next. K8ssandra now has images for Cassandra 3.11.10 and 4.0-beta4 that run rootless containers with Reaper and Medusa functions.

User Space

Instana

“The Instana components are already containerized and run in our SaaS platform, but we still needed to create containers for our databases, Clickhouse, Cassandra, etc., and set up the release pipeline for them. Most of the complexity is not in creating a container with the database running, but in the management of the configuration and how to pass it down in a maintainable way to the corresponding component.” - Instana

Flant

“We were able to successfully migrate the Cassandra database deployed in Kubernetes to another cluster while keeping the Cassandra production installation in a fully functioning state and without interfering with the operation of applications.” - Flant

Do you have a Cassandra case study to share? Email cassandra@constantia.io.

In the News

CRN: Top 10 Highest IT Salaries Based On Tech Skills In 2021: Dice

TechTarget: Microsoft ignites Apache Cassandra Azure service

Dynamic Business: 5 Ways your Business Could Benefit from Open Source Technology

TWB: Top 3 Technologies which are winning the Run in 2021

Cassandra Tutorials & More

Data Operations Guide for Apache Cassandra - Rahul Singh, Anant

Introduction to Apache Cassandra: What is Apache Cassandra - Ksolves

What’s New in Apache Cassandra 4.0 - Deepak Vohra, Techwell

Reaper 2.2 for Apache Cassandra was released - Alex Dejanovski, The Last Pickle

What’s new in Apache Zeppelin’s Cassandra interpreter - Alex Ott

Apache Cassandra Changelog Footer


Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

Developing High Performance Apache Cassandra™ Applications in Rust (Part 1)

Apache Cassandra Changelog #4 | February 2021

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

Apache Cassandra 3.0.24 (pgp, sha256 and sha512). This is a security-related release for the 3.0 series and was released on February 1. Please read the release notes.

Apache Cassandra 3.11.10 (pgp, sha256 and sha512) was also released on February 1. You will find the release notes here.

Apache Cassandra 4.0-beta4 (pgp, sha256 and sha512) is the newest version which was released on December 30. Please pay attention to the release notes and let the community know if you encounter problems with any of the currently supported versions.

Join the Cassandra mailing list to stay updated.

Changed

A vulnerability rated Important was found when using the dc or rack internode_encryption setting. More details of CVE-2020-17516 Apache Cassandra internode encryption enforcement vulnerability are available on this user thread.

Note: The mitigation for 3.11.x users requires an update to 3.11.10 not 3.11.24, as originally stated in the CVE. (For anyone who has perfected a flux capacitor, we would like to borrow it.)

The current status of Cassandra 4.0 GA can be viewed on this Jira board (ASF login required). RC is imminent with testing underway. The remaining tickets represent 3.3% of the total scope. Read the latest summary from the community here.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

Apache Cassandra will be participating in the Google Summer of Code (GSoC) under the ASF umbrella as a mentoring organization. This is a great opportunity to get involved, especially for newcomers to the Cassandra community.

We’re curating a list of JIRA tickets this month, which will be labeled as gsoc2021. This will make them visible in the Jira issue tracker for participants to see and connect with mentors.

If you would like to volunteer to be a mentor for a GSoC project, please tag the respective JIRA ticket with the mentor label. Non-committers can volunteer to be a mentor as long as there is a committer as co-mentor. Projects can be mentored by one or more co-mentors.

Thanks to Paulo Motta for proposing the idea and getting the ticket list going.

Added

Apache Zeppelin 0.9.0 was released on January 15. Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing system, which supports Apache Cassandra and others. The release notes for the Cassandra CQL Interpreter are available here.

Changed

For the GA of Apache Cassandra 4.0, any claim of support for Python 2 will be dropped from update documentation. We will also introduce a warning when running in Python 2.7. Support for Python 3 will be backported to at least 3.11, due to existing tickets, but we will undertake the work needed to make packaging and internal tooling support Python 3.

Changed

The Kubernetes SIG is discussing how to encourage more participation and to structure SIG meetings around updates on Kubernetes and Cassandra. We also intend to invite other projects (like OpenEDS, Prometheus, and others) to discuss how we can make Cassandra and Kubernetes better. As well as updates, the group discussed handling large-scale backups inside Kubernetes and using S3 APIs to store images. Watch here.

kubernetes-sig-meeting-2021-01-14

User Space

Backblaze

“Backblaze uses Apache Cassandra, a high-performance, scalable distributed database to help manage hundreds of petabytes of data.” - Andy Klein

Witfoo

Witfoo uses Cassandra for big data needs in cybersecurity operations. In response to the recent licensing changes at Elastic, Witfoo decided to blog about its journey away from Elastic to Apache Cassandra in 2019. - Witfoo.com

Do you have a Cassandra case study to share? Email cassandra@constantia.io.

In the News

The New Stack: What Is Data Management in the Kubernetes Age?

eWeek: Top Vendors of Database Management Software for 2021

Software Testing Tips and Tricks: Top 10 Big Data Tools (Big Data Analytics Tools) in 2021

InfoQ: K8ssandra: Production-Ready Platform for Running Apache Cassandra on Kubernetes

Cassandra Tutorials & More

Creating Flamegraphs with Apache Cassandra in Kubernetes (cass-operator) - Mick Semb Wever, The Last Pickle

Apache Cassandra : The Interplanetary Database - Rahul Singh, Anant

How to Install Apache Cassandra on Ubuntu 20.04 - Jeff Wilson, RoseHosting

The Impacts of Changing the Number of VNodes in Apache Cassandra - Anthony Grasso, The Last Pickle

CASSANDRA 4.0 TESTING - Charles Herring, Witfoo

Apache Cassandra Changelog Footer


Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

Security Advisory: CVE-2020-17516

Earlier this week, a vulnerability was published to the Cassandra users mailing list which describes a flaw in the way Apache Cassandra nodes perform encryption in some configurations. 

The vulnerability description is reproduced here:

“Description:

When using ‘dc’ or ‘rack’ internode_encryption setting, a Cassandra instance allows both encrypted and unencrypted connections. A misconfigured node or a malicious user can use the unencrypted connection despite not being in the same rack or dc, and bypass mutual TLS requirement.”

The recommended mitigations are:

  • Users of ALL versions should switch from ‘dc’ or ‘rack’ to ‘all’ internode_encryption setting, as they are inherently insecure
  • 3.0.x users should additionally upgrade to 3.0.24
  • 3.11.x users should additionally upgrade to 3.11.10

By default, all Cassandra clusters running in the Instaclustr environment are configured to use ‘internode_encryption’ set to all. To confirm that our clusters are unaffected by this vulnerability, Instaclustr has checked the configuration of all Cassandra clusters in our managed service fleet and none are using the vulnerable configurations ‘dc’ or ‘rack’. 

Instaclustr restricts access to Cassandra nodes to only those IP addresses and port combinations required for cluster management and customer use, further mitigating the risk of compromise. 

In line with the mitigation recommendation, Instaclustr is developing a plan to upgrade all 3.0.x and 3.11.x Cassandra clusters to 3.0.24 and 3.11.10. Customers will be advised when their clusters are due for upgrade. 

Instaclustr recommends that our Support Only customers check their configurations to ensure that they are consistent with this advice, and upgrade their clusters as necessary to maintain a good security posture. 

Should you have any questions regarding Instaclustr Security, please contact us by email security@instaclustr.com.

If you wish to discuss scheduling of the upgrade to your system or have any other questions regarding the impact of this vulnerability, please contact support@instaclustr.com


To report an active security incident, email support@instaclustr.com.

The post Security Advisory: CVE-2020-17516 appeared first on Instaclustr.

Apache Cassandra Changelog #3 | January 2021

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

Apache Cassandra 4.0-beta4 (pgp, sha256 and sha512) was released on December 30. Please pay attention to release notes and let the community know if you encounter problems. Join the Cassandra mailing list to stay updated.

Changed

The current status of Cassandra 4.0 GA can be viewed on this Jira board (ASF login required). RC is imminent with testing underway. Read the latest summary from the community here.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

The Cassandra community welcomed one new PMC member and five new committers in 2020! Congratulations to Mick Semb Wever who joined the PMC and Jordan West, David Capwell, Zhao Yang, Ekaterina Dimitrova, and Yifan Cai who accepted invitations to become Cassandra committers!

Changed

The Kubernetes SIG is discussing how to extend the group’s scope beyond the operator, as well as sharing an update on current operator merge efforts in the latest meeting. Watch here.

IApache Cassandra Kubernetes SIG Meeting Header

User Space

Keen.io

Under the covers, Keen leverages Kafka, Apache Cassandra NoSQL database and the Apache Spark analytics engine, adding a RESTful API and a number of SDKs for different languages. Keen enriches streaming data with relevant metadata and enables customers to stream enriched data to Amazon S3 or any other data store. - Keen.io

Monzo

Suhail Patel explains how Monzo prepared for the recent crowdfunding (run entirely through its app, using the very same platform that runs the bank) which saw more than 9,000 people investing in the first five minutes. He covers Monzo’s microservice architecture (on Go and Kubernetes) and how they profiled and optimized key platform components such as Cassandra and Linkerd. - Suhil Patel

In the News

ZDNet - Meet Stargate, DataStax’s GraphQL for databases. First stop - Cassandra

CIO - It’s a good day to corral data sprawl

TechTarget - Stargate API brings GraphQL to Cassandra database

ODBMS - On the Cassandra 4.0 beta release. Q&A with Ekaterina Dimitrova, Apache Cassandra Contributor

Cassandra Tutorials & More

Intro to Apache Cassandra for Data Engineers - Daniel Beach, Confessions of a Data Guy

Impacts of many columns in a Cassandra table - Alex Dejanovski, The Last Pickle

Migrating Cassandra from one Kubernetes cluster to another without data loss - Flant staff

Real-time Stream Analytics and User Scoring Using Apache Druid, Flink & Cassandra at Deep.BI - Hisham Itani, Deep.BI

User thread: Network Bandwidth and Multi-DC replication (Login required)

Apache Cassandra Changelog Footer


Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

The Instaclustr LDAP Plugin for Cassandra 2.0, 3.0, and 4.0

LDAP (Lightweight Directory Access Protocol) is a common vendor-neutral and lightweight protocol for organizing authentication of network services. Integration with LDAP allows users to unify the company’s security policies when one user or entity can log in and authenticate against a variety of services. 

There is a lot of demand from our enterprise customers to be able to authenticate to their Apache Cassandra clusters against LDAP. As the leading NoSQL database, Cassandra is typically deployed across the enterprise and needs this connectivity.

Instaclustr has previously developed our LDAP plugin to work with the latest Cassandra releases. However, with Cassandra 4.0 right around the corner, it was due for an update to ensure compatibility. Instaclustr takes a great deal of care to provide cutting-edge features and integrations for our customers, and our new LDAP plugin for Cassandra 4.0 showcases this commitment. We always use open source and maintain a number of Apache 2.0-licensed tools, and have released our LDAP plugin under the Apache 2.0 license. 

Modular Architecture to Support All Versions of Cassandra

Previously, the implementations for Cassandra 2.0 and 3.0 lived in separate branches, which resulted in some duplicative code. With our new LDAP plugin update, everything lives in one branch and we have modularized the whole solution so it aligns with earlier Cassandra versions and Cassandra 4.0.

The modularization of our LDAP plugin means that there is the “base” module that all implementations are dependent on. If you look into the codebase on GitHub, you see that the implementation modules consist of one or two classes at maximum, with the rest inherited from the base module. 

This way of organizing the project is beneficial from a long-term maintenance perspective. We no longer need to keep track of all changes and apply them to the branched code for the LDAP plugin for each Cassandra version. When we implement changes and improvements to the base module, all modules are updated automatically and benefit.

Customizable for Any LDAP Implementation

This plugin offers a default authenticator implementation for connecting to a LDAP server and authenticating a user against it. It also offers a way to implement custom logic for specific use cases. In the module, we provide the most LDAP server-agnostic implementation possible, but there is also scope for customization to meet specific LDAP server nuances. 

If the default solution needs to be modified for a particular customer use case, it is possible to add in custom logic for that particular LDAP implementation. The implementation for customized connections is found in “LDAPPasswordRetriever” (DefaultLDAPServer being the default implementation from which one might extend and override appropriate methods). This is possible thanks to the SPI mechanism. If you need this functionality you can read more about it in the relevant section of our documentation.

Enhanced Testing for Reliability

Our GitHub build pipeline now tests the LDAP plugins for each supported Cassandra version on each merged commit. This update provides integration tests that will spin up a standalone Cassandra node as part of JUnit tests as well as a LDAP server. This is started in Docker as part of a Maven build before the actual JUnit tests. 

This testing framework enables us to test to make sure that any changes don’t break the authentication mechanism. This is achieved by actually logging in via the usual mechanism as well as via LDAP.

Packaged for Your Operating System

Last but not least, we have now added Debian and RPM packages with our plugin for each Cassandra version release. Until now, a user of this plugin had to install the JAR file to Cassandra libraries directory manually. With the introduction of these packages, you do not need to perform this manual action anymore. The plugin’s JAR along with the configuration file will be installed in the right place if the official Debian or RPM Cassandra package is installed too.

How to Configure LDAP for Cassandra

In this section we will walk you through the setup of the LDAP plugin and explain the most crucial parts of how the plugin works.

After placing the LDAP plugin JAR to Cassandra’s classpath—either by copying it over manually or by installing a package—you will need to modify a configuration file in /etc/cassandra/ldap.properties.

There are also changes that need to be applied to cassandra.yaml. For Cassandra 4.0, please be sure that your authenticator, authorizer, and role_manager are configured as follows:

authenticator: LDAPAuthenticator
authorizer: CassandraAuthorizer
role_manager: LDAPCassandraRoleManager

Before using this plugin, an operator of a Cassandra cluster should configure system_auth keyspace to use NetworkTopologyStrategy.

How the LDAP Plugin Works With Cassandra Roles

LDAP plugin works via a “dual authentication” technique. If a user tries to log in with a role that already exists in Cassandra, separate from LDAP, it will authenticate against that role. However, if that role is not present in Cassandra, it will reach out to the LDAP server and it will try to authenticate against it. If it is successful, from the user’s point of view, it looks like this role was in Cassandra the whole time as it logs in the user transparently. 

If your LDAP server is down, you will not be able to authenticate with the specified LDAP user. You can enable caching for LDAP users—available in the Cassandra 3.0 or 4.0 plugins—to take some load off a LDAP server when authentication is conducted frequently.

The Bottom Line

Our LDAP plugin meets the enterprise need for a consolidated security and authentication policy. 100% open source and supporting all major versions of Cassandra, the plugin works with all major LDAP implementations and can be easily customized for others. 

The plugin is part of our suite of supported tools for our support customers and Instaclustr is committed to actively maintaining and developing the plugin. Our work updating it to support the upcoming Cassandra 4.0 release is part of this commitment. You can download it here and feel free to get in touch with any questions you might have. Cassandra 4.0 beta 2 is currently in preview on our managed platform and you can use our free trial to check it out.

The post The Instaclustr LDAP Plugin for Cassandra 2.0, 3.0, and 4.0 appeared first on Instaclustr.

Apache Cassandra Changelog #2 | December 2020

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

Apache #Cassandra 4.0-beta3, 3.11.9, 3.0.23, and 2.2.19 were released on November 4 and are in the repositories. Please pay attention to release notes and let the community know if you encounter problems. Join the Cassandra mailing list to stay updated.

Changed

Cassandra 4.0 is progressing toward GA. There are 1,390 total tickets and remaining tickets represent 5.5% of total scope. Read the full summary shared to the dev mailing list and take a look at the open tickets that need reviewers.

Cassandra 4.0 will be dropping support for older distributions of CentOS 5, Debian 4, and Ubuntu 7.10. Learn more.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

The community weighed options to address reads inconsistencies for Compact Storage as noted in ticket CASSANDRA-16217 (committed). The conversation continues in ticket CASSANDRA-16226 with the aim of ensuring there are no huge performance regressions for common queries when you upgrade from 2.x to 3.0 with Compact Storage tables or drop it from a table on 3.0+.

Added

CASSANDRA-16222 is a Spark library that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory, one can achieve high performance with minimal impact to a production cluster. It was used to successfully export a 32TB Cassandra table (46bn CQL rows) to HDFS in Parquet format in around 70 minutes, a 20x improvement on previous solutions.

Changed

Great news for CEP-2: Kubernetes Operator, the community has agreed to create a community-based operator by merging the cass-operator and CassKop. The work being done can be viewed on GitHub here.

Released

The Reaper community announced v2.1 of its tool that schedules and orchestrates repairs of Apache Cassandra clusters. Read the docs.

Released

Apache Cassandra 4.0-beta-1 was released on FreeBSD.

User Space

Netflix

“With these optimized Cassandra clusters in place, it now costs us 71% less to operate clusters and we could store 35x more data than our previous configuration.” - Maulik Pandey

Yelp

“Cassandra is a distributed wide-column NoSQL datastore and is used at Yelp for both primary and derived data. Yelp’s infrastructure for Cassandra has been deployed on AWS EC2 and ASG (Autoscaling Group) for a while now. Each Cassandra cluster in production spans multiple AWS regions.” - Raghavendra D Prabhu

In the News

DevPro Journal - What’s included in the Cassandra 4.0 Release?

JAXenter - Moving to cloud-native applications and data with Kubernetes and Apache Cassandra

DZone - Improving Apache Cassandra’s Front Door and Backpressure

ApacheCon - Building Apache Cassandra 4.0: behind the scenes

Cassandra Tutorials & More

Users in search of a tool for scheduling backups and performing restores with cloud storage support (archiving to AWS S3, GCS, etc) should consider Cassandra Medusa.

Apache Cassandra Deployment on OpenEBS and Monitoring on Kubera - Abhishek Raj, MayaData

Lucene Based Indexes on Cassandra - Rahul Singh, Anant

How Netflix Manages Version Upgrades of Cassandra at Scale - Sumanth Pasupuleti, Netflix

Impacts of many tables in a Cassandra data model - Alex Dejanovski, The Last Pickle

Cassandra Upgrade in production : Strategies and Best Practices - Laxmikant Upadhyay, American Express

Apache Cassandra Collections and Tombstones - Jeremy Hanna

Spark + Cassandra, All You Need to Know: Tips and Optimizations - Javier Ramos, ITNext

How to install the Apache Cassandra NoSQL database server on Ubuntu 20.04 - Jack Wallen, TechRepublic

How to deploy Cassandra on Openshift and open it up to remote connections - Sindhu Murugavel

Apache Cassandra Changelog Footer

Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

Apache Cassandra Changelog #1 (October 2020)

Introducing the first Cassandra Changelog blog! Our monthly roundup of key activities and knowledge to keep the community informed.

Cassandra Changelog header

Release Notes

Updated

The most current Apache Cassandra releases are 4.0-beta2, 3.11.8, 3.0.22, 2.2.18 and 2.1.22 released on August 31 and are in the repositories. The next cut of releases will be out soon―join the Cassandra mailing list to stay up-to-date.

We continue to make progress toward the 4.0 GA release with the overarching goal of it being at a state where major users should feel confident running it in production when it is cut. Over 1,300 Jira tickets have been closed and less than 100 remain as of this post. To gain this confidence, there are various ongoing testing efforts involving correctness, performance, and ease of use.

Added

With CASSANDRA-15013, the community improved Cassandra's ability to handle high throughput workloads, while having enough safeguards in place to protect itself from potentially going out of memory.

Added

The Harry project is a fuzz testing tool that aims to generate reproducible workloads that are as close to real-life as possible, while being able to efficiently verify the cluster state against the model without pausing the workload itself.

Added

The community published its first Apache Cassandra Usage Report 2020 detailing findings from a comprehensive global survey of 901 practitioners on Cassandra usage to provide a baseline understanding of who, how, and why organizations use Cassandra.

Community Notes

Updates on new and active Cassandra Enhancement Proposals (CEPs) and how to contribute.

Changed

CEP-2: Kubernetes Operator was introduced this year and is an active discussion on creation of a community-based operator with the goal of making it easy to run Cassandra on Kubernetes.

Added

CEP-7: Storage Attached Index (SAI) is a new secondary index for Cassandra that builds on the advancements made with SASI. It is intended to replace the existing built-in secondary index implementations.

Added

Cassandra was selected by the ASF Diversity & Inclusion committee to be included in a research project to evaluate and understand the current state of diversity.

User Space

Bigmate

"In vetting MySQL, MongoDB, and other potential databases for IoT scale, we found they couldn't match the scalability we could get with open source Apache Cassandra. Cassandra's built-for-scale architecture enables us to handle millions of operations or concurrent users each second with ease – making it ideal for IoT deployments." - Brett Orr

Bloomberg

"Our group is working on a multi-year build, creating a new Index Construction Platform to handle the daily production of the Bloomberg Barclays fixed income indices. This involves building and productionizing an Apache Solr-backed search platform to handle thousands of searches per minute, an Apache Cassandra back-end database to store millions of data points per day, and a distributed computational engine to handle millions of computations daily." - Noel Gunasekar

In the News

Solutions Review - The Five Best Apache Cassandra Books on Our Reading List

ZDNet - What Cassandra users think of their NoSQL DBMS

Datanami - Cassandra Adoption Correlates with Experience

Container Journal - 5 to 1: An Overview of Apache Cassandra Kubernetes Operators

Datanami - Cassandra Gets Monitoring, Performance Upgrades

ZDNet - Faster than ever, Apache Cassandra 4.0 beta is on its way

Cassandra Tutorials & More

A Cassandra user was in search of a tool to perform schema DDL upgrades. Another user suggested https://github.com/patka/cassandra-migration to ensure you don't get schema mismatches if running multiple upgrade statements in one migration. See the full email on the user mailing list for other recommended tools.

Start using virtual tables in Apache Cassandra 4.0 - Ben Bromhead, Instaclustr

Benchmarking Apache Cassandra with Rust - Piotr Kołaczkowski, DataStax

Open Source BI Tools and Cassandra - Arpan Patel, Anant Corporation

Build Fault Tolerant Applications With Cassandra API for Azure Cosmos DB - Abhishek Gupta, Microsoft

Understanding Data Modifications in Cassandra - Sameer Shukla, Redgate

Cassandra Changelog footer

Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.

Building a Low-Latency Distributed Stock Broker Application: Part 4

In the fourth blog of the  “Around the World ” series we built a prototype of the application, designed to run in two georegions.

Recently I re-watched “Star Trek: The Motion Picture” (The original 1979 Star Trek Film). I’d forgotten how much like “2001: A Space Odyssey” the vibe was (a drawn out quest to encounter a distant, but rapidly approaching very powerful and dangerous alien called “V’ger”), and also that the “alien” was originally from Earth and returning in search of “the Creator”—V’ger was actually a seriously upgraded Voyager spacecraft!

Star Trek: The Motion Picture (V’ger)

The original Voyager 1 and 2 had only been recently launched when the movie came out, and were responsible for some remarkable discoveries (including the famous “Death Star” image of Saturn’s moon Mimas, taken in 1980). Voyager 1 has now traveled further than any other human artefact and has since left the solar system! It’s amazing that after 40+ years it’s still working, although communicating with it now takes a whopping 40 hours in round-trip latency (which happens via the Deep Space Network—one of the stations is at Tidbinbilla, near our Canberra office).

Canberra Deep Space Communication Complex (CSIRO)

Luckily we are only interested in traveling “Around the World” in this blog series, so the latency challenges we face in deploying a globally distributed stock trading application are substantially less than the 40 hours latency to outer space and back. In Part 4 of this blog series we catch up with Phileas Fogg and Passepartout in their journey, and explore the initial results from a prototype application deployed in two locations, Sydney and North Virginia.

1. The Story So Far

In Part 1 of this blog series we built a map of the world to take into account inter-region AWS latencies, and identified some “georegions” that enabled sub 100ms latency between AWS regions within the same georegion. In Part 2 we conducted some experiments to understand how to configure multi-DC Cassandra clusters and use java clients, and measured latencies from Sydney to North Virginia. In Part 3 we explored the design for a globally distributed stock broker application, and built a simulation and got some indicative latency predictions. 

The goal of the application is to ensure that stock trades are done as close as possible to the stock exchanges the stocks are listed on, to reduce the latency between receiving a stock ticker update, checking the conditions for a stock order, and initiating a trade if the conditions are met. 

2. The Prototype

For this blog I built a prototype of the application, designed to run in two georegions, Australia and the USA. We are initially only trading stocks available from stock exchanges in these two georegions, and orders for stocks traded in New York will be directed to a Broker deployed in the AWS North Virginia region, and orders for stocks traded in Sydney will be directed to a Broker deployed in the AWS Sydney region. As this Google Earth image shows, they are close to being antipodes (diametrically opposite each other, at 15,500 km apart, so pretty much satisfy the definition of  traveling “Around the World”), with a measured inter-region latency (from blog 2) of 230ms.

Sydney to North Virginia (George Washington’s Mount Vernon house)

The design of the initial (simulated) version of the application was changed in a couple of ways to: 

  • Ensure that it worked correctly when instances of the Broker were deployed in multiple AWS regions
  • Measure actual rather than simulated latencies, and
  • Use a multi-DC Cassandra Cluster. 

The prototype isn’t a complete implementation of the application yet. In particular it only uses a single Cassandra table for Orders—to ensure that Orders are made available in both georegions, and can be matched against incoming stock tickers by the Broker deployed in that region. 

Some other parts of the application are “stubs”, including the Stock Ticker component (which will eventually use Kafka), and checks/updates of Holdings and generation of Transactions records (which will eventually also be Cassandra tables). Currently only the asynchronous, non-market, order types are implemented (Limit and Stop orders), as I realized that implementing Market orders (which are required to be traded immediately) using a Cassandra table would result in too many tombstones being produced—as each order is deleted immediately upon being filled (rather than being marked as “filled” in the original design, to enable the Cassandra query to quickly find unfilled orders and prevent the Orders table from growing indefinitely). However, for non-market orders it is a reasonable design choice to use a Cassandra table, as Orders may exist for extended periods of time (hours, days, or even weeks) before being traded, and as there is only a small number of successful trades (10s to 100s) per second relative to the total number of waiting Orders (potentially millions), the number of deletions, and therefore tombstones, will be acceptable. 

We now have a look at the details of some of the more significant changes that were made to the application.

Cassandra

I created a multi-DC Cassandra keyspace as follows:

CREATE KEYSPACE broker WITH replication = {'class': 'NetworkTopologyStrategy', 'NorthVirginiaDC': '3', 'SydneyDC': '3'};

In the Cassandra Java driver, the application.conf file determines which Data Center the Java client connects to. For example, to connect to the SydneyDC the file has the following settings:

datastax-java-driver {
    basic.contact-points = [
        "1.2.3.4:9042"
    ]

    basic.load-balancing-policy {
        class = DefaultLoadBalancingPolicy
        local-datacenter = "SydneyDC"
    }
}

The Orders table was created as follows (note that I couldn’t use “limit” for a column name as it’s a reserved word in Cassandra!):

CREATE TABLE broker.orders (
    symbol text,
    orderid text,
    buyorsell text,
    customerid text,
    limitthreshold double,
    location text,
    ordertype text,
    quantity bigint,
    starttime bigint,
    PRIMARY KEY (symbol, orderid)
);

For the primary key, the partition key is the stock symbol, so that all outstanding orders for a stock can be found when each stock ticker is received by the Broker, and the clustering column is the (unique) orderid, so that multiple orders for the same symbol can be written and read, and a specific order (for an orderid) can be deleted. In a production environment using a single stock symbol partition may result in skewed and unbounded partitions which is not recommended.

The prepared statements for creating, reading, and deleting orders are as follows:

PreparedStatement prepared_insert = Cassandra.session.prepare(
"insert into broker.orders (symbol, orderid, buyorsell, customerid, limitthreshold, location, ordertype, quantity, starttime) values (?, ?, ?, ?, ?, ?, ?, ?, ?)");
                 
PreparedStatement prepared_select = Cassandra.session.prepare(
        "select * from broker.orders where symbol = ?");

PreparedStatement prepared_delete = Cassandra.session.prepare(
        "delete from broker.orders where symbol = ? and orderid = ?");

I implemented a simplified “Place limit or stop order” operation (see Part 3), which uses the prepared_insert statement to create each new order, initially in the Cassandra Data Center local to the Broker where the order was created from, which is then automatically replicated in the other Cassandra Data Center. I also implemented the “Trade Matching Order” operation (Part 3), which uses the prepared_select statement to query orders matching each incoming Stock Ticker, checks the rules, and then if a trade is filled deletes the order.

Deployment

I created a 3 node Cassandra cluster in the Sydney AWS region, and then added another identical Data Center in the North Virginia AWS regions using Instaclustr Managed Cassandra for AWS. This gave me 6 nodes in total, running on t3.small instances (5 GB SSD, 2GB RAM, 2 CPU Cores). This is a small developer sized cluster, but is adequate for a prototype, and very affordable (2 cents an hour per node for AWS costs) given that the Brokers are currently only single threaded so don’t produce much load. We’re more interested in latency at this point of the experiment, and we may want to increase the number of Data Centers in the future. I also spun up an EC2 instance (t3a.micro) in the same AWS regions, and deployed an instance of the Stock Broker on each (it only used 20% CPU). Here’s what the complete deployment looks like:

3. The Results

For the prototype, the focus was on demonstrating that the design goal of minimizing latency for trading stop and limit orders (asynchronous trades) was achieved. For the prototype, the latency for these order types is measured from the time of receiving a Stock Ticker, to the time an Order is filled. We ran a Broker in each AWS region concurrently for an hour, with the same workload for each, and measured the average and maximum latencies. For the first configuration, each Broker is connected to its local Cassandra Data Center, which is how it would be configured in practice. The results were encouraging, with an average latency of 3ms, and a maximum of 60ms, as shown in this graph.  

During the run, across both Brokers, 300 new orders were created each second, 600 stock tickers were received each second, and 200 trades were carried out each second. 

Given that I hadn’t implemented Market Orders yet, I wondered how I could configure and approximately measure the expected latency for these synchronous order types between different regions (i.e. Sydney and North Virginia)? The latency for Market orders in the same region will be comparable to the non-market orders. The solution turned out to be simple— just re-configure the Brokers to use the remote Cassandra Data Center, which introduces the inter-region round-trip latency which would also be encountered with Market Orders placed on one region and traded immediately in the other region. I could also have achieved a similar result by changing the consistency level to EACH_QUOROM (which requires a majority of nodes in each data center to respond). Not surprisingly, the latencies were higher, rising to 360ms average, and 1200ms maximum, as shown in this graph with both configurations (Stop and Limit Orders on the left, and Market Orders on the right):

So our initial experiments are a success, and validate the primary design goal, as asynchronous stop and limit Orders can be traded with low latency from the Broker nearest the relevant stock exchanges, while synchronous Market Orders will take significantly longer due to inter-region latency. 

Write Amplification

I wondered what else can be learned from running this experiment? We can understand more about resource utilization in multi-DC Cassandra clusters. Using the Instaclustr Cassandra Console, I monitored the CPU Utilization on each of the nodes in the cluster, initially with only one Data Center and one Broker, and then with two Data Centers and a single Broker, and then both Brokers running. It turns out that the read load results in 20% CPU Utilization on each node in the local Cassandra Data Center, and the write load also results in 20% locally.  Thus, for a single Data Center cluster the total load is 40% CPU. However, with two Data Centers things get more complex due to the replication of the local write loads to each other Data Center. This is also called “Write Amplification”.

The following table shows the measured total load for 1 and 2 Data Centers, and predicted load for up to 8 Data Centers, showing that for more than 3 Data Centers you need bigger nodes (or bigger clusters). A four CPU Core node instance type would be adequate for 7 Data Centers, and would result in about 80% CPU Utilization.  

  Number of Data  Centres Local Read Load Local Write Load Remote Write Load Total Write Load Total Load
  1 20 20 0 20 40
  2 20 20 20 40 60
  3 20 20 40 60 80
  4 20 20 60 80 100
  5 20 20 80 100 120
  6 20 20 100 120 140
  7 20 20 120 140 160
  8 20 20 140 160 180

Costs

The total cost to run the prototype includes the Instaclustr Managed Cassandra nodes (3 nodes per Data Center x 2 Data Centers = 6 nodes), the two AWS EC2 Broker instances, and the data transfer between regions (AWS only charges for data out of a region, not in, but the prices vary depending on the source region). For example, data transfer out of North Virginia is 2 cents/GB, but Sydney is more expensive at 9.8 cents/GB. I computed the total monthly operating cost to be $361 for this configuration, broken down into $337/month (93%) for Cassandra and EC2 instances, and $24/month (7%) for data transfer, to process around 500 million stock trades. Note that this is only a small prototype configuration, but can easily be scaled for higher throughputs (with incrementally and proportionally increasing costs).

Conclusions

In this blog we built and experimented with a prototype of the globally distributed stock broker application, focussing on testing the multi-DC Cassandra part of the system which enabled us to significantly reduce the impact of planetary scale latencies (from seconds to low milliseconds) and ensure greater redundancy (across multiple AWS regions), for the real-time stock trading function. Some parts of the application remain as stubs, and in future blogs I aim to replace them with suitable functionality (e.g. streaming, analytics) and non-functionality (e.g. failover) from a selection of Kafka, Elasticsearch and maybe even Redis!

The post Building a Low-Latency Distributed Stock Broker Application: Part 4 appeared first on Instaclustr.

Understanding the Impacts of the Native Transport Requests Change Introduced in Cassandra 3.11.5

Summary

Recently, Cassandra made changes to the Native Transport Requests (NTR) queue behaviour. Through our performance testing, we found the new NTR change to be good for clusters that have a constant load causing the NTR queue to block. Under the new mechanism the queue no longer blocks, but throttles the load based on queue size setting, which by default is 10% of the heap.

Compared to the Native Transport Requests queue length limit, this improves how Cassandra handles traffic when queue capacity is reached. The “back pressure” mechanism more gracefully handles the overloaded NTR queue, resulting in a significant lift of operations without clients timing out. In summary, clusters with later versions of Cassandra can handle more load before hitting hard limits.

Introduction

At Instaclustr, we are responsible for managing the Cassandra versions that we release to the public. This involves performing a review of Cassandra release changes, followed by performance testing. In cases where major changes have been made in the behaviour of Cassandra, further research is required. So without further delay let’s introduce the change to be investigated.

Change:
  • Prevent client requests from blocking on executor task queue (CASSANDRA-15013)
Versions affected:

Background

Native Transport Requests

Native transport requests (NTR) are any requests made via the CQL Native Protocol. CQL Native Protocol is the way the Cassandra driver communicates with the server. This includes all reads, writes, schema changes, etc. There are a limited number of threads available to process incoming requests. When all threads are in use, some requests wait in a queue (pending). If the queue fills up, some requests are silently rejected (blocked). The server never replies, so this eventually causes a client-side timeout. The main way to prevent blocked native transport requests is to throttle load, so the requests are performed over a longer period.

Prior to 3.11.5

Prior to 3.11.5, Cassandra used the following configuration settings to set the size and throughput of the queue:

  • native_transport_max_threads is used to set the maximum threads for handling requests.  Each thread pulls requests from the NTR queue.
  • cassandra.max_queued_native_transport_requests is used to set queue size. Once the queue is full the Netty threads are blocked waiting for the queue to have free space (default 128).

Once the NTR queue is full requests from all clients are not accepted. There is no strict ordering by which blocked Netty threads will process requests. Therefore in 3.11.4 latency becomes random once all Netty threads are blocked.

 Figure 1: Native Transport Request Overview
Figure 1: Native Transport Request Overview

Change After 3.11.5

In 3.11.5 and above, instead of blocking the NTR queue as previously described, it throttles. The NTR queue is throttled based on the heap size. The native transport requests are limited in terms of total size occupied in memory rather than the number of them. Requests are paused after the queue is full.

  • native_transport_max_concurrent_requests_in_bytes a global limit on the number of NTR requests, measured in bytes. (default heapSize / 10)
  • native_transport_max_concurrent_requests_in_bytes_per_ip an endpoint limit on the number of NTR requests, measured in bytes. (default heapSize / 40)

Maxed Queue Behaviour

From previously conducted performance testing of 3.11.4 and 3.11.6 we noticed similar behaviour when the traffic pressure has not yet reached the point of saturation in the NTR queue. In this section, we will discuss the expected behaviour when saturation does occur and breaking point is reached. 

In 3.11.4, when the queue has been maxed, client requests will be refused. For example, when trying to make a connection via cqlsh, it will yield an error, see Figure 2.

Figure 2: Timed out request
Figure 2: Timed out request

Or on the client that tries to run a query, you may see NoHostAvailableException

Where a 3.11.4 cluster previously got blocked NTRs, when upgraded to 3.11.6 NTRs are no longer blocked. The reason is that 3.11.6 doesn’t place a limit on the number of NTRs but rather on the size of memory of all those NTRs. Thus when the new size limit is reached, NTRs are paused. Default settings in 3.11.6 result in a much larger NTR queue in comparison to the small 128 limit in 3.11.4 (in normal situations where the payload size would not be extremely large).

Benchmarking Setup

This testing procedure requires the NTR queue on a cluster to be at max capacity with enough load to start blocking requests at a constant rate. In order to do this we used multiple test boxes to stress the cluster. This was achieved by using 12 active boxes to create multiple client connections to the test cluster. Once the cluster NTR queue is in constant contention, we monitored the performance using:

  • Client metrics: requests per second, latency from client perspective
  • NTR Queue metrics: Active Tasks, Pending Tasks, Currently Blocked Tasks, and Paused Connections.

For testing purposes we used two testing clusters with details provided in the table below:

Cassandra Cluster size Instance Type Cores RAM Disk
3.11.4 3 M5xl-1600-v2  4 16GB 1600 GB
3.11.6 3 m5xl-1600-v2 4 16GB 1600 GB
Table 1: Cluster Details

To simplify the setup we disabled encryption and authentication. Multiple test instances were set up in the same region as the clusters. For testing purposes, we used 12 KB blob payloads. To give each cluster node a balanced mixed load, we kept the number of test boxes generating write load equal to the number of instances generating read load. We ran the load against the cluster for 10 mins to temporarily saturate the queue with reading and write requests and cause contention for the Netty threads.

Our test script used Cassandra-stress for generating the load, you can also refer to Deep Diving Cassandra-stress – Part 3 (Using YAML Profiles) for more information.

In the stressSpec.yaml, we used the following table definition and queries:

table_definition: |
 CREATE TABLE typestest (
       name text,
       choice boolean,
       date timestamp,
       address inet,
       dbl double,
       lval bigint,
               ival int,
       uid timeuuid,
       value blob,
       PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
 ) WITH compaction = { 'class':'LeveledCompactionStrategy' }
   AND comment='A table of many types to test wide rows'
 
columnspec:
 - name: name
   size: fixed(48)
   population: uniform(1..1000000000) # the range of unique values to select for the field 
 - name: date
   cluster: uniform(20..1000)
 - name: lval
   population: gaussian(1..1000)
   cluster: uniform(1..4)
 - name: value
   size: fixed(12000)
 
insert:
 partitions: fixed(1)       # number of unique partitions to update in a single operation
                                 # if batchcount > 1, multiple batches will be used but all partitions will
                                 # occur in all batches (unless they finish early); only the row counts will vary
 batchtype: UNLOGGED               # type of batch to use
 select: uniform(1..10)/10       # uniform chance any single generated CQL row will be visited in a partition;
                                 # generated for each partition independently, each time we visit it
 
#
# List of queries to run against the schema
#
queries:
  simple1:
     cql: select * from typestest where name = ? and choice = ? LIMIT 1
     fields: samerow             # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
  range1:
     cql: select name, choice, uid  from typestest where name = ? and choice = ? and date >= ? LIMIT 10
     fields: multirow            # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
  simple2:
     cql: select name, choice, uid from typestest where name = ? and choice = ? LIMIT 1
     fields: samerow             # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

Write loads were generated with:

cassandra-stress user no-warmup 'ops(insert=10)' profile=stressSpec.yaml cl=QUORUM duration=10m -mode native cql3 maxPending=32768 connectionsPerHost=40 -rate threads=2048 -node file=node_list.txt

Read loads were generated by changing ops to

ops(simple1=10,range1=1)'

Comparison

3.11.4 Queue Saturation Test

The active NTR queue reached max capacity (at 128) and remained in contention under load. Pending NTR tasks remained above 128 throughout. At this point, timeouts were occurring when running 12 load instances to stress the cluster. Each node had 2 load instances performing reads and another 2 performing writes. 4 of the read load instances constantly logged NoHostAvailableExceptions as shown in the example below.

ERROR 04:26:42,542 [Control connection] Cannot connect to any host, scheduling retry in 1000 milliseconds
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ec2-18-211-4-255.compute-1.amazonaws.com/18.211.4.255:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-18-211-4-255.compute-1.amazonaws.com/18.211.4.255] Timed out waiting for server response), ec2-35-170-231-79.compute-1.amazonaws.com/35.170.231.79:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-35-170-231-79.compute-1.amazonaws.com/35.170.231.79] Timed out waiting for server response), ec2-35-168-69-19.compute-1.amazonaws.com/35.168.69.19:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-35-168-69-19.compute-1.amazonaws.com/35.168.69.19] Timed out waiting for server response))

The client results we got from this stress run are shown in Table 2.

Box Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
1 700.00 2,862.20 2,078.30 7,977.60 11,291.10 19,495.10 34,426.80
2 651.00 3,054.50 2,319.50 8,048.90 11,525.90 19,528.70 32,950.50
3 620.00 3,200.90 2,426.40 8,409.60 12,599.70 20,367.50 34,158.40
4 607.00 3,312.80 2,621.40 8,304.70 11,769.20 19,730.00 31,977.40
5 568.00 3,529.80 3,011.50 8,216.60 11,618.20 19,260.20 32,698.80
6 553.00 3,627.10 3,028.30 8,631.90 12,918.50 20,115.90 34,292.60
Writes 3,699.00 3,264.55 2,580.90 8,264.88 11,953.77 19,749.57 34,426.80
7 469.00 4,296.50 3,839.90 9,101.60 14,831.10 21,290.30 35,634.80
8 484.00 4,221.50 3,808.40 8,925.50 11,760.80 20,468.20 34,863.10
9 Crashed due to time out
10 Crashed due to time out
11 Crashed due to time out
12 Crashed due to time out
Reads 953.00 4,259.00 3,824.15 9,092.80 14,800.40 21,289.48 35,634.80
Summary 4,652.00 3,761.78 3,202.53 8,678.84 13,377.08 20,519.52 35,634.80
Table 2: 3.11.4 Mixed Load Saturating The NTR Queue

* To calculate the total write operations, we summed the values from 6 instances. For max write latency we used the max value from all instances and for the rest of latency values, we calculated the average of results. Write results are summarised in the Table 2 “Write” row. For the read result we did the same, and results are recorded in the “Read” row. The last row in the table summarises the results in “Write” and “Read” rows.

The 6 write load instances finished normally, but the read instances struggled. Only 2 of the read load instances were able to send traffic through normally, the other clients received too many timeout errors causing them to crash. Another observation we have made is that the Cassandra timeout metrics, under client-request-metrics, did not capture any of the client timeout we have observed.

Same Load on 3.11.6

Next, we proceeded to test 3.11.6 with the same load. Using the default NTR settings, all test instances were able to finish the stress test successfully.

Box Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
1 677.00 2,992.60 2,715.80 7,868.50 9,303.00 9,957.30 10,510.90
2 658.00 3,080.20 2,770.30 7,918.80 9,319.70 10,116.70 10,510.90
3 653.00 3,102.80 2,785.00 7,939.80 9,353.30 10,116.70 10,510.90
4 608.00 3,340.90 3,028.30 8,057.30 9,386.90 10,192.20 10,502.50
5 639.00 3,178.30 2,868.90 7,994.30 9,370.10 10,116.70 10,510.90
6 650.00 3,120.50 2,799.70 7,952.40 9,353.30 10,116.70 10,510.90
Writes 3,885.00 3,135.88 2,828.00 7,955.18 9,347.72 10,102.72 10,510.90
7 755.00 2,677.70 2,468.30 7,923.00 9,378.50 9,982.40 10,762.60
8 640.00 3,160.70 2,812.30 8,132.80 9,529.50 10,418.70 11,031.00
9 592.00 3,427.60 3,101.70 8,262.80 9,579.80 10,452.20 11,005.90
10 583.00 3,483.00 3,160.40 8,279.60 9,579.80 10,435.40 11,022.60
11 582.00 3,503.60 3,181.40 8,287.90 9,588.20 10,469.00 11,047.80
12 582.00 3,506.70 3,181.40 8,279.60 9,588.20 10,460.60 11,014.20
Reads 3,734.00 3,293.22 2,984.25 8,194.28 9,540.67 10,369.72 11,047.80
Summary 7,619.00 3,214.55 2,906.13 8,074.73 9,444.19 10,236.22 11,047.80
Table 3: 3.11.6 Mixed Load

Default Native Transport Requests (NTR) Setting Comparison

Taking the summary row from both versions (Table 2 and Table 3), we produced Table 4.


Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
3.11.4 4652 3761.775 3202.525 8678.839167 13377.08183 20519.52228 35634.8
3.11.6 7619 3214.55 2906.125 8074.733333 9444.191667 10236.21667 11047.8
Table 4: Mixed Load 3.11.4 vs 3.11.6
Figure 3: Latency 3.11.4 vs 3.11.6
Figure 2: Latency 3.11.4 vs 3.11.6

Figure 2 shows the latencies from Table 4. From the results, 3.11.6 had slightly better average latency than 3.11.4. Furthermore, in the worst case where contention is high, 3.11.6 handled the latency of a request better than 3.11.4. This is shown by the difference in Latency Max. Not only did 3.11.6 have lower latency but it was able to process many more requests due to not having a blocked queue.

3.11.6 Queue Saturation Test

The default native_transport_max_concurrent_requests_in_bytes is set to 1/10 of the heap size. The Cassandra max heap size of our cluster is 8 GB, so the default queue size for our queue is 0.8 GB. This turns out to be too large for this cluster size, as this configuration will run into CPU and other bottlenecks before we hit NTR saturation.

So we took the reverse approach to investigate full queue behaviour, which is setting the queue size to a lower number. In cassandra.yaml, we added:

native_transport_max_concurrent_requests_in_bytes: 1000000

This means we set the global queue size to be throttled at 1MB. Once Cassandra was restarted and all nodes were online with the new settings, we ran the same mixed load on this cluster, the results we got are shown in Table 5.

3.11.6 Op rate (op/s) Latency mean (ms) Latency median (ms) Latency 95th percentile (ms) latency 99th percentile (ms) Latency 99.9th percentile (ms) Latency max (ms)
Write: Default setting 3,885.00 3,135.88 2,828.00 7,955.18 9,347.72 10,102.72 10,510.90
Write: 1MB setting 2,105.00 5,749.13 3,471.82 16,924.02 26,172.45 29,681.68 31,105.00
Read: Default setting 3,734.00 3,293.22 2,984.25 8,194.28 9,540.67 10,369.72 11,047.80
Read: 1MB setting 5,395.00 2,263.13 1,864.55 5,176.47 8,074.73 9,693.03 15,183.40
Summary: Default setting 7,619.00 3,214.55 2,906.13 8,074.73 9,444.19 10,236.22 11,047.80
Summary: 1MB setting 7,500.00 4,006.13 2,668.18 11,050.24 17,123.59 19,687.36 31,105.00

Table 5: 3.11.6 native_transport_max_concurrent_requests_in_bytes default and 1MB setting 

During the test, we observed a lot of paused connections and discarded requests—see Figure 3. For a full list of Instaclustr exposed metrics see our support documentation.

NTR Test - Paused Connections and Discarded Requests
Figure 3: 3.11.6 Paused Connections and Discarded Requests

After setting native_transport_max_concurrent_requests_in_bytes to a lower number, we start to get paused connections and discarded requests, write latency increased resulting in fewer processed operations, shown in Table 5. The increased write latency is illustrated Figure 4.

Cassandra 3.11.6 Write Latency Under Different Settings
Figure 4: 3.11.6 Write Latency Under Different Settings

On the other hand, read latency decreased, see Figure 5, resulting in a higher number of operations being processed.

Cassandra 3.11.6 Read Latency Under Different Settings
Figure 5: 3.11.6 Read Latency Under Different Settings
Cassandra 3.11.6 Operations Rate Under Different Settings
Figure 6: 3.11.6 Operations Rate Under Different Settings

As illustrated in Figure 6, the total number of operations decreased slightly with the 1MB setting, but the difference is very small and the effect of read and write almost “cancel each other out”. However, when we look at each type of operation individually, we can see that rather than getting equal share of the channel in a default setting of “almost unlimited queue”, the lower queue size penalizes writes and favors read. While our testing identified this outcome, further investigation will be required to determine exactly why this is the case.

Conclusion

In conclusion, the new NTR change offers an improvement over the previous NTR queue behaviour. Through our performance testing we found the change to be good for clusters that have a constant load causing the NTR queue to block. Under the new mechanism the queue no longer blocks, but throttles the load based on the amount of memory allocated to requests.

The results from testing indicated that the changed queue behaviour reduced latency and provided a significant lift in the number of operations without clients timing out. Clusters with our latest version of Cassandra can handle more load before hitting hard limits. For more information feel free to comment below or reach out to our Support team to learn more about changes to 3.11.6 or any of our other supported Cassandra versions.

The post Understanding the Impacts of the Native Transport Requests Change Introduced in Cassandra 3.11.5 appeared first on Instaclustr.