Scylla University Live
Following on the success of our online 2021 Scylla Summit Training Day, we will host our first-ever Scylla University Live event. Featuring helpful new coursework, Scylla University Live will take place on Thursday, April 22, 2021, 8:45 AM – 1 PM, Pacific Standard Time (PST). Register now and mark your calendars!
The event will be online and instructor-led. It will include two parallel tracks – one for Developers and Scylla Cloud users and one for DevOps and Architects. You can bounce back and forth between tracks or drop in for the sessions that most interest you.
SIGN UP FOR SCYLLA UNIVERSITY LIVE
The sessions will be led by many of the same Scylla engineers and developers who build our products. They will be available for questions during and after the sessions.
After the training sessions, you will have the opportunity to take accompanying Scylla University courses to get some hands-on experience, complete quizzes, receive certificates of completion, and earn some exclusive swag.
Agenda and Schedule
Time (PST) | Developers & Scylla Cloud Users | DevOps & Architects |
08:45-09:00am | Welcome & Agenda | |
09:00-09:55am | Getting Started with Scylla | Best Practices for Change Data Capture |
09:55-10:00am | Break/Passing Period | |
10:00-10:55am | How to Write Better Apps | Scylla on Kubernetes: CQL and DynamoDB compatible APIs |
10:55-11:00am | Break/Passing Period | |
11:00-11:55am | Lightweight Transactions in Action | Advanced Monitoring and Admin Best Practices |
11:55am-12:00pm | Break/Passing Period | |
12:00pm-1:00pm | Birds of a Feather Roundtables |
Bring any questions you have to the sessions. We aim to make them as live and interactive as possible.
We will also host a Birds of a Feather networking session where you can meet your fellow database monsters and chat with the Scylla team.
Details
- Getting Started with Scylla: This session will introduce Scylla Architecture, basic concepts, and basic data modeling. Some of the topics covered include Scylla Terminology, Scylla Components, Data Replication, Consistency Level, Scylla Write, Read paths + sstable, memtable, Basic Data modeling. Also, I’ll show you a quick demo of how easy it is to spin up a cluster with docker.
- How to Write Better Apps: In this class, we will touch on Advanced Data Modeling and Data Types, Scylla shard-aware drivers, Materialized Views, Global and Local Secondary Indexes, and Compaction Strategies. We will also talk about tips and best practices for writing applications for Scylla.
- Lightweight Transactions in Action: In this class, you will get an overview of LWT, when and why this feature should be used, how it’s implemented in Scylla, best practices for using it. You’ll also learn how to configure and use Scylla’s “compare and set” operations.
- Best Practices for Change Data Capture: This session will start with an overview of how Scylla implemented CDC as standard CQL tables, the different ways it can be configured to record pre-and post images, as well as differences, and then show how to consume this data via various methods.
- Scylla on Kubernetes: CQL and DynamoDB compatible APIs: here, you will learn about using Scylla Operator for Kubernetes for cloud-native deployments, including how to take advantage of the three recently released helm charts.
- Advanced Monitoring and Best Practices: This session will start with an overview of our Scylla Monitoring Stack, which is your first resource for understanding what’s happening in your cluster and your application and for getting alerts when something goes wrong. Scylla exports thousands of different metrics. We will show how to use Scylla Monitoring to make sense of all that data and provide a complete picture of how well your hardware is set up and used, the current/historical state of the cluster, and how well your app is written.
Get Started in Scylla University
If you haven’t done so yet, we recommend you first complete the Scylla Essentials and Mutant Monitoring System courses on Scylla University to better understand Scylla and how the technology works.
Hope to see you at Scylla University Live!
SIGN UP FOR SCYLLA UNIVERSITY LIVE
The post Scylla University Live appeared first on ScyllaDB.
Kiwi.com: Nonstop Operations with Scylla Even Through the OVHcloud Fire
Disasters can strike any business on any day. This particular disaster, a fire at the OVHcloud Strasbourg datacenter, struck recently and the investigation and recovery are still ongoing. This is an initial report of one company’s resiliency in the face of that disaster.
Overview of the Incident
Less than an hour after midnight on Wednesday, March 10, 2021, in the city of Strasbourg, at 0:47 CET, a fire began in a room at the SBG2 datacenter of OVHcloud, the popular French cloud provider. Within hours the fire had been contained, but not before wreaking havoc. The fire nearly entirely destroyed SBG2, and gutted four of twelve rooms in the adjacent SBG1 datacenter. Additionally, combatting the fire required proactively switching off the other two datacenters, SBG3 and SBG4.
Netcraft estimates this disaster accounted for knocking out 3.6 million websites spread across 464,000 domains. Of those,184,000 websites across nearly 60,000 domains were in the French country code Top Level Domain (ccTLD) .FR — about 1 in 50 servers for the entire .FR domain. As Netcraft stated, “Websites that went offline during the fire included online banks, webmail services, news sites, online shops selling PPE to protect against coronavirus, and several countries’ government websites.”
OVHcloud’s Strasbourg SBG2 Datacenter engulfed in flames. (Image: SDIS du Bas Rhin )
Kiwi.com Keeps Running
However, one company that had their servers deployed in OVHcloud fared better than others: Kiwi.com, the popular online travel site. Scylla, the NoSQL database Kiwi.com had standardized upon, was designed from the ground up to be highly available and resilient, even in the face of disaster.
Around 01:12 CET, about a half hour after the fire initially broke out, Kiwi.com’s monitoring dashboards produced alerts as nodes went down and left the cluster. There were momentary traffic spikes as these nodes became unresponsive, but soon the two other OVHcloud European datacenters used by Kiwi.com took over requests bound for Strasbourg.
Out of a thirty node distributed NoSQL cluster, ten nodes became suddenly unavailable. Other than a brief blip around 1:15, Kiwi.com’s Scylla cluster continued working seamlessly. Load on the remaining online nodes rose from ~25% before the outage to ~30-50% three hours later. (Source: Kiwi.com)
Kiwi.com had just lost 10 server nodes out of 30 nodes total, but the remaining Scylla database cluster was capable of rebalancing itself and handling the load. Plus, because Scylla is datacenter topology aware and kept multiple copies of data geographically distributed, their database kept running with zero data loss.
According to Kiwi.com’s Milos Vyletel, “As we designed Scylla to be running on three independent locations — every location at least 200 kilometers from another — Kiwi.com survived without any major impact of services.”
The multi-local OVHcloud infrastructure enabled Kiwi.com to build out a robust and scalable triple replicated Scylla database in three datacenters all in separate locations. The secure OVH vRack synchronised the connection of the three sites via a reliable private network, allowing the cluster optimal replication and scalability across multiple locations.
Indeed, Kiwi.com had done their disaster planning years before, even joking about their resiliency by having their initial Scylla cluster launch party in a Cold War era nuclear fallout shelter. Now their planning, and their technology choice, had paid back in full.
As dawn broke, the fire was out, but the extensive damage to the OVHcloud Strasbourg datacenter was clear. (Image: AP Photo/Jean-Francois Badias)
With the dawning of a new day, load on Kiwi.com’s database picked up, which taxed the remaining servers, yet Scylla kept performing. As Milos informed the ScyllaDB support team, “Scylla seems fine. A bit red but everything works as designed.”
The Road to Disaster Recovery
In total, ten production nodes, plus two other development servers, located in SBG2 were lost to Kiwi.com and are unrecoverable. The next steps are to await for the other OVHcloud SBG buildings to be brought back up again, at which point Kiwi.com will refresh their hardware with new servers. Kiwi.com is also considering using this opportunity to update the servers in their other datacenters.
Lessons Learned
Milos provided this advice from Kiwi.com’s perspective: “One thing we have learned is to test full datacenter outages on a regular basis. We always wanted to test it on one product, as one of the devs was pushing us to do, but never really had taken the time.”
“Fortunately, we sized our Scylla cluster in a way that two DCs were able to handle the load just fine. We applied the same principles to other (non-Scylla) clusters as well, but over time as new functionality was added we have not been adding new capacity for various reasons — COVID impact being the major one over this last year or so. We are kind of pushing limits on those clusters — we had to do some reshuffling of servers to accommodate for the lost compute power.
“The bottom line is it is more expensive to have data replicated on multiple geographically distributed locations, providing enough capacity to survive a full DC outage, but when these kinds of situations happen it is priceless to be able to get over it with basically no downtime whatsoever.”
The post Kiwi.com: Nonstop Operations with Scylla Even Through the OVHcloud Fire appeared first on ScyllaDB.
A Shard-Aware Scylla C/C++ Driver
We are happy to announce the first release of a shard-aware
C/C++ driver (connector library). It’s an API-compatible fork of
Datastax cpp-driver
2.15.2, currently packaged for
x86_64 CentOS 7 and Ubuntu 18.04 (with more to come!). It’s also
easily compilable on most Linux distributions. The driver still
works with Apache Cassandra and DataStax Enterprise (DSE), but when
paired with Scylla enables shard-aware queries, delivering even
greater performance than before.
GET THE SCYLLA SHARD-AWARE C/C++ DRIVER
Why?
Scylla C/C++ driver was forked from Datastax driver with the view to adding Scylla-specific features. It’s a rich, fast, async, battle-proven piece of C++ code, although its interface is in pure C, which opens a lot of possibilities, like creating Python or Erlang bindings. Its newest feature, which will be discussed in this post, is shard-awareness.
We’ve already written a neat introduction into this concept
so, instead of repeating all of that here, we’ll just briefly
remind you of the key topics. In short: shard-awareness is the
ability of the client application to query specific CPUs within the
Scylla cluster. As you know, Scylla’s code follows the concept of a
shard-per-core
architecture, which means that every node is a
multithreaded process whose every thread performs some relatively
independent work. It also means that each piece of data stored in
your Scylla DB is bound to specific CPU(s). Unsurprisingly, also
every CQL connection is served by a specific CPU, as you can see by
querying system.clients
table:
cqlsh> SELECT address, port, client_type, shard_id FROM system.clients;
address | port | client_type | shard_id
-----------+-------+-------------+----------
127.0.0.1 | 46730 | cql | 2
127.0.0.1 | 46732 | cql | 3
127.0.0.1 | 46768 | cql | 0
127.0.0.1 | 46770 | cql | 1
(4 rows)
Each CQL connection is bound to a specific Scylla shard
(column shard_id
). The contents of this table
are local (unique to every node).
By default, scylla-cpp-driver
opens a CQL
connection to every shard on every node, so it can communicate
directly with any CPU in the cluster. Now — by locally hashing the
partition key in your query — the driver determines which shard, on
which node, would be the best for execution of your statement. Then
it’s just a matter of pumping the query up the right CQL
connection. The best part is that all of that happens behind the
scenes: client code is unaffected, so to get this ability you’ll
just need to re-link your C/C++ application with
libscylla-cpp-driver
!
Note: The driver doesn’t do client-side parsing of queries, so it must rely on Scylla to identify the partition keys in your queries. That’s why, for now, only prepared statements are shard-aware.
Note: To benefit from shard-awareness,
token-aware routing needs to be enabled on the Cluster
object. Don’t worry; it’s on by default.
How?
Scylla nodes extend the CQL
SUPPORTED
message with information about their shard count,
current shard ID, sharding algorithm, etc. Knowing all this, the
client can take one of two approaches to establish the pools of
per-shard-connections: “basic” and “advanced”.
In the basic mode driver just opens connections until all shards
are reached and the goal set by
cass_cluster_set_core_connections_per_host()
is met.
It works because Scylla node assigns CQL connections to the least
busy shards, so with sufficiently many connections all shards will
eventually be hit. The downside to this is that usually some number
of connections are opened unnecessarily, which led us to the idea
of “advanced” shard selection.
In the advanced mode, available since Scylla Open Source 4.3,
CQL SUPPORTED
message contains a field named
SCYLLA_SHARD_AWARE_PORT
. If set, it indicates that
Scylla listens to CQL connections on an additional port (by default
19042). This port works very much like the traditional 9042, with
only one important difference: TCP connections incoming to this
port are routed to specific shards depending on the
client-side port number (aka. source port, local
port). Clients can explicitly choose the target shard by setting
the local port number to a value of N × node_shard_count +
desired_shard_number where the allowed range for N is
platform-specific. A call to:
cass_cluster_set_local_port_range(cluster, 50000,
60000);
tells the driver to use advanced shard-awareness and to bind its local sockets to ports from the range [50000, 60000). That’s all you need to do.
Note: Of course, the new port also does have a secure (SSL-ed) counterpart.
Note: Advanced shard selection falls back to the basic mode
automatically in case of failure. Failure can happen for a number
of reasons, such as when a client hides behind a NAT gateway or
when SCYLLA_SHARD_AWARE_PORT
is blocked. The fallback
slows down the process of reconnecting, so we don’t recommend
relying on it.
Installation
Our C/C++ driver releases are hosted on GitHub. If you are lucky to run CentOS 7, Ubuntu 18.04 or their relatives, you can simply download and install our packages along with their dependencies.
# Installation: CentOS 7
sudo yum install -y epel-release
sudo yum -y install libuv openssl zlib
wget https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-2.15.2-1.el7.x86_64.rpm https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-devel-2.15.2-1.el7.x86_64.rpm
sudo yum localinstall -y scylla-cpp-driver-2.15.2-1.el7.x86_64.rpm scylla-cpp-driver-devel-2.15.2-1.el7.x86_64.rpm
# Installation: Ubuntu 18.04
wget https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver_2.15.2-1_amd64.deb https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-dev_2.15.2-1_amd64.deb
sudo apt-get update
sudo apt-get install -y ./scylla-cpp-driver_2.15.2-1_amd64.deb ./scylla-cpp-driver-dev_2.15.2-1_amd64.deb
For those working on cutting-edge Linux distributions we recommend compilation from sources. Once you have the tools (cmake, make, g++) and dependencies (libuv-dev, zlib-dev, openssl-dev) installed please check out the latest tagged revision and proceed according to the linked instruction.
Example
In this short example we’ll just open a number of connections and exit after a minute of sleep.
// conn.cpp - demonstration of advanced shard selection in C++
#include
#include
#include
#include
int main(int argc, char* argv[]) {
CassCluster* cluster = cass_cluster_new();
CassSession* session = cass_session_new();
cass_cluster_set_num_threads_io(cluster, 2);
cass_cluster_set_contact_points(cluster, "127.0.0.1");
cass_cluster_set_core_connections_per_host(cluster, 7);
// Now enable advanced shard-awareness
cass_cluster_set_local_port_range(cluster, 50000, 60000);
CassFuture* connect_future = cass_session_connect(session, cluster);
if (cass_future_error_code(connect_future) == CASS_OK) {
std::cout << "Connected\n";
std::this_thread::sleep_for(std::chrono::seconds(60)); // [1]
} else {
std::cout << "Connection ERROR\n";
}
// Cleanup
cass_future_free(connect_future);
cass_session_free(session);
cass_cluster_free(cluster);
}
If you installed the driver from packages, compilation is a breeze:
g++ conn.cpp -lscylla-cpp-driver -o conn
If you built the driver from sources but you hesitate to install it system-wide, you can still compile the example with:
g++ conn.cpp /cpp-driver/build/libscylla-cpp-driver.so
-Wl,-rpath,/cpp-driver/build/ -I /cpp-driver/include/ -o
conn
Assuming that you have a running 1-node Scylla cluster on
localhost
, the code above should greet you with
“===== Using optimized driver!!! =====
“. Then it would
establish a session with 2 threads, each owning 7 connections (or
the closest greater multiple of shard count on that instance of
Scylla). While the sleep [1] takes place, we can use
cqlsh
to peek at the client-side port numbers and at
the distribution of connections among shards:
cqlsh> SELECT address, port, client_type, shard_id FROM system.clients;
address | port | client_type | shard_id
-----------+-------+-------------+----------
127.0.0.1 | 37534 | cql | 0
127.0.0.1 | 37536 | cql | 1
127.0.0.1 | 38207 | cql | 4
127.0.0.1 | 50000 | cql | 0
127.0.0.1 | 50001 | cql | 1
127.0.0.1 | 50002 | cql | 2
127.0.0.1 | 50003 | cql | 3
127.0.0.1 | 50004 | cql | 4
127.0.0.1 | 50005 | cql | 5
127.0.0.1 | 50006 | cql | 6
127.0.0.1 | 50007 | cql | 7
127.0.0.1 | 50008 | cql | 0
127.0.0.1 | 50009 | cql | 1
127.0.0.1 | 50010 | cql | 2
127.0.0.1 | 50011 | cql | 3
127.0.0.1 | 50012 | cql | 4
127.0.0.1 | 50013 | cql | 5
127.0.0.1 | 50014 | cql | 6
127.0.0.1 | 50015 | cql | 7
(19 rows)
Results of running conn.cpp
against an
8-shard, single-node cluster of Scylla 4.3. There are two
connections from cqlsh
and one “control
connection” from scylla-cpp-driver
. Then 16
pooled connections follow (8 per each of 2 client threads), because
the requested number of 7 was rounded up to the shard count. Please
observe the advanced shard-awareness in action: the value in
the port
column determines
shard_id
of a pooled connection.
Summary
Shard-awareness has been present in Java, Go and Python drivers
for a while, and we’re now actively developing a shard-aware Rust driver. Today
we are delighted to welcome the C/C++ driver as a member of this
honorable group. If you use the Datastax cpp-driver
with Scylla you should switch to scylla-cpp-driver
right now — it’s basically a free performance boost for your
applications. And there is more to come: CDC partitioner, LWT support,
RAII interface… The future will definitely bring these and a few
other reasons to follow our driver!
GET THE SHARD-AWARE C/C++ DRIVER FOR SCYLLA
The post A Shard-Aware Scylla C/C++ Driver appeared first on ScyllaDB.
Zillow: Optimistic Concurrency with Write-Time Timestamps
Dan Podhola is a Principal Software Engineer at Zillow, the most-visited real estate website in the U.S. He specializes in performance tuning of high-throughput backend database services. We were fortunate to have him speak at our Scylla Summit on Optimistic Concurrency with Write-Time Timestamps. If you wish, you can watch the full presentation on-demand:
WATCH THE ZILLOW PRESENTATION NOW
Dan began by describing his team’s role at Zillow. They are responsible for processing property and listing records — what is for sale or rent — and mapping those to a common Zillow property IDs, then translating different message types into a common interchange format so their teams can talk to each other using the same type of data.
They are also responsible for deciding what’s best to display. He showed a high-level diagram of what happens when they receive a message from one of their data providers. It needs to be translated into a common output format.
“We fetch other data that we know about that property that’s also in that same format. We bundle that data together and choose a winner — I use the term ‘winner’ lightly here — and we send that bundle data out to our consumers.”
The Problem: Out-of-Order Data
The biggest problem that Zillow faced, and the reason for Dan’s talk, was that they have a highly threaded application. They receive message queues from two different data producers, and the messages can be received out of order. They cannot go backwards in time to look into the data, or, as Dan put it, “it would cause bad things to happen.” It would throw off analytics and many other systems that consume the data.
As an example, they might get a for-sale listing message that says a certain property has changed price. If they processed that sold message first, but then processed a price change which showed it was still for sale, it would create confusion.
It might require manual intervention to identify and fix, such as from a consumer complaining. “That’s not a great experience for anybody.” Especially not their analytics team.
This is the visual representation Dan showed to depict the problem. The message that was generated at time 2 has a later timestamp and should be accepted and processed. “It goes all the way through the system. At that point, we publish it. It’s golden. That’s what we want.” However an earlier message from Time 1 might be received from a backfilled queue. If it goes through the whole system and gets published, it’s already out-of-date.
“What’s the solution? You may already be thinking, ‘Well, just don’t write the older message.’ That’s a good solution and I agree with you. So I drew a diagram here that shows a decision tree that I added in the middle. It says ‘if the message is newer than the previous message then write it, otherwise don’t — just skip to the fetch.’”
Problems with Other Methods
“Now you might be wondering, ‘Well, why fetch it all? Just stop there.” If you’re grabbing a transaction lock and doing this in a SQL server type of way, that’s a good solution.
However, Dan then differentiated why you wouldn’t just implement this in the traditional SQL manner of doing things. That is, spinning up a SQL server instance, grabbing a lock and running a stored procedure that returns a result code. “Was this written? Was this not written? Then act on that information.”
The problem with the traditional SQL method, Dan pointed out, is the locking, “you don’t have the scalability of something like Scylla.” He also pointed out that one could also implement this with Scylla’s Lightweight Transactions (LWT), but those require three round-trip times of a normal transaction, and Zillow wanted to avoid them for performance reasons.
“So the second method is ‘store everything.’ That means we get every record and we store it, no matter when it was given to us. Then we have the application layer pull all the data at once. It does the filtering and then does action on it.”
However, this does lead to data bloat, “The problem with that is we have to trim data because we don’t need it all. We just didn’t want to have to deal with running a trim job or paying to store all that data we just didn’t need. Our system just needs to store the latest, most current state.”
Insert Using Timestamp
“Then finally there’s the ‘overwrite only-if-newer, but without a lock method.’”
The downside with this, Dan pointed out, is if you don’t know yet if your write has written to the database, actually committed the write, then you have to republish the results every single time.
Dan then showed the following CQL statement:
INSERT … USING TIMESTAMP = ?
“‘USING TIMESTAMP
’ allows you to replace the
timestamp that Scylla uses [by default] to decide which record to
write, or not write, as it were. This is normally a great thing
that you would want. You have a couple of nodes. Two messages come
in. Those messages have different timestamps — though only the
latest one will survive [based on] the consistency.”
“You can use USING TIMESTAMP
to supply your own
timestamp, and that’s what we’re doing here. This allows us to pick
a timestamp that we decided as the best arbiter for what is written
and what is not. The upside is that it solves our problem without
having to use LWT or an application lock, [or a cache] like
Redis.”
“The downside is that we don’t know if the record was actually written to the database. So the workaround for that is to pretend that the record always has been written.”
“We have to ensure that we get the latest documents back when we
ask for all the documents. For us to do that we have to write it
QUORUM
and we have to read it QUORUM
—
for at least just these two statements. This allows us to avoid
conditions where you might write just to a single node and then you
fetch — and that should even be in QUORUM
. If you
don’t write a QUORUM
you might pull from the other two
nodes and get an older result set, which would mean that we could
flicker and send an old result that we don’t want.
QUORUM
here solves that problem. We didn’t use
ALL
because if one of the nodes goes down we wouldn’t
be able to process at all.”
Putting It All Together
“This is my message flow diagram. You can see at the top there’s a Data Service that has its own data set. It publishes to its own real time, its own backfill queue. There’s a more complicated service down at the bottom that mostly streams data in from Flink but has a manual data retrieval service that allows people to correct issues or manually request individual messages or properties. And then there’s a Flink state on the bottom that uses some magic to fill a backfill queue.”
“Another interesting thing here is that the listing processor — which is the service that my team has written — takes messages from those queues. It actually publishes them differently. It’ll publish real-time messages to a Kinesis stream and it’ll publish backfill messages to an s3 bucket.”
“We have some additional required producer actions here. These different systems operate in different ways and might theoretically have older timestamps. Backfill would be a great example. You generate a message. It goes into your backfill bucket and then you publish a real time change. Well, if we happen to be processing the real time change, and then the backfill later, that message would have an older timestamp.”
“Now if they were produced at roughly the same time and their system had different ways internally of how it generated that message then the message generation time could be wrong.
So we’ve required our producers to give us the timestamp in their messages that is the time that they fetch the data from their database. Because their database is their arbiter of what’s the most recent record that was in there that they send to us.”
“This also solves our DELETE
scenario. If we were
to manually DELETE
a record from Scylla using one of
Scylla’s timestamps, in order for us to get a fresher message on
the DELETE
we have to ask for a fresh message
timestamp. Going to that service that requests the individual
messages we would get a fresh timestamp that’s newer than the one
that Scylla has for its DELETE
.
Performance
“Our performance has been great,” Dan said. For Zillow’s normal real-time workloads Scylla has been ample. “We run on three i3.4xlarge’s for Scylla. Our service actually runs on a single c5.xlarge. It has 25 consumer threads. It autoscales on EBS, so if we happen to have a small spike in real time we’ll scale up to three c5.xlarge’s, turn through that data and we’ll be done. The real beauty of this is even just on three nodes [for Scylla] we can scale up to 35 of the c5.xlarge instances and we’ll process over 6,500 records per second plus the real-time workload.”
“No one will even notice that we’re processing the entirety of Zillow’s property and listings data in order to correct some data issue or change a business rule. The beauty of that is we can process the entirety of the data at Zillow that we care about in less than a business day and, again, no performance hit to real-time data.”
If you want to check out Scylla for yourself, you can download it now, or if you want to discuss your needs, feel free to contact us privately, or join our community on Slack.
WATCH THE ZILLOW PRESENTATION NOW
The post Zillow: Optimistic Concurrency with Write-Time Timestamps appeared first on ScyllaDB.
Join Apache Cassandra for Google Summer of Code 2021
I have been involved with Apache Cassandra for the past eight years, and I’m very proud to mention that my open source journey started a little more than a decade ago during my participation at the Google Summer of Code (GSoC).
GSoC is a program sponsored by Google to promote open source development, where post-secondary students submit project proposals to open source organizations. Selected students receive community mentorship and a stipend from Google to work on the project for ten weeks during the northern hemisphere summer. Over 16,000 students from 111 countries have participated so far! More details about the program can be found on the official GSoC website.
The Apache Software Foundation (ASF) has been a GSoC mentor organization since the beginning of the program 17 years ago. The ASF acts as an “umbrella” organization, which means that students can submit project proposals to any subproject within the ASF. Apache Cassandra mentored a successful GSoC project in 2016 and we are participating again this year. The application period opens on March 29, 2021 and ends on April 13, 2021. It’s a highly competitive program, so don’t wait to the last minute to prepare!
How to Get Involved
Getting Started
The best way to get started if you’re new to Apache Cassandra is to get acquainted by reading the documentation and setting up a local development environment. Play around with a locally running instance via cqlsh and nodetool to get a feel for how to use the database. If you run into problems or roadblocks during this exercise, don’t be shy to ask questions via the community channels like the developers mailing list or the #cassandra-dev channel on the ASF Slack.
GSoC Project Ideas
Once you have a basic understanding of how the project works, browse the GSoC ideas list to select ideas that you are interested in working on. Browse the codebase to identify components related to the idea you picked. You are welcome to propose other projects if they’re not on the ideas list.
Writing a proposal
Write a message on the JIRA ticket introducing yourself and demonstrating your interest in working on that particular idea. Sketch a quick proposal on how you plan to tackle the problem and share it with the community for feedback. If you don’t know where to start, don’t hesitate to ask for help!
Useful Resources
There are many good resources on the web on preparing for GSoC, particularly the ASF GSoC Guide and the Python community notes on GSoC expectations. The best GSoC students are self-motivated and proactive, and following the tips above should increase your chances of getting selected and delivering your project successfully. Good luck!
Apache Cassandra Changelog #5 | March 2021
Our monthly roundup of key activities and knowledge to keep the community informed.
Release Notes
Released
We are expecting 4.0rc to be released soon, so join the Cassandra mailing list to stay up-to-date.
For the latest status on Cassandra 4.0 GA please check the Jira board (ASF login required). We are within line-of-sight to closing out beta scope, with the remaining tickets representing 2.6% of the total scope. Read the latest summary from the community here.
Proposed
The community has been discussing release cadence after 4.0 reaches GA. An official vote has not been taken on this yet, but the current consensus is one major release every year. Also under discussion are bleeding-edge snapshots (where stability is not guaranteed) and the duration of support for releases.
Community Notes
Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.
Added
We are pleased to announce that Paulo Motta has accepted the invitation to become a PMC member! This invite comes in recognition of all his contributions to the Apache Cassandra project over many years.
Added
Apache Cassandra is taking part in the Google Summer of Code (GSoC) under the ASF umbrella as a mentoring organization. We will be posting a separate blog soon detailing how post-secondary students can get involved.
Proposed
With 4.0 approaching completion, the idea of a project roadmap is also being discussed.
Changed
The Kubernetes SIG is looking at ways to invite more participants by hosting two meetings to accommodate people in different time zones. Watch here.
A community website dedicated to cass-operator is also in development focused on documentation for the operator. Going forward, the Kubernetes SIG is discussing release cadence and looking at six releases a year.
K8ssandra 1.0, an open source production-ready platform for running Apache Cassandra on Kubernetes, was also released on 25 February and announced on its new community website. Read the community blog to find out more and what’s next. K8ssandra now has images for Cassandra 3.11.10 and 4.0-beta4 that run rootless containers with Reaper and Medusa functions.
User Space
Instana
“The Instana components are already containerized and run in our SaaS platform, but we still needed to create containers for our databases, Clickhouse, Cassandra, etc., and set up the release pipeline for them. Most of the complexity is not in creating a container with the database running, but in the management of the configuration and how to pass it down in a maintainable way to the corresponding component.” - Instana
Flant
“We were able to successfully migrate the Cassandra database deployed in Kubernetes to another cluster while keeping the Cassandra production installation in a fully functioning state and without interfering with the operation of applications.” - Flant
Do you have a Cassandra case study to share? Email cassandra@constantia.io.
In the News
CRN: Top 10 Highest IT Salaries Based On Tech Skills In 2021: Dice
TechTarget: Microsoft ignites Apache Cassandra Azure service
Dynamic Business: 5 Ways your Business Could Benefit from Open Source Technology
TWB: Top 3 Technologies which are winning the Run in 2021
Cassandra Tutorials & More
Data Operations Guide for Apache Cassandra - Rahul Singh, Anant
Introduction to Apache Cassandra: What is Apache Cassandra - Ksolves
What’s New in Apache Cassandra 4.0 - Deepak Vohra, Techwell
Reaper 2.2 for Apache Cassandra was released - Alex Dejanovski, The Last Pickle
What’s new in Apache Zeppelin’s Cassandra interpreter - Alex Ott
Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.
Apache Cassandra Changelog #4 | February 2021
Our monthly roundup of key activities and knowledge to keep the community informed.
Release Notes
Released
Apache Cassandra 3.0.24 (pgp, sha256 and sha512). This is a security-related release for the 3.0 series and was released on February 1. Please read the release notes.
Apache Cassandra 3.11.10 (pgp, sha256 and sha512) was also released on February 1. You will find the release notes here.
Apache Cassandra 4.0-beta4 (pgp, sha256 and sha512) is the newest version which was released on December 30. Please pay attention to the release notes and let the community know if you encounter problems with any of the currently supported versions.
Join the Cassandra mailing list to stay updated.
Changed
A vulnerability rated Important
was found when using the
dc
or rack
internode_encryption setting. More
details of CVE-2020-17516 Apache Cassandra internode encryption
enforcement vulnerability are available on this
user thread.
Note: The mitigation for 3.11.x users requires an update to 3.11.10 not 3.11.24, as originally stated in the CVE. (For anyone who has perfected a flux capacitor, we would like to borrow it.)
The current status of Cassandra 4.0 GA can be viewed on this Jira board (ASF login required). RC is imminent with testing underway. The remaining tickets represent 3.3% of the total scope. Read the latest summary from the community here.
Community Notes
Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.
Added
Apache Cassandra will be participating in the Google Summer of Code (GSoC) under the ASF umbrella as a mentoring organization. This is a great opportunity to get involved, especially for newcomers to the Cassandra community.
We’re curating a list of JIRA tickets this month, which will be
labeled as gsoc2021
. This
will make them visible in the
Jira issue tracker for participants to see and connect with
mentors.
If you would like to volunteer to be a mentor for a GSoC
project, please tag the respective JIRA ticket with the
mentor
label. Non-committers
can volunteer to be a mentor as long as there is a committer as
co-mentor. Projects can be mentored by one or more co-mentors.
Thanks to Paulo Motta for proposing the idea and getting the ticket list going.
Added
Apache Zeppelin 0.9.0 was released on January 15. Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing system, which supports Apache Cassandra and others. The release notes for the Cassandra CQL Interpreter are available here.
Changed
For the GA of Apache Cassandra 4.0, any claim of support for Python 2 will be dropped from update documentation. We will also introduce a warning when running in Python 2.7. Support for Python 3 will be backported to at least 3.11, due to existing tickets, but we will undertake the work needed to make packaging and internal tooling support Python 3.
Changed
The Kubernetes SIG is discussing how to encourage more participation and to structure SIG meetings around updates on Kubernetes and Cassandra. We also intend to invite other projects (like OpenEDS, Prometheus, and others) to discuss how we can make Cassandra and Kubernetes better. As well as updates, the group discussed handling large-scale backups inside Kubernetes and using S3 APIs to store images. Watch here.
User Space
Backblaze
“Backblaze uses Apache Cassandra, a high-performance, scalable distributed database to help manage hundreds of petabytes of data.” - Andy Klein
Witfoo
Witfoo uses Cassandra for big data needs in cybersecurity operations. In response to the recent licensing changes at Elastic, Witfoo decided to blog about its journey away from Elastic to Apache Cassandra in 2019. - Witfoo.com
Do you have a Cassandra case study to share? Email cassandra@constantia.io.
In the News
The New Stack: What Is Data Management in the Kubernetes Age?
eWeek: Top Vendors of Database Management Software for 2021
Software Testing Tips and Tricks: Top 10 Big Data Tools (Big Data Analytics Tools) in 2021
InfoQ: K8ssandra: Production-Ready Platform for Running Apache Cassandra on Kubernetes
Cassandra Tutorials & More
Creating Flamegraphs with Apache Cassandra in Kubernetes (cass-operator) - Mick Semb Wever, The Last Pickle
Apache Cassandra : The Interplanetary Database - Rahul Singh, Anant
How to Install Apache Cassandra on Ubuntu 20.04 - Jeff Wilson, RoseHosting
The Impacts of Changing the Number of VNodes in Apache Cassandra - Anthony Grasso, The Last Pickle
CASSANDRA 4.0 TESTING - Charles Herring, Witfoo
Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.
Security Advisory: CVE-2020-17516
Earlier this week, a vulnerability was published to the Cassandra users mailing list which describes a flaw in the way Apache Cassandra nodes perform encryption in some configurations.
The vulnerability description is reproduced here:
“Description:
When using ‘dc’ or ‘rack’ internode_encryption setting, a Cassandra instance allows both encrypted and unencrypted connections. A misconfigured node or a malicious user can use the unencrypted connection despite not being in the same rack or dc, and bypass mutual TLS requirement.”
The recommended mitigations are:
- Users of ALL versions should switch from ‘dc’ or ‘rack’ to ‘all’ internode_encryption setting, as they are inherently insecure
- 3.0.x users should additionally upgrade to 3.0.24
- 3.11.x users should additionally upgrade to 3.11.10
By default, all Cassandra clusters running in the Instaclustr environment are configured to use ‘internode_encryption’ set to all. To confirm that our clusters are unaffected by this vulnerability, Instaclustr has checked the configuration of all Cassandra clusters in our managed service fleet and none are using the vulnerable configurations ‘dc’ or ‘rack’.
Instaclustr restricts access to Cassandra nodes to only those IP addresses and port combinations required for cluster management and customer use, further mitigating the risk of compromise.
In line with the mitigation recommendation, Instaclustr is developing a plan to upgrade all 3.0.x and 3.11.x Cassandra clusters to 3.0.24 and 3.11.10. Customers will be advised when their clusters are due for upgrade.
Instaclustr recommends that our Support Only customers check their configurations to ensure that they are consistent with this advice, and upgrade their clusters as necessary to maintain a good security posture.
Should you have any questions regarding Instaclustr Security, please contact us by email security@instaclustr.com.
If you wish to discuss scheduling of the upgrade to your system or have any other questions regarding the impact of this vulnerability, please contact support@instaclustr.com.
To report an active security incident, email support@instaclustr.com.
The post Security Advisory: CVE-2020-17516 appeared first on Instaclustr.
Apache Cassandra Changelog #3 | January 2021
Our monthly roundup of key activities and knowledge to keep the community informed.
Release Notes
Released
Apache Cassandra 4.0-beta4 (pgp, sha256 and sha512) was released on December 30. Please pay attention to release notes and let the community know if you encounter problems. Join the Cassandra mailing list to stay updated.
Changed
The current status of Cassandra 4.0 GA can be viewed on this Jira board (ASF login required). RC is imminent with testing underway. Read the latest summary from the community here.
Community Notes
Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.
Added
The Cassandra community welcomed one new PMC member and five new committers in 2020! Congratulations to Mick Semb Wever who joined the PMC and Jordan West, David Capwell, Zhao Yang, Ekaterina Dimitrova, and Yifan Cai who accepted invitations to become Cassandra committers!
Changed
The Kubernetes SIG is discussing how to extend the group’s scope beyond the operator, as well as sharing an update on current operator merge efforts in the latest meeting. Watch here.
User Space
Keen.io
Under the covers, Keen leverages Kafka, Apache Cassandra NoSQL database and the Apache Spark analytics engine, adding a RESTful API and a number of SDKs for different languages. Keen enriches streaming data with relevant metadata and enables customers to stream enriched data to Amazon S3 or any other data store. - Keen.io
Monzo
Suhail Patel explains how Monzo prepared for the recent crowdfunding (run entirely through its app, using the very same platform that runs the bank) which saw more than 9,000 people investing in the first five minutes. He covers Monzo’s microservice architecture (on Go and Kubernetes) and how they profiled and optimized key platform components such as Cassandra and Linkerd. - Suhil Patel
In the News
ZDNet - Meet Stargate, DataStax’s GraphQL for databases. First stop - Cassandra
CIO - It’s a good day to corral data sprawl
TechTarget - Stargate API brings GraphQL to Cassandra database
ODBMS - On the Cassandra 4.0 beta release. Q&A with Ekaterina Dimitrova, Apache Cassandra Contributor
Cassandra Tutorials & More
Intro to Apache Cassandra for Data Engineers - Daniel Beach, Confessions of a Data Guy
Impacts of many columns in a Cassandra table - Alex Dejanovski, The Last Pickle
Migrating Cassandra from one Kubernetes cluster to another without data loss - Flant staff
Real-time Stream Analytics and User Scoring Using Apache Druid, Flink & Cassandra at Deep.BI - Hisham Itani, Deep.BI
User thread: Network Bandwidth and Multi-DC replication (Login required)
Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.
The Instaclustr LDAP Plugin for Cassandra 2.0, 3.0, and 4.0
LDAP (Lightweight Directory Access Protocol) is a common vendor-neutral and lightweight protocol for organizing authentication of network services. Integration with LDAP allows users to unify the company’s security policies when one user or entity can log in and authenticate against a variety of services.
There is a lot of demand from our enterprise customers to be able to authenticate to their Apache Cassandra clusters against LDAP. As the leading NoSQL database, Cassandra is typically deployed across the enterprise and needs this connectivity.
Instaclustr has previously developed our LDAP plugin to work with the latest Cassandra releases. However, with Cassandra 4.0 right around the corner, it was due for an update to ensure compatibility. Instaclustr takes a great deal of care to provide cutting-edge features and integrations for our customers, and our new LDAP plugin for Cassandra 4.0 showcases this commitment. We always use open source and maintain a number of Apache 2.0-licensed tools, and have released our LDAP plugin under the Apache 2.0 license.
Modular Architecture to Support All Versions of Cassandra
Previously, the implementations for Cassandra 2.0 and 3.0 lived in separate branches, which resulted in some duplicative code. With our new LDAP plugin update, everything lives in one branch and we have modularized the whole solution so it aligns with earlier Cassandra versions and Cassandra 4.0.
The modularization of our LDAP plugin means that there is the “base” module that all implementations are dependent on. If you look into the codebase on GitHub, you see that the implementation modules consist of one or two classes at maximum, with the rest inherited from the base module.
This way of organizing the project is beneficial from a long-term maintenance perspective. We no longer need to keep track of all changes and apply them to the branched code for the LDAP plugin for each Cassandra version. When we implement changes and improvements to the base module, all modules are updated automatically and benefit.
Customizable for Any LDAP Implementation
This plugin offers a default authenticator implementation for connecting to a LDAP server and authenticating a user against it. It also offers a way to implement custom logic for specific use cases. In the module, we provide the most LDAP server-agnostic implementation possible, but there is also scope for customization to meet specific LDAP server nuances.
If the default solution needs to be modified for a particular customer use case, it is possible to add in custom logic for that particular LDAP implementation. The implementation for customized connections is found in “LDAPPasswordRetriever” (DefaultLDAPServer being the default implementation from which one might extend and override appropriate methods). This is possible thanks to the SPI mechanism. If you need this functionality you can read more about it in the relevant section of our documentation.
Enhanced Testing for Reliability
Our GitHub build pipeline now tests the LDAP plugins for each supported Cassandra version on each merged commit. This update provides integration tests that will spin up a standalone Cassandra node as part of JUnit tests as well as a LDAP server. This is started in Docker as part of a Maven build before the actual JUnit tests.
This testing framework enables us to test to make sure that any changes don’t break the authentication mechanism. This is achieved by actually logging in via the usual mechanism as well as via LDAP.
Packaged for Your Operating System
Last but not least, we have now added Debian and RPM packages with our plugin for each Cassandra version release. Until now, a user of this plugin had to install the JAR file to Cassandra libraries directory manually. With the introduction of these packages, you do not need to perform this manual action anymore. The plugin’s JAR along with the configuration file will be installed in the right place if the official Debian or RPM Cassandra package is installed too.
How to Configure LDAP for Cassandra
In this section we will walk you through the setup of the LDAP plugin and explain the most crucial parts of how the plugin works.
After placing the LDAP plugin JAR to Cassandra’s classpath—either by copying it over manually or by installing a package—you will need to modify a configuration file in /etc/cassandra/ldap.properties.
There are also changes that need to be applied to cassandra.yaml. For Cassandra 4.0, please be sure that your authenticator, authorizer, and role_manager are configured as follows:
authenticator: LDAPAuthenticator authorizer: CassandraAuthorizer role_manager: LDAPCassandraRoleManager
Before using this plugin, an operator of a Cassandra cluster should configure system_auth keyspace to use NetworkTopologyStrategy.
How the LDAP Plugin Works With Cassandra Roles
LDAP plugin works via a “dual authentication” technique. If a user tries to log in with a role that already exists in Cassandra, separate from LDAP, it will authenticate against that role. However, if that role is not present in Cassandra, it will reach out to the LDAP server and it will try to authenticate against it. If it is successful, from the user’s point of view, it looks like this role was in Cassandra the whole time as it logs in the user transparently.
If your LDAP server is down, you will not be able to authenticate with the specified LDAP user. You can enable caching for LDAP users—available in the Cassandra 3.0 or 4.0 plugins—to take some load off a LDAP server when authentication is conducted frequently.
The Bottom Line
Our LDAP plugin meets the enterprise need for a consolidated security and authentication policy. 100% open source and supporting all major versions of Cassandra, the plugin works with all major LDAP implementations and can be easily customized for others.
The plugin is part of our suite of supported tools for our support customers and Instaclustr is committed to actively maintaining and developing the plugin. Our work updating it to support the upcoming Cassandra 4.0 release is part of this commitment. You can download it here and feel free to get in touch with any questions you might have. Cassandra 4.0 beta 2 is currently in preview on our managed platform and you can use our free trial to check it out.
The post The Instaclustr LDAP Plugin for Cassandra 2.0, 3.0, and 4.0 appeared first on Instaclustr.
Apache Cassandra Changelog #2 | December 2020
Our monthly roundup of key activities and knowledge to keep the community informed.
Release Notes
Released
Apache #Cassandra 4.0-beta3, 3.11.9, 3.0.23, and 2.2.19 were released on November 4 and are in the repositories. Please pay attention to release notes and let the community know if you encounter problems. Join the Cassandra mailing list to stay updated.
Changed
Cassandra 4.0 is progressing toward GA. There are 1,390 total tickets and remaining tickets represent 5.5% of total scope. Read the full summary shared to the dev mailing list and take a look at the open tickets that need reviewers.
Cassandra 4.0 will be dropping support for older distributions of CentOS 5, Debian 4, and Ubuntu 7.10. Learn more.
Community Notes
Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.
Added
The community weighed options to address reads inconsistencies for Compact Storage as noted in ticket CASSANDRA-16217 (committed). The conversation continues in ticket CASSANDRA-16226 with the aim of ensuring there are no huge performance regressions for common queries when you upgrade from 2.x to 3.0 with Compact Storage tables or drop it from a table on 3.0+.
Added
CASSANDRA-16222 is a Spark library that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory, one can achieve high performance with minimal impact to a production cluster. It was used to successfully export a 32TB Cassandra table (46bn CQL rows) to HDFS in Parquet format in around 70 minutes, a 20x improvement on previous solutions.
Changed
Great news for CEP-2: Kubernetes Operator, the community has agreed to create a community-based operator by merging the cass-operator and CassKop. The work being done can be viewed on GitHub here.
Released
The Reaper community announced v2.1 of its tool that schedules and orchestrates repairs of Apache Cassandra clusters. Read the docs.
Released
Apache Cassandra 4.0-beta-1 was released on FreeBSD.
User Space
Netflix
“With these optimized Cassandra clusters in place, it now costs us 71% less to operate clusters and we could store 35x more data than our previous configuration.” - Maulik Pandey
Yelp
“Cassandra is a distributed wide-column NoSQL datastore and is used at Yelp for both primary and derived data. Yelp’s infrastructure for Cassandra has been deployed on AWS EC2 and ASG (Autoscaling Group) for a while now. Each Cassandra cluster in production spans multiple AWS regions.” - Raghavendra D Prabhu
In the News
DevPro Journal - What’s included in the Cassandra 4.0 Release?
JAXenter - Moving to cloud-native applications and data with Kubernetes and Apache Cassandra
DZone - Improving Apache Cassandra’s Front Door and Backpressure
ApacheCon - Building Apache Cassandra 4.0: behind the scenes
Cassandra Tutorials & More
Users in search of a tool for scheduling backups and performing restores with cloud storage support (archiving to AWS S3, GCS, etc) should consider Cassandra Medusa.
Apache Cassandra Deployment on OpenEBS and Monitoring on Kubera - Abhishek Raj, MayaData
Lucene Based Indexes on Cassandra - Rahul Singh, Anant
How Netflix Manages Version Upgrades of Cassandra at Scale - Sumanth Pasupuleti, Netflix
Impacts of many tables in a Cassandra data model - Alex Dejanovski, The Last Pickle
Cassandra Upgrade in production : Strategies and Best Practices - Laxmikant Upadhyay, American Express
Apache Cassandra Collections and Tombstones - Jeremy Hanna
Spark + Cassandra, All You Need to Know: Tips and Optimizations - Javier Ramos, ITNext
How to install the Apache Cassandra NoSQL database server on Ubuntu 20.04 - Jack Wallen, TechRepublic
How to deploy Cassandra on Openshift and open it up to remote connections - Sindhu Murugavel
Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.
Apache Cassandra Changelog #1 (October 2020)
Introducing the first Cassandra Changelog blog! Our monthly roundup of key activities and knowledge to keep the community informed.
Release Notes
Updated
The most current Apache Cassandra releases are 4.0-beta2, 3.11.8, 3.0.22, 2.2.18 and 2.1.22 released on August 31 and are in the repositories. The next cut of releases will be out soon―join the Cassandra mailing list to stay up-to-date.
We continue to make progress toward the 4.0 GA release with the overarching goal of it being at a state where major users should feel confident running it in production when it is cut. Over 1,300 Jira tickets have been closed and less than 100 remain as of this post. To gain this confidence, there are various ongoing testing efforts involving correctness, performance, and ease of use.
Added
With CASSANDRA-15013, the community improved Cassandra's ability to handle high throughput workloads, while having enough safeguards in place to protect itself from potentially going out of memory.
Added
The Harry project is a fuzz testing tool that aims to generate reproducible workloads that are as close to real-life as possible, while being able to efficiently verify the cluster state against the model without pausing the workload itself.
Check out Harry, a fuzz-testing tool for Apache Cassandra, the best way to test databases or storage engines you've ever used: https://t.co/WjtkIMMrXp
— αλεx π (@ifesdjeen) September 21, 2020
Added
The community published its first Apache Cassandra Usage Report 2020 detailing findings from a comprehensive global survey of 901 practitioners on Cassandra usage to provide a baseline understanding of who, how, and why organizations use Cassandra.
Community Notes
Updates on new and active Cassandra Enhancement Proposals (CEPs) and how to contribute.
Changed
CEP-2: Kubernetes Operator was introduced this year and is an active discussion on creation of a community-based operator with the goal of making it easy to run Cassandra on Kubernetes.
Added
CEP-7: Storage Attached Index (SAI) is a new secondary index for Cassandra that builds on the advancements made with SASI. It is intended to replace the existing built-in secondary index implementations.
Added
Cassandra was selected by the ASF Diversity & Inclusion committee to be included in a research project to evaluate and understand the current state of diversity.
User Space
Bigmate
"In vetting MySQL, MongoDB, and other potential databases for IoT scale, we found they couldn't match the scalability we could get with open source Apache Cassandra. Cassandra's built-for-scale architecture enables us to handle millions of operations or concurrent users each second with ease – making it ideal for IoT deployments." - Brett Orr
Bloomberg
"Our group is working on a multi-year build, creating a new Index Construction Platform to handle the daily production of the Bloomberg Barclays fixed income indices. This involves building and productionizing an Apache Solr-backed search platform to handle thousands of searches per minute, an Apache Cassandra back-end database to store millions of data points per day, and a distributed computational engine to handle millions of computations daily." - Noel Gunasekar
In the News
Solutions Review - The Five Best Apache Cassandra Books on Our Reading List
ZDNet - What Cassandra users think of their NoSQL DBMS
Datanami - Cassandra Adoption Correlates with Experience
Container Journal - 5 to 1: An Overview of Apache Cassandra Kubernetes Operators
Datanami - Cassandra Gets Monitoring, Performance Upgrades
ZDNet - Faster than ever, Apache Cassandra 4.0 beta is on its way
Cassandra Tutorials & More
A Cassandra user was in search of a tool to perform schema DDL upgrades. Another user suggested https://github.com/patka/cassandra-migration to ensure you don't get schema mismatches if running multiple upgrade statements in one migration. See the full email on the user mailing list for other recommended tools.
Start using virtual tables in Apache Cassandra 4.0 - Ben Bromhead, Instaclustr
Benchmarking Apache Cassandra with Rust - Piotr Kołaczkowski, DataStax
Open Source BI Tools and Cassandra - Arpan Patel, Anant Corporation
Build Fault Tolerant Applications With Cassandra API for Azure Cosmos DB - Abhishek Gupta, Microsoft
Understanding Data Modifications in Cassandra - Sameer Shukla, Redgate
Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.
Building a Low-Latency Distributed Stock Broker Application: Part 4
In the fourth blog of the “Around the World ” series we built a prototype of the application, designed to run in two georegions.
Recently I re-watched “Star Trek: The Motion Picture” (The original 1979 Star Trek Film). I’d forgotten how much like “2001: A Space Odyssey” the vibe was (a drawn out quest to encounter a distant, but rapidly approaching very powerful and dangerous alien called “V’ger”), and also that the “alien” was originally from Earth and returning in search of “the Creator”—V’ger was actually a seriously upgraded Voyager spacecraft!
The original Voyager 1 and 2 had only been recently launched when the movie came out, and were responsible for some remarkable discoveries (including the famous “Death Star” image of Saturn’s moon Mimas, taken in 1980). Voyager 1 has now traveled further than any other human artefact and has since left the solar system! It’s amazing that after 40+ years it’s still working, although communicating with it now takes a whopping 40 hours in round-trip latency (which happens via the Deep Space Network—one of the stations is at Tidbinbilla, near our Canberra office).
Luckily we are only interested in traveling “Around the World” in this blog series, so the latency challenges we face in deploying a globally distributed stock trading application are substantially less than the 40 hours latency to outer space and back. In Part 4 of this blog series we catch up with Phileas Fogg and Passepartout in their journey, and explore the initial results from a prototype application deployed in two locations, Sydney and North Virginia.
1. The Story So Far
In Part 1 of this blog series we built a map of the world to take into account inter-region AWS latencies, and identified some “georegions” that enabled sub 100ms latency between AWS regions within the same georegion. In Part 2 we conducted some experiments to understand how to configure multi-DC Cassandra clusters and use java clients, and measured latencies from Sydney to North Virginia. In Part 3 we explored the design for a globally distributed stock broker application, and built a simulation and got some indicative latency predictions.
The goal of the application is to ensure that stock trades are done as close as possible to the stock exchanges the stocks are listed on, to reduce the latency between receiving a stock ticker update, checking the conditions for a stock order, and initiating a trade if the conditions are met.
2. The Prototype
For this blog I built a prototype of the application, designed to run in two georegions, Australia and the USA. We are initially only trading stocks available from stock exchanges in these two georegions, and orders for stocks traded in New York will be directed to a Broker deployed in the AWS North Virginia region, and orders for stocks traded in Sydney will be directed to a Broker deployed in the AWS Sydney region. As this Google Earth image shows, they are close to being antipodes (diametrically opposite each other, at 15,500 km apart, so pretty much satisfy the definition of traveling “Around the World”), with a measured inter-region latency (from blog 2) of 230ms.
The design of the initial (simulated) version of the application was changed in a couple of ways to:
- Ensure that it worked correctly when instances of the Broker were deployed in multiple AWS regions
- Measure actual rather than simulated latencies, and
- Use a multi-DC Cassandra Cluster.
The prototype isn’t a complete implementation of the application yet. In particular it only uses a single Cassandra table for Orders—to ensure that Orders are made available in both georegions, and can be matched against incoming stock tickers by the Broker deployed in that region.
Some other parts of the application are “stubs”, including the Stock Ticker component (which will eventually use Kafka), and checks/updates of Holdings and generation of Transactions records (which will eventually also be Cassandra tables). Currently only the asynchronous, non-market, order types are implemented (Limit and Stop orders), as I realized that implementing Market orders (which are required to be traded immediately) using a Cassandra table would result in too many tombstones being produced—as each order is deleted immediately upon being filled (rather than being marked as “filled” in the original design, to enable the Cassandra query to quickly find unfilled orders and prevent the Orders table from growing indefinitely). However, for non-market orders it is a reasonable design choice to use a Cassandra table, as Orders may exist for extended periods of time (hours, days, or even weeks) before being traded, and as there is only a small number of successful trades (10s to 100s) per second relative to the total number of waiting Orders (potentially millions), the number of deletions, and therefore tombstones, will be acceptable.
We now have a look at the details of some of the more significant changes that were made to the application.
Cassandra
I created a multi-DC Cassandra keyspace as follows:
CREATE KEYSPACE broker WITH replication = {'class': 'NetworkTopologyStrategy', 'NorthVirginiaDC': '3', 'SydneyDC': '3'};
In the Cassandra Java driver, the application.conf file determines which Data Center the Java client connects to. For example, to connect to the SydneyDC the file has the following settings:
datastax-java-driver { basic.contact-points = [ "1.2.3.4:9042" ] basic.load-balancing-policy { class = DefaultLoadBalancingPolicy local-datacenter = "SydneyDC" } }
The Orders table was created as follows (note that I couldn’t use “limit” for a column name as it’s a reserved word in Cassandra!):
CREATE TABLE broker.orders ( symbol text, orderid text, buyorsell text, customerid text, limitthreshold double, location text, ordertype text, quantity bigint, starttime bigint, PRIMARY KEY (symbol, orderid) );
For the primary key, the partition key is the stock symbol, so that all outstanding orders for a stock can be found when each stock ticker is received by the Broker, and the clustering column is the (unique) orderid, so that multiple orders for the same symbol can be written and read, and a specific order (for an orderid) can be deleted. In a production environment using a single stock symbol partition may result in skewed and unbounded partitions which is not recommended.
The prepared statements for creating, reading, and deleting orders are as follows:
PreparedStatement prepared_insert = Cassandra.session.prepare( "insert into broker.orders (symbol, orderid, buyorsell, customerid, limitthreshold, location, ordertype, quantity, starttime) values (?, ?, ?, ?, ?, ?, ?, ?, ?)"); PreparedStatement prepared_select = Cassandra.session.prepare( "select * from broker.orders where symbol = ?"); PreparedStatement prepared_delete = Cassandra.session.prepare( "delete from broker.orders where symbol = ? and orderid = ?");
I implemented a simplified “Place limit or stop order” operation (see Part 3), which uses the prepared_insert statement to create each new order, initially in the Cassandra Data Center local to the Broker where the order was created from, which is then automatically replicated in the other Cassandra Data Center. I also implemented the “Trade Matching Order” operation (Part 3), which uses the prepared_select statement to query orders matching each incoming Stock Ticker, checks the rules, and then if a trade is filled deletes the order.
Deployment
I created a 3 node Cassandra cluster in the Sydney AWS region, and then added another identical Data Center in the North Virginia AWS regions using Instaclustr Managed Cassandra for AWS. This gave me 6 nodes in total, running on t3.small instances (5 GB SSD, 2GB RAM, 2 CPU Cores). This is a small developer sized cluster, but is adequate for a prototype, and very affordable (2 cents an hour per node for AWS costs) given that the Brokers are currently only single threaded so don’t produce much load. We’re more interested in latency at this point of the experiment, and we may want to increase the number of Data Centers in the future. I also spun up an EC2 instance (t3a.micro) in the same AWS regions, and deployed an instance of the Stock Broker on each (it only used 20% CPU). Here’s what the complete deployment looks like:
3. The Results
For the prototype, the focus was on demonstrating that the design goal of minimizing latency for trading stop and limit orders (asynchronous trades) was achieved. For the prototype, the latency for these order types is measured from the time of receiving a Stock Ticker, to the time an Order is filled. We ran a Broker in each AWS region concurrently for an hour, with the same workload for each, and measured the average and maximum latencies. For the first configuration, each Broker is connected to its local Cassandra Data Center, which is how it would be configured in practice. The results were encouraging, with an average latency of 3ms, and a maximum of 60ms, as shown in this graph.
During the run, across both Brokers, 300 new orders were created each second, 600 stock tickers were received each second, and 200 trades were carried out each second.
Given that I hadn’t implemented Market Orders yet, I wondered how I could configure and approximately measure the expected latency for these synchronous order types between different regions (i.e. Sydney and North Virginia)? The latency for Market orders in the same region will be comparable to the non-market orders. The solution turned out to be simple— just re-configure the Brokers to use the remote Cassandra Data Center, which introduces the inter-region round-trip latency which would also be encountered with Market Orders placed on one region and traded immediately in the other region. I could also have achieved a similar result by changing the consistency level to EACH_QUOROM (which requires a majority of nodes in each data center to respond). Not surprisingly, the latencies were higher, rising to 360ms average, and 1200ms maximum, as shown in this graph with both configurations (Stop and Limit Orders on the left, and Market Orders on the right):
So our initial experiments are a success, and validate the primary design goal, as asynchronous stop and limit Orders can be traded with low latency from the Broker nearest the relevant stock exchanges, while synchronous Market Orders will take significantly longer due to inter-region latency.
Write Amplification
I wondered what else can be learned from running this experiment? We can understand more about resource utilization in multi-DC Cassandra clusters. Using the Instaclustr Cassandra Console, I monitored the CPU Utilization on each of the nodes in the cluster, initially with only one Data Center and one Broker, and then with two Data Centers and a single Broker, and then both Brokers running. It turns out that the read load results in 20% CPU Utilization on each node in the local Cassandra Data Center, and the write load also results in 20% locally. Thus, for a single Data Center cluster the total load is 40% CPU. However, with two Data Centers things get more complex due to the replication of the local write loads to each other Data Center. This is also called “Write Amplification”.
The following table shows the measured total load for 1 and 2 Data Centers, and predicted load for up to 8 Data Centers, showing that for more than 3 Data Centers you need bigger nodes (or bigger clusters). A four CPU Core node instance type would be adequate for 7 Data Centers, and would result in about 80% CPU Utilization.
Number of Data Centres | Local Read Load | Local Write Load | Remote Write Load | Total Write Load | Total Load |
1 | 20 | 20 | 0 | 20 | 40 |
2 | 20 | 20 | 20 | 40 | 60 |
3 | 20 | 20 | 40 | 60 | 80 |
4 | 20 | 20 | 60 | 80 | 100 |
5 | 20 | 20 | 80 | 100 | 120 |
6 | 20 | 20 | 100 | 120 | 140 |
7 | 20 | 20 | 120 | 140 | 160 |
8 | 20 | 20 | 140 | 160 | 180 |
Costs
The total cost to run the prototype includes the Instaclustr Managed Cassandra nodes (3 nodes per Data Center x 2 Data Centers = 6 nodes), the two AWS EC2 Broker instances, and the data transfer between regions (AWS only charges for data out of a region, not in, but the prices vary depending on the source region). For example, data transfer out of North Virginia is 2 cents/GB, but Sydney is more expensive at 9.8 cents/GB. I computed the total monthly operating cost to be $361 for this configuration, broken down into $337/month (93%) for Cassandra and EC2 instances, and $24/month (7%) for data transfer, to process around 500 million stock trades. Note that this is only a small prototype configuration, but can easily be scaled for higher throughputs (with incrementally and proportionally increasing costs).
Conclusions
In this blog we built and experimented with a prototype of the globally distributed stock broker application, focussing on testing the multi-DC Cassandra part of the system which enabled us to significantly reduce the impact of planetary scale latencies (from seconds to low milliseconds) and ensure greater redundancy (across multiple AWS regions), for the real-time stock trading function. Some parts of the application remain as stubs, and in future blogs I aim to replace them with suitable functionality (e.g. streaming, analytics) and non-functionality (e.g. failover) from a selection of Kafka, Elasticsearch and maybe even Redis!
The post Building a Low-Latency Distributed Stock Broker Application: Part 4 appeared first on Instaclustr.
Understanding the Impacts of the Native Transport Requests Change Introduced in Cassandra 3.11.5
Summary
Recently, Cassandra made changes to the Native Transport Requests (NTR) queue behaviour. Through our performance testing, we found the new NTR change to be good for clusters that have a constant load causing the NTR queue to block. Under the new mechanism the queue no longer blocks, but throttles the load based on queue size setting, which by default is 10% of the heap.
Compared to the Native Transport Requests queue length limit, this improves how Cassandra handles traffic when queue capacity is reached. The “back pressure” mechanism more gracefully handles the overloaded NTR queue, resulting in a significant lift of operations without clients timing out. In summary, clusters with later versions of Cassandra can handle more load before hitting hard limits.
Introduction
At Instaclustr, we are responsible for managing the Cassandra versions that we release to the public. This involves performing a review of Cassandra release changes, followed by performance testing. In cases where major changes have been made in the behaviour of Cassandra, further research is required. So without further delay let’s introduce the change to be investigated.
Change:
- Prevent client requests from blocking on executor task queue (CASSANDRA-15013)
Versions affected:
- 3.11.5 and above (CHANGES.txt)
- Backported to 3.0.19
Background
Native Transport Requests
Native transport requests (NTR) are any requests made via the CQL Native Protocol. CQL Native Protocol is the way the Cassandra driver communicates with the server. This includes all reads, writes, schema changes, etc. There are a limited number of threads available to process incoming requests. When all threads are in use, some requests wait in a queue (pending). If the queue fills up, some requests are silently rejected (blocked). The server never replies, so this eventually causes a client-side timeout. The main way to prevent blocked native transport requests is to throttle load, so the requests are performed over a longer period.
Prior to 3.11.5
Prior to 3.11.5, Cassandra used the following configuration settings to set the size and throughput of the queue:
- native_transport_max_threads is used to set the maximum threads for handling requests. Each thread pulls requests from the NTR queue.
- cassandra.max_queued_native_transport_requests is used to set queue size. Once the queue is full the Netty threads are blocked waiting for the queue to have free space (default 128).
Once the NTR queue is full requests from all clients are not accepted. There is no strict ordering by which blocked Netty threads will process requests. Therefore in 3.11.4 latency becomes random once all Netty threads are blocked.

Change After 3.11.5
In 3.11.5 and above, instead of blocking the NTR queue as previously described, it throttles. The NTR queue is throttled based on the heap size. The native transport requests are limited in terms of total size occupied in memory rather than the number of them. Requests are paused after the queue is full.
- native_transport_max_concurrent_requests_in_bytes a global limit on the number of NTR requests, measured in bytes. (default heapSize / 10)
- native_transport_max_concurrent_requests_in_bytes_per_ip an endpoint limit on the number of NTR requests, measured in bytes. (default heapSize / 40)
Maxed Queue Behaviour
From previously conducted performance testing of 3.11.4 and 3.11.6 we noticed similar behaviour when the traffic pressure has not yet reached the point of saturation in the NTR queue. In this section, we will discuss the expected behaviour when saturation does occur and breaking point is reached.
In 3.11.4, when the queue has been maxed, client requests will be refused. For example, when trying to make a connection via cqlsh, it will yield an error, see Figure 2.

Or on the client that tries to run a query, you may see NoHostAvailableException.
Where a 3.11.4 cluster previously got blocked NTRs, when upgraded to 3.11.6 NTRs are no longer blocked. The reason is that 3.11.6 doesn’t place a limit on the number of NTRs but rather on the size of memory of all those NTRs. Thus when the new size limit is reached, NTRs are paused. Default settings in 3.11.6 result in a much larger NTR queue in comparison to the small 128 limit in 3.11.4 (in normal situations where the payload size would not be extremely large).
Benchmarking Setup
This testing procedure requires the NTR queue on a cluster to be at max capacity with enough load to start blocking requests at a constant rate. In order to do this we used multiple test boxes to stress the cluster. This was achieved by using 12 active boxes to create multiple client connections to the test cluster. Once the cluster NTR queue is in constant contention, we monitored the performance using:
- Client metrics: requests per second, latency from client perspective
- NTR Queue metrics: Active Tasks, Pending Tasks, Currently Blocked Tasks, and Paused Connections.
For testing purposes we used two testing clusters with details provided in the table below:
Cassandra | Cluster size | Instance Type | Cores | RAM | Disk |
3.11.4 | 3 | M5xl-1600-v2 | 4 | 16GB | 1600 GB |
3.11.6 | 3 | m5xl-1600-v2 | 4 | 16GB | 1600 GB |
To simplify the setup we disabled encryption and authentication. Multiple test instances were set up in the same region as the clusters. For testing purposes, we used 12 KB blob payloads. To give each cluster node a balanced mixed load, we kept the number of test boxes generating write load equal to the number of instances generating read load. We ran the load against the cluster for 10 mins to temporarily saturate the queue with reading and write requests and cause contention for the Netty threads.
Our test script used Cassandra-stress for generating the load, you can also refer to Deep Diving Cassandra-stress – Part 3 (Using YAML Profiles) for more information.
In the stressSpec.yaml, we used the following table definition and queries:
table_definition: | CREATE TABLE typestest ( name text, choice boolean, date timestamp, address inet, dbl double, lval bigint, ival int, uid timeuuid, value blob, PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid) ) WITH compaction = { 'class':'LeveledCompactionStrategy' } AND comment='A table of many types to test wide rows' columnspec: - name: name size: fixed(48) population: uniform(1..1000000000) # the range of unique values to select for the field - name: date cluster: uniform(20..1000) - name: lval population: gaussian(1..1000) cluster: uniform(1..4) - name: value size: fixed(12000) insert: partitions: fixed(1) # number of unique partitions to update in a single operation # if batchcount > 1, multiple batches will be used but all partitions will # occur in all batches (unless they finish early); only the row counts will vary batchtype: UNLOGGED # type of batch to use select: uniform(1..10)/10 # uniform chance any single generated CQL row will be visited in a partition; # generated for each partition independently, each time we visit it # # List of queries to run against the schema # queries: simple1: cql: select * from typestest where name = ? and choice = ? LIMIT 1 fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition) range1: cql: select name, choice, uid from typestest where name = ? and choice = ? and date >= ? LIMIT 10 fields: multirow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition) simple2: cql: select name, choice, uid from typestest where name = ? and choice = ? LIMIT 1 fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
Write loads were generated with:
cassandra-stress user no-warmup 'ops(insert=10)' profile=stressSpec.yaml cl=QUORUM duration=10m -mode native cql3 maxPending=32768 connectionsPerHost=40 -rate threads=2048 -node file=node_list.txt
Read loads were generated by changing ops to
ops(simple1=10,range1=1)'
Comparison
3.11.4 Queue Saturation Test
The active NTR queue reached max capacity (at 128) and remained in contention under load. Pending NTR tasks remained above 128 throughout. At this point, timeouts were occurring when running 12 load instances to stress the cluster. Each node had 2 load instances performing reads and another 2 performing writes. 4 of the read load instances constantly logged NoHostAvailableExceptions as shown in the example below.
ERROR 04:26:42,542 [Control connection] Cannot connect to any host, scheduling retry in 1000 milliseconds com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ec2-18-211-4-255.compute-1.amazonaws.com/18.211.4.255:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-18-211-4-255.compute-1.amazonaws.com/18.211.4.255] Timed out waiting for server response), ec2-35-170-231-79.compute-1.amazonaws.com/35.170.231.79:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-35-170-231-79.compute-1.amazonaws.com/35.170.231.79] Timed out waiting for server response), ec2-35-168-69-19.compute-1.amazonaws.com/35.168.69.19:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [ec2-35-168-69-19.compute-1.amazonaws.com/35.168.69.19] Timed out waiting for server response))
The client results we got from this stress run are shown in Table 2.
Box | Op rate (op/s) | Latency mean (ms) | Latency median (ms) | Latency 95th percentile (ms) | latency 99th percentile (ms) | Latency 99.9th percentile (ms) | Latency max (ms) |
1 | 700.00 | 2,862.20 | 2,078.30 | 7,977.60 | 11,291.10 | 19,495.10 | 34,426.80 |
2 | 651.00 | 3,054.50 | 2,319.50 | 8,048.90 | 11,525.90 | 19,528.70 | 32,950.50 |
3 | 620.00 | 3,200.90 | 2,426.40 | 8,409.60 | 12,599.70 | 20,367.50 | 34,158.40 |
4 | 607.00 | 3,312.80 | 2,621.40 | 8,304.70 | 11,769.20 | 19,730.00 | 31,977.40 |
5 | 568.00 | 3,529.80 | 3,011.50 | 8,216.60 | 11,618.20 | 19,260.20 | 32,698.80 |
6 | 553.00 | 3,627.10 | 3,028.30 | 8,631.90 | 12,918.50 | 20,115.90 | 34,292.60 |
Writes | 3,699.00 | 3,264.55 | 2,580.90 | 8,264.88 | 11,953.77 | 19,749.57 | 34,426.80 |
7 | 469.00 | 4,296.50 | 3,839.90 | 9,101.60 | 14,831.10 | 21,290.30 | 35,634.80 |
8 | 484.00 | 4,221.50 | 3,808.40 | 8,925.50 | 11,760.80 | 20,468.20 | 34,863.10 |
9 | Crashed due to time out | – | – | – | – | – | – |
10 | Crashed due to time out | – | – | – | – | – | – |
11 | Crashed due to time out | – | – | – | – | – | – |
12 | Crashed due to time out | – | – | – | – | – | – |
Reads | 953.00 | 4,259.00 | 3,824.15 | 9,092.80 | 14,800.40 | 21,289.48 | 35,634.80 |
Summary | 4,652.00 | 3,761.78 | 3,202.53 | 8,678.84 | 13,377.08 | 20,519.52 | 35,634.80 |
* To calculate the total write operations, we summed the values from 6 instances. For max write latency we used the max value from all instances and for the rest of latency values, we calculated the average of results. Write results are summarised in the Table 2 “Write” row. For the read result we did the same, and results are recorded in the “Read” row. The last row in the table summarises the results in “Write” and “Read” rows.
The 6 write load instances finished normally, but the read instances struggled. Only 2 of the read load instances were able to send traffic through normally, the other clients received too many timeout errors causing them to crash. Another observation we have made is that the Cassandra timeout metrics, under client-request-metrics, did not capture any of the client timeout we have observed.
Same Load on 3.11.6
Next, we proceeded to test 3.11.6 with the same load. Using the default NTR settings, all test instances were able to finish the stress test successfully.
Box | Op rate (op/s) | Latency mean (ms) | Latency median (ms) | Latency 95th percentile (ms) | latency 99th percentile (ms) | Latency 99.9th percentile (ms) | Latency max (ms) |
1 | 677.00 | 2,992.60 | 2,715.80 | 7,868.50 | 9,303.00 | 9,957.30 | 10,510.90 |
2 | 658.00 | 3,080.20 | 2,770.30 | 7,918.80 | 9,319.70 | 10,116.70 | 10,510.90 |
3 | 653.00 | 3,102.80 | 2,785.00 | 7,939.80 | 9,353.30 | 10,116.70 | 10,510.90 |
4 | 608.00 | 3,340.90 | 3,028.30 | 8,057.30 | 9,386.90 | 10,192.20 | 10,502.50 |
5 | 639.00 | 3,178.30 | 2,868.90 | 7,994.30 | 9,370.10 | 10,116.70 | 10,510.90 |
6 | 650.00 | 3,120.50 | 2,799.70 | 7,952.40 | 9,353.30 | 10,116.70 | 10,510.90 |
Writes | 3,885.00 | 3,135.88 | 2,828.00 | 7,955.18 | 9,347.72 | 10,102.72 | 10,510.90 |
7 | 755.00 | 2,677.70 | 2,468.30 | 7,923.00 | 9,378.50 | 9,982.40 | 10,762.60 |
8 | 640.00 | 3,160.70 | 2,812.30 | 8,132.80 | 9,529.50 | 10,418.70 | 11,031.00 |
9 | 592.00 | 3,427.60 | 3,101.70 | 8,262.80 | 9,579.80 | 10,452.20 | 11,005.90 |
10 | 583.00 | 3,483.00 | 3,160.40 | 8,279.60 | 9,579.80 | 10,435.40 | 11,022.60 |
11 | 582.00 | 3,503.60 | 3,181.40 | 8,287.90 | 9,588.20 | 10,469.00 | 11,047.80 |
12 | 582.00 | 3,506.70 | 3,181.40 | 8,279.60 | 9,588.20 | 10,460.60 | 11,014.20 |
Reads | 3,734.00 | 3,293.22 | 2,984.25 | 8,194.28 | 9,540.67 | 10,369.72 | 11,047.80 |
Summary | 7,619.00 | 3,214.55 | 2,906.13 | 8,074.73 | 9,444.19 | 10,236.22 | 11,047.80 |
Default Native Transport Requests (NTR) Setting Comparison
Taking the summary row from both versions (Table 2 and Table 3), we produced Table 4.
Op rate (op/s) | Latency mean (ms) | Latency median (ms) | Latency 95th percentile (ms) | latency 99th percentile (ms) | Latency 99.9th percentile (ms) | Latency max (ms) | |
3.11.4 | 4652 | 3761.775 | 3202.525 | 8678.839167 | 13377.08183 | 20519.52228 | 35634.8 |
3.11.6 | 7619 | 3214.55 | 2906.125 | 8074.733333 | 9444.191667 | 10236.21667 | 11047.8 |

Figure 2 shows the latencies from Table 4. From the results, 3.11.6 had slightly better average latency than 3.11.4. Furthermore, in the worst case where contention is high, 3.11.6 handled the latency of a request better than 3.11.4. This is shown by the difference in Latency Max. Not only did 3.11.6 have lower latency but it was able to process many more requests due to not having a blocked queue.
3.11.6 Queue Saturation Test
The default native_transport_max_concurrent_requests_in_bytes is set to 1/10 of the heap size. The Cassandra max heap size of our cluster is 8 GB, so the default queue size for our queue is 0.8 GB. This turns out to be too large for this cluster size, as this configuration will run into CPU and other bottlenecks before we hit NTR saturation.
So we took the reverse approach to investigate full queue behaviour, which is setting the queue size to a lower number. In cassandra.yaml, we added:
native_transport_max_concurrent_requests_in_bytes: 1000000
This means we set the global queue size to be throttled at 1MB. Once Cassandra was restarted and all nodes were online with the new settings, we ran the same mixed load on this cluster, the results we got are shown in Table 5.
3.11.6 | Op rate (op/s) | Latency mean (ms) | Latency median (ms) | Latency 95th percentile (ms) | latency 99th percentile (ms) | Latency 99.9th percentile (ms) | Latency max (ms) |
Write: Default setting | 3,885.00 | 3,135.88 | 2,828.00 | 7,955.18 | 9,347.72 | 10,102.72 | 10,510.90 |
Write: 1MB setting | 2,105.00 | 5,749.13 | 3,471.82 | 16,924.02 | 26,172.45 | 29,681.68 | 31,105.00 |
Read: Default setting | 3,734.00 | 3,293.22 | 2,984.25 | 8,194.28 | 9,540.67 | 10,369.72 | 11,047.80 |
Read: 1MB setting | 5,395.00 | 2,263.13 | 1,864.55 | 5,176.47 | 8,074.73 | 9,693.03 | 15,183.40 |
Summary: Default setting | 7,619.00 | 3,214.55 | 2,906.13 | 8,074.73 | 9,444.19 | 10,236.22 | 11,047.80 |
Summary: 1MB setting | 7,500.00 | 4,006.13 | 2,668.18 | 11,050.24 | 17,123.59 | 19,687.36 | 31,105.00 |
Table 5: 3.11.6 native_transport_max_concurrent_requests_in_bytes default and 1MB setting
During the test, we observed a lot of paused connections and discarded requests—see Figure 3. For a full list of Instaclustr exposed metrics see our support documentation.

After setting native_transport_max_concurrent_requests_in_bytes to a lower number, we start to get paused connections and discarded requests, write latency increased resulting in fewer processed operations, shown in Table 5. The increased write latency is illustrated Figure 4.

On the other hand, read latency decreased, see Figure 5, resulting in a higher number of operations being processed.


As illustrated in Figure 6, the total number of operations decreased slightly with the 1MB setting, but the difference is very small and the effect of read and write almost “cancel each other out”. However, when we look at each type of operation individually, we can see that rather than getting equal share of the channel in a default setting of “almost unlimited queue”, the lower queue size penalizes writes and favors read. While our testing identified this outcome, further investigation will be required to determine exactly why this is the case.
Conclusion
In conclusion, the new NTR change offers an improvement over the previous NTR queue behaviour. Through our performance testing we found the change to be good for clusters that have a constant load causing the NTR queue to block. Under the new mechanism the queue no longer blocks, but throttles the load based on the amount of memory allocated to requests.
The results from testing indicated that the changed queue behaviour reduced latency and provided a significant lift in the number of operations without clients timing out. Clusters with our latest version of Cassandra can handle more load before hitting hard limits. For more information feel free to comment below or reach out to our Support team to learn more about changes to 3.11.6 or any of our other supported Cassandra versions.
The post Understanding the Impacts of the Native Transport Requests Change Introduced in Cassandra 3.11.5 appeared first on Instaclustr.