Project Circe February Update

Project Circe is our 2021 initiative to improve Scylla by adding greater capabilities for consistency, performance, scalability, stability, manageability and ease of use. For this installment of our monthly updates on Project Circe, we’ll take a deep dive into the Raft consensus protocol and the part it will play in Scylla, as well as provide a roundup of activities across our software development efforts.

Raft in Scylla

At Scylla Summit 2021, ScyllaDB engineering team lead Konstantin “Kostja” Osipov presented on the purpose and implementation of the Raft consensus protocol in Scylla. Best known for his work on Lightweight Transactions (LWT) in Scylla using a more efficient implementation of the Paxos protocol, Kostja began with a roundup of those activities, including our recently conducted Jepsen testing to see how our Lightweight Transactions behaved under various stresses and partitioned state conditions.

Watch the full presentation online, or read on if you prefer.

WATCH THE TALK: RAFT IN SCYLLA

What is Raft?

Kostja noted that the purpose of implementing Raft in Scylla extends far beyond enablement of transactions. He also differentiated it from Paxos, which is leaderless, by noting that Raft is a leader-based log replication protocol. “A very crude explanation of what Raft does is it elects a leader once, and then the leader is responsible for making all the decisions about the state of the database. This helps avoid extra communication between replicas during individual reads and writes. Each node maintains the state of who the current leader is, and forwards requests to the leader. Scylla clients are unaffected, except now the leader does some more work than the replicas, so the load distribution may be uneven. This means Scylla will need to support multiple Raft groups — Raft instances — to evenly distribute the load between nodes.”

“Raft is built around the notion of a replicated log,” Kostja explained. “When the leader receives a request, it first stores an entry for it in its durable local log. Then makes sure this local log is replicated to all of the followers, the replicas. Once the majority of replicas confirm they have persisted the log, the leader applies the entry and instructs the replicas to do the same. In the event of leader failure, a replica with the most up-to-date log becomes the leader.”

“Raft defines not only how the group makes a decision, but also the protocol for adding new members and removing members from the group. This lays a solid foundation for Scylla topology changes: they translate naturally to Raft configuration changes, assuming there is a Raft group for all of the nodes in the cluster, and thus no longer need a proprietary protocol.”

“Schema changes translate to simply storing a command in the global Raft log and then applying the change on each node which has a copy of the log.”

“Because of the additional state stored at each peer, it’s not as straightforward to do regular reads and writes using Raft. Maintaining a separate leader for each partition would be too much overhead, considering individual partition updates may be rare.”

“This is why Scylla is working on a new partitioner alongside Raft, which would reduce the total number of tokens or partitions, while still keeping the number high to guarantee even distribution of work. It will also allow balancing the data between partitions more flexibly. We will call such partitions tablets. Scylla will run an independent Raft instance for each tablet.”.

Topology Changes

“Presently, topology changes in Scylla are eventually consistent. Let’s use node addition as an example. A node wishing to join the cluster advertises itself to the rest of the members through gossip.

For those of you not familiar with gossip, it’s an epidemic protocol which is good for distributing some infrequently changing information at low cost. It’s commonly used for failure detection. So when the cluster is healthy it imposes very low overhead on the network. And the state of a failing node is disseminated across the cluster reasonably quickly. Several seconds would be a typical interval.”

“Since gossip is not too fast, the joining node waits by default for 30 seconds to let the news spread. Then nodes that received information about the joining node begin forwarding reads and writes once they become aware of it. Once the joining node waits for the interval and starts receiving updates it can begin data rebalancing.”

“Node removal or decommission works similarly, except the node gossiping about the node being removed is not the node being removed, because that node is already basically dead. So this is very similar to real life. Not all of the gossip spread about us is spread by us. Most of it is not spread by us.”

“This poses some challenges. If some of the nodes in the cluster are not around while a node is joining it will not be aware of the new topology. So as soon as it is back, and before gossip is actually reaching it with news about the new joined node, it will assume the old topology [is still valid] and serve reads and writes using the old topology. This is not terrible, but repair will be needed to bring the writes that were served using this old topology back to the new node.”

“Another issue is that information dissemination of gossip is fairly slow. One way we reason about how we could add multiple nodes to the cluster concurrently, we think about splitting node addition or node topology change operations into multiple steps. And communicating between the nodes during each step independently. Relying upon gossip in that case would be impractical. That would require a thirty second interval for each step.”

Raft handles these challenges by including topology changes (called “configuration changes”) in its protocol core. This part of Raft protocol is also widely adopted and went under extensive scrutiny, so should be naturally preferred to Scylla’s proprietary gossip-based solution inherited from Cassandra.

“It approaches configuration [topology] changes very similarly to regular reads and writes,” Kostja noted. “The first thing Raft does about the configuration change is [the leader] stores information about it in the log. The first entry stored in the log is special. It informs all of the followers and replicas. They now need to begin to work in this new joint configuration. They need to take this new configuration into account while still ensuring they also reach all of the nodes from the old configuration in case they become deleted.”

“Then when the leader knows that the majority of replicas in the cluster have received this joint configuration it stores another auxiliary entry in the log that informs the nodes to switch entirely to the new configuration. This approach guarantees that the majority of the cluster actually works in the new configuration all together. So some nodes know the old configuration, and some nodes use this new configuration.”

“The worst case that can happen is that some nodes in this majority will use the joint configuration and other nodes will use the old configuration, or some nodes use joint configuration and some nodes use new configuration. But since joint configuration includes both old and new, these configurations are compatible so this preserves linearizability of configuration changes.”

“In Scylla we plan to use Raft for configuration changes as the first step of any topology change. So when a node is joining or leaving it will first be added or last removed from the global Raft group. Then we can use the global Raft log to consistently store the information about the actual range movement, token additions, token removals and so on.”

Schema Changes

“Schema changes are operations such as creating and dropping key spaces, tables, user defined types or functions. If they are using Raft they can also benefit from linearizability.”

“Currently schema changes in Scylla are eventually consistent. Each Scylla node has a full copy of the schema. Requests to change schema are validated against a local copy and then applied locally. A new data item might be added immediately following the schema change, so before any other nodes even know about the new schema.”

“There is no coordination between changes at different nodes, and any node is free to propose a change. The changes eventually propagate through the cluster and the last time stamp wins rule is used to resolve conflicts if two changes against the same object happen concurrently.”

“Data manipulation is aware of this possible schema inconsistency. A specific request carries a schema version with it. Scylla is able to execute requests with divergent history. So it will fetch the necessary schema version just to execute the request. This guarantees the schema changes are fully available even in the presence of severe network failures.”

“It has some downsides as well. It is possible to submit changes that conflict: define a table that uses UDT, then drop that UDT. New features — such as triggers or functions, UDFs — aggravate the consistency problem.”

“Schema changes using Raft also benefit from linearizability. After switching them to Raft any node would still be able to propose a change. The change would be forwarded to the leader. The leader validates it against the latest version of the schema. Then it will store the entry for the schema change in the Raft log, make sure it’s disseminated among the majority of the nodes, and only then will it apply the change on all of the nodes in linearizable order.”

“With this approach, changes will form a linear history and divergent changes will be impossible. It should open the way for more complex but safe dependencies between schema objects such as triggers, functional indexes, and so on.”

“Replicas that were down while the cluster has been making schema changes will first catch up with the entire history of the schema changes and only then will start serving reads and writes. There is also a downside: it will not be possible to do a schema change on an isolated node, or a node which is isolated from the majority.”

“Still possible, even with this approach, is that the node gets a request — say an eventually consistent write — that uses an old version of the schema, or a version of the schema the node is still not aware of. In this case the node which doesn’t have the schema will still have to pull it like we do today.”

Tablets

“Finally, the ultimate feature enabled by Raft are fast and efficient, strongly-consistent tables. Tablets is a term for a database unit of data distribution load balancing first introduced in Google Bigtable in 2016. Let’s see how they work.”

“Today’s Scylla partitioning strategy is not pluggable. Compare it with replication strategy. You can change how many replicas you have. Where these replicas are located. What is the consistency level you use for each read and write. The Scylla partitioner is not like that. All you can do is define a partitioning key and then the partition key is mapped to a token, and tokens map to a specific replica set or shard.”

“Thanks to hashing and use of vNodes, the data is evenly distributed across the cluster. Most write and read workloads produce even load on all [nodes] of the cluster. Even including such workloads as time series workloads. So hotspots are unlikely.”

“Despite excellent randomization provided by applying hashing, oversized partitions and very hot tokens are possible: you cannot select the partition key perfectly, so some keys will naturally have more data or will be more frequently accessed. Frequent range scans over even over small tables require streaming from many nodes, even if they stream very little data.”

“With tablets we would like to introduce a partitioning strategy which is based on data size, not only the number of partitions it contains, or on partition hash. Tablets split the entire key range over the primary key range and make sure that every tablet contains roughly equal numbers of data. When the tablet becomes too big it is split. When it becomes too small it’s coalesced — two small adjacent tablets are merged into one. This is called dynamic load balancing.”

“Another good thing about tablets is that even in very large clusters there can’t be too many of them. Like a hundred thousand tablets is 64 terabytes of data. This means we can have a reasonable number of Raft groups.”

“Every tablet will have its own Raft log. If a Raft log is used for reads and writes we cannot accept client-side timestamps because there is a single linearizable order for all writes to the table. We provide serial consistency for all queries. Writes do not need to require a read like LWT. There is no need to repair because Raft automatically repairs itself. There still may be a need to repair the Raft log itself, but this is different from repairing the actual data.”

“Those of you who are familiar with consistency of materialized views know that it’s very hard to make materialized views consistent in the presence of eventual consistent writes to the base table. This problem will also be solved with Raft and tablets.”

Project Circe Progress for February 2021

Recent Raft Developments

Since our Scylla Summit there have been Raft-related commits into our repository to bring these plans to fruition.

  • Raft Joint Consensus has been merged. This is the ability to change a Raft group from one set of nodes to another. This is needed to change cluster topology or to migrate data to different nodes.
  • Raft now integrates with the Scylla RPC subsystem; Raft itself is modular and requires integration with the various Scylla service providers.

DynamoDB-Compatible API Improvements

In line with the broad set of capabilities outlined in Project Circe, we’re also making improvements to our Amazon DynamoDB-compatible interface, called Alternator.

  • We now support nested attributes #5024
  • Alternator: support Cross-Origin Resource Sharing (CORS) #8025
  • Support limiting the number of concurrent requests in Alternator #7294

Scylla Operator for Kubernetes

Scylla Operator 1.1 is almost released (rc) with many bug fixes and three Helm Charts, especially useful for users interested in customizing their Scylla setups:

  • scylla – allows for customization and deployment of Scylla clusters.
  • scylla-manager – allows for customization and deployment of Scylla Manager.
  • scylla-operator – allows for customization and deployment of Scylla Operator.

To help you start using new Helm Charts, we added an example run to our documentation.

The post Project Circe February Update appeared first on ScyllaDB.

Scylla University: New Lessons for February 2021

In my previous blog post, I wrote about the top students for 2020, the Scylla Summit Training Day, getting course completion certificates, and other news. In this blog post I’ll talk about new lessons added to Scylla University since our June 2020 update.

New Features

  • New CDC lesson: Change Data Capture, or CDC, is a feature that allows you to query recent changes made to data in specified tables. CDC enables users to build streaming data pipelines that enable real-time data processing and analysis and immediately react to modifications occurring in the database. Some of the topics covered in this lesson are:
    • An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
      How can that data be consumed? Different options for consuming the data changes including normal CQL, a layered approach, and integrators
    • How does CDC work under the hood? Covers an example of what happens in the DB on different operations to allow CDC
    • A summary of CDC: It’s easy to integrate and consume, it uses plain CQL tables, it’s robust, it’s replicated in the same way as regular data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTL’ed. The summary also includes a comparison with other NoSQL CDC implementations in Cassandra, DynamoDB, and MongoDB.
  • Workload Prioritization: using Service Level CQL commands database administrators working on Scylla Enterprise can set different workload prioritization (level of service) for each of these workloads without sacrificing latency or throughput. Each service level can also be attached to your organization’s various roles, ensuring that each role is granted the level of service they require.
  • Materialized Views + Secondary Indexes: Includes nine topics, three quizzes, MV lab, and SI lab. In Scylla (and Apache Cassandra), data is divided into partitions, rows, and values, which can be identified by a partition key. Sometimes the application needs to find a value by the value of another column. Doing this efficiently without scanning all of the partitions requires indexing, the focus of this lesson. There are three indexing options available in Scylla: Materialized Views, Global Secondary Indexes, and Local Secondary Indexes. They are all covered in this lesson, along with comparing them, examples of when to use each, quizzes, and hands-on labs.
  • New LWT lesson: There are cases when it is necessary to modify data based on its current state: that is, to perform an update that is executed only if a row does not exist or contains a particular value. Lightweight Transactions (LWTs) provide this functionality by only allowing data changes to occur if the condition provided evaluates as true. The conditional statements provide linearizable semantics, thus allowing data to remain consistent. A basic rule of thumb is that any statement with an IF clause is a conditional statement. A batch that has at least one conditional statement is a conditional batch. Conditional statements and conditional batches are executed atomically as a LWT. This lesson provides an overview of LWT, an example of how it’s used, and a comparison with Apache Cassandra.

Language Lessons

  • CPP Part 2: Using Prepared Statements with the CPP (C++) Driver to connect to a Scylla cluster and perform queries.
  • Scala lesson part one and part two: How to use the Phantom Scala driver to create a sample Scala application that executes a few basic CQL operations with a Scylla cluster using the Phantom Scala driver.

Operations and Management

  • Security: This lesson covers security features and the way that Scylla handles security. By the end of this lesson, you’ll understand why security is essential in Scylla, the different security features, and how it works. Some of the topics covered in this lesson are:
    • Why is it important to secure your data? Business value is increasingly tied to data. Security properties such as Identity, Authentication, Confidentiality, Availability, Integrity, and Non-repudiation
      How to manage identities with users? Identity, Authentication, Users and passwords, and Availability
    • What is Authentication, and how it limits access to the cluster to identified clients? Authentication is the process where login accounts and their passwords are verified, and the user is allowed access to the database.
    • Users and passwords are created with roles using a GRANT statement. This procedure enables Authentication on the Scylla servers. However, once complete, all clients (application using Scylla/Apache Cassandra drivers) will stop working until they are updated to work with Authentication as well.
    • The concepts of roles and permissions, Confidentiality, Non-repudiation
    • What is Authorization? How are users granted permissions which entitle them to access or change data on specific keyspaces, tables, or an entire data center? Role-Based Access Control reduces lists of authorized users to a few roles assigned to multiple users. It also includes an example.
    • Encryption In Transit, which is: Client to Node, Node to Node, and an overview of Encryption At Rest, which includes data stored in Tables, System, and Providers.
    • Encryption at Rest, or how to encrypt user data as stored on disk? This is invisible to the client and available on Scylla Enterprise. It uses disk block encryption and has a minimal impact on performance.
    • Auditing enables us to know who did/looked at / changed what and when by logging activities a user performs on the Scylla cluster.
    • The importance of ensuring that Scylla runs in a trusted network environment, limiting access to IP / Port by role, using minimal privileges principle, avoiding Public IP if possible, and using VPC if possible. Security is an ongoing process. Ensure that you routinely upgrade to the latest Scylla and OS versions, routinely check for network exposure, routinely replace keys/passwords, use 2FA (Scylla Cloud), and use minimal privilege principle, apply available security features.
  • Using Spark with Scylla: by using Spark together with Scylla, users can deploy analytics workloads on the information stored in the transactional system. This lesson goes over an overview of Scylla, Spark, and how they can work together.
  • Configuration and Where to Run Scylla: covers Scylla configuration and setup and best practices. By the end of this lesson, you’ll better understand Scylla installation and configuration’s practical aspects.
  • How to Write Better Apps: In this lesson, you’ll learn how to write better applications. This is an intermediate to advanced level lesson. By the end of this lesson, you’ll have a better understanding of application development, caveats, performance, and what you should or shouldn’t do.
  • Scylla Drivers: Three new lessons were added to the course. They cover an overview of the Scylla Token Ring architecture, Scylla specific (shard-aware) drivers, why it is important to use them, and what is paging and shard awareness.
  • Alternator: The course is updated with new lessons and quizzes, covering Alternator, Scylla’s DynamoDB-compatible API, in action and how it works, implementation details, and a hands-on lab to help you get started.
  • Manager Repair Tombstones: This lesson deals with what repair is and why it is needed. What are Tombstones, why it is important, and Scylla Manager. Scylla Manager is a centralized cluster administration and recurrent tasks automation tool. Scylla Manager can schedule tasks such as repairs and backups. Scylla Repair is a process that runs in the background and synchronizes the data between nodes so that eventually, all the replicas hold the same data. Data stored on nodes can become inconsistent with other replicas over time, which is why repairs are a necessary part of database maintenance. Using Scylla repair makes data on the node consistent with the other nodes in the cluster. The best use of Scylla repair is to have the Scylla Manager schedule and run the repairs for you.
  • Admin Procedures and Tools: In this lesson, you’ll learn how to administer a Scylla cluster. It covers essential tools and procedures, best practices, common pitfalls, and tips for successfully running a cluster.
  • New Scylla Monitoring lesson: It’s extensive, and it includes nine topics, one lab, and two quizzes. This lesson covers Scylla Monitoring. Scylla Monitoring is a full-stack for monitoring a Scylla cluster and for alerting. The stack contains open source tools, including Prometheus and Grafana, and custom Scylla dashboards and tooling.

Next Steps

That’s a lot of new content for you to check out. Scylla University now covers most Scylla-related topics for Developers, DBAs, and Architects.

Visit Scylla University and start learning. It’s free, online, and self-paced!

SIGN UP FOR SCYLLA UNIVERSITY!

The post Scylla University: New Lessons for February 2021 appeared first on ScyllaDB.

Prometheus Backfilling: Recording Rules and Alerts 

For many Prometheus users using recording rules and alerts, a known issue is how both are only generated on the fly at runtime. This limitation has two downsides. First of all, any new recording rule will not be applied to your historical data. Secondly and even more troubling, you cannot even test your rules and alerts against your historical data.

There is active work inside Prometheus to change this, but it’s not there yet. In the short term, to meet this requirement we created a simple utility to produce OpenMetrics data to fill in the gaps. I will cover the following topics in this blog post:

  • Generating OpenMetrics from Prometheus
  • Backfilling alerts and recording rules

Introduction

While Prometheus can load existing data, it does not fill recording rules and alerts. Starting from Prometheus release 2.25 it is possible to backfill Prometheus using OpenMetrics files.

To better understand the issue and the problem, consider the following case: One of our customers wanted tighter and more granular alerts on their p99 latencies.

Before applying those new rules, they wanted to understand what impact those alerts would have. In other words, if those rules were in place, would they have been used enough or would they have been used too much? To make things even more interesting, recording rules are used in the dashboards.

Since the customer was running a production system there was no problem obtaining a week’s worth of Prometheus data. Then all the experiments were done on a separate Prometheus server with those data copies and not on the production one.

I used promutil.py, a small utility python script that you can download from our repository here.

Setup

While I use Scylla Monitoring Stack for the testing because it’s easier for me, you don’t need to. I suggest using the Prometheus Docker container in general.

  1. Create a data directory — we’ll assume it’s called data.
  2. Place your Prometheus table data in that directory.
  3. Download promutil.py
  4. Have your Prometheus rule file as prometheus.rules.yml

Run Prometheus with Docker

docker run -d  -v "$PWD/data:/prometheus/data" -p 9090:9090 --name prom prom/prometheus:v2.25.0

This will run Prometheus using data in its data container; you can connect to the server via HTTP via port 9090 (e.g., http://{ip}:9090, where “ip” is your server’s IP address).

You can check that your data is there by looking at a known metric. Typically you would need to look back at a day or two, depending how old your data is.

Run promutil.py

The promutil.py utility can generate OpenMetrics output-file from Prometheus. You can run ./promutil.py help to see the different options.

We would use a range query. You can supply a specific query as a parameter but instead we would use the prometheus.rules.yml file, that would add to the output file each of the metric rules in that file.

A range query needs a start and end, the promutil.py would accept any two of: start, end and duration.

Start and end can either be absolute (example: 2021-01-29T23:58:55.980Z) or relative (examples: 8s, 10h, 3d). For example, this is how we generate 3 days of metrics that ends 2 days ago:

./promutil.py --rules prometheus.rules.yml rquery -d 3d --end 2d --out-file /data/metrics.txt

Note that we placed the output file inside the data directory. This is important because we can access it from inside the Prometheus Docker container.

Generate the Prometheus Blocks

Run promtool inside the container:

docker exec -it prom promtool tsdb create-blocks-from openmetrics /prometheus/data/metrics.txt /prometheus/data/

Restart the Prometheus server and that’s it — your rules are there!

Alerts

For testing, promutil.py can also be used to handle alerts. While it will not generate the alerts for historical data, it will create a metric for each of the alerts named alert:{alert_name}.

Labels in the alert will be added to the generated metric. You can now look at the graph of that metric, you will see the points that match the alert criteria.

Bonus – Ad Hoc Alerts

Unrelated to backfilling, if you regularly keep track of systems then you know the feeling of looking at a system and trying to figure out what’s going on. You can also use the promutil.py for such cases. Keep a rule file with as many alerts as you want. The output of running the promutil on this file will return only metrics that match those alerts. In other words, it’s like asking, “have any of these conditions been met in the last few days (or hours, etc.)?”

Take Away

Prometheus finally supports backfilling metrics through the use of OpenMetrics format. You can use our promutil.py to generate historical values for recording rules and alerts from a running Prometheus server. You can also use the promutil.py for ad hoc alerts, making it a handy new tool when diagnosing a system.

DISCOVER MORE IN OUR SCYLLA MONITORING HANDS-ON LAB

 

 

The post Prometheus Backfilling: Recording Rules and Alerts  appeared first on ScyllaDB.

Expedia Group: Our Migration Journey to Scylla

Expedia Group, the multi-billion-dollar travel brand, presented at our recent Scylla Summit 2021 virtual event. Singaram “Singa” Ragunathan and Dilip Kolosani presented their technical challenges, and how Scylla was able to solve them.

Currently there are multiple applications at Expedia built on top of Apache Cassandra. “Which comes with its own set of challenges,” Singa noted. He highlighted four top issues:

  • Garbage Collection: The first well-known issue is with Java Virtual Machine (JVM) Garbage Collection (GC). Singa noted, “Apache Cassandra, written in Java, brings in the onus of managing garbage collection and making sure it is appropriately tuned for the workload at hand. It takes a significant amount of time and effort, as well as expertise required, to handle and tune the GC pause for every specific use case.”
  • Burst Traffic & Infrastructure Costs: The next two interrelated issues for Expedia are burst traffic which leads to overprovisioning. “With burst traffic or a sudden peak in the workload there is significant disturbance to the p99 response time. So we end up having buffer nodes to handle this peak capacity, which results in more infrastructure costs.”
  • Infrequent Releases: “Another significant worry” for Expedia, according to Singa, was Cassandra’s infrequent release schedule. “According to the past years’ history, the number of Apache Cassandra releases has significantly slowed down.”

Showing a comparative timeline between Cassandra and Scylla, Singa continued, “We would like to compare the open source commits in Cassandra versus Scylla in a timeline chart here, and highlight the amount of releases that Scylla has gone through in the same past three year period. As you can see, it gives enough confidence towards Scylla that, given an issue or bug with a specific release, it will be soon addressed with a patch. In contrast with Apache Cassandra, one might have to wait longer.

Timeline created by Expedia showing the update frequency of Cassandra compared to Scylla.

“So why did we end up with Scylla?” Peace of mind, both operationally and expense-wise, was key. “Thanks to the C++ backend of Scylla we no longer have to worry about ‘stop-the-world’ GC pauses. Also, we were able to store more data per node, and achieve more throughput per node, thereby saving significant dollars for the company.”

For Singa, another key issue was ease of migration. “From an Apache Cassandra code base, it’s frictionless for developers to switch over to Scylla. For the use cases that we tried, there weren’t any data model changes necessary. And the Scylla driver was compatible, and a swap-in replacement with Cassandra driver dependency. With a few tweaks to our automation framework that provisions an Apache Cassandra cluster, we were able to provision a Scylla Open Source cluster.”

Lastly, in terms of being able to entrust Expedia’s business to Scylla, “A clear roadmap and support from ScyllaDB’s Slack community comes in very handy.”

Expedia Geosystem on Scylla

“The candidate application chosen for this proof-of-concept (POC) is our geosystem that provides information about geographical entities and the relationships between them. It aggregates data from multiple systems, like hotel location info, third party data, etc. This rich geography dataset enables different types of data searches using a simple REST API while guaranteeing single-digit millisecond p99 read response time.”

Sanga then described the prior existing architecture, “To speed up API responses, we are using multilayered cache, with Redis as a first layer, and Cassandra as a second layer.” The goal was to replace both Redis and Cassandra with Scylla.

Dilip then described the Scylla cluster that they ran tests on:

  • 24 nodes
  • 25 TB of data
  • i3.2xlarge AWS instances
  • Scylla Open Source 4.1.4

“The idea is to test if a lower capacity Scylla cluster can match the performance of our existing Cassandra cluster or not,” Dilip explained. “We didn’t face any major challenges migrating from Cassandra to Scylla. We are not using any fancy features like secondary indexes, materialized views and lightweight transactions, so we kept our data model and application drivers as-is while migrating from Cassandra to Scylla.”

In the performance comparison between Cassandra and Scylla, Dilip showed how write latency was essentially flat and negligible for both, “But the real winner here is p99 read latency for Scylla, which is consistently around 5 ms throughout the day.” He showed how Cassandra latency was “spiky in nature, and it varies from 20 to 80 ms throughout the day depending on the traffic pattern.”

For throughput, “Scylla was also able to deliver throughput close to 3x compared with Apache Cassandra. So for applications that require high throughput and single-digit read latency, Scylla is recommended over Apache Cassandra.” Their benchmark also helped them prove that Scylla would provide a 30% infrastructure cost savings.

Next Steps

Expedia’s roadmap for 2021 is to replace the Cassandra/Redis architecture with Scylla, since it by itself can support single-digit millisecond latencies.

You can learn more about Expedia’s benchmark results and their plans for Scylla by watching their full Scylla Summit presentation.

If you have plans of your own for Scylla, we’d love to hear about them. You can contact us directly, or join our Slack channel to engage with our engineers and the Scylla community.

WATCH EXPEDIA’S SCYLLA SUMMIT PRESENTATION

The post Expedia Group: Our Migration Journey to Scylla appeared first on ScyllaDB.

Scylla Developer Hackathon: Rust Driver

Scylla’s internal developer conference and hackathon this past year was a lot of fun and very productive. One of the projects we put our efforts into was a new shard-aware Rust driver. We’d like to share with you how far we’ve already gotten, and where we want to take the project next.

Motivation

The CQL protocol, used both by Scylla and Cassandra, already has some drivers for the Rust programming language on the market – cdrs, cassandra-rs and others. Still, we have rigorous expectations towards the drivers, and in particular we really wanted the following:

  • asynchronicity
  • support for prepared statements
  • token-aware routing
  • shard-aware routing (Scylla-specific optimization)
  • paging support

Also, it would be nice to have the driver written in pure Rust, without having to use any unsafe code. Since none of the existing drivers fulfilled our strict expectations, it was clear what we have to do — write our own async CQL driver in pure Rust! And the ScyllaDB hackathon was the perfect opportunity to do just that.

After intensive work during the hackathon, we completed the first version of the scylla-rust-driver and made it open-source available here:

The Team

Here’s our hackathon team, discussing crucial design decisions for the new most popular CQL driver for Rust!

Rust driver hackathon team members beginning with Piotr Sarna in the upper left and clockwise: Pekka Enberg, Piotr Dulikowski, and Kamil Braun.

Design and API example

Our driver exposes a set of asynchronous methods which can be used to establish a CQL session, send all kinds of CQL queries, receive and parse results into native Rust types, and much more. At the core of our driver, there’s a class which represents a CQL session. After establishing a connection to the cluster, the aforementioned session can be used to execute all kinds of requests.

Here’s what you can currently do with the API:

API features include:

  • connect to a cluster
  • refresh cluster topology information
    • how many nodes are in the cluster
    • how many nodes are up
    • which nodes are responsible for which data partitions
  • perform a raw CQL query
    • not paged or with custom page size
  • prepare a CQL statement
  • execute a prepared statement
    • not paged or with custom page size

To see more comprehensive examples, take a look at https://github.com/scylladb/scylla-rust-driver/tree/main/examples

A snapshot of the documentation is available here:

https://psarna.github.io/scylla-rust-driver-docs/scylla/

Implementing the driver with Rust async/await and Tokio

Rust language already has built-in support for asynchronous programming through the async/await mechanism. Additionally, we decided to base the driver on the Tokio framework, which provides an asynchronous runtime for Rust along with many useful features.

Our first step was to implement connection pools used to connect to Scylla/Cassandra clusters and to ensure the driver can handle both unprepared and prepared CQL statements. In order to provide that, we meticulously followed the CQL v4 protocol specification and implemented the initial request types: STARTUP, QUERY, PREPARE and EXECUTE.

Having such a solid footing, we split the work to also provide proper paging and the ability to fetch topology information from the cluster. The latter was needed to make our driver token-aware and shard-aware.

Token awareness allows the driver to route the request straight to the right coordinator node which owns the particular partition, which avoids inter-node communication and generally lowers the overhead. Shard awareness is one step further and is only supported when using the driver to connect to Scylla. The idea is that the request ends up not only on the right node, but also on the right CPU core, thus avoiding inter-core communication and minimizing the overhead even further. Read more about Scylla shard awareness and its positive effect on performance in a great blog which described this optimization for a Python driver.

Interlude: fixing murmur3 by implementing it with bugs

Wait, what? That’s right, during the hackathon we ended up needing to rewrite a murmur3 hashing algorithm with bugs in order to stay fully compatible with Apache Cassandra!

When performing token awareness tests, I noticed that around 30% of all requests ended up on a wrong node. That shouldn’t happen, so we quickly started an investigation. We meticulously checked:

  1. That the topology information fetched from Scylla is indeed correct,
    and consistent with the output of `nodetool describering`
  2. That the token computations return correct results on the first 100 keys,
    which makes it highly unlikely that token computation is to blame

… and we shouldn’t have stopped at checking just 100 keys! It turns out that the first failure happened after we rerun the test for the first 10,000 keys. Further investigation showed that a similar problem occurred for our Golang friends: https://github.com/gocql/gocql/issues/1033.

In short, Cassandra’s murmur3 implementation handwritten in Java operates on signed integers, while the original algorithm used unsigned ones. That creates some subtle differences when shifting the values, which in turn translates to around 30% of tokens being calculated inconsistently with the Cassandra way.

We had no choice but to stop using a comfy crate from crates.io which provided us with a neat murmur3 algorithm implementation and instead we spent the whole night rewriting the algorithm by hand, bugs included™!

Results

We ran two simple benchmarks to see how scylla-rust-driver compares to other existing drivers.

The benchmark’s goal was to send 10 million prepared statements as fast as possible, given a fixed concurrency of 1,024. The usage of token-aware and shard-aware routing was allowed, if supported by the driver. All drivers were compared against the same 3-node Scylla cluster, each node having 2 shards. We compared against GoCQL (enhanced by us with shard awareness) and cdrs.

gocql scylla-rust-driver cdrs
Writes real 0m59.658s
user 15m21.846s
sys 2m32.438s
real 0m18.310s
user 1m44.170s
sys 0m36.318s
real 12m34.761s
user 2m14.253s
sys 6m21.757s
Reads real 1m6.276s
user 17m35.803s
sys 2m46.497s
real 0m23.928s
user 1m52.654s
sys 0m41.791s
real 12m50.929s
user 3m7.008s
sys 6m43.048s
Mixed
(reads and writes)
real 1m3.409s
user 17m23.127s
sys 2m29.905s
real 0m19.715s
user 1m51.372s
sys 0m35.209s
real 13m28.133s
user 2m44.918s
sys 6m42.705s

Output of Linux’ time command for processing 10 million prepared statements with a fixed concurrency of 1,024 using different drivers.

Source code of all benchmarks:

Future plans

The future of our project is very bright. As a matter of fact, it’s already scheduled for another year of hands-on work! A team of four talented students from the University of Warsaw will continue developing the driver from where we left off, as part of the ZPP program. This is the second time Scylla is proudly taking part in the program as a mentor. Here are the other projects from last year:

We also have an official roadmap, updated version of which can always be found in our repository (https://github.com/scylladb/scylla-rust-driver):

Done:

  • driver-side metrics
    • number of sent requests
    • latency percentiles of sent requests
    • number of errors
  • handling topology changes and presenting them to the user
  • CQL batch statements
  • custom error types
    • robust handling of various errors (e.g. repreparing statements)

In progress:

  • CQL authentication support
  • TLS support
  • configurable load balancing algorithms
  • configurable retry policies
  • query builders

Backlog:

  • CQL tracing
  • [additional] performance benchmarks against other drivers
    • cdrs
    • gocql
  • handling events pushed by the server
  • speculative execution
  • expanding the documentation
  • more correctness tests
  • more integration tests – preferably using Scylla’s ccm framework and Python
  • preparing the work to be published on crates.io

The post Scylla Developer Hackathon: Rust Driver appeared first on ScyllaDB.

What’s New at Scylla University for February 2021

We had a busy 2020, and a lot has happened since my last update. Usage was significantly higher than the prior-year, and we had thousands of new users consuming University material.

We added lots of content to Scylla University covering many of the new features of Scylla. I’ll write about the new lessons in my next blog post. In this post, I’ll cover the top students for 2020, Training Day, which was part of Scylla Summit along with some other news.

Dean’s List

Of the many active trainees using Scylla University, the top users were:

  1. Lucrezia Geusa, Software Developer, RDSLab, “I really enjoyed the online training course and it was really useful for me. I’m also glad to have added the certificate to my Linkedin profile.”
  2. Lalith Kumar Nimmala, Technology Lead, Infosys, “Scylla University provided a frictionless experience for me to get started and level up quickly. Thanks to the training team at ScyllaDB for being thoughtful in preparing the content and organizing it for the benefit of learners like me.”
  3. Ian Harding, DBA, Zillow, “I really enjoyed the online free training provided by Scylla.  The interface is simple and the content is “Just right” when it comes to the amount of detail provided.  It tells you “why” as well as “how” which I appreciate!”
  4. Julianna Hansen, Senior Software Developer, Project Lead, Platforms LLC, “Scylla University has tremendously helped me learn the fundamentals of NoSQL DBs! The latest courses that paired with the Scylla Summit have furthered my understanding especially about Materialized Views vs Secondary Indexes.”
  5. Oliver Allan, DBA, Metro Bank, “I have found Scylla University to be a fantastic learning resource, clear explanations and has helped me immensely so thank you for providing this great resource!”
  6. Satyam Mittal, Database Software Engineer, Grab, “The content at Scylla University is very good quality and quizzes at the end of the section encourage me to figure more things out and actually learn. It not only helped me in learning specific concepts about Scylla but also helped me in figuring out differences with other NoSQL databases. I will be looking at learning more courses from Scylla University in my free time.”

Cool Scylla Scylla Swag is on its way to you.

As a reminder, points are awarded for:

  • Completing lessons including quizzes and hands-on labs, with a high score
  • Submitting a feedback form, for example, this one.
  • General suggestions, feedback, and input (just shoot me a message on the #scylla-university channel on Slack)

Training Day

This year, due to the circumstances and for the first time, Scylla Summit was virtual and took place at the beginning of 2021. The Training Day had the highest attendance to date, and all the material was published in two courses on Scylla University. One for the Developer Track and one for the Administrator Track. If you complete one of the above courses, you’ll get a course completion certificate and some cool swag. These courses will be available until the end of February.

Once you get a certificate, you can share it on your LinkedIn profile to show off your achievements. To see your certificates go to your profile page.

Next Steps

If you haven’t done so yet, create a user account and start learning, it’s free!

Join the #scylla-university channel on Slack for more training related updates and discussions.

START LEARNING!

 

 

The post What’s New at Scylla University for February 2021 appeared first on ScyllaDB.

Disney+ Hotstar: Powered by Scylla Cloud

Disney+ Hotstar was originally launched in 2015 as Hotstar, the streaming service for Star India, which was later acquired by The Walt Disney Company. In March 2020 the digital service was rebranded as Disney+ Hotstar and subscriptions soared. By November 2020 they had increased to 18.5 million paid subscriptions, becoming the fastest growing segment of Disney+ global subscribers. They also expanded into Indonesia in September 2020.

Disney+ Hotstar in addition provides an ad supported content tier, which is growing even faster than paid subscribers. Total paid and unpaid subscribers account for over 300 million users. Disney+ Hotstar coverage of last year’s India Premier League cricket competition (IPL20) set a record for concurrent streaming viewership of any sporting event, with an audience of more than 25 million subscribers.

We were thrilled to have the engineering team of Disney+ Hotstar join us to present at our online Scylla Summit 2021 this January. The speakers included Vamsi Subash Achanta, the Architect behind their Scylla deployment, and Balakrishnan Kaliyamoorthy, their Senior Data Engineer.

Use Case: Continue Watching

Their “Continue Watching” feature uses Scylla to track every show for every user, remembering the timestamp where they were last watching their show; or, if you were done with one episode in a series, it will prompt you to watch the next episode. It can even alert you to continue watching when a new episode of your favorite show becomes available. This feature can be used cross-platform, so you might have begun watching on your computer, and then can continue watching from your mobile or tablet device.

Moving to Scylla

Prior to their adoption of Scylla, their infrastructure was built on a combination of Redis and Elasticsearch, connected to an event processor for Kafka streaming data. Their Redis cluster held 500 GB of data, and the Elasticsearch cluster held 20 TB. Their key-value data ranged from 5kb to 10kb per event.

This presented the team with a few problems. First was that the multiple data stores also meant maintaining multiple data models. The data was scaling rapidly, which meant that costs were also rising dramatically.

The first redesign decision was to adopt a new data model. For the user content table, the userid acted as the primary key, the content ID as the secondary (clustering) key, plus a timestamp and additional fields.

The team considered a number of alternatives, from Apache Cassandra and Apache HBase to Amazon DynamoDB to Scylla. Why did they choose Scylla? Two important reasons: first and foremost, consistently low latencies for both reads and writes, which would ensure a snappy user experience. Secondly, Scylla Cloud, our fully managed database as a service (NoSQL DBaaS), offered a much lower cost than the other options they considered.

Performance monitoring results of Scylla showing sub-millisecond p99 latencies, and average read and write latencies in the range of 150 – 200 µseconds (microseconds).

Balakrishnan (“Bala”) gave an overview of their migration process. They began with saving a Redis snapshot in an RDB format file, which was then converted into Comma Separated Value (CSV) for uploading into Scylla using cqlsh. One thing Bala cautioned was to watch for maximum useful concurrency of your writes to ensure you do not end up with write timeouts.

A similar process was applied to the Elasticsearch migration.

Once Scylla Cloud had been loaded with the historical data from both Redis and Elasticsearch, it was kept in sync by modifying their processor application, to ensure that all new writes also were made to Scylla, and an upgrade to the API server so that all reads could be made from Scylla as well.

At that point, writes and reads could be cut out from the legacy Redis and Elasticsearch systems, leaving Scylla to handle ongoing traffic. This completely avoided any downtime.

The Disney+ Hotstar team had also done some work with Scylla Open Source, and needed to move that data into their managed Scylla Cloud environment as well. There were two different processes they could use: SSTableloader or the Scylla Spark Migrator.

SSTableloader uses a nodetool snapshot of each server in a cluster, and then uploads the snapshots to the new database in Scylla Cloud. This can be run in batches, or all at once. Bala noted that this migration process slowed down noticeably when they had a secondary (composite key).

To avoid this slowdown the team implemented the Scylla Spark Migrator instead.

In this process, the data was first backed up to S3 storage, then put onto a single node Scylla Open Source instance; a process known as unirestore. From there it was pumped into Scylla Cloud using the Scylla Spark Migrator. Bala found our blog, Deep Dive into the Scylla Spark Migrator, particularly helpful in setting up his migration.

Advantages of Scylla Cloud

Disney+ Hotstar is now running on Scylla Cloud. So beyond the improved performance, predictable low latencies, and better TCO, they are also relieved of the burden of administrative tasks like backups, upgrades and repairs. Now they can focus on scaling their business.

If you’d like to learn more about the advantages of Scylla Cloud for your own organization, please feel free to contact us.

Watch the Presentation

For now, feel free to watch their presentation in full from our on-demand Scylla Summit 2021 site. Once you sign up you can also watch all of the other presentations from that event.

WATCH DISNEY+ HOTSTAR AT SCYLLA SUMMIT

The post Disney+ Hotstar: Powered by Scylla Cloud appeared first on ScyllaDB.

Apache Cassandra Changelog #4 | February 2021

Our monthly roundup of key activities and knowledge to keep the community informed.

Apache Cassandra Changelog Header

Release Notes

Released

Apache Cassandra 3.0.24 (pgp, sha256 and sha512). This is a security-related release for the 3.0 series and was released on February 1. Please read the release notes.

Apache Cassandra 3.11.10 (pgp, sha256 and sha512) was also released on February 1. You will find the release notes here.

Apache Cassandra 4.0-beta4 (pgp, sha256 and sha512) is the newest version which was released on December 30. Please pay attention to the release notes and let the community know if you encounter problems with any of the currently supported versions.

Join the Cassandra mailing list to stay updated.

Changed

A vulnerability rated Important was found when using the dc or rack internode_encryption setting. More details of CVE-2020-17516 Apache Cassandra internode encryption enforcement vulnerability are available on this user thread.

Note: The mitigation for 3.11.x users requires an update to 3.11.10 not 3.11.24, as originally stated in the CVE. (For anyone who has perfected a flux capacitor, we would like to borrow it.)

The current status of Cassandra 4.0 GA can be viewed on this Jira board (ASF login required). RC is imminent with testing underway. The remaining tickets represent 3.3% of the total scope. Read the latest summary from the community here.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

Apache Cassandra will be participating in the Google Summer of Code (GSoC) under the ASF umbrella as a mentoring organization. This is a great opportunity to get involved, especially for newcomers to the Cassandra community.

We’re curating a list of JIRA tickets this month, which will be labeled as gsoc2021. This will make them visible in the Jira issue tracker for participants to see and connect with mentors.

If you would like to volunteer to be a mentor for a GSoC project, please tag the respective JIRA ticket with the mentor label. Non-committers can volunteer to be a mentor as long as there is a committer as co-mentor. Projects can be mentored by one or more co-mentors.

Thanks to Paulo Motta for proposing the idea and getting the ticket list going.

Added

Apache Zeppelin 0.9.0 was released on January 15. Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing system, which supports Apache Cassandra and others. The release notes for the Cassandra CQL Interpreter are available here.

Changed

For the GA of Apache Cassandra 4.0, any claim of support for Python 2 will be dropped from update documentation. We will also introduce a warning when running in Python 2.7. Support for Python 3 will be backported to at least 3.11, due to existing tickets, but we will undertake the work needed to make packaging and internal tooling support Python 3.

Changed

The Kubernetes SIG is discussing how to encourage more participation and to structure SIG meetings around updates on Kubernetes and Cassandra. We also intend to invite other projects (like OpenEDS, Prometheus, and others) to discuss how we can make Cassandra and Kubernetes better. As well as updates, the group discussed handling large-scale backups inside Kubernetes and using S3 APIs to store images. Watch here.

kubernetes-sig-meeting-2021-01-14

User Space

Backblaze

“Backblaze uses Apache Cassandra, a high-performance, scalable distributed database to help manage hundreds of petabytes of data.” - Andy Klein

Witfoo

Witfoo uses Cassandra for big data needs in cybersecurity operations. In response to the recent licensing changes at Elastic, Witfoo decided to blog about its journey away from Elastic to Apache Cassandra in 2019. - Witfoo.com

Do you have a Cassandra case study to share? Email cassandra@constantia.io.

In the News

The New Stack: What Is Data Management in the Kubernetes Age?

eWeek: Top Vendors of Database Management Software for 2021

Software Testing Tips and Tricks: Top 10 Big Data Tools (Big Data Analytics Tools) in 2021

InfoQ: K8ssandra: Production-Ready Platform for Running Apache Cassandra on Kubernetes

Cassandra Tutorials & More

Creating Flamegraphs with Apache Cassandra in Kubernetes (cass-operator) - Mick Semb Wever, The Last Pickle

Apache Cassandra : The Interplanetary Database - Rahul Singh, Anant

How to Install Apache Cassandra on Ubuntu 20.04 - Jeff Wilson, RoseHosting

The Impacts of Changing the Number of VNodes in Apache Cassandra - Anthony Grasso, The Last Pickle

CASSANDRA 4.0 TESTING - Charles Herring, Witfoo

Apache Cassandra Changelog Footer


Cassandra Changelog is curated by the community. Please send submissions to cassandra@constantia.io.