So it took four posts to describe all of the setup prior to implementing any business logic or data access code! All that work made me wonder if maybe the rest would be easy in comparison.
This post is all about writing the business logic and reading and writing data to DataStax Enterprise (DSE) using the Cassandra Query Language (CQL).
DataStax Accelerate is right around the corner and comes with a plethora of awesome speakers and informational talks. For those of you wearing a developer hat, the “Building Modern Applications” track was built for you.
This track features talks on building applications with Spring Boot in Java, Go, and microservices in Python (all on Apache Cassandra), plus a discussion on how serverless functions make everything a microservice.
Want to figure out how to design for time series applications at scale? Then come watch Alice Lottini and Patrick Callaghan give you all the deets. Maybe architectures with Kafka and Cassandra is your thing; Cliff Gilmore of Confluent will bring the knowledge.
Like data modeling? Who doesn’t!? If that’s the case, you won’t want to miss Hackolade CEO Pascal Desmaret’s talk, “Data modeling Is Dead… Long Live Schema Design!”.
Finally, how about a talk on leveraging DataStax Enterprise to build a feature store at Condé Nast, or an in-depth look at data as a service from the Chief Revenue Officer of CTO Boost, Keith Loo. Boom!
As you can see, there’s something for all stripes and sizes. Don’t miss this opportunity to experience some of the top names in the Apache Cassandra world send a few terabytes worth of knowledge into your brain nodes!
At Instaclustr, our vision is to provide organizations the freedom to confidently deliver applications at the highest levels of scale and reliability through open source technologies. Today, we are proud to announce the Instaclustr Certification Framework for Open Source Software and our first certified technology, Instaclustr Certified Apache Cassandra, as the latest important step towards realising this vision.
Over Instaclustr’s 5+ year history of providing a managed service for Apache Cassandra, we have incrementally increased the level of sophistication that we apply to testing Cassandra before we release it to our Managed Platform. Several months ago we realized that we had an opportunity to substantially increase the rigor of this testing, provide open visibility into the process and outputs and provide some real value to our Managed Platform customers, our Enterprise Support customers and the wider Apache Cassandra community. This effort has resulted in the Instaclustr Certification Framework for Open Source Software, a rigorous testing and evaluation program that will apply to each of the open source, data-layer technologies Instaclustr supports.
The Instaclustr Certification Framework formalises the overall process of considering open source for inclusion in the Instaclustr platform. It has two key stages: assessment, where we consider the governance, adoption, and liveliness of the project and certification, where we develop a test plan and undertake independent technical testing of a specific software release or version. Both phases result in a detailed, publicly available report of the work we have undertaken.
For this initial release, we have published the following documents:
- Instaclustr Certification Framework for Open Source Software
- Project Assessment Report for Apache Cassandra
- Technology Testing Plan for Apache Cassandra
- Certification Report for Apache Cassandra
Key highlights of the Certification Report include:
- Performance testing (latency and throughput) comparing the current version to the previous two released version for multiple use cases
- 24-hour soak testing (including repairs and replaces) including garbage collection time
- Shakedown testing against popular drivers
We do view the current testing approach as a starting point and expect to continue to build on it over time. One particular area of focus that didn’t make the cut for the first release will be testing of much larger clusters.
We’re excited to be releasing this work and really hope it will be valuable to the wider community. Please leave a comment below or drop us an email at firstname.lastname@example.org if you have any comments or questions.
Nate McCall, Apache Cassandra Project Chair, will set the tone for us in his Technical Keynote on Thursday morning, as he walks us through the highlights of the 4.0 release and updates on the Cassandra community.
Building on Nate’s talk, I’m excited to be hosting an “Innovating with Apache Cassandra” track dedicated entirely to the open source project, where we’ll dig deeper into key features of the upcoming release:
The ScyllaDB team announces the release of Scylla Open Source 3.0.6, a bugfix release of the Scylla Open Source 3.0 stable branch. Scylla Open Source 3.0.6, like all past and future 3.x.y releases, is backward compatible and supports rolling upgrades.
- Scylla Open Source 3.0
- Get Scylla Open Source 3.0.6 – Docker, binary packages, and EC2 AMI
- Get started with Scylla
- Upgrade from Scylla Open Source 3.x.y to Scylla Open Source 3.x.z
- Upgrade from 2.3.x to 3.0.x
- Please let us know if you encounter any problems.
Issues solved in this release:
- CQL: In some cases, when using CQL batch operation, storage
proxy prints exception errors with wrong ip addresses #3229
For example: storage_proxy – exception during mutation write to 0.0.96.16
- CQL: Secondary Indexes not working correctly for clustering keys when using UPDATE #4144
- nodetool cleanup may double the used disk space while running #3735
- Schema changes: a race condition when dropping and creating a table with the same name #3797
- Schema changes: when creating a table, there is no real time validation of duplicate ID property #2059
- Schema changes: schema change statement can be delayed indefinitely when there are constant schema pulls #4436
- std::bad_alloc may be thrown even though there is evictable memory #4445
- In rare cases, resharding may cause scylla to exit #4425
A wide variety of companies around the world are leveraging open source databases and hybrid or multi-cloud to deliver modern applications with Apache Cassandra. Increasingly, these organizations are looking to consume Cassandra services on clouds to boost the productivity of development teams and reduce the operational overhead of setup, ongoing management, and performance optimization.
DataStax Distribution of Apache Cassandra is a production-ready database that is 100% open source compatible and supported by DataStax, the Apache Cassandra experts. By making our offering available on Azure, we are providing developers with a simple, cost-effective way to build, deploy, and scale Cassandra-based applications in the cloud.
Easy to get started
DataStax Distribution of Apache Cassandra clusters on Azure are pre-configured and designed to deliver a frictionless environment for developing highly available applications while minimizing upfront configuration and cost. Our simple deployment using best practices for Azure means you reduce the time it takes to get started from days to minutes. Our production-ready Cassandra clusters can be deployed in a few easy steps and scale seamlessly as requirements grow. Your cluster will automatically spread across Azure fault domains for built-in high availability. Also provided is a virtual machine that helps kickstart your application development with docs, clients, support, training, videos, and more.
Open source compatible
DataStax has been the driving force behind Apache Cassandra, contributing the majority of the Cassandra code and still maintaining all Cassandra drivers today. DataStax Distribution of Apache Cassandra is 100% compatible with open source Cassandra, delivering a production-certified database, drivers, bulk loading utility, Kafka connector, and professional support. Developers can build applications with the same data distribution and performance capabilities whether on-premises or in the cloud, providing confidence while maintaining open source standards.
Open source Cassandra is easy to adopt, but as deployments scale, the need for increased support can have an adverse impact on your ability to build new applications and features. DataStax Distribution of Apache Cassandra provides a unique Cassandra support experience that’s unmatched in the market. As an example, our engineers and DBAs are available to help design, configure, manage, scale, optimize, and secure your cluster. We constantly monitor your database, help with technology integration and provide advice on running Cassandra in production.
DataStax Distribution of Apache Cassandra on Azure is the easiest way to run production-ready Cassandra in the cloud. Together, DataStax and Microsoft enable developers to quickly build, deploy and scale Cassandra-based applications in the cloud with an easy-to-use, open source compatible solution that comes with unparalleled support.
DataStax Distribution of Apache Cassandra on Azure
Scylla Enterprise is a NoSQL database that offers the horizontal scale-out and fault-tolerance of Apache Cassandra, yet delivers 10X the throughput and consistent, low single-digit latencies. Implemented from scratch in C++, Scylla’s close-to-the-hardware design significantly reduces the number of database nodes you require and self-optimizes to dynamic workloads and various hardware combinations.
With Scylla Enterprise 2019.1 we’ve introduced new capabilities, as well as inherited a rich set of new features from our Scylla Open Source 3.0 release for more efficient querying, reduced storage requirements, lower repair times, and better overall database performance. Already the industry’s most performant NoSQL database, Scylla now includes production-ready features that surpass the capabilities of Apache Cassandra.
Scylla Enterprise 2019.1 is now available for download for existing customers.
If you’re not yet a customer, you can register for a 30-day trial.
New Features in 2019.1
Scylla Enterprise 2019.1 includes all the Scylla Enterprise features of previous releases, including Auditing and In-Memory Tables. It also now includes an innovative new feature that’s exclusive to Scylla Enterprise:
Workload Prioritization (Technology Preview)
Scylla Enterprise will enable its users to safely balance multiple real-time workloads within a single database cluster. For example, Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP), which have very different data access patterns and characteristics, can be run concurrently on the same cluster. OLTP involves many small and varied transactions, including mixed writes, updates, and reads, with a high sensitivity to latency. In contrast, OLAP emphasizes the throughput of broad scans across datasets. By introducing capabilities that isolate workloads, Scylla Enterprise will uniquely support simultaneous OLTP and OLAP workloads without sacrificing latency or throughput. However, workload prioritization can be used to differentiate between any two or more different workloads (for example, different OLTP workloads with different priorities or SLAs). More on Workload Prioritization
Features Inherited from Scylla Open Source
The following features that were previously introduced in Scylla Open Source are now part of the Enterprise branch and are included with the Scylla 2019.1 release:
Material Views automate the tedious and inefficient chores created when an application maintains several tables with the same data organized differently. Data is divided into partitions that can be found by a partition key. Sometimes the application needs to find a partition or partitions by the value of another column. Doing this efficiently without scanning all of the partitions requires indexing.
People have been using Materialized Views, also calling them denormalization, for years as a client-side implementation. In such implementations, the application maintained two or more views and two or more separate tables with the same data but under a different partition key. Every time the application wanted to write data, it needed to write to both tables, and reads were done directly (and efficiently) from the desired table. However, ensuring any level of consistency between the data in the two or more views required complex and slow application logic.
Scylla’s Materialized Views feature moves this complexity out of the application and into the servers. As a result, the implementation is faster (fewer round trips to the applications) and more reliable. This approach makes it much easier for applications to begin using multiple views into their data. The application just declares the additional views, Scylla creates the new view tables, and on every update to the base table the view tables are automatically updated as well. Writes are executed only on the base table directly and are automatically propagated to the view tables. Reads go directly to the view tables.
As usual, the Scylla version is compatible – in features and CQL syntax – with the Apache Cassandra version (where it is still in experimental mode).
Scylla Enterprise 2019.1 now supports production-ready global
secondary indexes that can scale to any size distributed cluster —
unlike the local-indexing approach adopted by Apache Cassandra.
The secondary index uses a Materialized View index under the hood in order to make the index independent from the amount of nodes in the cluster. Secondary Indexes are (mostly) transparent to the application.
Queries have access to all the columns in the table and you can add and remove indexes without changing the application. Secondary Indexes can also have less storage overhead than Materialized Views because Secondary Indexes need to duplicate only the indexed column and primary key, not the queried columns like with a Materialized View.
For the same reason, updates can be more efficient with Secondary Indexes because only changes to the primary key and indexed column cause an update in the index view. In the case of a Materialized View, an update to any of the columns that appear in the view requires the backing view to be updated.
As always, the decision whether to use Secondary Indexes or Materialized Views really depends on the requirements of your application. If you need maximum performance and are likely to query a specific set of columns, you should use Materialized Views. However, if the application needs to query different sets of columns, Secondary Indexes are a better choice because they can be added and removed with less storage overhead depending on application needs.
Global secondary indexes minimize the amount of data retrieved from the database, providing many benefits:
- Results are paged and customizable
- Filtering is supported to narrow result sets
- Keys, rather than data, are denormalized
- Supports more general-purpose use cases than Materialized Views
CQL: Enable ALLOW FILTERING for regular and primary key columns
Allow filtering is a way to make a more complex query, returning only a subset of matching results. Because the filtering is done on the server, this feature also reduces the amount of data transferred over the network between the cluster and the application. Such filtering may incur processing impacts to the Scylla cluster. For example, a query might require the database to filter an extremely large data set before returning a response. By default, such queries are prevented from execution, returning the following message:
Bad Request: Cannot execute this query as it might involve
data filtering and thus may have unpredictable
Unpermitted queries include those that restrict:
- Non-partition key fields
- Parts of primary keys that are not prefixes
- Partition keys with something other than an equality relation (though you can combine SI with ALLOW FILTERING to support inequalities; >= or <=; see below)
- Clustering keys with a range restriction and then by other conditions (see this blog)
However, in some cases (usually due to data modeling decisions), applications need to make queries that violate these basic rules. Queries can be appended with the ALLOW FILTERING keyword to bypass this restriction and utilize server-side filtering.
The benefits of filtering include:
- Cassandra query compatibility
- Spark-Cassandra connector query compatibility
- Query flexibility against legacy data sets
Hinted handoffs are designed to help when any individual node is temporarily unresponsive due to heavy write load, network weather, hardware failure, or any other factor. Hinted handoffs also help in the event of short-term network issues or node restarts, reducing the time for scheduled repairs, and resulting in higher overall performance for distributed deployments.
Technically, a ‘hint’ is a record of a write request held by the coordinator until an unresponsive replica node comes back online. When a write is deemed successful but one or more replica nodes fail to acknowledge it, Scylla will write a hint that is replayed to those nodes when they recover. Once the node becomes available again, the write request data in the hint is written to the replica node.
Hinted handoffs deliver the following benefits:
- Minimizes the difference between data in the nodes when nodes are down—whether for scheduled upgrades or for all-too-common intermittent network issues.
- Reduces the amount of data transferred during repair.
- Reduces the chances of checksum mismatch (during read-repair) and thus improves overall latency.
Scylla Enterprise 2019.1 now has support for a more performant storage format (SSTable), which is not only compatible with Apache Cassandra 3.x but also reduces storage volume by as much as 3X. The older 2.x format used to duplicate the column name next to each cell on disk. The new format eliminates the duplication and the column names are stored once, within the schema.
The newly introduced format is identical to that used by Apache Cassandra 3.x, while remaining backward-compatible with prior Scylla SSTable formats. New deployments of Scylla Enterprise 2019.1 will automatically use the new format, while existing files remain unchanged.
This new storage format delivers important benefits, including:
- Can read existing Apache Cassandra 3.x files when migrating
- Faster than previous versions
- Reduced storage footprint of up to 66%, depending on the data model used
- Range delete support
Scylla Enterprise 2019.1builds on earlier improvements by extending stateful paging to support range scans as well. As opposed to other partition queries, which read a single partition or a list of distinct partitions, range scans read all of the partitions that fall into the range specified by the client. Since the precise number and identity of partitions in a given range cannot be determined in advance, the query must read data from all nodes containing data for the range.
To improve range scan paging, Scylla Enterprise now has a new control algorithm for reading all data belonging to a range from all shards, which caches the intermediate streams on each of the shards and directs paged queries to the matching, previously used, cached results. The new algorithm is essentially a multiplexer that combines the output of readers opened on affected shards into a single stream. The readers are created on-demand when the partition scan attempts to read from the shard. To ensure that the read won’t stall, the algorithm uses buffering and read-ahead.
- Improved system responsiveness
- Throughput of range scans improved by as much as 30%
- Amount of data read from the disk reduced by as much as 40%
- Disk operations lowered by as much as 75%
Role-Based Access Control (RBAC) is a method of reducing lists of authorized users to a few roles assigned to multiple users. RBAC is sometimes referred to as role-based security.
Use the GoogleCloudSnitch for deploying Scylla on the Google Cloud Engine (GCE) platform across one or more regions. The region is treated as a datacenter and the availability zones are treated as racks within the datacenter. All communication occurs over private IP addresses within the same logical network.
To use the GoogleCloudSnitch, add the snitch to the scylla.yaml
file which is located under
/etc/scylla/ for all nodes
in the cluster.
Scylla Enterprise 2019.1 comes with a helping hand for
discovering and investigating large partitions present in a cluster
system.large_partitions table. Sometimes you have
such issues without realizing it. It’s useful to be able to see
which tables have large partitions and how many of them exist in a
cluster. Aside from table name and size,
system.large_partitions contains information on the
offending partition key, when the compaction that led to the
creation of this large partition occurred, and its sstable name
(which makes it easy to locate its filename).
The large partition warning threshold defaults to 100MiB, which
implies that each larger partition will be registered into
system.large_partitions table the moment it’s written,
either because of memtable flush or as a result of compaction. The
threshold can be configured with an already existing parameter in
Iotune is a storage benchmarking tool that runs as part of the
scylla_setup script. Iotune runs a short benchmark on
the Scylla storage and uses the results to set the Scylla
io_properties.yaml configuration file (formerly called
io.conf). Scylla uses these settings to optimize I/O performance,
specifically through setting max storage bandwidth and max
concurrent requests.The new iotune output matches the IO scheduler
configuration, is time-limited (2 minutes) and produces more
consistent results than the previous version.
Scylla supports Cassandra-style datetime functions, including
Streaming is used during node recovery to populate restored nodes with data replicated from running nodes. The Scylla streaming model reads data on one node, transmits it to another node, and then writes to disk. The sender creates SSTable readers to read the rows from SSTables on disk and sends them over the network. The receiver receives the rows from the network and writes them to a memtable. The rows in memtable are flushed into SSTables periodically or when the memtable is full.
In Scylla Enterprise 2019.1, stream synchronization between nodes bypasses memtables, significantly reducing the time to repair, add and remove nodes. These improvements result in higher performance when there is a change in the cluster topology, improving streaming bandwidth by as much as 240% and reducing the time it takes to perform a “rebuild” operation by 70%.
Scylla’s new streaming improvements provide the following benefits:
- Lower memory consumption. The saved memory can be used to handle your CQL workload instead.
- Better CPU utilization. No CPU cycles are used to insert and sort memtables.
- Bigger SSTables and fewer compactions.
Row-level Cache Eviction
The cache is capable of freeing individual rows to satisfy memory reclamation requests. Rows are freed starting from the least recently used ones, with insertion counting as a use. For example, a time-series workload, which inserts new rows in the front of a partition, will cause eviction from the back of the partition. More recent data in the front of the partition will be kept in cache.
CPU Scheduler and Compaction Controllers
With Scylla’s thread-per-core architecture, many internal workloads are multiplexed on a single thread. These internal workloads include compaction, flushing memtables, serving user reads and writes, and streaming. The CPU scheduler isolates these workloads from each other, preventing, for example, a compaction using all of the CPU and preventing normal read and write traffic from using its fair share. The CPU scheduler complements the I/O scheduler, which solves the same problem for disk I/O. Together, these two are the building blocks for the compaction controller. More on using Control Theory to keep compactions Under Control.
You can read more details about Scylla Enterprise 2019.1 in the Release Notes.
The Last Pickle will be presenting at DataStax Accelerate on May 21-23, 2019 in the US in National Harbor.
DataStax Accelerate is the largest Apache Cassandra conference since the last Apache Cassandra Summit which took place in 2016 in San Jose.
We are very happy to be a bronze sponsor of the conference and will be giving two talks:
Wed May 22nd at 4:40pm in Track 1 - Repair Improvements in Apache Cassandra 4.0 - by Alexander Dejanovski
The Anti-Entropy process used by nodetool repair is the way of ensuring consistency of data on disk. Incremental repair was introduced in Cassandra 2.1 to speed up the operation but in Cassandra 3.11 it is still subject to bugs that can severely impact a production cluster stability. In this talk, we will detail how incremental repair was fixed for the upcoming 4.0 release of Apache Cassandra.
Thu May 23rd at 11:40am in Track 1 - 10 Easy Ways To Tune Your Cassandra Cluster - by Jon Haddad
There’s a direct correlation between database performance and how much it costs to run. If you’re using the default Cassandra and OS configuration, you’re not getting the most out of your cluster. Not even close. In this talk, Jon Haddad, Principal Consultant at The Last Pickle, will show you how to understand where your cluster bottlenecks are, then 10 easy ways to improve its performance and cut costs.
In addition, we will have a booth in the expo hall where you can meet us. Feel free to join us if you want to have demos of the OSS tools we contribute to (tlp-cluster, tlp-stress, Reaper, cstar) or just have a chat about Apache Cassandra.
I feel like this talk is the perfect pairing with my talk, “Lightening A Spark with Machine Learning”. While my talk will focus on the practical “what and how” of machine learning with Apache Spark and Apache Cassandra, Russell’s will focus mostly on the “why”. Russell will cover the best and most accessible use cases for using distributed analytics with Spark and DataStax Analytics.
Russell will also talk about advanced use cases using Spark’s streaming service. I will be sitting in the front row to make sure I get to ask the first question: “How is Spark streaming different than Apache Kafka!?” Russel will also be covering some of the “what and how” that my talk will not be covering, such as using Spark to load data, modify tables, and move data from cluster to cluster. These are the topics I am frequently asked about, so I am excited to finally get my own questions answered!
I can not wait to hear this talk and I think it’s the perfect pairing to my talk—like peanut butter and jelly! And of course the answer is: “TO SPARK!”
Come check out both our talks at Accelerate!
If you are building modern applications that scale in all sorts of ways, then you’re probably into Apache Cassandra. If you’re attracted to multi-data center replication, fault tolerance, tunable consistency, and the distribution of data across clusters in a masterless architecture, then you are definitely into Cassandra. As a Cassandra enthusiast, you probably need to attend DataStax Accelerate, the world’s premier Cassandra conference, which is taking place May 21–23 near Washington, D.C. This is a great opportunity to spend a couple of focused days learning from others and meeting people just like yourself.
Cassandra has come a long way since it was open sourced more than 10 years ago. I myself have a long history with Cassandra, some of it painful in a “nobody gets it” kind of way. But we all know where the story is headed!
We have collectively pushed through the fear, uncertainty, and doubt to arrive at an interesting point in the history of NoSQL databases and databases in general. The real value behind the transformational technology in Cassandra is leading the migration, intentional or not, to hybrid and multi-cloud computing environments.
Hybrid cloud popularity is building as organizations become more aware of the growing technical debt around data usage and their databases. What’s your cloud strategy? How has that radically changed in the past few years? As of today, it’s hardly ever a bulk migration to one cloud. It’s a mix of old and new, which has put a lot of burden on engineers struggling to make it work.
As an open source project built with scaling in mind, Cassandra is uniquely positioned to be the go-to database of the hybrid cloud future, and DataStax expertise will play a key role in allowing companies to take full advantage of it. We get what you are trying to do and we are here to help.
As a part of that commitment, we are hosting a gathering place for the community of engineers trying to build that future. At Accelerate, we’re looking forward to bringing together all kinds of Cassandra enthusiasts to meet one another, share ideas, and learn how some of today’s leading enterprises are using Cassandra to change the world.
You do not want to miss this event. Just take a quick peek at some Accelerate sessions to give you an idea of what to expect:
Learn how Yahoo! Japan uses Cassandra at scale, spinning up as many as 5,000 servers at any time. Yahoo! Japan has been a driving force for Cassandra in Japan and hosts many awesome community events. They bring a deep expertise you won’t want to miss.
Apache Cassandra 4.0 is almost here! This session explores the new features and performance improvements you can expect, as well as the thinking that inspired the architecture of the release. There is no better source of 4.0 information than Cassandra committer Dinesh Joshi.
Are you getting the most out of your Cassandra cluster? Probably not. Discover how to pinpoint bottlenecks and learn 10 simple ways to boost performance while cutting costs. This is a talk by Jon Haddad, who has been a leader in Cassandra performance for years. You may see him on the user mailing list talking about this topic often. Here is your chance to meet him in person and get some firsthand knowledge.
Instagram has been using Cassandra for years, and their deployment is still growing fast. Learn how the company has worked to improve its infrastructure over the years, increasing the efficiency and reliability of their clusters along the way.
Modern companies are connecting Cassandra and Kafka to stream data from microservices, databases, IoT events, and other critical systems to answer real-time questions and make better decisions. Find out why Cassandra and Kafka should be core components of any modern data architecture.
Everything is an event. To speed up applications, you need to think reactive. The DataStax team has been developing Cassandra drivers for years, and in our latest version of the enterprise driver, we introduced reactive programming. Learn how to migrate a CRUD Java service into reactive and bring home a working project.
See how FamilySearch uses DataStax Enterprise to solve large-scale family history problems, handling three unique workloads in production.
And that’s just the tip of the iceberg. Check out other Cassandra sessions you won’t want to miss here.
See you in D.C.!
In this article, we will compare Scylla Cloud and Google Cloud Bigtable, two different managed solutions. The TL;DR is the following: We show that Scylla Cloud is 1/5th the cost of Cloud Bigtable under optimal conditions (perfect uniform distribution) and that when applied with real-world, unoptimized data distribution, Scylla performs 26x better than Cloud Bigtable. You’ll see that Scylla manages to maintain its SLA while Cloud Bigtable fails to do so. Finally, we investigate the worst case for both offerings, where a single hot row is queried. In this case, both databases fail to meet the 90,000 ops SLA but Scylla processes 195x more requests than Cloud Bigtable.
In this benchmark study we will simulate a scenario in which the user has a predetermined workload with the following SLA: 90,000 operations per second (ops), half of those being reads and half being updates, and needs to survive zone failures. We set business requirements so that 95% of the reads need to be completed in 10ms or less and will determine what is the minimum cost of running such cluster in each of the offerings.
A Nod to Our Lineage
Competing against Cloud Bigtable is a pivotal moment for us, as Scylla is, in a way, also a descendant of Bigtable. Scylla was designed to be a drop-in-replacement for Cassandra, which, in turn, was inspired by the original Bigtable and Dynamo papers. In fact, Cassandra is described as the “daughter” of Dynamo and Bigtable. We’ve already done a comparison of Scylla versus Amazon DynamoDB. Now, in this competitive benchmark, Scylla is tackling Google Cloud Bigtable, the commercially available version of Bigtable, the database that is still used internally at Google to power their apps and services. You can see the full “family tree” in Figure 1 below.
Figure 1: The “Family Tree” for Scylla.
Our goal was to perform 90,000 operations per second in the cluster (50% updates, 50% reads) while keeping read latencies at 10ms or lower for 95% of the requests. We want the cluster to be present in three different zones, leading to higher availability and lower, local latencies. This is important not only to protect against entire-zone failures, which are rare, but also to reduce latency spikes and timeouts of Bigtable. This article does a good job of describing how Cloud Bigtable zone maintenance jobs can impact latencies for the application. Both offerings allow for tunable consistency settings, and we use eventual consistency.
For Google Cloud Bigtable, increasing the replication factor means adding replication clusters. Cloud Bigtable claims that each group node (per replica cluster) should be able to do 10,000 operations per second at 6ms latencies, although it does not specify at which percentile and rightfully makes the disclaimer that those numbers are workload-dependent. Still, we use them as a basis for analysis and will start our tests by provisioning three clusters of 9 nodes each. We will then increase the total size of the deployment by adding a node to each cluster until the desired SLAs are met.
For Scylla, we will select a 3-node cluster of AWS i3.4xlarge instances. The selection is based on ScyllaDB’s recommendation of leaving 50% free space per instance and the fact that this is the smallest instance capable of holding the data generated in the population phase of the benchmark— each i3.4xlarge can store a total of 3.8TB of data and should be able to comfortably hold 1TB at 50% utilization. We will follow the same procedure as Cloud Bigtable and keep adding nodes (one per zone) until our SLA is met.
The results are summarized in Table 1. As we can see, Cloud Bigtable is not able to meet the desired number of 90,000 operations per second with the initial setup of 9 nodes per cluster. We report the latencies for completeness, but they are uninteresting since the cluster was clearly at a bottleneck and operating over capacity. During this run, we can verify that this is indeed the case by looking at Cloud Bigtable’s monitoring dashboard. Figure 4 shows the average CPU utilization among the nodes, already way above the recommended threshold of 70%. Figure 5 shows CPU utilization at the hottest node: for those, we are already at the limit.
Google Cloud Bigtable is still unable to meet the desired amount of operations with clusters of 10 nodes, and is finally able to do so with 11 nodes. However, the 95th percentile for reads is above the desired goal of 10 ms so we take an extra step. With clusters of 12 nodes each, Cloud Bigtable is finally able to achieve the desired SLA.
Scylla Cloud had no issues meeting the SLA at its first attempt.
The above was actually a fast-forward version of what we encountered. Originally, we didn’t start with the perfect uniform distribution and chose the real-life-like Zipfian distribution. Over and over we received only 3,000 operations per second instead of the desired 90,000. We thought something was wrong with the test until we cross-checked everything and switched to uniform distribution testing. Since Zipfian test results mimic real-life behaviour, we ran additional tests and received the same original poor result (as described in the Zipfian section below).
For uniform distribution, the results are shown in Table 1, and the average and hottest node CPU utilizations are shown just below in Figures 2 and 3, respectively.
|OPS||Maximum Latency P95 (microseconds)||Cost per replica/AZ
per year ($USD)
|Total Cost for 3 replicas/AZ
per year ($USD)
(total 27 nodes)
(total 30 nodes)
(total 33 nodes)
(total 36 nodes)
(total 3 nodes)
Table 1: Scylla Cloud is able to meet the desired SLA with just one instance per zone (for a total of three). For Cloud Bigtable 12 instances per cluster (total of 36) are needed to meet the performance characteristics of our workload. For both Scylla Cloud and Cloud Bigtable, costs exclude network transfers. Cloud Bigtable price was obtained from Google Calculator and used as-is, and for Scylla Cloud prices were obtained from the official pricing page, with the network and backup rates subtracted. For Scylla Cloud, the price doesn’t vary with the amount of data stored up to the instance’s limit. For Cloud Bigtable, price depends on data that is actually stored up until the instance’s maximum. We use 1TB in the calculations in this table.
Figure 2: Average CPU load on a 3-cluster 9-node Cloud Bigtable instance, running Workload A
Figure 3: Hottest node CPU load on a 3-cluster 9-node Cloud Bigtable instance, running Workload A
Behavior under Real-life, Non-uniform Workloads
Both Scylla Cloud and Cloud Bigtable will behave better under uniform data and request distribution and all users are advised to strive for that. But no matter how much work is put in guaranteeing good partitioning, workloads in real life often behave differently — either permanent or temporary — and that affects performance in practice.
For example, a user profile application can see patterns in time where groups of users are more active than others. An IoT application tracking data for sensors can have sensors that end up accumulating more data than others, or having time periods in which data gets clustered. A famous case is the dress that broke the internet, where a lively discussion among tens of millions of Internet users about the color of a particular dress led to issues in handling traffic for the underlying database.
In this session we will keep the cluster size determined in the previous phase constant, and study how both offerings behave under such scenarios.
To simulate real-world conditions, we changed the request distribution in the YCSB loaders to a Zipfian distribution. We have kept all other parameters the same, so the loaders are still trying to send the same 90,000 requests per second (with 50% reads, 50% updates).
Zipfian distribution was originally used to describe word frequency in languages. However, this distribution curve, known as Zipf’s Law, has also shown correlation to many other real-life scenarios. It can often indicate a form of “rich-get-richer” self-reinforcing algorithm, such as a bandwagon or network effect, where the distribution of results is heavily skewed and disproportionately weighted. For instance, in searching over time, once a certain search result becomes popular, more people click on it, and thus, it becomes an increasingly popular search result. Examples include the number of “likes” different NBA teams have on social media, as shown in Figure 4, or the activity distribution among users of the Quora website.
When these sort of result skews occur in a database, it can lead to incredibly unequal access to the database and, resultantly, poor performance. For database testing, this means we’ll have keys randomly accessed in a heavy-focused distribution pattern that allows us to visualize how the database in question handles hotspot scenarios.
Figure 4: The number of “likes” NBA teams get on Facebook follows a Zipfian distribution.
The results of our Zipfian distribution test are summarized in Table 2. Cloud Bigtable is not able to sustain answering the 90,000 requests per second that the clients send. It can process only 3,450 requests per second. Scylla Cloud, on the other hand, actually gets slightly better in its latencies. This result is surprising at first and deserves a deeper explanation. But to understand why that is the case it will be helpful to first look at our next proposed test scenario — a single hot row. We’ll then address these surprising results in the section Why did Zipfian latencies go down?”.
|p95 latency, milliseconds|
|Google Cloud Bigtable||3,450||1,601 ms||122 ms|
|Scylla Cloud||90,000||3 ms||1 ms|
Table 2: Results of Scylla Cloud and Cloud Bigtable under the Zipfian distribution. Cloud Bigtable is not able to sustain the full 90,000 requests per second. Scylla Cloud was able to sustain 26x the throughput, and with read latencies 1/800th and write latencies less than 1/100th of Cloud Bigtable.
A Few Words about Consistency/Cost
Scylla Cloud leverages instance-local ephemeral storage for its data, meaning parts of the dataset will not survive hardware failures in the case of a single replica. This means that running a single replica is not acceptable for data durability reasons— and that’s not even a choice in Scylla Cloud’s interface.
Due to its choice of network storage, Cloud Bigtable, on the other hand, does not lose any local data when an individual node fails, meaning it is reasonable to run it without any replicas. Still, if an entire zone fails service availability will be disrupted. Also, databases under the hood have to undergo node-local maintenance, which can temporarily disrupt availability in single-replica setups as this article does a good job of explaining.
Still, it’s fair to say that not all users require such a high level of availability and could run single-zone Cloud Bigtable clusters. These users would be forced to run more zones in Scylla Cloud in order to withstand node failure without data loss, even if availability is not a concern. However, even if we compare the total Scylla Cloud cost of $44,640.00 per year with the cost of running Cloud Bigtable in a single zone (sacrificing availability) of $70,416.96, or $140,833.92 for two zones, Scylla Cloud is still a fraction of the cost of Cloud Bigtable.
Note that we did not conduct tests to see precisely how many Cloud Bigtable nodes would be needed to achieve the same required 90,000 ops/second throughput under Zipfian distribution. We had already scaled Cloud Bigtable to 36 nodes (versus the 3 for Scylla) to simply achieve 90,000 ops/second under uniform distribution at the required latencies. However, Cloud Bigtable’s throughput under Zipfian distribution was 1/26th that of Scylla.
Theoretically, presuming Cloud Bigtable throughput scaling continued linearly under a Zipfian distribution scenario, it would have required over 300 nodes (as a single replica/AZ) to more than 900 nodes (triple replicated) to achieve the same 90,000 ops as the SLA required. The annual cost for a single-replicated cluster of that scale would have been approximately $1.8 million annually; or nearly $5.5 million if triple-replicated. That would be, 41x or 123x the cost of Scylla, respectively. Presuming a middle-ground situation where Cloud Bigtable was deployed in two availability zones, the cost would be approximately $3.6 million annually; 82x the cost of Scylla Cloud running triple-replicated. We understand these are only hypothetical projections and welcome others to run these Zipfian tests themselves to see how many nodes would be required to meet the 90,000 ops/second test requirement on Cloud Bigtable.
A Single Hot Row
We want to understand how each offering will behave under the extreme case of a single hot row being accessed over a certain period of time. To do that, we kept the same YCSB parameters, meaning the client still tries to send 90,000 requests per second, but set the key population of the workload to a single row. Results are summarized in Table 2.
|Single Hot Row|
|p95 latency, milliseconds|
|Google Cloud Bigtable||180||7,733 ms||5,365 ms|
|Scylla Cloud||35,400||73 ms||13 ms|
Table 3: Scylla Cloud and Cloud Bigtable results when accessing a single hot row. Both databases see their throughput reduced for this worst-case scenario, but Scylla Cloud is still able to achieve 35,400 requests per second. Given the data anomaly, neither Scylla Cloud nor Cloud Bigtable were able to meet the SLA, but Scylla was able to perform much better, with nearly 200 times Cloud Bigtable’s anemic throughput and multi-second latency. This Scylla environment is 1/5th the cost of Cloud Bigtable.
We see that Cloud Bigtable is capable of processing only 180 requests per second, and the latencies, not surprisingly, shoot up. As we can see in Figure 5, from Cloud Bigtable monitoring, although the overall CPU utilization is low, the hottest node is already bottlenecked.
Figure 5: While handling a single hot row, Cloud Bigtable shows its hottest node at 100% utilization even though overall utilization is low.
Scylla Cloud is also not capable of processing requests at the rate the client sends. However the rate drops to a baseline of 35,400 requests per second.
This result clearly shows that Scylla Cloud has a far more efficient engine underneath. While higher throughput can always be achieved by stacking processing units side by side, a more efficient engine guarantees better behavior during worst-case scenarios.
Why Did Zipfian Latencies Go Down?
We draw the reader’s attention to the fact that while the uniform distribution provides the best case scenario for resource distribution, if the dataset is larger than memory (which it clearly is in our case), it also provides the worst-case scenario for the database internal cache as most requests have to be fetched from disk.
In the Zipfian distribution case, the number of keys being queried gets reduced and the ability of the database to cache those values improves and, as a result, the latencies get better. Serving requests from cache is not only cheaper in general, but also easier on the CPU as well.
With each request being cheaper due to their placement in the cache, the number of requests that each processing unit can serve also increases (note that the Single Hot Row scenario provides the best case for cache access, with a 100% hit rate). As a combination of those two effects, the busiest CPU in the Scylla Cloud nodes is working less than in the uniform case and is actually further away from the bottleneck than it was in the uniform case.
We don’t mean to imply that Scylla Cloud will always perform better in Zipfian cases than uniform. The results will depend on which set of many, many real-life competing factors wins. But we do believe that our results for both the Zipfian and Single Hot Row scenario clearly show that Scylla Cloud performs better under real-life conditions than the Cloud Bigtable offering.
Optimize Your Data Model
All distributed databases will prefer near-perfect access distribution. In some cases, it may be impossible, like the hot row case, and some of them are hard to plan for. However, it is possible to optimize your data model by making the primary key a composite key and adding another column to it. For instance, in the case where your primary key is a customer ID, you can make a composite key of the ID and another field: Location or Date or sub department. In case your key is an IoT device, add the date to it with year/month/day/hour granularity of your choice.
Data model optimization comes with a price, apart from the development time, when you need to scan all of the items that belong to a customer ID or all of the events from the IoT device, you will need to scan multiple composite keys, causing an increase in read times, reduction of consistency and complex development.
Of course, it is much better when the database can offload these complexities on your behalf!
When both Scylla Cloud and Google Cloud Bigtable are exposed to synthetic lab, well-behaved, uniform workload, Scylla Cloud is 1/5th the cost of Cloud Bigtable. However, under more likely real-world Zipfian distribution of data, which we see in both our own customer experience as well as academic research and empirical validation, Cloud Bigtable is no where near meeting the desired SLA and is therefore not practical.
To recap our results, as shown below, Scylla Cloud is 1/5th the expense while providing 26x the performance of Cloud Bigtable, better latency and better behavior in the face of hot rows. Scylla met its SLA with 90,000 OPS and low latency response times while Bigtable couldn’t do more than 3,450 OPS. As an exercise to the reader, try to execute or compute the total cost of ownership of a Bigtable, real-life workload at 90,000 OPS. Would it be 5x (the Scylla savings) multiplied by 26x? Even if you decrease the Bigtable replication to two zones or even a single zone, the difference is enormous.
We explained how to optimize the data model by composite keys as a workaround to Bigtable’s limitation. However, that approach requires development time and makes scans more expensive, less consistent and more complicated. You may still hit a hot row that cannot return more than 180 requests at a time with Bigtable.
As you can see, Scylla has a notable advantage on every metric and on through data test we conducted. We invite you can judge how much better it is in a scenario similar to your needs.
Cloud Bigtable is an offering available exclusively on the Google Cloud Platform (GCP), which locks the user in to Google as both the database provider and infrastructure service provider — in this case, to its own public cloud offering.
Scylla Cloud is a managed offering of the Scylla Enterprise database. It is available on Amazon Web Services (AWS) at the moment, and is soon coming to GCP and other public clouds. Beyond Scylla Cloud, Scylla also offers Open Source and Enterprise versions, which can be deployed to private clouds, on-premises, or co-location environments. While Scylla Cloud is ScyllaDB’s own public cloud offering, Scylla provides greater flexibility and does not lock-in users to any particular deployment option.
Yet while vendors like us can make claims, the best judge of performance is your own assessment based on your specific use case. Give us a try on Scylla Cloud and please feel free to drop in to ask questions via our Slack channel.
In the following sections, we will provide a detailed discussion of what is behind the choices made in this comparison.
Google Cloud Bigtable Replication Settings and Test Setup
In Cloud Bigtable terminology, replication is achieved by adding additional clusters in different zones. To withstand the loss of two of them, we set three replicas, as shown below in Figure 7:
Figure 7: Example setting of Google Cloud Bigtable in three zones. A single zone is enough to guarantee that node failures will not lead to data loss, but is not enough to guarantee service availability/data-survivability if the zone is down.
Cloud Bigtable allows for different consistency models as described in their documentation according to the cluster routing-model. By default, data will be eventually consistent and multi-cluster routing is used, which makes for highest availability of data. Since eventual consistency is our goal, Cloud Bigtable had their settings kept at the defaults and multi-cluster routing is used. Figure 8 shows the explanation about routing that can be seen in Cloud Bigtable’s routing selection interface.
Figure 8: Google Cloud Bigtable settings for availability. In this test, we want to guarantee maximum availability across three zones.
We verified that the cluster is, indeed, set up in this mode, as it can be seen in Figure 11 below:
Figure 9: Google Cloud Bigtable is configured in multi-cluster mode.
Scylla Cloud Replication Settings and Test Setup
For Scylla, consistency is set in a per-request basis and is not a cluster-wide property, as described in Scylla’s documentation (referred to as “tunable consistency”). Unlike Cloud Bigtable, in Scylla’s terminology all nodes are part of the same cluster. Replication across availability zones is achieved by adding nodes present in different racks and setting the replication factor to match. We set up the Scylla Cluster in a single datacenter (us-east-1), and set the number of replicas to three (RF 3), placing them in three different availability zones within the us-east-1 region for AWS. This setup can be seen in Figure 12 below:
Figure 10: Scylla Cloud will be set up in 3 availability zones. To guarantee data durability, Scylla Cloud requires replicas, but that also means that any Scylla Cloud setup is able to maintain availability of service when availability zones go down.
Eventual consistency is achieved by setting the consistency of the requests to LOCAL_ONE for both reads and writes. This will cause Scylla to acknowledge write requests as successful when one of the replicas respond, and serve reads from a single replica. Mechanisms such as hinted handoff and periodic repairs are used to guarantee that data will eventually be consistent among all replicas.
We used the YCSB benchmark running across multiple zones in the same region as the cluster. To achieve the desired distribution of 50% reads and 50% updates, we will use YCSB’s Workload A, which is already pre-configured to have that ratio. We will keep the workload’s default record size (1kB), and add one billion records— enough to generate approximately 1TB of data on each database.
We will then run the load for the total time of 1.5 hours. At the end of the run, YCSB produces a latency report that includes the 95th-percentile latency per each client. Aggregating percentiles is a challenge on its own. Throughout this report, we will use the client that reported the highest 95th-percentile as our number. While we understand this is not the “real” 95th-percentile of the distribution, it at least maps well to a real-world situation where the clients are independent and we want to guarantee that no client sees a percentile higher than the desired SLA.
We will start the SLA investigation by using a uniform distribution, since this guarantees good data distribution across all processing units of each database while being adversarial to caching. This allows us to make sure that both databases are exercising their I/O subsystems and not relying only on in-memory behavior.
1. Google Cloud Bigtable Clients
Our instance has 3 clusters all in the same region. We spawned 12 small (4 cpus, 8GB RAM) GCE machines, 4 in each of the 3 zones where Cloud Bigtable clusters are located. Each client ran 1 YCSB client with 50 threads and a target of 7,500 ops per second; for a total of 90,000 operations per second. The command used is:
~/YCSB/bin/ycsb run googlebigtable -p
requestdistribution=uniform -p columnfamily=cf -p
recordcount=1000000000 -p operationcount=125000000 -p
maxexecutiontime=5400 -s -P ~/YCSB/workloads/$WORKLOAD -threads
2. Scylla Cloud Clients
Our Scylla cluster has 3 nodes spread across 3 different Availability Zones. We spawned 12 c5.xlarge machines, 4 in each AZ, running one YCSB client with 50 threads and a target of 7,500 ops per second; for a total of 90,000 ops per second. For this benchmark, we used a YCSB version that handles prepared statements, which means all queries will be compiled only once and then reused. We also used the Scylla-optimized Java driver. Although Scylla is compatible with Apache Cassandra drivers, it ships with optimized drivers that increase performance through Scylla-specific features.
~/YCSB/bin/ycsb run cassandra-cql -p hosts=$HOSTS -p
recordcount=1000000000 -p operationcount=125000000 -p
cassandra.readconsistencylevel=LOCAL_ONE -p maxexecutiontime=5400
-s -P workloads/$WORKLOAD -threads 50 -target 7500
Availability and Consistency Models
When comparing different database offerings it is important to make sure they are both providing similar data placement availability and consistency guarantees. Stronger consistency and higher availability are often available but come at a cost, so benchmarks should take this into consideration.
Both Cloud Bigtable and Scylla have flexible settings for replication and consistency options, so the first step is to understand how those are set for each offering. Cloud Bigtable does not lose any local data when an individual node fails, meaning it is reasonable to run it without any replicas. Still, nothing will save you from an entire zone failure, i.e., disaster recovery, thus it’s highly recommended to have at least one more availability zone.
Scylla Cloud, on the other hand, utilizes instance-local ephemeral storage for its data, meaning local-node data will not survive hardware failures. This means that running a single replica is not acceptable for data durability reasons, but as a nice side-effect of that any standard Scylla Cloud setup already is replicated across availability zones and the service will be kept available in the face of availability zone failures.
To properly compare such different solutions, we will start with a user-driven set of requirements and will compare the cost of both solutions. We will simulate a scenario in which the user wants the data available in three different zones, with all zones in the same region. This ensures that all communications are still low latency but can withstand failure of up to two zones without becoming unavailable as long as the region is still available.
Among different zones, we will assume a scenario in which the user aims for eventual consistency. This should lead to the most performing setup possible for both offerings given the restriction of maintaining three replicas we imposed above.
The post Going Head-to-Head: Scylla Cloud vs. Google Cloud Bigtable appeared first on ScyllaDB.
Got a Real-Time Big Data Story to Share?
Scylla Summit 2019 is coming to San Francisco this November 5-6. Our sessions will cover real-world Scylla use cases, roadmap plans, technical product demonstrations, integrations with NoSQL and other Big Data ecosystem components, training sessions, the latest Scylla news, and much more.
Everyone has a story about their journey to find the best database solution for their specific use case. What’s yours? Are you a developer, architect, designer, or product manager leveraging the close-to-the-hardware design of Scylla or Seastar? Are you an executive, entrepreneur, or innovator faced with navigating the impact of NoSQL databases on your organization? If so, come and share your experiences to help build community and your own profile within it.
Best of all, speakers receive complimentary admission to both the 2-day summit and the pre-summit training!
What Our Attendees Most Want to Hear
Our user and customer talks are the key aspect of Scylla Summit! We’re looking for compelling case studies, technical sessions, technical and organizational best practices, and more. See below for a list of suggested topics, and feel free to recommend others. The deadline for submissions is Friday, June 28th, 2019.
Real-world Use Cases
Everything from design and architectural considerations to POCs, to production deployment and operational lessons learned, including…
- Practical examples for vertical markets — Ad Tech, FinTech, fraud detection, customer/user data, e-commerce, media/multimedia, social, B2B or B2C apps, IoT, and more.
- Migrations — How did you get from where you were in the past to where you are today? What were the limitations of other systems you needed to overcome? How did Scylla address those issues?
- Hard numbers — Clusters, nodes, CPUs, RAM and disk, data size, growth, OPS, throughput, latencies, benchmark and stress test results (think graphs, charts and tables)
- War stories — What were the hardest Big Data challenges you faced? How did you solve them? What lessons can you pass along?
Tips & Tricks
From getting started to performance optimization to disaster management, tell us your devops secrets, or unleash your chaos monkey!
- Computer languages and dev environments — What are your favorite languages and tools? Are you a Pythonista? Doing something interesting in Golang or Rust?
- Is this Open Source? — Got a Github repo to share? Our attendees would love to walk your code.
- Integrations into Big Data ecosystem — Share your stack! Kafka, Spark, other SQL and NoSQL systems; time series, graph, in-memory integrations (think “block diagrams”).
- Seastar — The Seastar infrastructure, which is the heart of our Scylla database, can be used for other projects as well. What sort of systems architecture challenges are you tackling with Seastar?
Event Details and Call for Speakers Schedule
- Dates: Tuesday-Wednesday, November 5-6, 2019 (Pre-Summit Training Day on November 4)
- Location: Parc 55 Hotel, 55 Cyril Magnin Street, San Francisco, CA
- Speaker Submission Deadline: Friday, June 28th, 2019, 5:00 PM
- Speakers receive complimentary admission to the 2-day summit and 1-day pre-summit training (on Monday, November 4, 2019)
- Code of Conduct
You’ll be asked to include the following information in your proposal:
- Proposed title
- Suggested main topic
- Audience information
- Who is this presentation for?
- What will the audience take away?
- What prerequisite knowledge would they need?
- Videos or live demos included in presentation?
- Length of the presentation (30-45 minutes recommended)
Eight Tips for a Successful Speaker Proposal
Help us understand why your presentation is the right one for Scylla Summit 2019. Please keep in mind this event is made by and for deeply technical professionals. All presentations and supporting materials must be respectful and inclusive.
- Be authentic — Your peers need original ideas with real-world scenarios, relevant examples, and knowledge transfer
- Be catchy — Give your proposal a simple and straightforward but catchy title
- Be interesting — Make sure the subject will be of interest to others; explain why people will want to attend and what they’ll take away from it
- Be complete — Include as much detail about the presentation as possible
- Don’t be “pitchy” — Keep proposals free of marketing and sales. We tend to ignore proposals submitted by PR agencies and require that we can reach the suggested participant directly.
- Be understandable — While you can certainly cite industry terms, try to write a jargon-free proposal that contains clear value for attendees
- Be deliverable — Sessions have a fixed length, and you will not be able to cover everything. The best sessions are concise and focused. Overviews aren’t great in this format; the narrower your topic is, the deeper you can dive into it, giving the audience more to take home
- Be cautious — Live demos sometimes don’t go as planned, so we don’t recommend them
Super Early Bird Registration for Scylla Summit 2019 is open!
Anomalia Machina 10 – Final Results: Massively Scalable Anomaly Detection with Apache Kafka and Cassandra
In this tenth and final blog of the Anomalia Machina series we tune the anomaly detection system and succeed in scaling the application out from 3 to 48 Cassandra nodes, and get some impressive numbers: 574 CPU cores (across Cassandra, Kafka, and Kubernetes clusters), 2.3 Million writes/s into Kafka (peak), 220,000 anomaly checks per second (sustainable), which is a massive 19 Billion anomaly checks a day.
1. The Scaling Journey
Odysseus’s final challenge was to regain his throne! Odysseus finally reached his homeland of Ithaca only to find his palace overrun with a crowd of 108 suitors who were drinking his wine, slaughtering his cattle, and courting his wife, Penelope. If he had rushed in without thinking, it would probably have ended badly. Instead, disguised as a beggar, he planned to recapture his throne. Penelope announced to the suitors that she would marry the man who could string the bow of Odysseus and shoot an arrow through 12 axes placed in a row. The suitors all tried and failed. When the “beggar” tried, he accomplished the feat! Throwing off his disguise, Odysseus fought and eliminated all the suitors.
Likewise, rather than attempting to jump straight to the end of the story and run the Anomalia Machina Application at a massive scale we thought it prudent to scale it out on increasingly bigger clusters. The aim was to (a) discover how to scale it before committing a larger amount of resources, and (b) document how well it scales with increasing cluster sizes.
Odysseus eliminates Penelope’s suitors
For the initial scalability testing, I simplified the approach to focus on Cassandra scalability and application tuning on my Kubernetes cluster. To do this I used a small production Kafka cluster (3 nodes with 4 cores each) to “replay” the same events that I had previously sent to it. Event reprocessing is a rich use case for Kafka that we explored in the blog Exploring the Apache Kafka “Castle” Part B: Event Reprocessing. Kafka consumers also place a very low load on Kafka clusters compared with producers, so this ensured that Kafka was not a bottleneck and the results were repeatable as I scaled the rest of the system.
To get the Anomalia Machina application ready to scale there were a few things I improved from the previous blog. Given the likely increase in the number of Pods to monitor with Prometheus, the previous approach of running the Prometheus server on a laptop and manually adding each Pod IP address to the configuration file was no longer workable. To fix this I deployed Prometheus to the Kubernetes cluster and automated Pod monitoring using the Prometheus Operator.
Prometheus Kubernetes Operator
The first step is to install the Prometheus Operator and get it running. I did this by copying the yaml file:
to my local machine, and running:
kubectl apply -f bundle.yaml
The Prometheus operator works by dynamically detecting and monitoring Pods with labels that match a selector. Some assembly is required.
- Service Objects monitor Pods (with labels that match the selector)
- ServiceMonitors discover Service Objects
- Prometheus objects specify which ServiceMonitors should be included
- To access the Prometheus instance it must be exposed to the outside world (e.g. by using a Service of type NodePort)
- When it doesn’t work the first time, you most likely need to create Role-based access control rules for both Prometheus and Prometheus Operator
See this documentation for all the steps and examples.
But now that Prometheus was running in the Kubernetes cluster it’s not easy for it to monitor the Kafka load generator (Kafka producer application) running on standalone AWS EC2 instances. I also wanted to ensure sufficient resources for the Kafka producer, so an obvious solution was to deploy it into the Kubernetes cluster. This turned out to be easy, just a simple copy/edit of the existing Kubernetes deployment artefacts to create a new Deployment type from the producer jar file. The Kafka producer load can now be easily increased by scaling the number of Pods. This also enables unified monitoring of both the producer application and the detector pipeline application by the Prometheus Operator. We’re now ready to continue with the scaling journey.
Pre-tuning: “La Jamais Contente”, first automobile to reach 100 km/h in 1899 (electric, 68hp)
There are a few knobs to twiddle to tune the anomaly detector pipeline part of the application. Each Pod has 2 Kafka consumers (in a thread pool) reading events from the Kafka cluster and another thread pool which performs Cassandra writes and reads and runs the anomaly detection algorithm. A single Cassandra connection is initially started per Pod, but this can be increased automatically by the Cassandra Java driver, if the connection becomes overloaded (it didn’t).
There are therefore 2 parameters that are critical for tuning the application with increasing scale: (1) The number of Kafka consumers, and (2) the number of detector threads per Pod. The number of partitions for the Kafka topic must be greater than or equal to the number of Kafka consumers to ensure that every consumer receives data (any consumers in excess of the number of partitions will be idle). The number of detector threads per Pod is critical, as throughput peaks at a “magic” number, and drops off if there are more or less threads.
I initially assumed that I could scale-out by the relatively simple process of (1) tuning the detector thread pool at low load (1 Pod) for a 3 node cluster, and then (2) increasing the number of Pods for each bigger cluster until maximum throughput was obtained. I used this approach for increasing cluster sizes, doubling the number of nodes from 3 to 6, 12, and 24 nodes. Surprisingly this gave sub-linear scalability as shown by this graph.
Given the theoretical perfect linear scalability of Cassandra with more nodes, something was obviously wrong with this application tuning approach.
To gain more visibility into what was actually going on, I made more metrics visible in Prometheus including the throughput for the Kafka consumers, the detector thread pool, and different detector events including detector run or not run (>= 50 rows returned from Cassandra or < 50 rows), and the number of rows of data read from Cassandra. This enabled confirmation that the business metric was correct (only reporting when >= 50 rows for every ID), and checking/tuning the thread pool so that the consumer/detector rates are steady state (one is not getting ahead/behind the other), as this can be suboptimal.
I then reran the benchmark, and using the increased visibility the extra metrics gave, improved the tuning approach for smaller iterations of cluster sizes (adding only 3 nodes at a time rather than doubling the number as above). Adding extra nodes to an existing Cassandra cluster is easy with an Instaclustr managed Cassandra cluster, as there is a button on the console to automatically add extra nodes. This also sped up the benchmarking process as I didn’t have to reload the data into Cassandra at the start of each run (as I did for a new cluster each time), as it was already in place.
Note that I kept the Kafka cluster the same size for these experiments (as load from the Kafka consumers was minimal, < 10% CPU), but I did increase the size of the Kubernetes cluster by increasing the number of Kubernetes worker nodes each time CPU utilisation went over 50% to ensure the application resources weren’t a bottleneck.
The following graphs show the post-tuning results for clusters from 3 to 21 nodes, with linear extrapolation to 50 nodes. The number of expected Pods for 50 nodes is approximately 100.
A similar graph but this time showing the predicted number of Kafka consumers (2 x Pods) for a 50 node cluster to be around 200.
This graph shows the results of tuning the number of detector threads per pod with increasing Cassandra nodes, and linear extrapolation predicts more than 250 threads per Pod for a 50 node Cassandra cluster.
Note that there was up to +/-10% variation in results across runs with identical configurations, and that the tuning may not be 100% optimal for some of the middle-sized clusters (i.e. slightly too many pods and insufficient threads per pod). However, it is likely that extrapolation over 7 data points gives reasonable predictions for bigger clusters.
Did all the tuning help? Yes, from 3 to 21 node Cassandra clusters we now have significantly better scalability compared with the first attempt, now close to perfect linear scalability (within the +/-10% error variation) as shown in this graph.
Observation: Apache Cassandra is perfectly linear scalable (as it’s a shared nothing architecture there are no shared resources that can become bottlenecks with increased nodes), but you need to put some effort into application optimisation. Cassandra will handle large numbers of connections, but for good scalability try to minimise the total number of Cassandra connections by optimising the use of each connection.
Post-tuning: Fast-forward 120 years… “Pininfarina Battista” the fastest car in the world, 0-100 kph in 2 seconds, top speed 350 kph (electric, 1,900hp).
2. Results at Massive Scale
- 2.3 Million writes/s into Kafka (peak)
- 220,000 anomaly checks per second (sustainable)
- 400 million events checked for anomalies in 30 minutes
- 19 Billion anomaly checks a day
- 6.4 Million events/s total “system” throughput (peak)
- 574 CPU cores across Cassandra, Kafka, Kubernetes
For the final run, we revisited the original “Kafka as a buffer” use case (to decouple event producers from consumers). We want a Kafka cluster that will process at least 2 Million write/s for a few minutes to cope with load spikes while enabling the rest of the anomaly detection pipeline to scale and run at maximum capacity to process the backlog of events as fast as possible.
Based on the experience of tuning the application up to a 21 node Cassandra cluster we hopefully have sufficient experience to tackle the final challenge and scale up to something even bigger – a 48 node cluster.
2.1 Kafka Cluster Sizing
Based on the predictions above it looked like we needed 200 partitions on the Kafka side. I, therefore, spun up some different sized Kafka clusters and experimented with increasing producer throughputs and number of partitions.
Using a 6 node (4 CPU cores per node) Kafka cluster as a starting point, it’s apparent that the write throughput drops significantly with increasing partitions (This turns out to be due to the number of partitions being written to, rather than just in existence, as writing to only 6 out of 600 partitions results in the same throughput as if there were only 6 partitions).
Using bigger Kafka node sizes (8 cores per node) gets us into the target (>= 2M write/s) range for 200 partitions (right hand orange bar), but the cluster is close to maxed out, so we decided to use a 9 node (8 CPU cores per node) Kafka cluster, as we don’t want the Kafka cluster to be a bottleneck.
Initial testing revealed that 9 Kafka Producer Pods was sufficient to exceed the write/s target of 2M/s for the 9×8 Kafka cluster with 200 partitions.
2.2 48 Node Cassandra Cluster
To get the final results we spun up a 48 node Cassandra cluster on AWS using the Instaclustr console. We then tuned the application thread pool (thread pool 2 in the diagram in section 2.4) and increased the number of Pods while monitoring the application metrics in Prometheus, and the Kafka and Cassandra cluster metrics using the Instaclustr console. We reached the maximum anomaly checks/s with 100 Pods, with 300 detector threads per Pod (slightly more than predicted, giving a total of 30,000 detector pipeline application threads), and with the Cassandra cluster running close to flat out at 97% CPU (higher than recommended for a production cluster), and Kafka with some headroom at 66% CPU.
To test the Kafka as a buffer use case we switched from replaying existing Kafka events to reading new events, ramped up the Kafka producer load over 2 minutes, and held the load at maximum for a further 2 minutes before terminating.
After a few false starts, I found it useful to use an open source Kubernetes tool, Weaveworks Scope, to see that everything was working as expected. It is easy to connect to the Kubernetes cluster and supports different views and filtering of nodes. This view shows the main Services (some of which I’d had problems with configuring previously) and shows that Prometheus is correctly deployed and monitoring 100 Consumer Pods and 9 Producer Pods via the Prometheus operator.
Here are the specifications of the final system.
Cluster Details (all running in AWS, US East North Virginia)
Instaclustr managed Kafka – EBS: high throughput 1500 9 x r4.2xlarge-1500 (1,500 GB Disk, 61 GB RAM, 8 cores), Apache Kafka 2.1.0, Replication Factor=3
Instaclustr managed Cassandra – Extra Large, 48 x i3.2xlarge (1769 GB (SSD), 61 GB RAM, 8 cores), Apache Cassandra 3.11.3, Replication Factor=3
AWS EKS Kubernetes Worker Nodes – 2 x c5.18xlarge (72 cores, 144 GB RAM, 25 Gbps network), Kubernetes Version 1.10, Platform Version eks.3
2.3 Raw Results in Prometheus and Grafana
Here are the raw results. The average latency of the detector pipeline (from reading an event from Kafka, to deciding if it is an anomaly or not) was under 460ms for this test as shown in this Prometheus graph.
The next graph shows the Kafka producer ramping up (from 1 to 9 Kubernetes Pods), with 2 minutes load time, peaking at 2.3M events/s (this time in Grafana). Note that because each metric was being retrieved from multiple Pods I had to view them as stacked graphs to get the total metric value for all the Pods.
This graph shows the anomaly check rate reaching 220,000 events/s and continuing (until all the events are processed). Prometheus is gathering this metric from 100 Kubernetes Pods.
2.4 Is this a good result?
Ten years ago it was considered “impractical to present an entire series of transactions” to an anomaly detection system. Instead, they recommended using aggregated historical data. However, we’ve demonstrated that current technology is more than up to the task of detecting anomalies from the raw transactions, rather than having to rely on aggregated data.
How do our results compare with more recent results? Results published in 2018, for a similar system, achieved 200 anomaly check/s using 240 cores. They used supervised anomaly detection which required training of the classifiers (once a day), so they used Apache Spark (for ML, feature engineering, and classification), as well as Kafka and Cassandra. Taking into account resource differences, our result is around 500 times higher throughput, and with faster real-time latency. They had more overhead due to the “feature engineering” phase, and their use of Spark to run the classifier introduced up to 200s latency, making it unsuitable for real-time use. With a detection latency under 1s (average 500ms), our solution is fast enough to provide real-time anomaly detection and blocking. If the incoming load exceeds the capacity of the pipeline for brief periods of time the processing time increases, and potentially anomalous transactions detected then may need to be handled differently.
To summarise, the maximum Kafka writes/s reached 2.3M/s, while the rest of the pipeline managed a sustainable 220,000 anomaly checks/s.
The numbers are actually bigger than this if we take into account all the events in the complete system (i.e. all the events flowing between the distributed systems). In the previous blog we showed that for every anomaly check decision, there are many other events contributing to it. For the load spike scenario, we need to take into account the bigger Kafka load spike (Y, blue line, 2.3M/s) and the smaller detector pipeline rate (Z, orange line, 220,000/s):
The peak total system throughput calculation is slightly different from the previous steady-state calculation as the load spike (Y) is produced by the Kafka load generator (step 1) and written to the Kafka cluster (step 2) until the rest of the pipeline catches up and processes them (steps 3-10, at maximum rate Z).
The peak system throughput is therefore (2 x Peak Kafka load) + (8 x anomaly checks/s) = (2 x 2.3M) + (8 x 0.22) = 6.4 Million events/s as shown in this graph:
Here are some more big numbers. The 4 minutes of events (ramp up and load) into Kafka produces 400 million events to be checked, and it takes the anomaly detection pipeline 30 minutes to process all the events. The following graph shows this scenario (note that rates are in Millions per minute):
A more realistic scenario is to assume that on top of the load spike, there is an average background load of say 50% of the pipeline capacity running continuously (110,000 events/s). It then takes 60 minutes to clear all the events due to the load spike as shown in this graph:
Under what circumstances would this be useful? Imagine we have an SLA in place to process say 99% of events per week in under 1s, with an upper bound of 1-hour latency. Assuming load spike events like this are relatively infrequent (e.g. once a week) then these scenarios can satisfy a 99.4% SLA (1 hour is 0.6% of a week).
And for our final and Biggest Number, the following graph shows that our largest Anomalia Machine system with 48 Cassandra nodes has more than sufficient capability to process 19 Billion anomaly checks a day.
2.6 How Big is Anomalia Machina?
How big is our final Anomalia Machina “machine”? Here’s a graph showing the business metric vs the number of cores used for the Cassandra, Kafka, and Kubernetes clusters and the total system.
The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total. This is a lot of cores! Managing the provisioning and monitoring of this sized system by hand would be an enormous effort. With the combination of the Instaclustr managed Cassandra and Kafka clusters (automated provisioning and monitoring), and the Kubernetes (AWS EKS) managed cluster for the application deployment it was straightforward to spin up clusters on demand, run the application for a few hours, and delete the resources when finished for significant cost savings. Monitoring over 100 Pods running the application using the Prometheus Kubernetes operator worked smoothly and gave enhanced visibility into the application and the necessary access to the benchmark metrics for tuning and reporting of results.
The system (irrespective of size) was delivering an approximately constant 400 anomaly checks per second per core.
It is worth noting that the Cassandra cluster is more than 5 times bigger than the Kafka cluster, even though the Kafka cluster is processing an order of magnitude larger load spike (2.3M/s) than the Cassandra cluster (220,000/s). It is obviously more efficient (easier, cheaper, more elastic) to use “Kafka as a buffer” to cope with load spikes rather than to increase the size of the Cassandra cluster by an order of magnitude (i.e. from 48 to 480 nodes!) in a hurry. However, it is possible to dynamically resize a Cassandra cluster given sufficient warning. Instaclustr’s dynamic resizing for Apache Cassandra enables vertical scaling up or down in minutes (20-30 minutes for a complete cluster, but the capacity starts to increase almost immediately). The biggest increase in capacity is from r4.large (2 cores) to r.4xlarge (16 cores) giving a capacity increase of 8 times. This would be sufficient for this scenario if used and in conjunction with Kafka as a buffer, and would result in significantly faster processing of the event backlog. I tried this on a smaller cluster with resizing one node at a time (concurrent resizing is also an option), and it worked flawlessly. For this to work you need to have (1) created a resizable Instaclustr Cassandra cluster, (2) with sufficient nodes to enable vertical scaling to satisfy the target load, and (3) enable elastic scaling of the application on Kubernetes (this is another challenge).
2.7 Affordability at Scale
We have proven that our system can scale well to process 19 Billion events a day, more than adequate for even a large business. So, what is the operational cost to run an anomaly detection system of this size? This graph shows that it only costs around $1,000 a day for the basic infrastructure using on-demand AWS instances.
This graph also shows that the system can easily be scaled up or down to match different business requirements, and the infrastructure costs will scale proportionally. For example, the smallest system we ran still checked 1.5 Billion events per day, for a cost of only $100/day for the AWS infrastructure.
Admittedly, the total cost of ownership would be higher (including R&D of the anomaly detection application, ongoing maintenance of the application, Managed service costs, etc). Assuming a more realistic $10,000 a day total cost (x10 the infrastructure cost), the system can run anomaly checks on 1.9 Million events per dollar spent.
Just as Homer’s epic was an exercise in unbounded imagination (e.g. Heroes, gods, monsters such as Polyphemus the one-eyed cannibal Cyclops, the alluring Sirens, Scylla the six-headed sea snake, Calypso the Nymph), the size and use cases of a scalable Anomaly Detector system are only limited by your imagination! In this series we have demonstrated that a combination of open source technologies (Kafka, Cassandra, Kubernetes, Prometheus) and Instaclustr managed Kafka and Cassandra clusters can scale to detect anomalies hidden in Billions of events a day, provides a significant return/cost ratio and actual dollar savings, and is applicable across many application areas. Do you have a Homeric scale imagination?
I had a chance encounter with “Kubernetes” (helmsman) related technology recently. The Fijian museum in Suva has the last Fijian ocean going double-hulled canoe (Drua – the Ratu Finau, 14m hull). Here’s a photo with its steering oar (uli), which is over 3m long, but could be managed by one helmsman.
The steering oars of the older bigger canoes (36m hulls) were even more massive (as long as this canoe) and needed up to 4 helmsmen to handle them (with the help of ropes) and keep them on course.
The post Anomalia Machina 10 – Final Results: Massively Scalable Anomaly Detection with Apache Kafka and Cassandra appeared first on Instaclustr.
We just released Scylla Monitoring Stack version 2.3. The new version comes with dashboards to support the coming Scylla Enterprise 2019.1 release and for the Scylla Manager 1.4 release.
Making the Scylla Monitoring Stack more robust
Scylla Monitoring Stack 2.3 improves the way Scylla Monitoring works with templates and makes some of the magic of dashboard generation more visible and explicit.
Scylla Monitoring Stack uses Grafana for its front end dashboards. Grafana dashboards definitions are verbose and hard to maintain. To make dashboard maintenance easier we use a hierarchical template mechanism. You can read more about it the blog post here.
We use a python script to generate the dashboards from the template. This created a dependency on python for the solution to work.
As of Scylla Monitoring Stack 2.3, the dashboards will be available pre-generated with the release. Which means, that by default, you no longer have a python dependency.
Making changes to the dashboards
If you are making changes to the dashboard, you will need to run generate-dashboards.sh for the changes to take effect. Note that generate-dashboards.sh will change the dashboards in place and that the grafana server will update the changes without a restart and does depend on python.
Docker and Permissions
Using Docker Containers is an easy way to install the different servers of the Scylla Monitoring Stack. Containers bring a layer of isolation with the intent to provide additional security. In practice, many users run into issues when a process inside the container needs to access and modify files outside of the container. This happens with the Prometheus data directory, and now, with Grafana dashboards and plugins.
Many users when facing this problem tends to use some workaround to bypass the Linux security, examples are using root (running with sudo) changing directory permissions to all, and disabling SELinux.
All these workarounds are unadvised. We made multiple changes in the way we run the containers so it will not be necessary.
Best Practices for using Scylla Monitoring Stack:
- Do not use root, use the same user for everything (e.g. centos)
- Add that user to the docker group (See here)
- Use the same user when downloading and extracting Scylla Monitoring Stack
- If you did use
sudoin the past, it is preferential to change the directory and file ownership instead of granting excessive root permissions.
Controlling the alerts configuration files from the command line
It is now possible to override the Prometheus alert file and the alertmanager configuration files from the command line.
The Prometheus alert file, describe what alerts will be triggered and when. To specify the Prometheus alert file use the -R command line with start-alls.h
The Alertmanager configuration describes how to handle alerts. To specify the alert manager config file use the -r command line operator with start-all.sh
./start-all.sh -r rules.yml
As of Scylla Monitoring Stack 2.3 all directory and files can be passed either as a relative path or as an absolute path.
Generating Prometheus configuration with genconfig.py
genconfig.py is a utility that can generate the scylla_sever files for Prometheus. In Scylla Monitoring Stack 2.3 there are multiple enhancement to it.
- It now supports dc and cluster name.
- You can use the output from
nodetool statusas input, this will make sure that your datacenters are configured correctly.
- It no longer creates the node_exporter file that was deprecated.
New panels to existing dashboards
CQL Optimization: Cross shard
Scylla uses a shared-nothing model that shards all requests onto individual cores. Scylla runs one application thread-per-core, and depends on explicit message passing, not shared memory between threads. This design avoids slow, unscalable lock primitives and cache bounces.
Ideally, each request to a Scylla node reaches the right core (shard), avoiding internal communication between cores. This is not always the case, for example, when using a non-shard-aware Scylla driver (see more here)
New panels in the cql optimization dashboard were added to help identify cross-shard traffic.
Per-machine Dashboard: Disk Usage Over time
Answering user request to show the disk usage as a graph over time, the disk size panel shows the aggregation disk usage (by instance, dc or cluster)
Now you’ve seen the changes that were made in Scylla Monitoring Stack 2.3 to make it easier to run and more secure. The next step is yours! Download Scylla Monitoring Stack 2.3 directly from Github. It’s free and open source. If you try it, we’d love to hear your feedback, either by contacting us privately or sharing your experience with your fellow users on our Slack channel.
The Scylla team is pleased to announce the release of Scylla Monitoring Stack 2.3.
- Scylla Open Source versions 2.3 and 3.0
- Scylla Enterprise versions 2018.x and 2019.x (upcoming release)
- Scylla Manager 1.3.x, 1.4.x (upcoming release)
- Download Scylla Monitoring 2.3
- Scylla Monitoring Stack Docs
- Upgrade from Scylla Monitoring 2.x to 2.3
New in Scylla Monitoring Stack 2.3
- Scylla enterprise dashboards for 2019.1 (#538)
Getting ready for Scylla Enterprise 2019.1, the dashboards for Scylla Enterprise 2019.1 are included.
- Scylla manager dashboard for 1.4 (#557)
Getting ready for Scylla Manager 1.4, the dashboard for Scylla Manager 1.4 is included.
- Dashboards are precompiled in the release
Scylla Monitoring Stack uses templates for simpler dashboard representation. A python script is used to generate the dashboards from the templates. As of Scylla Monitoring version 2.3, the dashboards will be packaged pre-compiled part of the release. This will remove the dependency in python for users that do not make changes to the dashboards.In addition, the starting script will not generate new dashboards on change, but will only issue a warning that a dashboard was changed and would ask the user to run the generation script.
- Add cross_shard_ops panel to cql optimization (#553)
Scylla uses a shared-nothing model that shards all requests onto individual cores. Scylla runs one application thread-per-core, and depends on explicit message passing, not shared memory between threads. This design avoids slow, unscalable lock primitives and cache bounces.Ideally, each request to a Scylla node reaches the right core (shard), avoiding internal communication between cores. This is not always the case, for example, when using a non-shard-aware Scylla driver (see more here)New panels in the cql optimization dashboard were added to help identify cross-shard traffic.
- Cluster name in use is shown in the dashboard (#533)
To simplify monitoring multi-cluster installation the cluster name is now shown on the dashboards header.
- genconfig.py with multi dc support (#513)
The genconfig.py utility can accept datacenter name, it can also be used with nodetool output for simpler configuration.
- Add a storage usage over time panel (#466)
Add a panel to the per-machine dashboard that shows the disk usage over time.
- Upgrade Prometheus to 2.7.2 (#456)
Prometheus container now uses Prometheus version 2.7.2, see Prometheus releases for more information
- Show more information about compaction (#491)
Two panels were added to the per-server dashboard, one that shows the percentage of CPU used by compaction, and one that shows the compaction shares over time.
- Alertmanager and Prometheus alerts can be configured from the
It is now possible to override the Prometheus alert file and the alertmanager configuration files from the command line.To specify the alert manager config file use the
-rcommand line argument with
./start-all.sh -r rules.ymlTo specify the prometheus alert file use the
-Rcommand line argument with start-alls.hFor example:
./start-all.sh -R prometheus.rules.yml
- Warn users when starting docker as root and make grafana volume
Users should avoid running Docker containers as root. When a user would start the Monitoring stack as root, a warning will be issued.
- Prometheus data directory to accept relative path (#527)
No need to specify the full path to the Prometheus data directory.
- Prometheus.rules.yaml: Fix the alert rule that warns when there is no CQL connectivity (#541)
- Not all 2018.1 uses cluster and dc (#540)
The threat of cybersecurity is real, pervasive and omnipresent. Real-time global cybersecurity attack maps, such as those maintained by Kaspersky Labs and Fortinet show there are no breaks, vacations or downtime allowance in the world of online and computer security.
The scale, variety, complexity, velocity and ferocity of attacks has compounded year-over-year. The costs associated with such attacks also continues to increase, and cybercrime is estimated to exceed $6 trillion annually by 2021. Beyond the criminal costs, there are also risks that range from national security to personal safety.
Veramine is one company tackling the national security threat. Awarded contracts from the U.S. Air Force and U.S. Department of Homeland Security to defend against cyberthreats, Veramine is also a commercially available service for enterprises.
Veramine provides advanced capabilities for reactive intrusion response and proactive threat detection. Using endpoint telemetry to feed a central server, Veramine scours huge amounts of network data to identify attacker activity. With its advanced detection engine, advanced rule-based and machine-learning algorithms, Veramine can identify Mimikatz-style password dumping, kernel-mode exploitation (local EoP), process injection, unauthorized lateral movement, and other attacker activity.
For Veramine, cybersecurity analysis begins with good data. As such, the company’s goal is to collect as much data from the enterprise network as possible, including both servers and desktops. Events collected by the platform are enriched with context information from the system. For example, every network connection that’s recorded is associated with its originating process, user, time, and associated metadata. The end result is huge and ever-growing data set.
“Even if the performance were only as good as Cassandra, and in fact it’s much better, Scylla would still be a significant improvement.”
— Jonathan Ness, CEO, Veramine
According to Jonathan Ness, CEO of Veramine, “We’re trying to collect everything you would ever want to know about what goes on on a computer and centralize that data and run algorithms on it.”
All that data needs to be stored somewhere. Since the data is so sensitive, very few Veramine customers permit it to be stored in the cloud. Veramine needs to provide low-latency big data capabilities in on-premises datacenters. Given the sensitive nature of the data collected, Veramine personnel were unable to directly access databases to help with support.
Veramine began using Postgres, but quickly realized that a NoSQL database was more appropriate to their use case. They switched to Cassandra, but soon realized that it was not up to the task.
“The problem was every week it was crashing, so we created all this infrastructure just to keep Cassandra alive,” said Ness. Veramine went so far as to parse Cassandra logs in an attempt to predict when garbage collection would happen, and then apply throttling to avoid crashing the database. Without direct access to customer environments, Cassandra soon became a nightmare. The team set out to find a replacement.
What was needed was a low-latency NoSQL database that provided extremely low administrative overhead and high stability. Initial attempts to use PostgreSQL did not meet the challenge. And while Cassandra was able to scale, it had operational management problems. That was when the Veramine team turned to Scylla.
Veramine saw instant results from using Scylla. “We started using Scylla two years ago,” said Ness. “We fell in love with Scylla because it doesn’t crash and we don’t have to manage it.” Since Scylla is a feature-complete, drop-in replacement for Cassandra, the migration was quick and painless. “Our code didn’t change much going from Cassandra to Scylla.”
According to Ness, a big benefit of Scylla is developer productivity. Scylla lets the team focus on business logic rather than on custom code around the datastore. Veramine’s Scylla clusters that are running in production are surprisingly small compared to Cassandra.
Ness summed up Veramine’s Scylla journey: “Even if the performance were only as good as Cassandra, and in fact it’s much better, Scylla would still be a significant improvement due to its stability and lower administrative overhead.”
The post Veramine Turns to Scylla to Manage Big Data for Enterprise Cybersecurity appeared first on ScyllaDB.