Maximizing Disk Utilization with Incremental Compaction

 Introduction

“We found having about 50% free disk space is a good rule of thumb.”

You’ve heard this advice since you first laid hands on Cassandra. And over time, we’ve all gotten used to it. Yet it still sounds awful, right? It’s the sad reality for all Size-Tiered compaction strategy (STCS) users, even those on DataStax Enterprise or Scylla. You need to leave at least half of the disk free so that you’re guaranteed your cluster won’t run out of disk space when you need to run compactions.

That translates into burning tons of money because tons of disk space are set aside due to this limitation. Let’s do some math: Suppose you have a cluster with 1000 nodes, where each has a 1 TB disk. If you need to keep each of those nodes under 50% of total disk usage, it means that at least 500TB will be wasted in the end. It’s a terrible limitation which increases storage cost by a factor of 2.

That’s why we came up with a new compaction approach, named Incremental Compaction, that solves this problem by considerably reducing the aforementioned space overhead with a hybrid technique that combines properties from both Size-Tiered and Leveled compaction strategies.

Incremental Compaction Strategy (ICS) was created to take full advantage of this new compaction approach and it is exclusively available in newer Scylla Enterprise releases (2019.1.4 and above).

Space overhead in Size-Tiered Compaction Strategy (STCS)

A compaction that runs on behalf of a table that uses STCS potentially has a 100% space overhead. It means that compaction may temporarily use disk space which can be up to the total size of all input SSTables. That’s because compaction, as-is, cannot release disk space for any of the input data until the whole operation is finished, and it needs additional space for storing all the output data, which can be as large as the input data.

Let’s say that a user is provided with a 1TB disk and the size of its table using STCS is roughly 0.5T. Compacting all SSTables in that aforementioned table together via nodetool compact, for example, could temporarily increase disk usage by ~0.5T so as to store the output data, potentially causing Scylla to run out of disk space.

That scenario is not triggered only with nodetool compact, it could also happen when compacting SSTables in the largest size-tiers, an automatic process that happens occasionally. So that’s why keeping each node under 50% disk usage is a major concern for all users of this compaction strategy.

To further understand how STCS works with regards to space amplification, we’d recommend you to take a look at this blog post.

Incremental compaction as a solution for temporary space overhead in STCS

We fixed the temporary space overhead on STCS by applying the incremental compaction approach to it, which resulted in the creation of Incremental Compaction Strategy (ICS). The compacted SSTables, that become increasingly larger over time with STCS, are replaced with sorted runs of SSTable fragments, together called “SSTable runs” – which is a concept borrowed from Leveled Compaction Strategy (LCS).

Each fragment is a roughly fixed size (aligned to partition boundaries) SSTable and it holds a unique range of keys, a portion of the whole SSTable run. Note that as the SSTable-runs in ICS hold exactly the same data as the corresponding SSTables created by STCS, they become increasingly longer over time (holding more fragments), in the same way that SSTables grow in size with STCS, yet the ICS SSTable fragments’ size remains the same.

For example, when compacting two SSTables (or SSTable runs) holding 7GB each: instead of writing up to 14GB into a single SSTable file, we’ll break the output SSTable into a run of 14 x 1GB fragments (fragment size is 1GB by default).

It can be seen that this new compaction approach takes runs as input and consequently outputs a new run, all of which are composed of one or more fragments. Also, the compaction procedure is modified to release an input fragment as soon as all of its data is safe in a new output fragment.

For example, when compacting 2 SSTables together, each 100GB in size, the worst-case temporary space requirement with STCS would be 200G. ICS, on the other hand, would have a worst-case requirement of roughly 2G (with the default fragment size of 1G) for exactly the same scenario. That’s because with the incremental compaction approach, those 2 SSTables would be actually 2 SSTable runs, each 100 fragments long, making it possible to roughly release 1 input SSTable fragment for each new output SSTable fragment, both of which are 1GB in size.

Given that ICS inherits the size-tiered nature from STCS, it will provide the same low write amplification as STCS. ICS also has the same read amplification as STCS even though the number of SSTables in a table is increased, compared to STCS. That’s because a SSTable run is composed of non-overlapping fragments, so we’re able to filter out all SSTables that don’t overlap with a given partition range in logarithmic complexity (using interval trees), not requiring any additional disk I/O. So read and write performance-wise, STCS and ICS are essentially the same. ICS has the same relatively-high space amplification as STCS due to accumulation of data within tiers if there are overwrites or fine-grain deletes in the workload, but when it comes to temporary space requirement, ICS doesn’t suffer from the same problem as STCS at all. That makes ICS far more efficient than STCS space wise, allowing its users to further utilize the disk.

  • Note: ICS may have better read performance if there are overwrites or fine-grain deletes in the workload, because compaction output can be used much earlier for data queries due to the incremental approach.

Watch this Scylla webinar to further understand how this new compaction approach works!



How much can I further utilize the disk with incremental compaction?

In order to know how much further you can utilize the disk, it’s important to first understand how the temporary space requirement works in practice for incremental compaction.

Given that incremental compaction can release fragments at roughly the same rate it produces new ones, a single compaction job will have, on average, a space overhead of 2GB (with the default 1GB fragment size). To calculate the worst-case space requirement for compaction, you need to multiply the maximum number of ongoing compactions by the space overhead for a single compaction job. The maximum number of ongoing compactions can be figured out by multiplying the number of shards by log4 of (disk size per shard).

For example, on a setup with 10 shards and 1TB disk, the maximum number of compactions will be 33 (10 * log4(1000/10)), which results in a worst-case space requirement of 66GB. Since the space overhead is a logarithmic function of the disk size, when increasing the disk size by a factor of 10 to 10TB, the maximum number of compactions will be 50 (10 * log4(10000/10)), resulting in a worst-case space requirement of 100GB which is only 1.5x (log(1000)/log(100)) larger than of 1TB disks. As you can see, the worst-case space requirement increased by a logarithmic factor even though the disk size increased by a factor of 10.

On the setup with 1TB disk, it could theoretically be used up to 93% given that compaction would temporarily increase usage by 6% at most. Whereas on the setup with 10TB disk, it could theoretically be used up to 98% given that compaction would temporarily increase usage by 1% at most. Note that additional disk capacity needs to be reserved for system usage, like commitlog, for example.

Now, consider a real life example with 14 shards and 1TB disk. The worst-case space requirement is calculated to be 86GB. We expect the disk usage to be temporarily increased by 8% at most, so we could theoretically use up to 90% of the disk.

In the experiment above we compared the temporary space consumption of ICS vs. STCS. The data set size filled about 82% of the disk and ICS compaction temporarily increased usage by a maximum of 6% (lower than the worst-case requirement), leading to a maximum of 88%, as depicted by the blue curve in the graph.

The yellow line shows that the temporary requirement for incremental compaction is reduced and bound by a constant function of the number of shards and the fragment size, both of which are constants. Contrast that to the green line that shows that the temporary requirement for STCS is a function of data size.

If we had used a fragment size of 500MB instead of the default 1000MB, the worst-case space requirement would have been reduced from 86GB to 43GB, increasing the utilization by 4% for a 1TB disk.

If we had a 10TB disk instead, reducing the fragment size to 500MB might have been pointless because a saving less than 100GB in worst-case space requirement increases the utilization only in less than 1%.

Incremental compaction guarantees that with a certain configuration, the worst-case temporary space requirement for compaction can be calculated as a function of the number of shards, the disk size, and the fragment size. The exact percentage that results in — 80%, 90%, etc. — depends on the disk size. Since the space overhead is proportional to the logarithm of the disk size, its percentage decreases significantly for larger disks.

Major (manual) compaction is not a nightmare anymore

The nightmare in terms of space requirement for virtually all compaction strategies is also known as major (manual) compaction. That has been a big problem for users because this compaction type is very important for getting rid of redundant data. Many STCS users rely on it periodically so as to avoid disk usage issues.

As you probably know, major compaction is basically about compacting all SSTables together. Given that STCS has 100% space overhead, it can be seen by the purple curve above that during STCS major compaction the disk usage doubles, and that can be correlated with a proportional increase in temporary space (in green). ICS major compaction, however, resulted in a space overhead of only ~5%, as shown by the blue line, making the operation feasible even when the disk usage was at 80%!. Therefore, major compaction is not a nightmare for ICS users at all.

Migrating from Size-tiered to Incremental Compaction Strategy

ICS is available from Scylla Enterprise 2019.1.4. Enterprise users can migrate to ICS by merely running the ALTER TABLE command on their STCS tables, as shown in the example below:

ALTER TABLE foo.bar with compaction =
{'class': 'IncrementalCompactionStrategy'};

Note that the benefits aren’t instantaneous because the pre-existing SSTables created with STCS cannot be released earlier by incremental compaction, as fragments can. That means their compaction will still suffer with the 100% space overhead. Users will only fully benefit from ICS once all pre-existing SSTables are transformed into SSTable runs. That can be done by running nodetool compact on the ICS tables, otherwise the process is undetermined because it depends on regular compactions eventually picking all those SSTables that were originally created by STCS. Please keep in mind that during migration, the space requirement for nodetool operation to complete is the same as in STCS, which can be up to double the size of a table.

Given that the goal is to push the disk utilization further, it’s extremely important to compact with ICS all of your large tables, that were previously created by STCS (system tables aren’t relevant), otherwise can affect the overall temporary space requirement, potentially leading to disk usage issues. Therefore we recommend running major compaction after the schema is changed to use ICS, before storing any substantial amounts of additional data in the table.

Configuring ICS

ICS inherits all strategy options from STCS, so you can control its behavior exactly like you would do with STCS. In addition, ICS has a new property called sstable_size_in_mb that controls the fragment size of SSTable runs. It is set to 1000 (1GB) by default.

ICS’s temporary space requirement is proportional to the fragment size, so lower sstable_size_in_mb decreases the temporary space overhead and higher values will increase it. For example, if you set sstable_size_in_mb to 500 (0.5GB), the temporary space requirement is expected to be cut by half. However, shrinking sstable_size_in_mb causes the number of SSTables to inflate, and with that, the SSTables in-memory metadata overhead will increase as well. So if you consider changing sstable_size_in_mb, we recommend keeping the average number of SSTables per shard roughly around 100 to 150.

  • Note: Number of SSTables per shard is estimated by: (data size in mb / sstable_size_in_mb / number of shards).

Conclusion

If you’re a Scylla Enterprise user that relies on STCS, we recommend switching to ICS immediately. Storage at scale is very expensive nowadays so it’s very important to maximize its usage. Incremental Compaction Strategy helps users increase disk utilization to the max, without increasing either read or write amplification. This translates into using fewer nodes, or storing more data on the existing cluster. So incremental compaction allows us to finally say:

Adios <50% disk usage limitation!

The post Maximizing Disk Utilization with Incremental Compaction appeared first on ScyllaDB.

Comcast: Sprinting from Cassandra to Scylla

Comcast’s Philip Zimich said on the stage at Scylla Summit 2019 that the Xfinity X1 platform saw such dramatically improved performance with Scylla over Apache Cassandra, especially on long-tail latency, they were also going to rip out the caching layer for its UI. The performance Scylla provides is “a massive, massive gain compared to where we are today.” In his twenty minute presentation, Philip, Senior Director of Software Development & Engineering at Comcast, ventured back in history to Comcast’s move from Oracle to NoSQL via Apache Cassandra, its subsequent shift from Cassandra to Scylla, and laid out the roadmap through to 2022. He showed how with Scylla the Xfinity X1 team will save on their costs and reduced their node count by an order of magnitude (from 962 Cassandra nodes to only 78 Scylla nodes), while ensuring their user base still has ample room to grow.

Philip leads the architecture, development and operations of the Comcast’s X1 Scheduler system. The X1 Scheduler powers the DVR and program reminder experience on the Comcast X1 platform, a cable and streaming video service that supports more than 31 million set top boxes and “second screen” devices used on a monthly basis for 15 million households. Their X1 Scheduler processes more than 2 billion RESTful calls daily. To meet that scale, the X1 Scheduler uses multiple datastore technologies, including Cassandra, MongoDB, Elasticsearch and Scylla.

Click the link below to hear his talk in full. You can start with this short preview:

In the full video you’ll learn:

  • The issues Comcast faced in their use of Cassandra and DataStax
  • How Scylla’s performance has created a “snappier” UI for Comcast users
  • How they anticipate saving over 60% of their Cassandra operating costs once the migration is complete.
  • The methodology Comcast used through their evaluation, risk analysis and testing processes, plus
  • Details of their benchmarking, migration and deployment plans

WATCH THE FULL VIDEO

The post Comcast: Sprinting from Cassandra to Scylla appeared first on ScyllaDB.

ScyllaDB’s Top Blogs for 2019

As we look back at 2019, it was a year of growth and progress at ScyllaDB. Since we maintain an active blog site, we can track many of our highlights from the year in our blog posts. What was most popular in our blog reflects the strongest interests of our community. So let’s look at the top ten blogs of 2019:

10. Scylla Alternator: The Open Source DynamoDB-compatible API

From our foundation Scylla has been known as the faster alternative to Apache Cassandra. In 2019 we also became known as the open source alternative to Amazon DynamoDB! The community became immediately interested in our Project Alternator API. With it you can now run your DynamoDB-compatible apps on any cloud vendor, or even on your own private cloud. It was a huge game changer for us, and we’ve only gotten to the #10 spot!

9. Isolating Workloads with Systemd Slices

The engineers at ScyllaDB aren’t just database experts. They’re also Linux OS experts. Since they understand how Linux works at deep, fundamental levels they can ensure Scylla leverages system primitives to operate in a fast and efficient manner. This blog looks specifically at SystemD slices and cgroups, a few of the Linux primitives behind tools like Docker, showing how Scylla uses them to prioritize database performance over helper apps running on the same machine.

8. Best Practices for Scylla Applications

Our users chose Scylla because they want to get the most out of their big data. So it’s no surprise that they flocked to this article on how to get the most out of Scylla. It’s chock full of tips on monitoring, data modeling, batching, collections, parallelism and more.

7. Scylla and Elasticsearch, Part Two: Practical Examples to Support Full-Text Search Workloads

Scylla and Elasticsearch are both best-of-breed solutions. We’ve had a lot of customer interest in using them together side-by-side for a complete big data solution. This how-to article, which included fully-working Python code, became a design pattern that drew a lot of interest from the community.

6. Managed Cassandra on AWS, Our Take

Amazon Web Services released their Managed Cassandra service in December. It’s a sort of chimera — half Cassandra on the front-end, and half DynamoDB on the back-end. While its UI/UX and serverless setup are nice, it is comparatively pricey, plus there are some key missing features for people familiar with Apache Cassandra.

5. AWS New I3en Meganode – Bigger Nodes for Bigger Data

Scylla is a ‘greedy’ application that loves scaling vertically: the more CPU and the more disk you can throw at it the happier it runs. We found that the new AWS EC2 makes Scylla very happy. With up to 60 terabytes of fast storage and 100 gbps networking interface cards (NICs), the I3en series is designed to take on the heaviest weight big data workloads.

How big is the i3en.24xlarge node? Sorta like this.

4. Powering a Graph Data System with Scylla + JanusGraph

JanusGraph is a powerful open source graph database solution that uses a pluggable back end datastore. We’re finding more and more JanusGraph users are turning to Scylla as that back end for its performance and scalability characteristics. This article was guest written by Ryan Stauffer of Enharmonic, one of the leading practitioners in the industry.

3. The Complex Path for a Simple Portable Python Interpreter, or Snakes on a Data Plane

While Python is theoretically a portable language, in practice it takes a lot of forethought and effort to bundle up all the dependencies to ship a Python interpreter for a hassle-free experience. Find out how (and why) we did it without cython, Nuitka or PyInstaller.

“When I said I wanted portable Python, this is NOT what I meant!”

2. Is Arm ready for server dominance?

Premiered at AWS re:Invent in December 2019 the Arm-powered Graviton2-based servers definitely were a shot across the bow for other major chip vendors. Our assessment was based on concrete test results that showed Arm is now poised for a major breakout in the server market:

“Major disruptive shifts in technology don’t come often. Displacing strong incumbents is hard and predictions about industry-wide replacement are, more often than not, wrong. But this kind of major disruption does happen. In fact we believe the server market is seeing tectonic changes which will eventually favor Arm-based servers. Companies that prepare for it are ripe for extracting value from this trend.”

This article, which included some of the first released test results for the new Amazon servers, caught the attention of many and trended on Hacker News all that day, putting it at the number two spot for the year.

1. Introducing Scylla Open Source 3.0

In the top spot for 2019 was our blog highlighting all the features of Scylla Open Source 3.0. From our production-ready materialized views to global secondary indexes, to streaming improvements, hinted handoffs, filtering and more, these features not only put us in parity with Cassandra but surpassed it in functionality in significant ways.

Those were our popular blogs from 2019. As we cast our gaze forward for 2020, we hope you’ll be pleased with our new directions and deliverables. But also, we want to stay closely aligned with your interests. Is there a topic you believe we should cover in 2020? Let us know! We’d love to hear your suggestions.

The post ScyllaDB’s Top Blogs for 2019 appeared first on ScyllaDB.

Apache Cassandra 4.0 – Stability and Testing

The last major version release of Apache Cassandra was 3.11.0 and that was more than 2 years ago in 2017. So what has the Cassandra developer community been doing over the last 2 years? Well let me tell you, it’s good,  real good. It’s Apache Cassandra 4.0! The final release is not here as yet, but with the release of the first alpha version, we now have a pretty solid idea of the features and capabilities that will be included in the final release. 

In this series of blog posts, we’ll take a meandering tour of some of the important changes, cool features, and nice ergonomics that are coming with Apache Cassandra 4.0.

The first blog of this series focuses on stability and testing.

Apache Cassandra 4.0: Stability and Testing

One of the explicit goals for Apache Cassandra 4.0 was to be the “most stable major release of Cassandra ever” (https://instac.io/37KfiAb

As those who’ve run Cassandra in production know it was generally advisable to wait for up to 5 or 6 minor versions before switching production clusters to a new major version. This resulted in adoption only occurring later in the supported cycle for a given major version. All in all, this was not a great user experience, and frankly a pretty poor look for a database which is the one piece of your infrastructure that really needs to operate correctly. 

In order to support a stable and safe major release, a significant amount of effort was put into improving Apache Cassandra testing. 

The first of these is the ability to run multi-node/coordinator tests in a single JVM (https://issues.apache.org/jira/browse/CASSANDRA-14821). This allows us to test distributed behavior with Java unit tests for quicker, more immediate feedback. Rather than having to leverage the longer running, more intensive DTests. This paid off immediately identifying typically hard to catch distributed bugs such as https://issues.apache.org/jira/browse/CASSANDRA-14807 and https://issues.apache.org/jira/browse/CASSANDRA-14812. It also resulted in a number of folk backporting this to earlier versions to assist in debugging tricky issues. 

See here for an example of one of the new tests, checking that a write fails properly during a schema disagreement:(https://instac.io/2ulqNQ7)

Interestingly, the implementation is a nice use of distinct Java class loaders to get around Cassandra’s horrid use of singletons everywhere and allows it to fire up multiple Cassandra Instances in a single JVM. 

From the ticket: “In order to be able to pass some information between the nodes, a common class loader is used that loads up Java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like.

Each Cassandra Instance, with its distinct class loader is using serialization and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc.”

On top of this, the community has started adopting Quick Theories as a library for introducing property based testing. Property based testing is a nice middle ground between unit tests and fuzzing. It allows you to define a range of inputs and test the test space (and beyond) in a repeatable and reproducible manner. 

Currently, in trunk there are two test classes that have adopted property based testing: EncodingStatsTest and ChecksummingTransformerTest.  However community members are using it in their own internal validation test frameworks for Cassandra and have been contributing bugs and patches back to the community as well. 

Moving beyond correctness testing, a significant amount of effort has gone into performance testing, especially with the change to adopt Netty as the framework for internode messaging.     So far testing has included, but definitely has not been limited to:

Probably the best indication of the amount of work that has gone into testing of the Netty rewrite can be seen in 15066 https://issues.apache.org/jira/browse/CASSANDRA-15066 and is well worth a read if you are into that kind of thing. 

The post Apache Cassandra 4.0 – Stability and Testing appeared first on Instaclustr.

Scylla Enterprise Release 2019.1.4

Scylla Enterprise Release Notes

The ScyllaDB team announces the release of Scylla Enterprise 2019.1.4, which is a production-ready Scylla Enterprise patch release. As always, Scylla Enterprise customers are encouraged to upgrade to Scylla Enterprise 2019.1.4 in coordination with the Scylla support team.

The focus of Scylla Enterprise 2019.1.4 is improving stability and robustness, reducing storage requirements with the new Incremental Compaction Strategy and enabling IPv6. More below.

Related Links

New features in Scylla Enterprise 2019.1.4

Incremental Compaction Strategy (ICS)

The new compaction strategy is an improvement on the default Size Tiered Compaction Strategy (STCS). While it shares the same read and writes amplification factors as STCS, it fixes its doubling of temporary disk usage issue by breaking huge SSTables into smaller chunks, all of which are named an SSTable run.

While STCS forces you to keep 50% of your storage reserved for temporary compaction allocation, ICS reduces most of this overhead, allowing you to use more of the disk — up to 80% — for regular usage. This can translate to using fewer nodes, saving more than a third of your overall storage costs, or allow you to store far more data on your existing cluster. For example, a typical node with 4 TB and 16 shards will have less than 20% temporary space amplification with ICS, allowing the user to run at up to 80% capacity. In a follow-up blog post we will provide a deep dive into ICS disk usage saving.

ICS is only available in Scylla Enterprise.

More on ICS:

IPv6

As you might have heard, there are no more IPv4 addresses available. As such you can now run Scylla Enterprise with IPv6 addresses for client-to-node and node-to-node communication.

Scylla now supports IPv6 Global Scope Addresses and Unique local addresses for all IPs: seeds, listen_address, broadcast_address etc. Note that Scylla Monitoring stack and Scylla Manager 2.0 already support IPv6 addresses as well. Make sure to enable enable_ipv6_dns_lookup in scylla.yaml (see below)

Example from scylla.yaml

Enable_ipv6_dns_lookup: true
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "fcff:69:46::7b"
listen_address: fcff:69:46::7b
broadcast_rpc_address: fcff:69:46::7b
rpc_address: fcff:69:46::7b
…

See related Scylla Open Source issue #2027

Fixed issues in this release are listed below, with open source references, if present:

  • Stability: non-graceful handling of end-of-disk space state may cause Scylla to exit with a coredump #4877
  • Stability: core dump on OOM during cache update after memtable flush, with !_snapshot->is_locked()’ failed error message #5327
  • Stability: Adding a DC with MV might fail with assertion _closing_state == state::closed #4948
  • Oversized allocation warning in reconcilable_result, for example, when paging is disabled #4780
  • Stability: running manual operations like nodetool compact will crash if the controller is disabled #5016
  • Stability: Under heavy read load, the read execution stage queue size can grow without bounds #4749
  • Stability: repair: assert failure when a local node fails to produce checksum #5238
  • CQL: One second before expiration, TTLed columns return as null values #4263, #5290
  • Stability: long-running cluster sees bad gossip generation when a node restarts #5164 (similar to CASSANDRA-10969)
  • CQL: wrong key type used when creating non-frozen map virtual column #5165
  • CQL: using queries with paging, ALLOW FILTERING and aggregation functions return intermediate aggregated results, not the full one #4540

The post Scylla Enterprise Release 2019.1.4 appeared first on ScyllaDB.

Scylla Monitoring Stack 3.1

Scylla Monitoring Stack Release Notes

The Scylla team is pleased to announce the release of Scylla Monitoring Stack 3.1.

Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.1 supports:

  • Scylla Open Source versions 2.3, 3.0, 3.1 and the upcoming 3.2
  • Scylla Enterprise versions 2018.x and 2019.x
  • Scylla Manager 1.4.x and Scylla Manager 2.0

Related Links

New in Scylla Monitoring Stack 3.1

  • Scylla Open Source Version 3.2 dashboards
  • Scylla Manager Version 2.0 dashboards.
  • Grafana 6.5.1
    More on Grafana release can be found here
  • Prometheus 2.14
    More on Prometheus release can be found here
  • Add the ad hoc filter to all dashboards #773
    The ad hoc filter is a flexible filter that allows specifying multiple conditions. When used it will affect all graphs; you can choose from the ad hoc drop down the labels you would like to filter by with the condition.
    Example: The following shows how to limit all graph to a specific shard and ip.
  • Report-a-problem button #691
    You can now report an issue with the monitoring by clicking a button on the top right corner of the dashboard. The relevant version and dashboard will be added to the newly opened ticket.
  • Adding annotations for server restart #712
    Annotations are marker lines that show an event occurrence. Annotations can be turned on and off from the top menu.
    The following shows how the server restart annotation looks like.
  • CQL Optimization warning for potential hazard with Consistency Level #706 #579
    You can read more about the CQL optimization here.
    The new consistency level part helps identify potential hazards with the Consistency level.
    Specifically: ANY and ALL should be avoided.
    When running in multi DC, ONE and QUORUM should be avoided, use LOCAL_ONE or LOCAL_QUORUM instead.
  • Experimental – Scylla Manager Consul integration #470
    Scylla-Manager 2.0 has a Consul-like API that allows Scylla Monitoring to get the cluster servers information directly from it.Users of Scylla-Manager can set Scylla Monitoring to read the cluster server configuration directly from Scylla-Manager instead of manually set them from a file.You can read more about it in the configuration guide here.

Bug Fixes

  • Migrate single-stat panel to gauge #774
  • Remove mountpoint from io queue #770
  • 3.1 OS metrics – doesn’t show the correct disk in pie graphs #768
  • MV section graphs are unrelated to MVs #775

The post Scylla Monitoring Stack 3.1 appeared first on ScyllaDB.

Scylla Summit 2019

I’ve had the pleasure to attend again and present at the Scylla Summit in San Francisco and the honor to be awarded the Most innovative use case of Scylla.

It was a great event, full of friendly people and passionate conversations. Peter did a great full write-up of it already so I wanted to share some of my notes instead…

This a curated set of topics that I happened to question or discuss in depth so this post is not meant to be taken as a full coverage of the conference.

Scylla Manager version 2

The upcoming version of scylla-manager is dropping its dependency on SSH setup which will be replaced by an agent, most likely shipped as a separate package.

On the features side, I was a bit puzzled by the fact that ScyllaDB is advertising that its manager will provide a repair scheduling window so that you can control when it’s running or not.

Why did it struck me you ask?

Because MongoDB does the same thing within its balancer process and I always thought of this as a patch to a feature that the database should be able to cope with by itself.

And that database-do-it-better-than-you motto is exactly one of the promises of Scylla, the boring database, so smart at handling workload impacts on performance that you shouldn’t have to start playing tricks to mitigate them… I don’t want this time window feature on scylla-manager to be a trojan horse on the demise of that promise!

Kubernetes

They almost got late on this but are working hard to play well with the new toy of every tech around the world. Helm charts are also being worked on!

The community developed scylla operator by Yannis is now being worked on and backed by ScyllaDB. It can deploy, scale up and down a cluster.

Few things to note:

  • it’s using a configmap to store the scylla config
  • no TLS support yet
  • no RBAC support yet
  • kubernetes networking is lighter on the network performance hit that was seen on Docker
  • use placement strategies to dedicate kubernetes nodes to scylla!

Change Data Capture

Oh boy this one was awaited… but it’s now coming soon!

I inquired about it’s performance impact since every operation will be written to a table. Clearly my questioning was a bit alpha since CDC is still being worked on.

I had the chance to discuss ideas with Kamil, Tzach and Dor: one of the thing that one of my colleague Julien asked for was the ability for the CDC to generate an event when a tombstone is written so we could actually know when a specific data expired!

I want to stress a few other things too:

  • default TTL on CDC table is 24H
  • expect I/O impact (logical)
  • TTL tombstones can have a hidden disk space cost and nobody was able to tell me if the CDC table was going to be configured with a lower gc_grace_period than the default 10 days so that’s something we need to keep in mind and check for
  • there was no plan to add user information that would allow us to know who actually did the operation, so that’s something I asked for because it could be used as a cheap and open source way to get auditing!

LightWeight Transactions

Another so long awaited feature is also coming from the amazing work and knowledge of Konstantin. We had a great conversation about the differences between the currently worked on Paxos based LWT implementation and the maybe later Raft one.

So yes, the first LWT implementation will be using Paxos as a consensus algorithm. This will make the LWT feature very consistent while having it slower that what could be achieved using Raft. That’s why ScyllaDB have plans on another implementation that could be faster with less data consistency guarantees.

User Defined Functions / Aggregations

This one is bringing the Lua language inside Scylla!

To be precise, it will be a Lua JIT as its footprint is low and Lua can be cooperative enough but the ScyllaDB people made sure to monitor its violations (when it should yield but does not) and act strongly upon them.

I got into implementation details with Avi, this is what I noted:

  • lua function return type is not checked at creation but at execution, so expect runtime errors if your lua code is bad
  • since lua is lightweight, there’s no need to assign a core to lua execution
  • I found UDA examples, like top-k rows, to be very similar to the Map/Reduce logic
  • UDF will allow simpler token range full table scans thanks to syntax sugar
  • there will be memory limits applied to result sets from UDA, and they will be tunable

Text search

Dejan is the text search guy at ScyllaDB and the one who kindly implemented the LIKE feature we asked for and that will be released in the upcoming 3.2 version.

We discussed ideas and projected use cases to make sure that what’s going to be worked on will be used!

Redis API

I’ve always been frustrated about Redis because while I love the technology I never trusted its clustering and scaling capabilities.

What if you could scale your Redis like Scylla without giving up on performance? That’s what the implementation of the Redis API backed by Scylla will get us!

I’m desperately looking forward to see this happen!

Why MistAway Turned to Scylla Cloud for Its IoT Systems

Mosquitos in the United States are a major problem. Besides simply detracting from the enjoyment of outdoor activities, they can also pose serious health risks. Nearly 1,700 cases of malaria are diagnosed annually. Meanwhile other mosquito-borne diseases such as Zika, dengue, chikungunya, and yellow fever are actually on the rise due to changing habitat and the trend in warming temperatures.

Over the course of history, there have been multiple phases to address this plague of insects. The period c. 1900 – 1942 was defined as the “mechanical” era, where drainage and habitat denial was seen as the primary method to control mosquitos. From c. 1942 – 1972 was the “chemical” era, marked by a prevalent use of pesticides beginning with the over-reliance on the use of the highly-toxic DDT and ending with its ban. The period afterwards to the present has been defined as the “integrated mosquito management” era, using a mix of both mechanical as well as chemical means, and where public health officials sought to balance all relevant public health risks: both from mosquito-borne diseases themselves and also from any means employed to keep them in check.

U.S. Public Health Service posters (1920) Source: NPR

Mosquitos are Wireless

Most recently we have entered a fourth phase: the “Internet of Things (IOT)” era of mosquito management. From mosquito trapping to misting, modern systems are increasingly Internet-enabled and data-driven.

MistAway, headquartered in Houston, Texas, is the market leader in outdoor insect control misting systems, accounting for nearly half of the installed base. Their products comprise a number of components:

  • Drum or tankless reservoirs for the chemical solution
  • A variety of organic and synthetic chemical solutions
  • Related misting nozzles, valves, pipes and joins to dispense the chemical mists
  • Standard and optional sensors and electronics for scheduling, flow control, wind direction, leak control, and wireless communications to ensure the system is operating appropriately

MistAway’s programmable system typically dispenses an insecticide for less than a minute three times daily to control mosquitoes.

iMistAway is their data-driven wireless and Internet-enabled management/monitoring service. It can monitor outages in the system, low fluid levels, leaks and other failures. It gathers time-series data reported back from all dealer-installed systems at end-user sites, conducts analyzes, and then reports out alerts to users and their dealer network in case alerts need to be sent and/or an on-site service call needs to be made. The mobile iMist app allows users and their authorized dealers to control MistAway systems from anywhere at any time.

The web and mobile interfaces of the iMistAway system allow the manufacturer, as well as dealers (left) and end users (right) to know precisely how their systems are working.

MistAway Moves to Scylla Cloud

Kevin Johnson of Mistaway described how they initially deployed the backend database for their solution and then migrated to Scylla Cloud in his talk at Scylla Summit 2019. The company initially used MySQL.

‘The real benefit in our mind is that the dealers are able to use this to provide better and more efficient service to the homeowner,” according to Kevin. It prevents having to do needless but regularly scheduled “truck-rolls” every thirty or sixty days and heads off calls from angry customers being assaulted by mosquitos. With iMistAway you only roll a truck out to a house when you need to, and you have foreknowledge of the precise issue.

Each system provides seventy channels of data points. One of those is the status for the tank. For instance, if a tank runs empty prematurely and the status is EMPTY, it can send an alert to be refilled.

MistAway’s internet of Things (IoT) architecture requires receivers on the individual components, a gateway at the customer location to the cloud and a back-end platform that ingests data, performs analytics, and processes messages to dealers and end users.

Originally the analytics system was stored in a third-party SQL database, but the vendor was moving away from their existing business model. This left MistAway in the lurch. While the vendor offered to move the database in-house, the MistAway staff would have to manage it themselves with little support. Their vendor gave a deadline of three months to make the transition to migrate their entire history of five years of data streaming from over 3,000 systems — a number that has only grown since. Kevin admitted this made him feel pretty overwhelmed.

The migration plan required shifting existing historical data as well as stream the current time-series data from live systems into the new database. This was when Kevin considered Scylla Cloud, a managed database service. “We needed the high performance of tools like Scylla but we really don’t want to have to manage everything.” Kevin described what it felt like faced with a hard deadline, “We’ve got a time crunch. We’ve got to get this thing solved. Let’s check out these Scylla guys and see if they’re legit.”

Once the engagement began Kevin was impressed and described the process as a “zero stress migration.” Scylla conducted sizing reviews, and optimized for MistAway’s business, taking into account time-of-day and seasonal effects. Scylla Cloud had an easy-to-digest price (“makes it a really easy convince-your-boss exercise”), and was easy to signup and get started. Describing the user experience, “Couple clicks. Couple minutes. You can spin up a cluster where you want it, how you want it.”

Scylla also provided step-by-step documentation of the migration process, “which was very helpful.” He even noted that the day of the migration he was unexpectedly unavailable. But Scylla took the initiative, along with the former vendor, and made a seamless migration.

“We had zero downtime. And for that, I think Scylla’s number one.”

Making the move from SQL to NoSQL required the team to completely remodel the schema. Tables to support their queries were created for different kinds of information: alarms (both current and historical), datapoints (both current and historical), reactions and current states:

alarms_current
alarms_history
change_history
atapoints_current
datapoints_history
reaction_receipt
states_current

The remodeling of data into current alarms and datapoints meant that Kevin could get the real-time state of all the deployed systems. “In my legacy platform these are two things you couldn’t do without getting up to take a break or get a cup of coffee. Now, with Scylla, we can do these full table scans. My boss loves to login as a superadmin, pull up the entire fleet and run analysis and see how the fleet is doing health-wise.”

Kevin’s own favorite table is datapoints_history. The partitioning key consists of a node’s ID, the channel that’s being updated, and the date the readings occur. This produces fairly even partitions throughout the cluster. “Then we use the clustering key of timestamp, which allows them to compare the levels from now to nine months ago to get seasonal reports.”

Scylla, Take the Wheel!

In production now for more than a year, Kevin has dealt with zero maintenance. “I don’t want to deal with repairs. I don’t want to deal with compactions. Scylla handles that for me.”

Support is provided over a dedicated Slack channel, which Kevin describes as “Probably my favorite aspect of Scylla Cloud.” He also appreciates the “really clean web app,” which he uses every morning. He logs in to DataDog, Kibana, and the Scylla Cloud dashboard to see how the cluster responds to the effects of the iMistAway platform. “This actually has really been really educational for me as well.” Kevin also appreciated having a dedicated account manager to be the “one-stop shop for any issues.”

“Overall, it’s been kind of boring. It’s low drama, it’s low stress. And that’s the way we need it to run our business. Over the past year, we have had zero hours downtime on our cluster — so we’ve always been up. And we’ve submitted zero support tickets.” (Technically, he pointed out, not true: he opened one ticket to get help to produce slides for his presentation at our the Scylla Summit!) “Everything’s been good so far.”

Coming back to the Slack-based support, he rued the way other companies typically handle customers: opening a ticket and getting put into a backlog. Sitting in a phone queue and leaving a message. “With Scylla, there’s none of that. I can jump right into the Slack channel. I’ve got direct access to the pros. Whether I have a question, just a curiosity, or even just a really good joke I can jump in there and drop it in. So that gives us a lot of confidence in our cluster.”

Even when their traffic fluctuates, “our cluster doesn’t budge.” They have high availability baked in by placing their nodes in different availability zones (AZs). “My data is always available when I need it.” And the exercise in data remodeling helped response times tremendously. “Our queries are really fast… What it equals is when our users need to access the app and get their data, it’s there. Instantaneously.”

An example of how MistAway was able to detect the exact moment when a unit went offline from their datapoints_history table.

Kevin painted the image of what this means to their success. “You got to think about it. Our dealers are technicians… They’re outside with an iPad… It’s hot. If the system’s down they’re getting assaulted by mosquitos. The dog’s barking. The customer’s over their shoulder. So even if I can draw a chart really fast, they don’t want to have to sit there and interpret it. So what I can do now with Scylla is that each time a datapoint comes in I can run a little function in the background over 90 days or a year of data. Tens of thousands of data points.” And from that analysis he can send an alarm — “Your unit is offline and we think it’s because you probably just lost power.”

The dealer can then contact the customer and have them unplug the system and plug it back in. “Saves you a truck roll. Saves the customer an invoice. And it gets the misting system back up and running as quickly as possible. Something we could not do in our legacy platform.”

Kevin also showed how easy it was to now produce end-of-season reports. “This is just showing off the speed of how we were able to draw charts… This is something that you would have to do for four hundred units. If you have a big fleet, it could take you hours. Now? It’s instantaneous.”

Kevin’s Takeaways

Running on Scylla Cloud, “you can focus on value creation. Focus on your application,” Kevin emphasized. “Focus on creating value for your customers and things that bring you revenue. And leave the devops to the pros.”

“One big thing that was important for us as a small company was that we didn’t have to staff up. So we didn’t actually have to go hire and bring in the talent to help run this cluster. And I’m not the Ph.D. at this. I’m really not the Ph.D. at anything, but because I had the Scylla team behind me I was successful and we are running our platform now.”

“The last thing that’s really affected me personally is that there’s just a lot less stress. A lot less anxiety running this platform,” Kevin confessed. “You know, migrations are scary. Right? You’re trying to get your data across. You’re trying to get your platform going. You’ve got live paying customers that you don’t want to upset. So it can be a little bit scary. But with Scylla Cloud handling your data it takes a lot of the stress and anxiety away.”



Want to watch Kevin Johnson’s full video and also see his slides? Check out his Tech Talk page. After that, you can check out all the other Scylla Summit 2019 tech talks. They’re all online now.

And if you would like a lot less stress in your own life, check out Scylla Cloud. You can even take Scylla for a free test drive!

TAKE SCYLLA FOR A TEST DRIVE

The post Why MistAway Turned to Scylla Cloud for Its IoT Systems appeared first on ScyllaDB.