Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases

Modern database systems are designed to handle a large number of connections, each serving an equally large number of simultaneous requests. This is especially important in databases supporting microservices architectures that scale horizontally and have clients coming and going as demand requires.

However, the fact that modern databases can handle large numbers of connections doesn’t make them immune to overload. At some point, there is such a thing as too many. Moreover, overload will increase latency and eventually generate timeouts. If one is not careful with retry policies, retries can further deteriorate the server.

For you to get the most out of your big data applications, let’s explore the effects of concurrency in distributed databases and provide you with tools to correctly configure your infrastructure for maximum performance, including client-side parallelism and timeout settings.

Connections, Pools, and Parallelism

One common confusion when talking about parallelism is to conflate the amount of connections with request-level parallelism. Is increasing the number of connections by itself enough to increase database activity?

Regardless if synchronous or asynchronous, a request/response exchange between a client and a server presupposes the existence of a network connection between them. One way to see a connection is like a two-way road: it represents a path between two endpoints capable of accommodating traffic in both directions. By the same token, requests and responses can be regarded as vehicles travelling from one point to the other in these roads.

Connection creation and destruction is an expensive operation both in terms of resources and time, so reusing them for multiple exchanges is a common practice among database servers. A connection pool does essentially that: it keeps a pool of established client/server connections and dynamically allocates them on demand whenever requests are triggered by the client application. The same reasoning applies to thread pools: reusing threads to dispatch requests and receive responses is more efficient that creating and destroying them on-the-fly.

Having our traffic analogy in mind, it is easy to understand why an excessive number of connections or threads can easily saturate any database server. The optimal size of a client/server connection pool is intrinsically dependent on the server’s physical capacity, just like the number of lanes in a highway depends on the size of the cities being interconnected by it. Also, the number of in-flight requests should be enough to keep the lanes busy, but not to the point of jamming the traffic. The diagram below illustrates these concepts.

Scaling Workloads

Properly sizing the workload that a client should submit to a server can be a tricky task. In this section we introduce an iterative process to assess the server’s ability to handle requests, which helps defining upper connection and in-flight request limits.

In our examples, we will use Scylla, a real-time NoSQL database which particularly specializes at very high throughput use cases. Scylla’s internal architecture leverages a shared-nothing model that assigns independent shards to individual cores, thanks to the Seastar framework it is based upon. Because of this architecture, the relationship between connections and the server physical attributes is even more visible.

The tests are executed against a 3-node cluster running on i3.2xlarge AWS VMs (8 cores, 61 GB RAM and NVMe storage volumes). The workload is simultaneously generated by two c5n.4xlarge AWS VMs (16 cores, 42 GB RAM and EBS storage volumes) running the cassandra-stress tool. The total number of threads shown in the results below are the sum of the number of threads configured in each VMs. As for the results, throughput results are aggregated and latency results are averaged.

Scaling Threads

The first part consists in fixing the number of connections so that we can observe the effects of parallelism in isolation. We initially set the number of connections to the total number of server cores – 8 per server, remembering that we have two stress tool instances running in parallel – and gradually increase the number of client threads. Since the cassandra-stress tool uses synchronous requests, each execution thread sequentially dispatches one request at a time. Hence, increasing the number of threads effectively increases the number of in-flight requests to the database.

The WRITE workloads are started at each loader respectively with the following commands:

cassandra-stress write n=48000000 -pop seq=1..48000000 -rate threads=<count> -node <IP> -mode native cql3 connectionsPerHost=4
cassandra-stress write n=48000000 -rate seq=48000000..96000000 threads=<count> -node <IP> -mode native cql3 connectionsPerHost=4

The READ workloads are started at each loader VM with the following command:

cassandra-stress read n=60000000 -rate threads=<count> -node -mode native cql3 connectionsPerHost=4

As can be seen in chart 1, peak throughput happens with about 2664 WRITE threads and 1512 READ threads. Besides peaking with more threads, WRITE operations also present higher peak throughput than READ. Beyond the peak point, cluster performance gradually degrades as an indication of server saturation.

Chart 1: Throughput rates observed while increasing the number of cassandra-stress threads

It is worth noting that more throughput is not always better: Latencies monotonically increase (get worse) with concurrency, both in the average and the high tail of the distribution, as shown in charts 2 and 3.

Chart 2: Latency rates observed while increasing the number of cassandra-stress threads

Chart 3: Read latency rates around the throughput turning point

Therefore, depending on the desired qualities of service from the database cluster, concurrency has to be judiciously balanced to reach appropriate throughput and latency values. This relationship can be expressed as the effective queue size as defined by the Little’s law, which establishes that:

L=λW

where λ is the average throughput, W is the average latency and L represents the total number of requests being either processed or on queue at any given moment when the cluster reaches steady state. Usually, throughput and average latency are part of the service level that the user controls. In other words, you know how much throughput your database system needs to provide, and which average latency it should sustain.

In the example above, if we want a system to serve 500,000 requests per second at 2.5ms average latency, the best concurrency is around 1250 in-flight requests. As we approach the saturation limit of the system — around 600,000 requests/s for read requests, increases in concurrency will keep constant since this is the physical limit of the database. Every new in-flight request will only translate into increased latency.

In fact, if we approximate 600,000 requests/s as the physical capacity of this database, we can calculate the expected average latency at a particular concurrency point. For example, at 6120 in-flight requests, our average latency is expected to be 6120 / 600,000 = 10ms. That is indeed very close to what we see in the graphs above.

Note that as seen in the throughput graph above, in real-life systems throughput-at-peak may not stay constant but be reduced-slightly, as contention increases.

Unbounded Request Flows

Now that we understand how the number of in-flight requests relate to throughput and latency, let’s take a look at some scenarios that can lead to overload and how to avoid them.

A thread pool is an important mechanism to reuse threads and attenuate the overhead of creating and destroying them recurrently. An important side effect of thread pools is that they offer and out-of-the-box mechanism to constraint client-side load by defining the maximum size a pool should have. Failure to specify a pool size limit can lead to excessive concurrency, which leads to server performance degradation.

Thread-pool parameters can usually be configured (see this example for Scylla and Cassandra), and relying on their default values is often not a good idea. Consider for example the Java code block below:

ExecutorService executor = Executors.newFixedThreadPool(500);
Cluster cluster = Cluster.builder().addContactPoint("host").build();
Session session = cluster.connect("keyspace");
PreparedStatement stmt = session.prepare("INSERT INTO test (id) VALUES (?)");

List<Future> futures = new ArrayList<>();
futures.add(executor.submit(() -> {
        return session.execute(stmt.bind(123));
}));

Here, a fixed-size thread pool is created and a synchronous task is submitted. If more tasks like that are triggered at the same time, they are executed in parallel at the pool. If more than 500 tasks are active at the same time (the pool size), only 500 are executed at the pool and the remaining ones are queue until a running task completes. This mechanism clearly constraints the load a client subjects the cluster to 500 requests. We could do the same for whichever concurrency we want to generate for this client.

But Is this enough to restrict concurrency? Consider the following code block:

ExecutorService executor = Executors.newFixedThreadPool(500);
Cluster cluster = Cluster.builder().addContactPoint("host").build();
Session session = cluster.connect("keyspace");
PreparedStatement stmt = session.prepare("INSERT INTO test (id) VALUES (?)");

List futures = new ArrayList<>();
executor.submit(() -> {
        futures.add(session.executeAsync(stmt.bind(123)));
});

The differences are subtle, but the functionality is fundamentally different than the previous one. In this case, each task triggers an asynchronous statement execution. The task themselves are short-lived and simply signal the client driver to spawn a new statement to the cluster. From the server workload perspective, this is similar to creating an additional thread with a synchronous statement execution, but on an unbounded pool.

So especially when dealing with asynchronous requests, how can we make sure that the server is always healthy and not overloaded?

Scaling Connections

Let’s take a look at how throughput and latency vary with the number of connections opened to the database. Here, the number of threads is locked to the best configuration found in the previous section — remembering that we have two stress tool instances running — and the number of connections is initially set to half the original configuration:

cassandra-stress write n=48000000 -rate threads= -node <IP> -mode native cql3 connectionsPerHost=<count>
cassandra-stress write n=48000000 -rate threads= -node <IP> -mode native cql3 connectionsPerHost=<count>

Chart 4 illustrates the results. As we can see, we need connections — the lanes — so our requests can flow. If there are too few connections we won’t achieve peak throughput. Having too many is not, on its own, enough to generate overload, as overload is caused by excess in-flight requests. But the extra lanes may very well be the avenue through which an excessive number of in-flight requests can now reach the server.

The biggest benefit in being aware of the number of connections come with the fact that we can configure the connection pool properties, like the number of maximum in-flight requests per connection. Take for instance, this example from the Scylla and

Cassandra Java drivers:

poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL,10)
.setMaxConnectionsPerHost(HostDistance.LOCAL,10)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 10)
.setPoolTimeoutMillis(100)

HostDistance.LOCAL is a Scylla and Cassandra specific concept, that tells that this setting is valid for the local DataCenter (as opposed to HostDistance.REMOTE). In this example we allow for ten default connections with a maximum of ten — meaning the pool size will never dynamically increase. We then set the maximum number of requests to ten as well, meaning that the maximum number of in-flight requests coming from this client will be one hundred.

Chart 4: Throughput rates observed when increasing the number of connections

If the client generates more than that, either because of asynchronous requests are generated too fast or because of too many threads, the requests will sit on a client-side queue for up to 100ms — according to setPoolTimeoutMillis — and timeout after that.

Careful setting of Connection Pool parameters is a powerful way for application developers to make sure that reasonable concurrency limits to the database are always respected, no matter what happens.

Timed out Requests and Retries

When dealing with distributed systems, requests may inevitably timeout. Timeouts can happen for a variety of reasons, from intermittent network interruptions to server overload. If and when requests do timeout, is it a good idea to retry them?

To demonstrate that let’s consider the following scenario: A heavy READ workload with moderate sized payloads (>500 bytes per operation) targeting a 3-node Scylla cluster. In Chart 5 we can see that the CPU load in the system is already high, and yet the average latencies, as seen on Chart 6, are below 5ms for this workload.

Chart 5: Cluster load with default timeout configurations

Chart 6: Cluster latencies with default timeout configurations

At such high load it is possible that some requests, even at the tail latency, may timeout. In the specific case of Scylla, the server configuration can tell us when will the requests time out in the server:

[centos@ip-10-0-0-59 scylla]$ egrep '^[^#]+timeout_in_ms:' /etc/scylla/scylla.yaml
read_request_timeout_in_ms: 5000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
range_request_timeout_in_ms: 10000
request_timeout_in_ms: 10000

Because this is a read workload, we can see that the server will time out after 5 seconds. Workloads run on an SLA, and oftentimes what we see is that after a certain threshold is crossed, the users may retry the request even if the server hasn’t replied yet.

We can simulate a scenario in which the client wants an SLA in the order of single-digit milliseconds, and retries requests that take more than that without changing the server configuration. This seems harmless enough if this happens with the occasional request due to intermittent failures. But Chart 7 shows us what happens if the latency is higher than expected due to overload:

Chart 7: Cluster latencies after lowering operation timeout parameter

As you can see, average latencies sky-rocketed 15 fold. Also, the latency curve shape clearly illustrates a cascading effect. As we saw throughout this article, an excessive amount of in-flight requests can harm the server. We saw in Charts 1 and 2 that after the saturation point latency increases without a corresponding throughput increase.

And if the client-side timeouts are lower than the server side timeout, this is exactly what happens here: new requests arrive from the client before the server had the opportunity to retire them. The increased workload injected by timed out operations push the latencies further up, which in turn result in higher timeout rates and so on.

For this reason, it is mandatory to use client-side timeouts that are equal to or greater than the server timeouts. If performance requirements are more strict and lower timeouts are needed, both servers and clients should be configured accordingly.

Next Steps

In this article you read some of the main aspects of concurrency control and how they should be taken into account when building and tuning a client application in a client/server architecture. You also learned how to assess the server’s ability to accommodate increasing workloads, and our best recommendations on how to avoid common pitfalls when designing and sizing a client application.

While we used Scylla as our reference database for testing we encourage you to perform your own testing with other databases using similar methodologies.

Designing scalable systems, dealing with multi-threaded programming and managing concurrency control are hard tasks. Hopefully this article will help alleviate some of the burdens when making your applications efficient and scalable.

If you have more questions about how best to take advantage of concurrency and timeouts in Scylla, or if you wish to share the results of your own testing, please contact us or drop by our public Slack channel. We’d love to hear from you!

The post Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases appeared first on ScyllaDB.

Overheard at Scylla Summit 2019

“It was five years ago when Dor Laor and Avi Kivity had a dream…” So began the opening remarks at Scylla Summit 2019. The dreams of ScyllaDB’s founders have since manifested and made their impact all across the Big Data industry.

The world’s leading Scylla practitioners gathered in San Francisco earlier this month to hear about the latest Scylla product developments and production deployments. Once again your intrepid reporter tried to keep up with the goings-on, live tweeting, along with others under the #ScyllaSummit hashtag about the event’s many sessions.

It’s impossible to pack two days and dozens of speakers into a few thousand words, so I’ll give just the highlights. We’ll get the SlideShare links and videos posted for you over the coming days.

Pre-Summit Programs: Training Day and the Seastar Summit

The day before the official opening of Scylla Summit was set aside for our all-day training program. The training program included novice and advanced administration tracks that were well attended. There were courses on data modeling, debugging, migrating, using Scylla Manager, the Scylla Operator for Kubernetes, and more!

Between sessions students flock around ScyllaDB’s Customer Success Manager Tomer Sandler to learn how to migrate to Scylla.

Training Day was held concurrently as a separate-but-related developer conference: the first-ever Seastar Summit, which featured speakers from Red Hat, Lightbits Labs, Vectorized.io as well as ScyllaDB. It was a unique opportunity for those building solutions based on the Seastar engine to engage directly with peers .

Keynotes

ScyllaDB CEO Dor Laor

The Tuesday keynotes kicked off with ScyllaDB CEO Dor Laor providing a review of how we got to this point in history in terms of scalability. He recounted the C10k problem — the problem with optimizing network sockets to handle 10,000 concurrent sessions. The C10k barrier has since been shattered by orders of magnitude — it is now the so-called “C10M problem” — with real-world examples of 10 million, 12 million or even 40 million concurrent connections.

In an analogous vein, there is also the so-called $1 million engineering problem, which has at its core the following axiom: “it’s easy to fall into a cycle where the first response to any problem is to spend more money.”

Dor dubbed this issue the “D10M” problem, where customers are spending $10 million or more on their cloud provider bill. He asserted large organizations using Big Data could literally save millions of dollars a year if they just had performant systems that scaled to their needs.

To that point, Dor brought to the stage ScyllaDB’s VP of Field Engineering, Glauber Costa, and Packet’s Solution Architect James Malachowski to reveal a new achievement in scalability. They had created a test scenario simulating 1 million IoT sensors sampling temperature data every minute over the course of a year. That resulted in a data set of 536 billion temperature readings — 1.44 billion data points per day.

Against that data, they created a query to check for average, min and max temperatures every minute across a given timespan.

To give an idea of how large such a dataset is, if you were to analyze it at 1 million rows per second it would take 146 hours — almost a week. Yet organizations don’t want to wait for hours, never mind days, to take action against their data. They want immediacy of insights, and the capability to take action in seconds or minutes.

Packet’s Solution Architect James Malachowski (left) and ScyllaDV VP of Field Engineering, Glauber Costa (right) describe the architecture needed to scale to 1,000,000,000 reads per second

This was why ScyllaDB partnered with Packet to run the bare-metal instances needed to analyze that data as fast as possible. Packet is a global bare metal cloud built for enterprises. Packet ran the Scylla database on a cluster of 83 instances. This cluster was capable of scanning three months of data in less than two minutes at a speed of 1.1 billion rows per second!

Scanning the entire dataset from disk (i.e., without leveraging any caching) took only 9 minutes.

As Glauber put it, “Bare metal plus Scylla are like peanut butter and jelly.”

Packet’s bare-metal cluster that achieved a billion reads-per-second comprised 83 servers with a total of 2800 physical cores, 34 TB of RAM and 314 TB of NVMe.

And while your own loads may be nowhere near as large, the main point was that if Scylla can scale to that kind of load, it can performantly handle pretty much anything you throw at it.

Dor then retook the stage noted how Jay Kreps, CEO of Confluent, recently scratched out the last word in Nadella’s famous quote, declaring instead “All companies are software.”

Yet even if “all companies are software,” he also noted many company’s software isn’t at all easy. So a major goal of ScyllaDB is to make it EASY for software companies to adopt, to use and to build upon Scylla. He then outlined the building blocks for Scylla that help address this issue.

  • Scylla Cloud allows users to deploy a highly-performant scalable NoSQL database without having to hire a team of experts to administer the back end.
  • Project Alternator, ScyllaDB’s DynamoDB-compatible API, is now available in beta on Scylla Cloud. Dor noted how with provisioned pricing 120,000 operations per second would cost $85 per hour on DynamoDB whereas running Scylla Alternator a user could do the same workload for $7.50 an hour — an order of magnitude cheaper.
  • With workload prioritization, you can now automatically balance workloads across a single cluster, providing greater cost savings by minimizing wasteful overprovisioning.

Beyond that, there are needs for workhorse features coming soon from Scylla, such as

  • Database backup with Scylla Manager
  • Lightweight Transactions (LWT)
  • Change Data Capture (CDC) for database updates and
  • User Defined Functions (UDFs) which will support data transformations

While Dor observed the trend to think of “all companies are software,” he also recognized that companies are still at their heart driven by people, highlighting the case of a Scylla user. He finished by making a bold assertion of his own. If all companies are software, then “Scylla is a damned good choice for your software.”

ScyllaDB CTO Avi Kivity

It was then time for Avi Kivity, ScyllaDB’s CTO to take the stage. Avi emphasized how Scylla was doubling down on density. While today’s cloud servers are capable of scaling to 60 terabytes of storage, he pointed out how features like Scylla’s innovative incremental compaction strategy will allow users to get the most out of those large storage systems. Also, to safeguard your data, Scylla now supports encryption at rest.

What Avi gets most excited about are the plans for new features. For instance User Defined Functions (UDFs) and User Defined Aggregates (UDAs). Also now that Scylla has provided Cassandra and DynamoDB APIs, Avi noted that there’s also work afoot on a Redis API (#5132) that allows for disk-backed persistence.

Avi clarified there are also going to be two implementations for Lightweight Transactions (LWT). First, a Paxos-based implementation for stricter guarantees, and then, in due time, a Raft implementation for higher throughput.

Avi also spoke about the unique nature of Scylla’s Change Data Capture (CDC) implementation. Instead of being a separate interface, it will be a standard CQL-readable table for increased integration with other systems.

He finished with a review of Scylla release roadmaps for 2020.

ScyllaDB CTO Avi Kivity showed the Release Schedule for Scylla Enterprise, Scylla Open Source and Scylla Manager for 2020

Philip Zimich, Comcast X1

Next up to speak was Comcast’s Philip Zimich, who presented the scope and scale of use cases behind Comcast’s video and DVR services. When Comcast’s X1 platform team began to consider Scylla they had grown their business to 15 million households and 31 million devices. Their data had grown to 19 terabytes per datacenter spanning 962 nodes of Casssandra. They make 2.4 billion RESTful calls per day, their business logic persisting both recordings and recording instructions. Everything from the DVRs and their recording history to back office data, recording intents, reminders, lookup maps and histories.

Their testing led the Xfinity team to spin up a 200 node cluster, first on Cassandra and then on Scylla, to simulate multiple times the normal peak production load of a single datacenter. Their results were startling. Cassandra is known as a fast-write oriented database. In Comcast’s testing it was able to achieve 22,000 writes per second. Yet Scylla was able to get over 26,500 writes per second — an improvement of 20%. On reads the difference was even more dramatic. Cassandra was able to manage 144,000 reads while Scylla was able to get 553,000 reads — an improvement of over 280%.

Comcast’s testing showed that Scylla could improve their read transactions by 2.8x and lower their long-tail (p999) latencies by over 22x

Difference in latencies were similarly dramatic. Median reads and writes for Scylla were both sub-millisecond. Scylla’s p999s were in the single-digit millisecond range. Under all situations latencies for Scylla were far better than for Cassandra — anywhere between 2x to 22x faster.

Latencies (in milliseconds) Cassandra Scylla Improvement
Reads Median 3.959 0.523 7.5x
p90 22.338 0.982 22.7x
p99 84.318 3.839 22x
p999 218.15 9.786 22.3x
Writes Median 1.248 0.609 2x
p90 4.094 0.913 4.5x
p99 31.914 2.556 12.5x
p999 117.344 7.152 16.4x

With performance testing complete Comcast moved forward with their migration to Scylla. Scylla’s architectural advantages allowed Comcast to scale their servers vertically, minimizing the number of individual nodes they need to administer.

Comcast’s dramatic node count reduction, from 962 nodes of Cassandra to 78 of Scylla

When fully deployed, Comcast will shrink their deployment radically from 962 servers on Cassandra to only 78 nodes on Scylla. This new server topology gives them all the capacity they need to support their user base but without increasing their costs, capping their spending and sustaining their planned growth through 2022.

Martin Strycek, Kiwi.com

Last year when the global travel booking giant Kiwi.com took the stage at Scylla Summit they were still in the middle of their migration and described that stage of their implementation as “taking flight with Scylla.” At this year’s Summit Martin continued the analogy to “reaching cruising altitude” and updated the crowds regarding their progress.

Martin described how they were able to use two specific features of Scylla to make all the difference for their production efficiency.

The first feature that improved Kiwi.com’s performance dramatically was enabling BYPASS CACHE for full table scan queries. With 90 million rows to search against, bypassing the cache allowed them to drop the time to do a full table scan from 520 seconds down to 330 seconds – a 35% improvement.

The second feature was SSTable 3.0 (“mc’) format. Enabling this allowed Kiwi.com to shrink their storage needs from 32 TiB of total data down to 22 TiB on disk — a 31% reduction.

The amount of disk used per server shrank once Kiwi.com enabled SSTable 3.0 (“mc”) format.

Enabling these features was a smooth, error-free operation for the Kiwi.com team. Martin finished his presentation by thanking ScyllaDB, especially Glauber, for making the upgrade experience entirely uneventful: “Thank you for a boring database.”

We take that as high praise!

Glauber Costa

After announcing the winners of the Scylla User Awards for 2019 the keynotes continued with Glauber Costa returning to the stage to share tips and tricks for how to be successful with Scylla.

First off he made distinctions for long-time Cassandra admins of what to remember from their prior experience (data model and consistency issues) and what they’ll need to forget about — such as trying to tune the system the exact same way as before. Because many of the operational aspects of Cassandra may work completely differently or may not even exist in Scylla.

In terms of production hardware, Glauber suggested NVMe if minimizing latency is your main goal. SSD is best if you need high throughput. But forget about using HDDs or any network interface below 1 Gbps if you care about performance. And, generally, never use more, smaller nodes if the result is the same amount of resources. However, in practice it is acceptable to use them to smooth out expansion.

Another key point Glauber touched upon was rack awareness. It is best to “run as many racks as you have replicas.” So if you have a replication factor of three, use three racks for your deployment. This provides perfect balance and perfect resiliency.

These are just two of many topics that Glauber touched upon, and we’ll work to get you the full video of his presentation soon. For now, here are his slides:

Alexys Jacob, Numberly

Known to the developer community as @ultrabug, Alexys is the CTO of Numberly. His presentation was his production experience comparison of MongoDB and Scylla. Numberly uses both of these NoSQL databases in their environment, and Alexys wished to contrast their purpose, utility, and architecture.

For Alexys, it’s not an either-or situation with NoSQL databases. Each is designed for specific data models and use cases. Alexys’ presentation highlighted some of the commonalities between both systems, then drilled down into the differences.

His operations takeaways between the two systems were unsparing but fair. Whereas he hammered MongoDB’s claims about their sharding-based clustering — which, to his view had both poor TCO and operations, he said that replica-sets should be enough, and gave them a moderate TCO rating (“vertical scaling is not a bad thing!”) and good operations rating (“Almost no tuning needed”).

He gave Scylla a good rating for TCO due to its clean and simple topology, maximized hardware utilization and capability to scale horizontally (as well as vertically). For operations, what Alexys wanted was even more automation.

Alexys observed there are complex and mandatory background maintenance operations for Scylla. For example, while compactions are seamless, repairs are still only “semi-seamless.”

From Alexys’ perspective, MongoDB favors flexibility to performance, while Scylla favors consistent performance to versatility.



Breakout Sessions

In their session SmartDeployAI spoke about democratizing Machine Learning and AI by making deployment of resources easy through Kubernetes

This year Scylla Summit boasted over thirty breakout sessions, which spanned from the afternoon of the first day to the afternoon of the second. In due time we’ll share the video and slides for each, but for now, a few highlights.

  • JanusGraph was quite prevalent this year, with back-to-back sessions hosted by our friends at Expero and Enharmonic, and use cases presented by Zeotap for adtech and FireEye for cybersecurity.
  • Kubernetes was also top of mind for many, with presentations from Arrikto’s Yannis Zarkardis about the Scylla Operator, as well as SmartDeployAI talking about using Kubernetes to make efficient workflow pipelines for Machine Learning.
  • Streaming is now nearly ubiquitous. We were pleased to have Confluent’s Jeff Bean on hand for our die-hard Kafka fans, as well as CapitalOne’s Glen Gomez Zuazo who gave an overview of streaming technologies that included Kafka but also touched on other options like RabbitMQ and NATS.
  • Scylla continues to make new inroads across various industries from IoT (Augury, Mistaway and Nauto), to retail and delivery services (Fanatics, iFood) to security (Lookout, Reversing Labs, and FireEye), utilities/energy (SkyElectric), to consumer video (Comcast and Tubi).
  • Scylla Cloud user Mistaway showcased how using Scylla’s database-as-a-service allowed them to keep their focus on their application and their customers, not their infrastructure.

SkyElectric spoke about the incredible opportunities of providing renewable energy – solar and wind — to power the world using their solutions built upon Scylla Open Source

There were also many sessions from ScyllaDB’s own engineering team, highlighting our major new features and capabilities, best practices, security, sizing, Scylla Manager, Scylla Monitoring Stack and more. Look for the video and slides from those sessions in the days ahead.

Day Two General Sessions

The second day of the conference reconvened after lunch as a single general session.

Goran Cvijanovic, ReversingLabs — You have twenty billion objects you’ve analyzed. Any of them could be malware (loaded with a virus, a trojan horse, or other malicious payload), or so-called “goodware” (known to be benign). What database will be able to keep up a level of analytics to prevent the next major data breach? That’s the problem facing ReversingLabs. They analyze all those objects for what they call a “file reputation” which determines if it is safe, suspicious, or known to be dangerous. The need to be fast and right every time is why they put Scylla at the heart of their TitaniumCloud offering.

ReversingLabs’ Goran Cvijanovic described the scale and scope of their file reputation system

The ReversingLabs backend data store needed to support protobuf format natively, exhibit extremely low latencies (<2 milliseconds), and allow for highly variable record sizes (ranging anywhere from 1k of data up to 500 MB). On top of it all, it needed to be highly available and support replication.

ReversingLabs results showed average latencies for writes below 6 milliseconds, and average reads below 1 millisecond. Even their p99 latencies were 12 ms for writes and 7 ms for reads

Goran emphasized that one of the key take aways is to test your chunk size with data compression. In ReversingLabs’ case, it meant they were able to use 49% less storage. (If you want to learn how to take advantage of Scylla’s data compression feature yourself, make sure to read our blog series on compression: Part One and Part Two).

Richard Ney, Lookout — Richard Ney is the principal architect behind Lookout’s ingestion pipeline and query services for the security of tens of millions of mobile devices. Their Common Device Service is receiving data at different intervals for all sorts of attributes related to a mobile device; its software and hardware, its filesystem, settings, permissions and configuration, a binary manifest of what is installed, and various analyses of risk.

Lookout’s Richard Ney shared how a simple mobile device, multiplied tens of millions of times, becomes a Big Data security challenge! His talk was on how Lookout solved that challenge with Scylla.

Their existing design had Spark streaming ingestion jobs flowing through this Common Device Service to DynamoDB and Elasticsearch. But the long-term issue was that, as the system grew, costs increased significantly. Particularly costs associated with DynamoDB. As well, DynamoDB has limits on the primary key and sort key; it was not designed for time series data.

What Lookout is now seeking to do is to replace their current architecture with Scylla plus Kafka, with an eye to leveraging Scylla’s recently announced Change Data Capture (CDC) to flow data through Kafka Connect into Elasticsearch and other downstream services. They also seek to employ incremental compaction with Scylla Enterprise.

Richard described Lookout’s test environment, which emulated 38 million devices generating over 100,000 messages per second. He then noted that the default Kafka partitioner (Murmur2 hash) was not very efficient. The Lookout team implemented their own Murmur3 hash (the same as is used within Scylla) with a consistent jump hash using the Google guava library.

The bottom line for Lookout was that “the cost benefits over the current architecture flow increased significantly as our volume increased.” Richard showed the cost analysis for DynamoDB versus Scylla. In an on-demand scenario, supporting 38 million devices would cost Lookout over $304,000 per month compared to $14,500 for Scylla; a 95% savings over DynamoDB! Even moving to provisioned pricing, the cost of DynamoDB (more than $55,600 per month) would still eclipse Scylla. Scylla would still be 74% cheaper.

Keeping in mind the growth of the mobile market as well as Dor’s allusion to the “million dollar engineering challenge,” Richard extrapolated costs out even further, to a 100,000,000 device scenario. In that case costs for provisioned DynamoDB would rise to $146,000 a month — roughly $1.88 million annually. Whereas for Scylla, at around $38,300 a month, the annualized cost would be less than $460,000. That would mean a savings of over $1.4 million annually.

Shlomi Livne — ScyllaDB’s VP of R&D was up next with a session on writing applications for Scylla. His conceptual process model had a number of steps, each of which could lead back to prior steps. In general:

  1. Think about the queries you are going to run
  2. Only then create a data model
  3. Use cassandra-stress or another tool to validate performance while you…
  4. Develop
  5. Scale test, and finally
  6. Deploy

During this iterative process, Shlomi encouraged the audience to look for opportunities for CQL optimization and to also pay heed to disk access patterns. Look at the amount of I/O operations, and the overall amount of bytes read. There are two elements read off disk: the data stored in the SSTables, and the index. Everything else is read from memory.

Shlomi also proposed paying careful attention to your memory-to-disk ratio when you choose your instance types. For example the AWS EC2 i3 family has a memory-to-disk ratio of 1:30. Whereas the newer i3en family has a memory-to-disk ratio of 1:78. Thus, on the latter instance type you will get more queries served off disk.

Shlomi then led the audience on a comparison of doing a single partition scan versus range scans for performance, and gave examples of how current features such as BYPASS CACHE and PER PARTITION LIMIT, and upcoming features like GROUP BY and LIKE will help users get even more efficiency out of their full table and range scans. The bottom line is that optimized full scans can reduce the overall amount of disk access compared to aggregated single-partition scans.

Calle Wilund — Change Data Capture is a compelling new feature presented by our engineer Calle Wilund. It is a way to record changes to the tables of a database, which can then be read asynchronously by consumers. Basically a way to track diffs to records in your Scylla database. What is unique about Scylla’s implementation is that CDC is implemented as a standard CQL-readable table. It does not require a separate interface or application to read them.

A table comparing how Change Data Capture (CDC) is implemented in Scylla versus other NoSQL databases.

There are a number of use cases for CDC, including database mirroring and replication, or to enable specific applications like fraud detection or a Kafka pipeline. The way it is implemented is as a per-table log in the form of a CQL-readable table. The CDC log is colocated with the original data and has a default Time To Live (TTL) of 24 hours to ensure it doesn’t bloat your disk use uncontrollably.

We’ll provide more information about this feature and its implementation in a future blog, as well as our documentation. For now, if you want to keep an eye on our progress, make sure to check out issue #4985 in Github and peek in on all of the related subtasks.

Konstantin Osipov — Lightweight Transactions (LWT) that enable “Compare-and-Set” are another long-awaited feature in Scylla (see #1359). Our Software Team Lead Konstantin “Kostja” Osipov gave the audience an update on our implementation status. The good news from Kostja is that you can now try it yourself using Scylla’s nightly builds. There is even a quickstart guide on how to use them in Docker. However, LWT is an experimental feature (use --experimental) unsuitable for production; it is not yet available in a tested maintenance release.

Kostja also noted that while ScyllaDB is doing its best to make our implementation match Cassandra, there are some differences. For example, Scylla supports per-core partitioning, so you would need to use a shard-aware driver for optimal performance.

Kostja made clear that LWT has a performance cost. Throughput will be lower and latencies higher. Four round trips are very costly. Especially when working across multiple regions. Right now, ScyllaDB is focusing on improving performance with LWT using a Paxos consensus algorithm. There are also plans in the future to implement Raft.

As with CDC, we’ll have a full blog and documentation of our LWT implementation in the future.

Nadav Har’ElProject Alternator, the free and open source DynamoDB-compatible API for Scylla was presented to our audience by our Distinguished Engineer Nadav Har’el. What was new to those who have been paying close attention to this project was that we are now offering Alternator on Scylla Cloud in beta.

An intensive test of DynamoDB vs. Alternator at 120,000 ops shows that running Scylla on EC2 VMs could be an order of magnitude cheaper than DynamoDB.

Ask Us Anything!

The summit ended with its traditional Ask Me Anything session with Dor, Avi and Glauber onstage. We’d like to thank everyone who attended, and to all our speakers who made the event such a great success.

Even though Scylla Summit is over, we still invite you to ask us anything! Want more details on a feature we announced? Curious to know how to get the same advantages of Scylla’s performance and scalability in your own organization? Feel free to contact us, or drop into our Slack and let us know your thoughts.

The post Overheard at Scylla Summit 2019 appeared first on ScyllaDB.

Winners of the Scylla Summit 2019 User Awards

The Envelope Please…

There are many highlights from our 2019 Scylla Summit this week. One of our favorites from Day 1 was our Scylla User Awards, our opportunity to recognize the impressive things our users are doing with Scylla.

This year we had winners in nine categories. I’m glad for the chance to share them here, along with a bit of color on their use case.

Best Use of Scylla with Spark: Tubi

Streaming company Tubi has a backend service that uses pre-computed machine learning model results to personalize its user home pages. Hundreds of millions of them are generated per day in Spark. Scylla is used for the persistence layer. Tubi chose Scylla because they needed “a NoSQL solution that is very fault-tolerant, fast reads/writes, and easy to set up plus maintain.”

Best Use of Scylla with Kafka: Grab

Southeast Asia’s leading super app, Grab developed a microservices architecture based on data streaming with Apache Kafka. These streams not only power Grab’s business, they provide a vital source of intelligence. Grab’s engineering teams aggregate and republish the streams using a low-latency metadata store built on Scylla Enterprise.

Best Use of Scylla with a Graph Database: FireEye

Cybersecurity company FireEye built its Graph Store using JanusGraph, which uses both ScyllaDB and ElasticSearch for backend storage. They have more than 600 million vertices and over 1.2 billion edges, occupying over 2.5TB of space. After comparing NoSQL databases to serve as a backend, FireEye found Scylla to be roughly 10X faster than other options.

Best Use of Scylla Cloud: Dynamic Yield

Dynamic Yield bills itself as an omnichannel personalization platform built by a team that is known to “eat, sleep and breathe data.” In their industry-leading platform, they combined Scylla with Apache Flink and Apache Spark to power their Unified Customer Profile, which also made GDPR compliance far easier to for them to implement.

Greatest Node Reduction: Comcast

Telecommunications and entertainment leader Comcast Xfinity has been migrating from Cassandra to Scylla with tremendous benefits. To date, Comcast has replaced 962 nodes of Cassandra with just 78 nodes of Scylla. That’s a 92% reduction in nodes! The significantly smaller footprint brings great costs savings in both hardware and administration.

Best Real-Time Use Case: Opera

Web browser company Opera lets you to synchronize your browsing data (bookmarks, open tabs, passwords, history, etc.) between your devices. After migrating to Scylla, Opera managed to reduce their P99 read latencies from 5 seconds to 4ms (a 99.92% reduction), and their P99 write latencies from 500ms to 4ms (a 99.2% reduction). This helps them push updates to connected browsers much faster than ever before. Beyond latency improvements, Opera shared that “migrating to Scylla helped us to sleep well at night.”

Best Analytics Use Case: Kiwi.com

Travel leader Kiwi.com won in this category for cohabitating analytics and operational stores on the same box, and reducing their footprint 30% by using Bypass Cache. Their data, based on flight bookings, is constantly and rapidly changing. In fact, Kiwi.com experiences a 100% turnover in their dataset every ten days. Their need to do analytics against their ever-changing data via full table scans requires them to have a database that can meet their analytics workloads without impacting their live customer transactions.

Community Member of the Year: Yannis Zarkadas, Arrikto

Yannis Zarkadas from data management company Arrikto was winner of Best Scylla Community Member for his work on the Kubernetes Operator for Scylla. He said: “Working on the Kubernetes Operator as part of my diploma thesis at Arrikto was a great experience. The ScyllaDB team was always there if I needed advice or suggestions, and the Operator turned out even better than I’d hoped. Scylla and Kubernetes are a natural fit.”

Most Innovative Use of Scylla: Numberly

AdTech pioneer Numberly has combined Scylla with Kafka Connect, Kafka Streams, Apache Spark and Python Faust, built on Gentoo Linux and deployed on bare-metal across multiple datacenters, all managed with Kubernetes. All of that resulted in reengineering a calculation process that used to take 72 hours but can now be delivered in just 10 seconds.

We congratulate all of our winners, and also thank everyone who has been making Scylla such a vital part of their enterprises and a vibrant open source software community. If you have created your own groundbreaking applications built on Scylla, we’d love to hear more about it! Contact us privately or join us on Slack and tell us all about it!

The post Winners of the Scylla Summit 2019 User Awards appeared first on ScyllaDB.

Medusa - Spotify’s Apache Cassandra backup tool is now open source

Spotify and The Last Pickle (TLP) have collaborated over the past year to build Medusa, a backup and restore system for Apache Cassandra which is now fully open sourced under the Apache License 2.0.

Challenges Backing Up Cassandra

Backing up Apache Cassandra databases is hard, not complicated. You can take manual snapshots using nodetool and move them off the node to another location. There are existing open source tools such as tablesnap, or the manual processes discussed in the previous TLP blog post “Cassandra Backup and Restore - Backup in AWS using EBS Volumes”. However they all tend to lack some features needed in production, particularly when it comes to restoring data - which is the ultimate test of a backup solution.

Providing disaster recovery for Cassandra has some interesting challenges and opportunities:

  • The data in each SSTable is immutable allowing for efficient differential backups that only copy changes since the last backup.
  • Each SSTable contains data that node is responsible for, and the restore process must make sure it is placed on a node that is also responsible for the data. Otherwise it may be unreachable by clients.
  • Restoring to different cluster configurations, changes in the number of nodes or their tokens, requires that data be re-distributed into the new topology following Cassandra’s rules.

Introducing Medusa

Medusa is a command line backup and restore tool that understands how Cassandra works.
The project was initially created by Spotify to replace their legacy backup system. TLP was hired shortly after to take over development, make it production ready and open source it.
It has been used on small and large clusters and provides most of the features needed by an operations team.

Medusa supports:

  1. Backup a single node.
  2. Restore a single node.
  3. Restore a whole cluster.
  4. Selective restore of keyspaces and tables.
  5. Support for single token and vnodes clusters.
  6. Purging the backup set of old data.
  7. Full or incremental backup modes.
  8. Automated verification of restored data.

The command line tool that uses Python version 3.6 and needs to be installed on all the nodes you want to back up. It supports all versions of Cassandra after 2.1.0 and thanks to the Apache libcloud project can store backups in a number of platforms including:

Backup A Single Node With Medusa

Once medusa is installed and configured a node can be backed up with a single, simple command:

medusa backup --backup-name=<backup name>

When executed like this Medusa will:

  1. Create a snapshot using the Cassandra nodetool command.
  2. Upload the snapshot to your configured storage provider.
  3. Clear the snapshot from the local node.

Along with the SSTables, Medusa will store three meta files for each backup:

  • The complete CQL schema.
  • The token map, a list of nodes and their token ownership.
  • The manifest, a list of backed up files with their md5 hash.

Full And Differential Backups

All Medusa backups only copy new SSTables from the nodes, reducing the network traffic needed. It then has two ways of managing the files in the backup catalog that we call Full or Differential backups. For Differential backups only references to SSTables are kept by each new backup, so that only a single instance of each SStable exists no matter how many backups it is in. Differential backups are the default and in operations at Spotify reduced the backup size for some clusters by up to 80%.

Full backups create a complete copy of all SSTables on the node each time they run. Files that have not changed since the last backup will be copied in the backup catalog into the new backup (and not copied off the node). In contrast to the differential method which only creates a reference to files. Full backups are useful when you need to take a complete copy and have all the files in a single location.

Cassandra Medusa Full Backups

Differential backups take advantage of the immutable SSTables created by the Log Structured Merge Tree storage engine used by Cassanda. In this mode Medusa checks if the SSTable has previously being backed up, and only copies the new files (just like always). However all SSTables for the node are then stored in a single common folder, and the backup manifest contains only metadata files and references to the SSTables.

Cassandra Medusa Full Backups

Backup A Cluster With Medusa

Medusa currently lacks an orchestration layer to run a backup on all nodes for you. In practice we have been using crontab to do cluster wide backups. While we consider the best way to automate this (and ask for suggestions) we recommend using techniques such as:

  • Scheduled via crontab on each node.
  • Manually on all nodes using pssh.
  • Scripted using cstar.

Listing Backups

All backups with the same “backup name” are considered part of the same backup for a cluster. Medusa can provide a list of all the backups for a cluster, when they started and finished, and if all the nodes have completed the backup.

To list all existing backups for a cluster, run the following command on one of the nodes:

$ medusa list-backups
2019080507 (started: 2019-08-05 07:07:03, finished: 2019-08-05 08:01:04)
2019080607 (started: 2019-08-06 07:07:04, finished: 2019-08-06 07:59:08)
2019080707 (started: 2019-08-07 07:07:04, finished: 2019-08-07 07:59:55)
2019080807 (started: 2019-08-08 07:07:03, finished: 2019-08-08 07:59:22)
2019080907 (started: 2019-08-09 07:07:04, finished: 2019-08-09 08:00:14)
2019081007 (started: 2019-08-10 07:07:04, finished: 2019-08-10 08:02:41)
2019081107 (started: 2019-08-11 07:07:04, finished: 2019-08-11 08:03:48)
2019081207 (started: 2019-08-12 07:07:04, finished: 2019-08-12 07:59:59)
2019081307 (started: 2019-08-13 07:07:03, finished: Incomplete [179 of 180 nodes])
2019081407 (started: 2019-08-14 07:07:04, finished: 2019-08-14 07:56:44)
2019081507 (started: 2019-08-15 07:07:03, finished: 2019-08-15 07:50:24)

In the example above the backup called “2019081307” is marked as incomplete because 1 of the 180 nodes failed to complete a backup with that name.

It is also possible to verify that all expected files are present for a backup, and their content matches hashes generated at the time of the backup. All these operations and more are detailed in the Medusa README file.

Restoring Backups

While orchestration is lacking for backups, Medusa coordinates restoring a whole cluster so you only need to run one command. The process connects to nodes via SSH, starting and stopping Cassandra as needed, until the cluster is ready for you to use. The restore process handles three different use cases.

  1. Restore to the same cluster.
  2. Restore to a different cluster with the same number of nodes.
  3. Restore to a different cluster with a different number of nodes.

Case #1 - Restore To The Same Cluster

This is the simplest case: restoring a backup to the same cluster. The topology of the cluster has not changed and all the nodes that were present at the time the backup was created are still running in the cluster.

Cassandra Medusa Full Backups

Use the following command to run an in-place restore:

$ medusa restore-cluster --backup-name=<name of the backup> \
                         --seed-target node1.domain.net

The seed target node will be used as a contact point to discover the other nodes in the cluster. Medusa will discover the number of nodes and token assignments in the cluster and check that it matches the topology of the source cluster.

To complete this restore each node will:

  1. Download the backup data into the /tmp directory.
  2. Stop Cassandra.
  3. Delete the commit log, saved caches and data directory including system keyspaces.
  4. Move the downloaded SSTables into to the data directory.
  5. Start Cassandra.

The schema does not need to be recreated as it is contained in the system keyspace, and copied from the backup.

Case #2 - Restore To A Different Cluster With Same Number Of Nodes

Restoring to a different cluster with the same number of nodes is a little harder because:

  • The destination cluster may have a different name, which is stored in system.local table.
  • The nodes may have different names.
  • The nodes may have different token assignments.

Cassandra Medusa Full Backups

Use the following command to run a remote restore:

$ medusa restore-cluster --backup-name=<name of the backup> \
                         --host-list <mapping file>

The host-list parameter tells Medusa how to map from the original backup nodes to the destination nodes in the new cluster, which is assumed to be a working Cassandra cluster. The mapping file must be a Command Separated File (without a heading row) with the following columns:

  1. is_seed: True or False indicating if the destination node is a seed node. So we can restore and start the seed nodes first.
  2. target_node: Host name of a node in the target cluster.
  3. source_node: Host name of a source node to copy the backup data from.

For example:

True,new_node1.foo.net,old_node1.foo.net
True,new_node2.foo.net,old_node2.foo.net
False,new_node3.foo.net,old_node3.foo.net

In addition to the steps listed for Case 1 above, when performing a backup to a remote cluster the following steps are taken:

  1. The system.local and system.peers tables are not modified to preserve the cluster name and prevent the target cluster from connecting to the source cluster.
  2. The system_auth keyspace is restored from the backup, unless the --keep-auth flag is passed to the restore command.
  3. Token ownership is updated on the target nodes to match the source nodes by passing the -Dcassandra.initial_token JVM parameter when the node is restarted. Which causes ownership to be updated in the local system keyspace.

Case #3 - Restore To A Different Cluster With A Different Number Of Nodes

Restoring to a different cluster with a different number of nodes is the hardest case to deal with because:

  • The destination cluster may have a different name, which is stored in system.local table.
  • The nodes may have different names.
  • The nodes may have different token assignments.
  • Token ranges can never be the same as there is a different number of nodes.

The last point is the crux of the matter. We cannot get the same token assignments because we have a different number of nodes, and the tokens are assigned to evenly distribute the data between nodes. However the SSTables we have backed up contain data aligned to the token ranges defined in the source cluster. The restore process must ensure the data is placed on the nodes which are replicas according to the new token assignments, or data will appear to have been lost.

To support restoring data into a different topology Medusa uses the sstableloader tool from the Cassandra code base. While slower than copying the files from the backup the sstableloader is able to “repair” data into the destination cluster. It does this by reading the token assignments and streaming the parts of the SSTable that match the new tokens ranges to all the replicas in the cluster.

Cassandra Medusa Full Backups

Use the following command to run a restore to a cluster with a different topology :

$ medusa restore-cluster --backup-name=<name of the backup> \
                         --seed-target target_node1.domain.net

Restoring data using this technique has some drawbacks:

  1. The restore will take significantly longer.
  2. The amount of data loaded into the cluster will be the size of the backup set multiplied by the Replication Factor. For example, a backup of a cluster with Replication Factor 3 will have 9 copies of the data loaded into it. The extra replicas will be removed by compaction however the total on disk load during the restore process will be higher than what it will be at the end of the restore. See below for a further discussion.
  3. The current schema in the cluster will be dropped and a new one created using the schema from the backup. By default Cassandra will take a snapshot when the schema is dropped, a feature controlled by the auto_snapshot configuration setting, which will not be cleared up by Medusa or Cassandra. If there is an existing schema with data it will take extra disk space. This is a sane safety precaution, and a simple work around is to manually ensure the destination cluster does not have any data in it.

A few extra words on the amplification of data when restoring using sstableloader. The backup has the replicated data and lets say we have a Replication Factor of 3, roughly speaking there are 3 copies of each partition. Those copies are spread around the SSTables we collected from each node. As we process each SSTable the sstableloader repairs the data back into the cluster, sending it to the 3 new replicas. So the backup contains 3 copies, we process each copy, and we send each copy to the 3 new replicas, which means in this case:

  • The restore sends nine copies of data to the cluster.
  • Each node gets three copies of data rather than one.

The following sequence of operations will happen when running this type of restore:

  1. Drop the schema objects and re-create them (once for the whole cluster)
  2. Download the backup data into the /tmp directory
  3. Run the sstableloader for each of the tables in the backup

Available Now On GitHub

Medusa is now available on GitHub and is available through PyPi. With this blog post and the readme file in the repository you should be able to take a backup within minutes of getting started. As always if you have any problems create an issue in the GitHub project to get some help. It has been in use for several months at Spotify, storing petabytes of backups in Google Cloud Storage (GCS), and we thank Spotify for donating the software to the community to allow others to have the same confidence that their data is safely backed up.

One last thing, contributions are happily accepted especially to add support for new object storage providers.

Also, we’re looking to hire a Cassandra expert in America.

Q&A with SmartDeployAI’s Timo Mechler and Charles Adetiloye on Simplifying the Creation of ML Workflow Pipelines

As we prepare for Scylla Summit 2019, we are producing a series of blogs highlighting this year’s featured presenters. And a reminder, if you’re not yet registered for Scylla Summit, please take the time to register now!

Today we are speaking with Timo Mechler, Product Manager, and Charles Adetiloye, Machine Learning Platform Engineer, both of SmartDeployAI, who will be co-presenting the session Simplifying the Creation of ML Workflow Pipelines for IoT Application on Kubernetes with Scylla.

Machine Learning, workflow pipelines, IoT, Kubernetes… That’s quite a buzzword compliant title! You sure you can fit all that into a single Scylla Summit session?

Timo: Well, we sure hope so! 😃 While the title is buzzword heavy, the technologies we are talking about are at the top of everybody’s mind today in the Enterprise IT space today. We wanted to show the audience how we have used these technologies together to build a novel software platform that we hope will continue to democratize the field of artificial intelligence (AI).

Charles: Yeap, it sure sounds like lots of buzzwords, because a lot of people are still trying to figure out all these new things, the technology frameworks are evolving rapidly in different directions that it’s quite difficult to establish the ground-truth most of the time! We have seen this over and over from all various client engagements. But the value proposition we bring to the table is a clear understanding of all these technology components (not just a buzzword to us 😊) from architecture to implementation, integration, and deployment at scale.

One more thing if I may add is that we don’t just jump on these new technologies — for instance, Kubernetes, Scylla, etc., as they show up — we do a comparative analysis with other parallel technologies. We check the long term viability and most importantly evaluate how well we can use it to build a better product to serve our customers. It took a little while for us to come around to Scylla but we are loving it right now!

Help our audience understand the nature of your talk. What will you be covering in specific? Who would benefit most by attending your session?

Timo: We will be talking about the creation of IoT data pipelines for artificial intelligence and machine learning and how we are running these on Kubernetes with the help of Scylla to drive efficiency and performance at scale.

Charles: Our talk will be centered around operationalizing pipelines for IoT systems with AI capabilities for real-time inferencing. It’s very easy to fall into the trap of building a Monster-AI pipeline system, that is not easily scalable, not cloneable; i.e you can’t instantiate a new pipeline on the fly, or requires database tuning every other day! We will be talking about how we have solved all these problems by leveraging Kubernetes and Scylla for the efficient utilization of computing resources and storage.

Tell us more about how SmartDeployAI evolved. When and how did it get its start?

Timo: SmartDeployAI grew out of our own experience consulting in the big data space by building data pipelines and analytical models (e.g. AI and ML) for clients. We noticed that these clients often lacked the necessary infrastructure or DevOps know-how to be able to put these types of models and pipelines into production and then run them at scale in their enterprise. This served as our “lightbulb moment” and decided to focus our energy into building a collaborative and intuitive software solution that would make it easy for anyone to run AI & ML at scale, without the need for specialized (and hard to find) DevOps skills, or a multi-million dollar budget.

Charles: Yeap, Timo said it well, This is a result of all our years of consulting engagement in this space. I like to use this analogy: The Journey from point A to B could be a straight line dash, but it’s very disturbing when you see lots of companies expend resources and time but they don’t make it to the destination. Our goal is to help our clients eliminate these distractions and get them to the destination with minimal effort and at a fraction of the cost.

What do you believe is the most compelling technical aspect of the architecture?

Charles: Its simplicity and intuitiveness. We treat components of the pipeline we build like a “Lego” block with the right level of functional abstractions.

Where can people learn more about SmartDeployAI?

Timo: A good place to start would be our website – www.smartdeploy.ai. We are also active on Social Media and have our own Slack Channel (http://bit.ly/ai-pipelines) for those that are interested in learning more about our platform and/or collaborating with us on building more data pipelines for AI & ML.

Outside of your day-to-day jobs, what interests or hobbies do each of you have?

Timo: Well, being a father or two young children does keep me busy, but I do like to step away from the keyboard as much as I can and enjoy a good CrossFit workout. I used to work in finance and still like to keep my pulse on the markets as time allows. Other than that, I’m a big fan of classical music – I enjoy visiting live performance, and I’m continuing to grow my own personal collection of recordings.

Charles: Being a dad keeps me busy and spending time with family is what I do most of the time when I am away from the keyboard. Other than that, I love things that can fly, Planes and drones! Working on AI and MLOps has taken a lot of my time in the last few years but I need to go back and get my VFR license so that I can get back flying my favorite plane – the Cessna 172.

Thank you both for taking the time to talk to me today. We’re looking forward to your Scylla Summit talk!

REGISTER NOW FOR SCYLLA SUMMIT 2019

The post Q&A with SmartDeployAI’s Timo Mechler and Charles Adetiloye on Simplifying the Creation of ML Workflow Pipelines appeared first on ScyllaDB.

Q&A with SAS’ David Blythe on Changing All Four Tires While Driving AdTech at Full Speed

As we prepare for Scylla Summit 2019, we are producing a series of blogs highlighting this year’s featured presenters. And a reminder, if you’re not yet registered for Scylla Summit, please take the time to register now!

Today we are speaking with David Blythe, Principal Software Developer at SAS. His presentation at Scylla Summit 2019 is entitled Changing All Four Tires while Driving an Ad Tech Engine at Full Speed.

SAS’s implementation of NoSQL is somewhat unique. Rather than expose the raw CQL interface common to Cassandra, DataStax Enterprise, and Scylla, you created a C++ API to abstract the underlying database. What was the strategy behind such a decision?

The abstraction is just a good object-oriented practice, shielding the application level code from the particulars of the implementation. It afforded us the ability to create other implementations we use in unit and local testing, with no need for an actual NoSQL database. And, while it wasn’t the original motivation, it has also allowed us to more easily change our NoSQL infrastructure (three times!) without disturbing application level code.

Your analogy to “changing all four tires while driving at full speed” refers to your database migration. I’m imagining a pit crew trying to do that at the Raleigh Speedway. As fans of racing know, the crew makes all the difference to a winning team. Tell us more about SAS’ “pit crew.” Who’s on your team?

First, let me broaden that analogy. A database migration is just one of the tires. A server code upgrade may be another. Turning features on or off for a customer is another. Our ad serving ecosystem has many moving parts, and we have to be able to upgrade or service those parts while continuously fulfilling ad requests. Oh, and a pit stop would be a luxury. We have to do it while barreling through Turn 4.

We have a surprisingly small team of talented engineers who develop and manage the entire system: 3 front end developers, 3 back end developers, 2 testers, 3 operations people, our development manager, and our product manager. There are 2 dedicated tech service people, and they are included in our daily scrums so that we have continuous visibility into customer issues. The developers share on-call duties with the operations staff, so we get to feel the pain first hand if something we’ve created goes sideways. It’s a great organization.

In your presentation, you repeat on a few slides the short but potent phrase, “No down time.” Tell us why uptime is so vital for SAS and your users. What’s at stake?

Our customers are web publishers and, of course, make most of their revenue from selling advertising space on their sites. The product we are responsible for, “CI 360 Match”, manages the delivery of ads into their sites, starting and stopping campaigns at the right time, pacing them to meet their goals, serving them to the right audience–in short, making real-time decisions about which is the best ad to serve “right now”. That’s how our customers maximize their revenue. They cannot afford periods when SAS cannot serve, because it has a direct impact on their bottom line. We have to be running 24/7.

The data we keep in NoSQL also has to be available 24/7. It has the information about the web site visitor–their demographics, behavior, and ad serving history. That data is needed to target the ads properly and make the best decisions. Without it, the ads served have less value to our customers.

Things are a little more relaxed on the front end, but not much. Our customers use the front end to enter ad purchases and set up business rules for their delivery. They also do searches for unsold inventory and generate reports. Down time is not as significant as for ad serving, but if it occurs during their work day, it can really interfere with their business processes. We have to be very careful.

Tell us about SAS’ deployment.

CI 360 Match is a hosted product deployed in Amazon AWS. Ten years ago, when the product began as part of a startup, we were deploying static instances in one US region. Over time, we have spread to Europe, Australia, and Japan. There are no longer static instances for ad serving or front end, but autoscaling clusters. Our use of automation, both within AWS and in our own tooling, has been critical to getting where we are and making the system manageable with a small staff. We’re running 200-300 instances in AWS at this point.

We deploy new front end and back end code weekly. It’s a great pleasure to be in an environment where we can measure time to new feature deployment in days–not weeks or months.

What is one thing that most people in the tech industry don’t know about yourself that you’d like to share?

I like to sing. You won’t find me on American Idol, but I have fun in our church choir. It’s about the only time during the week when all this tech stuff goes out of my head!

Thanks for taking the time to talk with me today, David. I’m sure our users will be eager to attend your session to find out more in depth!

REGISTER NOW FOR SCYLLA SUMMIT 2019

The post Q&A with SAS’ David Blythe on Changing All Four Tires While Driving AdTech at Full Speed appeared first on ScyllaDB.

Scylla Open Source Release 3.1.1

Scylla Open Source Release Notes

The ScyllaDB team announces the release of Scylla Open Source 3.1.1, a bugfix release of the Scylla 3.1 stable branch. Scylla Open Source 3.1.1, like all past and future 3.x.y releases, is backward compatible and supports rolling upgrades.

The main issue fixed in this release is in the upgrade path from 3.0.x to 3.1.x. If using TTL, internode read or write requests between nodes of different releases fails (for example during a rolling upgrade) #4855

Note: if and only if you installed a fresh Scylla 3.1.0, you must add the following line to scylla.yaml of each node before upgrading to 3.1.1:

enable_3_1_0_compatibility_mode: true

This is not relevant if your cluster was upgraded to 3.1.0 from an older version, or you are upgrading from or to any other Scylla releases.

If you have doubts, please contact us using the user mailing list.

Related links:

Other issues fixed in this release

  • Stability: Fix of handling of schema alterations and evictions in the cache, which may result in a node crash #5127 #5128 #5134 #5135

The post Scylla Open Source Release 3.1.1 appeared first on ScyllaDB.

Join Us at Seastar Summit!

Seastar, the advanced open-source C++ framework continues to gain traction. Most well known for being the engine at the heart of Scylla, the monstrously-fast NoSQL database, Seastar is also now at the core of other exciting new projects. And now we’ll be bringing together developer practitioners from around the world to the first-ever Seastar Summit.

This event will be held on November 4, 2019 at the Parc 55 Hotel in San Francisco, concurrently with the Scylla Summit Training Day.

“Seastar is an engine designed to take advantage of today’s massive multi-core server architectures,” says ScyllaDB CTO and Co-Founder, Avi Kivity. “It allows you the ability to tune and control almost every part of the future runtime. IO and CPU shares scheduling. But it is intrinsically safe. You worry about correctness and construction, while the framework worries about efficient execution.”

As an example of its inherent capabilities, Avi points to the workload prioritization feature now available on Scylla built with Seastar under-the-hood. With every computation preemptable at sub-millisecond resolution, Scylla can allocate system resources via shares assigned to different users/roles. This allows, for instance, OLAP and OLTP oriented workloads to run on the same cluster without either bringing the entire cluster to its knees.

Seastar Summit provides developers an opportunity to hear world leading practitioners from companies including ScyllaDB, Red Hat and Vectorized, as they discuss their technical use cases, detailed implementations, and best practices for development with Seastar.

Sessions include:

  • Keynote by ScyllaDB’s Avi Kivity
  • Futures Under the Hood by ScyllaDB’s Rafael Ávila de Espíndola
  • Update on the Seastarized Ceph by Red Hat’s Kefu Chai
  • Ceph::errorator – throw/catch-free, compile time-checked errors for seastar::future by Red Hat’s Radoslaw Zarzynski
  • A generic, C++17, static reflection RPC system with no IDL, designed for large payloads by Vectorized.io’s Alexander Gallego
  • Seastar RPC by ScyllaDB’s Gleb Natapov
  • Seastar Metrics by ScyllaDB’s Amnon Heiman

REGISTER NOW FOR SEASTAR SUMMIT

The post Join Us at Seastar Summit! appeared first on ScyllaDB.