Streaming Real-Time Chat Messages into ScyllaDB with Apache Pulsar

At Scylla Summit 2022, I presented “FLiP Into Apache Pulsar Apps with ScyllaDB”. Using the same content, in this blog we’ll demonstrate step-by-step how to build real-time messaging and streaming applications using a variety of OSS libraries, schemas, languages, frameworks, and tools utilizing ScyllaDB. We’ll also introduce options from MQTT, Web Sockets, Java, Golang, Python, NodeJS, Apache NiFi, Kafka on Pulsar, Pulsar protocol and more. You will learn how to quickly deploy an app to a production cloud cluster with StreamNative, and build your own fast applications using the Apache Pulsar and ScyllaDB integration.

Before we jump into the how, let’s review why this integration can be used for speedy application build. ScyllaDB is an ultra-fast, low-latency, high-throughput, open source NoSQL platform that is fully compatible with Cassandra. Populating ScyllaDB tables utilizing the Scylla-compatible Pulsar IO sink doesn’t require any complex or specialized coding, and the sink makes it easy to load data to ScyllaDB using a simple configuration file pointing to Pulsar topics that stream all events directly to ScyllaDB tables.

Now, let’s build a streaming real-time chat message system utilizing ScyllaDB and Apache Pulsar!

Why Apache Pulsar for Streaming Event Based Applications

Let’s start the process to create a chat application that publishes messages to an event bus anytime someone fills out a web form. After the message is published, sentiment analysis is performed on the “comments” text field of the payload, and the result of the analysis is output to a downstream topic.

Event-driven applications, like our chat application, use a message bus to communicate between loosely-coupled, collaborating services. Different services communicate with each other by exchanging messages asynchronously. In the context of microservices, these messages are often referred to as events.

The message bus receives events from producers, filters the events, and then pushes the events to consumers without tying the events to individual services. Other services can subscribe to the event bus to receive those events for processing (consumers).

Apache Pulsar is a cloud-native, distributed messaging and event-streaming platform that acts as a message bus. It supports common messaging paradigms with its diverse subscription types and consumption patterns.

As a feature required for our integration, Pulsar supports IO Connectors. Pulsar IO connectors enable you to create, deploy, and manage connectors utilizing simple configuration files and basic CLI tools and REST APIs. We will utilize a Pulsar IO Connector to sink data from Pulsar topics to ScyllaDB.

Pulsar IO Connector for ScyllaDB

First, we download the Cassandra connector to deploy it to my Pulsar cluster. This process is documented at the Pulsar IO Cassandra Sink connector information.

Next, we download the pulsar-io-cassandra-X.nar archive to our connectors directory. ScyllaDB is fully compatible with Cassandra, so we can use that connector to stream messages to it.

When using a Pulsar IO connector like the ScyllaDB one I used for my demo, you can specify the configuration details inside a YAML file like the one shown below.

configs:
roots: "172.17.0.2:9042"
keyspace: "pulsar_test_keyspace"
columnFamily: "pulsar_test_table"
keyname: "key"
columnName: "col"

The main configuration shown above is done in YAML format and lists the root server with port, a keyspace, a column family, keyname, and column name to populate.

First, we will need to create a topic to consume from.

bin/pulsar-admin topics create persistent://public/default/chatresult2

When you deploy the connector you pass in these configuration properties by command line call as shown below.

bin/pulsar-admin sinks create --tenant public --namespace default --name "scylla-test-sink" --sink-type cassandra --sink-config-file conf/scylla.yml --inputs chatresult2

For new data, create a keyspace, table and index or use one of your existing ones.

CREATE KEYSPACE pulsar_test_keyspace with replication = {‘class’:’SimpleStrategy’, ‘replication_factor’:1};
CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);
CREATE INDEX on pulsar_test_table(col);

Adding ML Functionality with a Pulsar Function

In the previous section, we discussed why Apache Pulsar is well-suited for event-driven applications. In this section, we’ll cover Pulsar Functions–a lightweight, serverless computing framework (similar to AWS Lambda). We’ll leverage a Pulsar Function to deploy our ML model to transform or process messages in Pulsar. The diagram below illustrates our chat application example.

Keep in mind: Pulsar Functions give you the flexibility to use Java, Python, or Go for implementing your processing logic. You can easily use alternative libraries for your sentiment analysis algorithm.

The code below is a Pulsar Function that runs Sentiment Analysis on my stream of events. (The function runs once per event.)

from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json

class Chat(Function):
    def __init__(self):
        pass

    def process(self, input, context):
        logger = context.get_logger()
        logger.info("Message Content: {0}".format(input))
        msg_id = context.get_message_id()

        fields = json.loads(input)
        sid = SentimentIntensityAnalyzer()
        ss = sid.polarity_scores(fields["comment"])
        logger.info("Polarity: {0}".format(ss['compound']))
        sentimentVal = 'Neutral'
        if ss['compound'] == 0.00:
            sentimentVal = 'Neutral'
        elif ss['compound'] < 0.00:
            sentimentVal = 'Negative'
        else:
            sentimentVal = 'Positive'
        row = { }

        row['id'] = str(msg_id)
        row['sentiment'] = str(sentimentVal)
        row['userInfo'] = str(fields["userInfo"])
        row['comment'] = str(fields["comment"])
        row['contactInfo'] = str(fields["contactInfo"])
        json_string = json.dumps(row)
        return json_string

Here, we use the Vader Sentiment NLP ML Library to analyze the user’s sentiment on the comment. We enrich our input record with the sentiment and then write it in JSON format to the output topic.

I use the Pulsar context to do logging. I could also push data values to state storage or record some metrics. For this example, we will just do some logging.

Deploy Our Function

Below is the deployment script where you can find all of the options and tools in its github directory. We have to make sure we have our NLP library installed on all of our nodes.

bin/pulsar-admin functions create --auto-ack true
--py pulsar-pychat-function/src/sentiment.py --classname "sentiment.Chat" --inputs "persistent://public/default/chat2" --log-topic "persistent://public/default/chatlog2" --name Chat --namespace default --output "persistent://public/default/chatresult2" --tenant public

pip3 install vaderSentiment

Let’s Run Our Chat Application

Now that we have built our topic, Function, and sink, let’s build our application. The full web page is in the github directory, but I’ll show you the critical portions here. For this Single Page Application (SPA), I am using JQuery and DataTables that are included from their public CDNs. Datatable.html

<form action="/datatable.html" method="post" enctype="multipart/form-data" id="form-id">
<div id="demo" name="demo"></demo>
<p><label>User: </label><input name="user" type="text" id="user-id" size="75" value="" maxlength="100"/></p>
<p><label>Question: </label><input type="text" name="other-field" type="text" id="other-field-id" size="75" maxlength="200" value=""/></p>
<p><label>Contact Info: </label><input name="contactinfo" type="text" id="contactinfo-id" size="75" maxlength="100" value=""/></p>
<p><input type="button" value="Send to Pulsar" onClick="loadDoc()" /></p>
</form>

In the above HTML Form, we let users add a comment to our chat.

Now we are using JavaScript to send the form data as JSON to a Pulsar topic via WebSockets. WebSockets are a supported protocol for Apache Pulsar. The WebSocket URL is ws://pulsar1:8080/ws/v2/producer/persistent/public/default/chat2.

Where ws is the protocol, pulsar1 is the Pulsar server, port 8080 is our REST port, producer is what we are doing, persistent is our type of topic, public is our tenant, default is our namespace and chat2 is our topic: We populate an object and convert it to a JSON String and encode that payload as a Base64-encoded ASCII string. Then, we add that encoded String as the payload in a new JSON string that includes payload, properties and context for our Pulsar Message. This format is required for the WebSocket protocol to convert to a regular message in our Pulsar topic.

​​<script>
function loadDoc() {
cxvar xhttp = new XMLHttpRequest();
  xhttp.onreadystatechange = function() {
    if (this.readyState == 4 && this.status == 200) {
      document.getElementById("demo").innerHTML = '';
    }
  };
var wsUri = "ws://pulsar1:8080/ws/v2/producer/persistent/public/default/chat2";

websocket = new WebSocket(wsUri);

const pulsarObject = {
  userInfo: document.getElementById('user-id').value.substring(0,200),
  contactInfo: document.getElementById('contactinfo-id').value.substring(0,200),
  comment: document.getElementById('other-field-id').value.substring(0, 200)};
const jsonStr = JSON.stringify(pulsarObject);
var payloadStr = btoa(jsonStr);
const propertiesObject = {key: Date.now() }
var data = JSON.stringify({ "payload": payloadStr, "properties": propertiesObject, "context": "cs" });

websocket.onopen = function(evt) {
  if (websocket.readyState === WebSocket.OPEN) {
    websocket.send(data);
  }
};
websocket.onerror = function(evt) {console.log('ERR', evt)};
websocket.onmessage = function(evt) {}
websocket.onclose = function(evt) {
  if (evt.wasClean) { console.log(evt);
  } else { console.log('[close] Connection died');
  }
};
}
var form = document.getElementById('form-id');
form.onsubmit = function() {
  var formData = new FormData(form);  
  var action = form.getAttribute('action');
  loadDoc();
  return false;
}
</script>

In the above code, we’ll grab the value of the fields from the form, stop the form from reloading the page, and then send the data to Pulsar.

Now, let’s consume any messages sent to the result topic of our Sentiment Pulsar function.

In the below code we consume from a Pulsar topic: ws://pulsar1:8080/ws/v2/consumer/persistent/public/default/chatresult2/chatrreader?subscriptionType=Shared&receiverQueueSize=500.

In this URI, we can see this differs some from the producer URI. We have a receiverQueueSize, consumer tag and a subscription Type of Shared.

JavaScript:

$(document).ready(function() {
   var t = $('#example').DataTable();

var wsUri = "ws://pulsar1:8080/ws/v2/consumer/persistent/public/default/chatresult2/chatrreader?subscriptionType=Shared&receiverQueueSize=500";
websocket = new WebSocket(wsUri);
websocket.onopen = function(evt) {
console.log('open');
};
websocket.onerror = function(evt) {console.log('ERR', evt)};
websocket.onmessage = function(evt) {

   var dataPoints = JSON.parse(evt.data);
   if ( dataPoints === undefined || dataPoints == null || dataPoints.payload === undefined || dataPoints.payload == null ) {
      return;
   }
   if (IsJsonString(atob(dataPoints.payload))) {
      var pulsarMessage = JSON.parse(atob(dataPoints.payload));
      if ( pulsarMessage === undefined || pulsarMessage == null ) {
         return;
      }
      var sentiment = "";
      if ( !isEmpty(pulsarMessage.sentiment) ) {
         sentiment = pulsarMessage.sentiment;
      }
      var publishTime = "";
      if ( !isEmpty(dataPoints.publishTime) ) {
         publishTime = dataPoints.publishTime;
      }
      var comment = "";
      if ( !isEmpty(pulsarMessage.comment) ) {
         comment = pulsarMessage.comment;
      }
      var userInfo= "";
      if ( !isEmpty(pulsarMessage.userInfo) ) {
         userInfo = pulsarMessage.userInfo;
      }
      var contactInfo= "";
      if ( !isEmpty(pulsarMessage.contactInfo) ) {
         contactInfo = pulsarMessage.contactInfo;
      }
         t.row.add( [ sentiment, publishTime, comment, userInfo, contactInfo ] ).draw(true);
   }
};

} );

For messages consumed in JavaScript WebSockets, we have to Base64-decode the payload and parse the JSON into an object and then use the DataTable row.add method to add these new table rows to our results. This will happen whenever messages are received.

Conclusion

In this blog, we explained how to use Apache Pulsar to build simple, streaming applications regardless of the data source. We chose to add a ScyllaDB compatible sink to our Chat application; however, we could do this for any data store in Apache Pulsar.

You can find the source code in the Github repo Scylla FLiPS The Stream With Apache Pulsar.

If you’d like to see this process in action, view the original on-demand recording.

WATCH THE ON-DEMAND RECORDING

Resources & References

More on Pulsar

  1. Learn Pulsar Fundamentals: While this blog did not cover Pulsar fundamentals, there are great resources available to help you learn more. If you are new to Pulsar, we recommend you to take the on-demand self-paced Pulsar courses or test your Pulsar knowledge with the Fundamentals TestOut.
  2. Spin up a Pulsar Cluster in Minutes: If you want to try building microservices without having to set up a Pulsar cluster yourself, sign up for StreamNative Cloud today. StreamNative Cloud is the simple, fast, and cost-effective way to run Pulsar in the public cloud.
  3. Continued Learning: If you are interested in learning more about Pulsar functions and Pulsar, take a look at the following resources:

This article original appeared on the StreamNative Engineering blog here and is republished with permission.

 

Exploring Phantom Jams in your Data Flow

“Think back to some time when you were driving down the freeway, clipping right along about 5 miles above the limit even though the highway was busy, and then suddenly everyone slowed to a crawl and you had to stand on the brakes. Next came a quarter, a half, even a full mile of stop-and-go traffic. Finally, the pack broke up and you could come back to speed. But there was no cause to be seen. What happened to the traffic flaw?”

— Charles Wetherell, Etudes for Programmers

In the real world a “phantom traffic jam,” also called a “traffic wave,” is when there is a standing compression wave in automobile traffic, even though there is no accident, lane closure, or other incident that would lead to the traffic jam. A similar phenomenon is possible in computer networking or storage I/O, where your data can slow down for no apparent reason. Let’s look into what might cause such “phantom jams” in your data.

Introducing a Dispatcher to the Producer-Consumer Model

Consider there’s a thread that sends some messages into the hardware, say network packets into the NIC or I/O requests into the disk. The thread, called Producer, does this at some rate, P messages per second. The hardware, in turn, is capable of processing the messages at some other rate, C messages per second. This time, however, the Producer doesn’t have direct access to the Consumer’s input. Instead, the messages are put into some intermediate buffer or queue and then a third component, called Dispatcher, pushes those messages into their destination. Like this:

The Producer-Dispatcher-Consumer model

Being a software component, the Dispatcher may have limited access to the CPU power and is thus only “allowed” to dispatch the messages from its queue at certain points in time. Let’s assume that the dispatcher wakes up D times per second and puts several messages into the Consumer before getting off the CPU.

The Dispatcher component is not as artificial as it may seem. For example in the Linux kernel there’s I/O scheduler and network traffic shaper. In ScyllaDB, the database for data-intensive apps that require high performance and low latency, there’s Seastar’s reactor, etc. Having such an interposer allows the system architect to achieve various goals like access rights, priority classes, buffering, congestion control and many others.

Apparently, if P > C ( i.e. the Producer generates messages at a higher rate than the Consumer can process), then someone’s queue will grow infinitely – either the Dispatcher’s queue or the Consumer’s internal one. So further we’ll always assume that the C > P, i.e. the Consumer is fast enough to keep the whole system operating.

On the other hand, the Dispatcher is allowed to work at any rate it wants. Since it may put any amount of messages into the Consumer, its math would look like this. It takes L = 1 / C seconds for a Consumer to process a single message, Dispatcher sleep time between wake-ups is G = 1 / D, respectively it needs to put at least G / L = C / D messages into the Consumer so as not to disturb the message flow.

Why not put the whole queue accumulated so far? It’s possible, but there’s a drawback. If the Consumer is dumb and operates in FIFO mode (and in case it’s a hardware most likely it is dumb), the Dispatcher limits its flexibility in maintaining the incoming flow of messages. And since in ideal conditions the Consumer won’t process more than C / D requests, it makes sense to dispatch at most some-other-amount (larger than C / D) of requests per wake-up.

Adding some jitter to the data-flow chain

In the real world, however, none of the components is able to do its job perfectly. The Producer may try to generate the messages at the given rate, but this rate will only be visible in the long run. The real delay between messages will be different, being P on average. Similarly, the Consumer will process C messages per second, but the real delay between individual messages would vary around the 1 / C value. And, of course, the same is true for the Dispatcher – it will be woken up at unequal intervals, some below the 1 / D goal, some above, some are well above it. In the academic world there’s a nice model of this jittery behavior called the Poisson point process. In it, the events occur continuously, sequentially and independently at some given rate. Let’s try to disturb each of the components we have and see how it will affect its behavior.

To play with this model, here’s a simple simulating tool (written on C++) that implements the Producer, the Dispatcher and the Consumer and runs them over a virtual time axis for the given duration. Since it was initially written to model the way Seastar library dispatches the IO requests it uses a hard-coded Dispatcher wake-up rate of 2kHz and applies a multiplier of 1.5 to the maximum number of requests it is allowed to put into the Consumer as per above description (but both constants can be overridden with the command line options).

The tool CLI is as simple as

$ sim <run duration in seconds>
      <producer delays> <producer rate>
      <dispatcher delays>
      <consumer delays> <consumer rate>

Where “delays” arguments define how the simulator models the pauses between events in the respective component. In the end the tool prints the statistics about time it took for requests to be processed – the mean value, 95% and 99% percentiles and the maximum value.

First, let’s validate that the initial idea about perfect execution is valid, i.e. if all three components are running at precisely fixed rates the system is stable and processes all the requests. Since it’s a simulator, it can provide the perfect intervals between events. We’ll run the simulation at 200k messages per second rate and will do it for 10 virtual minutes. The command to run is

$ sim 600 uniform 200000 uniform uniform 200000

The resulting timings would be

mean p95 p99 max
0.5ms 0.5ms 0.5ms 0.5ms

Since the dispatcher runs at 2kHz rate, it can keep requests queued for about half-a-millisecond. Consumer rate of 200k messages doesn’t contribute much delay, so from the above timings we can conclude that the system is working as it should – all the messages generated by the Producer are consumed by the Consumer in a timely manner, the queue doesn’t grow.

Disturbing individual components

Next, let’s disturb the Producer and simulate the whole thing as if the input was not regular. Going forward, it’s better to run a series of experiments with increasing Producer rates and show the results on a plot. Here it is (mind the logarithmic vertical axis)

$ sim 600 poisson 200000 uniform uniform 200000

Pic.1 Messages latencies, jitter happens in the Producer

A disturbing, though not very unexpected thing happened. At extreme rates, the system cannot deliver the 0.5ms delay and requests spend more time in the Dispatcher’s queue than they do in the ideal case – up to 100ms. With the rates up to 90% of the peak, one the system seems to work, though the expected message latency is now somewhat larger than in the ideal case.

Well, how does the system work if the Consumer is disturbed? Let’s check

$ sim 600 uniform $rate uniform poisson 200000

Pic.2 Messages latencies, jitter happens in the Consumer

Pretty much the same, isn’t it. As the Producer rate reaches the Consumer’s maximum, the system delays requests but still, it works and doesn’t accumulate requests without bounds.

As you might have guessed, the most astonishing result happens if the Dispatcher is disturbed. Here’s how it looks.

$ sim 600 uniform $rate poisson uniform 200000

Pic.3 Messages latencies, jitter happens in the Dispatcher

Now this is really weird. Despite the Dispatcher trying to compensate for the jitter by serving 1.5 times more requests into the Consumer per tick than it’s estimated by the ideal model, the effect on disturbing the Dispatcher is the largest. The request timings are 1k times of what they were when the Consumer or the Producer experienced the jitter. In fact, if we check the simulation progress in more detail, it becomes clear that at the rate of 160k requests per second,  the system effectively stops maintaining the incoming message flow and the Dispatcher queue grows infinitely.

To emphasize the difference between the effect the jitter has on each of the components, here’s how the maximum latency changes with the increase of the Producer rate. Different plots correspond to the component that experiences jitter.

Pic.4 Comparing jitter effect on each of the components

The Dispatcher seemed to be the least troublesome component that didn’t have any built-in limiting factors. Instead, there was an attempt to over-dispatch the Consumer with requests in case of any delay in service. Still the imperfection of the Dispatcher has the heaviest effect on the system performance. Not only jitter in the Dispatcher renders higher delays in messages processing, but also the maximum throughput of the system is mainly determined by the Dispatcher, not by the Consumer or the Producer.

Discovering the “effective dispatch rate”

This observation brings us to the idea of the effective dispatch rate – the maximum number of requests per second such a system may process. Apparently, this value is determined by the Consumer rate and depends on the behavior of the Dispatcher which, in turn, has two parameters – dispatch frequency and the over-dispatch multiplier. Using the simulation tool, it is possible to get this rate for different frequencies (actually, since the tool is seastar-centric it accepts the dispatch period duration, not the frequency itself) and multipliers. The plot below shows how the effective rate (the Y-axis) changes with the dispatch multiplier value (the X-axis) in 3 different dispatch rates – quarter, half and two milliseconds.

Pic. 5 Effective dispatch rate of different Dispatcher wake-up periods

As seen on the plot, the frequency of Dispatcher wake ups doesn’t matter. What affects the effective dispatch rate is the over-dispatching level – the more requests the Dispatcher puts into the Consumer, the better it works. It seems like that’s the solution to the problem. However, before jumping to this conclusion, we should note that the more requests are sitting in the Consumer simultaneously, the longer it takes for it to process it, and the worse end latencies may occur. Unfortunately, this is where the simulation boundary is. If we check this with the simulator, we’ll see even at the multiplier of 4.0, the end latency doesn’t become worse. But that’s because what grows is the in-Consumer latency, and since it’s much lower than the in-Dispatcher one, the end latency doesn’t change a lot. The ideal Consumer is linear – it takes exactly twice as much time to process two times more requests. Real hardware consumers (disks, NICs, etc.) do not behave like that. Instead, as the internal queue grows, the resulting latency goes up worse than linearly – so over-dispatching the Consumer will render higher latencies.

The Producer-Consumer model is quite a popular programming pattern. Although the effect was demonstrated in a carefully prepared artificial environment, it may reveal itself at different scales in real systems as well. Fighting one can be troublesome indeed, but in this particular case it’s worth saying that forewarned is forearmed. The key requirement for its disclosure is the elaborated monitoring facilities and profound understanding of the system components.

Discover More About ScyllaDB

ScyllaDB is a monstrously fast and scalable NoSQL database designed to provide optimal performance and ease of operations. You can learn more about its design by checking out its architecture on our website, or download our white paper on the Seven Design Principles behind ScyllaDB.

DOWNLOAD THE WHITE PAPER

Wasmtime: Supporting UDFs in ScyllaDB with WebAssembly

WebAssembly, also known as WASM, is a binary format for representing executable code, designed to be easily embeddable into other projects. It turns out that WASM is also a perfect candidate for a user-defined functions (UDFs) back-end, thanks to its ease of integration, performance, and popularity. ScyllaDB already supports user-defined functions expressed in WebAssembly in experimental mode, based on an open-source runtime written natively in Rust — Wasmtime.

In fact, we also just added Rust support to our build system in order to make future integrations even smoother!

Choosing the Right Engine

WebAssembly is a format for executable code designed first and foremost to be portable and embeddable. As its name suggests, it’s a good fit for web applications, but it’s also generally a good choice for an embedded language, since it’s quite fast.

One of WebAssembly’s core features is isolation. Each module is executed in a sandboxed environment separate from the host application. Such a limited trust environment is really desired for an embedded language because it vastly reduces the risk of somebody running malicious code from within your project.

WASM is a binary format but it also specifies a human readable text format called WebAssembly Text format – WAT.

To integrate WebAssembly into a project one needs to pick an engine. The most popular engine is Google’s v8, which is implemented in C++ with support for Javascript and provides a very rich feature set. It’s also (unfortunately) quite heavy and not very easy to integrate with asynchronous frameworks like Seastar, which is a building block of ScyllaDB.

Fortunately, there’s also Wasmtime – a smaller (but not small!) project implemented in Rust. It only supports WebAssembly, not Javascript, which also makes it more lightweight. It also has good support for asynchronous environments and has C++ bindings, making it a good fit for injecting into ScyllaDB for a proof of concept implementation.

In ScyllaDB, we selected Wasmtime, due to its being lighter than v8 and its potential for being async-friendly. While we currently use the existing C++ bindings provided by Wasmtime, we plan to implement this whole integration layer in Rust and then compile it directly into ScyllaDB.

Coding in WebAssembly

So, how would one create a WebAssembly program?

WebAssembly Text (WAT) format

First, modules can be coded directly in WebAssembly text format. It’s not the most convenient way, at least for me, due to WASM’s limited type system and specific syntax with lots of parentheses. But it’s possible, of course. All you need in this case is a text editor. Being in love with Lisp wouldn’t hurt either.

```wat

(module
   (func $fib (param $n i64) (result i64)
      (if
         (i64.lt_s (local.get $n) (i64.const 2))
         (return (local.get $n))
      )
      (i64.add
         (call $fib (i64.sub (local.get $n) (i64.const 1)))
         (call $fib (i64.sub (local.get $n) (i64.const 2)))
      )
   )
   (export "fib" (func $fib))
)

```

C++

C and C++ enthusiasts can compile their language of choice to WASM with the clang compiler.

```cpp
int fib(int n) {
   if (n < 2) {
      return n;
   }
   return fib(n - 1) + fib(n - 2);
}
```
```sh
clang -O2 --target=wasm32 --no-standard-libraries -Wl,--export-all -Wl,--no-entry fib.c -o fib.wasm
wasm2wat fib.wasm > fib.wat

```

The binary interface is well defined, and the resulting binaries are also quite well optimized underneath. The code is compiled to WebAssembly with the use of LLVM representation, which makes many optimizations possible.

Rust

Rust also has the ability to reduce WASM output in its ecosystem, and a wasm 32 target is already supported in cargo, the official Rust build tool chain.

```rust
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn fib(n: i32) -> i32 {
   if n < 2 {
      n
   } else {
      fib(n - 1) + fib(n - 2)
   }
}
```
```sh
rustup target add wasm32-unknown-unknown
cargo build --target wasm32-unknown-unknown
wasm2wat target/wasm32-unknown-unknown/debug/fib.wasm > fib.wat

```

AssemblyScript

There’s also an AssemblyScript, a typescript-like language that compiles directly to WebAssembly. AssemblyScript is especially nice for quick experiments because it’s a scripting language. It’s also the only language that was actually invented and designed with WebAssembly as a compilation target in mind.

```assemblyscript
export function fib(n: i32): i32 {
   if (n < 2) {
      return n
   }
   return fib(n - 1) + fib(n - 2)
}
```
```sh
asc fib.ts --textFile fib.wat --optimize
```

User-Defined Functions

Why do we need WebAssembly? Our first use case for ScyllaDB involves User Defined Functions (UDFs). UDF is a Cassandra query language (CQL) feature that allows functions to be defined in a given language, and then calling that function when querying the database. The function will be applied on the arguments by the database itself, and only then returned to the client. UDF also makes it possible to express nested calls and other more complex operations.

Here’s how you can use a user-defined function in CQL:

```cql

cassandra@cqlsh:ks> SELECT id, inv(id), mult(id, inv(id)) FROM t;

id | ks.inv(id) | ks.mult(id, ks.inv(id))
----+------------+-------------------------
7 |   0.142857 |                       1
1 |          1 |                       1
0 |   Infinity |                     NaN
4 |       0.25 |                       1

(4 rows)
```

UDFs are cool enough by themselves, but a more important purpose is enabling User Defined Aggregates (UDAs). UDAs are custom accumulators that combine data from multiple database rows into potentially complex outputs. UDAs consist of two functions: one for accumulating the result for each argument and another for finalizing and transforming the result into the output type.

The code example below shows an aggregate that computes the average length of all requested strings. Functions below are coded in Lua, which is yet another language that we support.

First, let’s create all the building blocks — functions for accumulating partial results and transforming the final result:

```cql
CREATE FUNCTION accumulate_len(acc tuple<bigint,bigint>, a text)
   RETURNS NULL ON NULL INPUT
   RETURNS tuple<bigint,bigint>
   LANGUAGE lua as 'return {acc[1] + 1, acc[2] + #a}';

CREATE OR REPLACE FUNCTION present(res tuple<bigint,bigint>)
   RETURNS NULL ON NULL INPUT
   RETURNS text
   LANGUAGE lua as
'return "The average string length is " .. res[2]/res[1] .. "!"';
```

…and now, let’s combine them all into a user-defined aggregate:

```cql
CREATE OR REPLACE AGGREGATE avg_length(text)
   SFUNC accumulate_len
   STYPE tuple<bigint,bigint>
   FINALFUNC present INITCOND (0,0);
```

Here’s how you can use the aggregate after it’s created:

```cql
cassandra@cqlsh:ks> SELECT * FROM words;

 word
------------
     monkey
 rhinoceros
        dog
(3 rows)

cassandra@cqlsh:ks> SELECT avg_length(word) FROM words;

 ks.avg_length(word)
-----------------------------------------------
 The average string length is 6.3333333333333!
(1 rows)
```

One function accumulates partial results by storing the total sum of all lengths and total number of strings. The finalizing function divides one by the other in order to return the result. In this case the result is in the form of rendered text. As you can see the potential here is quite large — user-defined aggregates allow using database queries in a more powerful way, for instance, by gathering complex statistics or transforming whole partitions into different formats.

Enter WebAssembly

To create a user-defined function in WebAssembly, we first need to write or compile a function to WASM text format. The function body is then simply registered in a CQL statement called create function. That’s it!

```cql
CREATE FUNCTION ks.fib (input bigint)
RETURNS NULL ON NULL INPUT
RETURNS bigint LANGUAGE xwasm
AS '(module
   (func $fib (param $n i64) (result i64)
      (if
         (i64.lt_s (local.get $n) (i64.const 2))
         (return (local.get $n))
      )
      (i64.add
         (call $fib (i64.sub (local.get $n) (i64.const 1)))
         (call $fib (i64.sub (local.get $n) (i64.const 2)))
      )
   )
   (export "fib" (func $fib))
   (global (;0;) i32 (i32.const 1024))
   (export "_scylla_abi" (global 0))
   (data $.rodata (i32.const 1024) "\\01")
)'
```
```cql

cassandra@cqlsh:ks> SELECT n, fib(n) FROM numbers;
 n | ks.fib(n)
---+-----------
 1 |         1 
 2 |         1
 3 |         2
 4 |         3
 5 |         5
 6 |         8
 7 |        13
 8 |        21
 9 |        34
(9 rows)
```

Note that the declared language here is xwasm, which stands for “experimental WASM.” Support for this language is currently still experimental in ScyllaDB.

The current design doc is maintained here. You’re welcome to take a look at it: https://github.com/scylladb/scylla/blob/master/docs/design-notes/wasm.md

Our Roadmap

WebAssembly support is in active development, and here are some of our most important goals.

Helper Libraries for Rust and C++

Writing functions directly in WAT format is cumbersome and not trivial, because ScyllaDB expects the functions to follow our application binary interface (ABI) specification. In order to hide these details from developers, we’re in the process of implementing helper libraries for Rust and C++, which seamlessly provide ScyllaDB bindings. With our helper libraries, writing a user-defined function will be no harder than writing a regular native function in your language of choice.

Rewriting the User-Defined Functions Layer in Rust

We currently rely on Wasmtime’s C++ bindings to expose a WASM runtime for user-defined functions to run on. These C++ bindings have certain limitations though. Specifically, they lack support for asynchronous operations, which is present in Wasmtime’s original Rust implementation.

The choice is abundantly clear — let’s rewrite it in Rust! Our precise plan is to move the entire user-defined functions layer to Rust, where we can fully utilize Wasmtime’s potential. With such an implementation, we’ll be able to run user-defined functions asynchronously, with strict latency guarantees; we’ll only provide a thin compatibility layer between our Seastar and Rust’s async model to enable polling Rust futures directly from ScyllaDB. The rough idea for binding Rust futures straight into Seastar is explained here.

We already added Rust support to our build system. The next step is to start rewriting User-Defined Functions engine to a native Rust implementation and then we can compile it right into ScyllaDB.

Join the ScyllaDB Community

WASM support is only one of a huge number of projects we have underway at ScyllaDB. If you want to keep abreast of all that’s happening, there’s a few ways to get further plugged in:

ScyllaDB Cloud Updates for April 2022

New free trials and features for ScyllaDB Cloud

Dev teams at industry-leading companies like Disney+ Hotstar, Crypto.com, and Instacart have turned to ScyllaDB Cloud, our fully-managed database-as-a-service (DBaaS). They appreciate its speed and scale – without the hassle and toil of self-management. Given the simple startup and seamless scaling, it’s not surprising that usage of ScyllaDB Cloud has surged 198% year over year.

With ScyllaDB now deployed across hundreds of projects, we nowpower all sorts of data-intensive applications, and we want to make it easier than ever for you to deploy your own applications..

Through our interactions with users and customers, we’ve found developers want to discover and experience ScyllaDB Cloud in two distinct ways. Application developers want to deploy a small instance for the purposes of exercising our developer documentation, code examples, and community, to see if ScyllaDB Cloud meets their needs as builders.

Other users, who tend to be DevOps, want to deploy production clusters to validate our performance benchmarks for their specific use case. They also tend to want to see exactly how ScyllaDB’s management and monitoring capabilities operate in the Cloud, under real-world conditions.

To meet the needs of both groups, we’re launching two free trials: a small developer instance that’s free for 30 days, and a production cluster that’s free for 48 hours. Neither is meant for actual production applications, and they don’t come with our usual SLA commitment and support.

Developer Free Trial: ScyllaDB Cloud running on a t3.micro sandbox instance, free for 30 days. This trial is designed to help you evaluate whether ScyllaDB Cloud is right for your use case, your development style, and your team.

Production Evaluation: Designed to help you evaluate your application running against a ScyllaDB cluster on a high performance AWS i3.2xlarge instance. This has few other limitations:

  • No backup or repair
  • Does not support multiple data centers
  • Does not support cluster resize

Given the relatively short duration of the evaluation, it’s a good idea to prepare beforehand. Our solutions architects are standing by to help you to design an appropriate scenario for your testing, so that you can get the results you need within the 48-hour window. Contact us if you need anything to help bring your ideas to fruition.

To access the free trials, register for an account on ScyllaDB Cloud.

Once you’ve logged into your ScyllaDB Cloud account, select “Create a New Cluster” and scroll down to the options for instance types. You’ll see the following:

Selecting your free trial on ScyllaDB Cloud

If you need guidance, we have a brief video that explains how to Get Started with ScyllaDB Cloud.

Fore more detail on connecting your app to ScyllaDB Cloud, watch our new video, Connect to ScyllaDB Cloud.

A New ScyllaDB Cloud Documentation Portal

To go along with our new free trials we have also launched a brand new Cloud Documentation Portal. Here you can learn how to get setup, the features and benefits of ScyllaDB Cloud, and follow tutorials that show you how to build apps that get the most out of ScyllaDB’s capabilities.

Support for Google Cloud Flexible Storage

We have also added support for flexible storage on Google Cloud instances, to provide better customization and lower costs. Google enables you to attach local SSDs to most machine types available on Compute Engine, so we made it easy for you to do so for your ScyllaDB Cloud instances. You can learn more about this capability on the Google Cloud Site.

If you wanted to add more storage before we added this feature to ScyllaDB Cloud, you needed to add more instances. With this new feature, users who are running ScyllaDB Cloud on Google Cloud can easily add flexible storage options via ScyllaDB Cloud’s browser-based user interface. All of the options provided adhere to ScyllaDB’s preferred RAM/storage preferred ratio of 1:80.

Learn More About ScyllaDB on ScyllaDB University

For those not already familiar with ScyllaDB, it is a wide-column NoSQL database that’s API-compatible with both Apache Cassandra CQL and DynamoDB. While there are many offerings of Cassandra-compatible databases in the industry (of which we believe we are the best-of-breed),, we are the first and currently only company in the industry to offer a DynamoDB-compatible managed database on a public cloud apart from AWS. (Learn more about our Alternator API below.) You can learn more about ScyllaDB Cloud on our free online ScyllaDB University.

Get Started on ScyllaDB Cloud Today!

Already today 82% of our customers run ScyllaDB on public clouds, and we believe this announcement will accelerate ScyllaDB’s adoption within the Google Cloud community. Built with a close-to-the-hardware, shared-nothing design, ScyllaDB Cloud empowers organizations to build and operate real-time applications at global scale — all for a fraction of the cost of other DBaaS options.

ScyllaDB Cloud is now available in 20 geographical regions served by Google Cloud, from key US regions (Virginia, Ohio, California, and Oregon), to locations in Asia, Europe, South America, the Middle East, and Australia. ScyllaDB Cloud will be deployed to the n2-highmem series of servers, known for their fast, locally-attached SSD storage.

REGISTER FOR SCYLLADB CLOUD

Direct I/O Writes: The Path to Storage Wealth

This post is by 2021 P99 CONF speaker Glauber Costa, founder and CEO of ChiselStrike. It first appeared as an article in ITNext. To hear more from Glauber and many more latency-minded engineers, sign up for P99 CONF updates.

GET P99 CONF UPDATES

P99CONF CALL FOR SPEAKERS

I have previously written about how major changes in storage technology are changing conventional knowledge on how to deal with storage I/O. The central thesis of the article was simple: As fast NVMe devices become commonplace, the impact of the software layer gets bigger. Old ideas and APIs, designed for a time in which storage accesses were in the hundreds of milliseconds should be revisited.

In particular, I investigated the idea that Buffered I/O, where the operating system caches data pages on behalf of the user should always be better than Direct I/O, where no such caching happens. Once we employ modern APIs that is simply not the case. As a matter of fact, in the example using the Glommio io_uring asynchronous executor for Rust, Direct I/O reads performed better than Buffered I/O in most cases.

But what about writes? In this article, we’ll take a look at the issue of writes, how it differs from reads, and show that much like credit card debt, Buffered I/O writes are only providing the illusion of wealth on cheap money. At some point in time, you still have to foot the bill. Real wealth, on the other hand, comes from Direct I/O.

How Do Reads and Writes Differ?

The fact that reads and writes differ in their characteristics should surprise no one: that’s a common thing in computer science, and is what is behind most of the trend toward immutable data structures in recent years.

However, there is one open secret about storage devices in particular that is nothing short of mind-blowing:

It is simply not possible to issue atomic writes to a storage device. Or at least not in practice. This stackoverflow article does a good job summarizing the situation, and I also recommend this LWN.net article that talks about changes some Linux Filesystem developers are discussing to ameliorate the situation.

For SSDs the situation is quite helpless. For NVMe, it is a bit better: there is a provision in the spec for atomic writes, but even if all devices implemented it (which they don’t), there’s still a big contingent of devices where software has to run on where this is simply not available.

For this reason, writes to the middle of a file are very rare in applications, and even when they do happen, they tend to come accompanied by a journal, which is sequential in nature.

There are two immediate consequences of this:

  • Append-only data structures vastly dominate storage writes. Most write-optimized modern stores are built on top of LSM trees, and even workloads that use more traditional data structures like B-Trees will have a journal and/or other techniques to make sure data is reliably written.
  • There is usually a memory buffer that is used to accumulate writes before they are passed into the file: this guarantees some level of control over the state of the file when the write happens. If we were to write directly to an mmap’d file, for instance, flushes could come at any time and we simply would have no idea in which state the file is in. Although it is true that we can force a maximum time for a sync with specialized system calls like msync the operating system may have to force a flush due to memory pressure at any point before that.

What this means is that coalescing, which is the usual advantage of buffering, doesn’t apply for writes. For most modern data structures, there is little reason to keep a buffer in-memory waiting for the next access: likely what is sent to the file is never touched again, except for future reads. And at that point the calculations in my read article apply. The next write is likely for the next position in the file.

This tips the scale even more in favor of Direct I/O. In anticipation of using the recently written pages in the future, Buffered I/O may use an immense amount of memory in the operating system page cache. And while is true that this is cached memory, that memory needs to be written to the device first before it can be discarded. If the device is not fast enough, we can easily run out of memory. This is a problem I have written about in the past.

Since we can write an entire Terabyte-large file while keeping only a couple of kilobytes in memory, Direct I/O is the undisputed way to write to files.

But How Much Does Direct I/O Cost?

Much like reads, you need to make sure you are measuring the right thing to realize the advantage of Direct I/O. And how to do that that is far from obvious.

Recently, one of our users opened an issue in our Github page, in which he noted that despite what we advertise, Direct I/O writes consumed a lot more CPU than buffered writes. So why is that?

The reason is: Buffered writes are like a loan: you can get your asset for cheap now, but you then have to pay it back in the future, with interest. When you issue a Direct I/O write, you are paying most of the costs related to the transaction right away, and in the CPU that dispatched the I/O — which is predictable. The situation is different for Buffered I/O: the only cost to be paid immediately are the very cheap memory writes.

The actual work to make the data persistent is done in kernel threads. Those kernel threads are free to run in other CPUs, so in a simple system that is far from its saturation point, this can give the user the illusion of cheaper access.

Much like a loan, there are certainly cases in which this can work in your favor. However, in practice, that will happen at an unpredictable — and potentially inconvenient time in the future.
Aside from this unpredictability, in order to make the right decision one needs to be at least aware of the fact that the total cost of the loan may be higher. More often than not it can be the case that at or close to saturation, all your CPUs are busy, in which case the total cost is more important.

If we use the time command to measure the Direct I/O vs Buffered version of the same code provided by the user, and focus on system and user times, we have:

Direct I/O:

user 0m7.401s
sys 0m7.118s

And Buffered I/O:

user 0m3.771s
sys 0m11.102s

So there we have it: all that the Buffered I/O version did was switch user time to system time. And because that system time is consumed by kernel threads, which may be harder to see, we can get the illusion that buffered writes are consuming less CPU.

But if we sum up user and system times, we can clearly see that in reality we’re eventually paying interest on our loan: Buffered writes used 1.7% more CPU than Direct I/O writes. This is actually not very far from current monthly interest rates on my credit card. If this is a shocking coincidence or a big conspiracy, is up for you, the reader, to decide.

But Which is Faster?

Many users would be happy to pay some percentage of CPU time to get faster results. But if we look at the real time in the examples above, Direct I/O is not only cheaper, but faster.

You will notice in the example code that the user correctly issued a call to close. By default, Glommio’s stream close imply a sync. But not only that can be disabled, most of the time in other languages and frameworks this is not the case. In particular, for Posix, close does not imply a sync.

What that means is that even after you write all your buffers, and close your file, your data may still not safely be present in the device’s media! What can be surprising, however, is that data is not safely stored even if you are using Direct I/O! This is because Direct I/O writes the data immediately to the device, but storage devices have their own internal caches. And in the event of a power loss data can still be lost if those caches are not persisted.

At this point it is fair to ask: if a sync is necessary for both buffered writes and Direct I/O, is there really an advantage to Direct I/O? To investigate that behavior we can use Glommio’s example storage benchmark.

At first, we will write a file that is smaller than memory and not issue a sync. It is easy to have the impression that Buffered I/O is faster. If we write a 4GiB file in a server with 64GiB of DRAM, we see the following:

Buffered I/O: Wrote 4.29 GB in 1.9s, 2.25 GB/s
Direct I/O: Wrote 4.29 GB in 4.4s, 968.72 MB/s

Buffered I/O is more than twice as fast! That is because since the file is so small compared to the size of memory, it can just sit in memory for the whole time. However, at this point your data is not safely committed to storage at all. If we account for the time-to-safety until our call to sync returns, the setup costs, lack of parallelism, mapping, and other costs discussed when analyzing reads start to show:

Buffered I/O: Wrote 4.29 GB in 1.9s, 2.25 GB/s
Buffered I/O: Closed in 4.7s, Amortized total 642.54 MB/s
Direct I/O: Wrote 4.29 GB in 4.4s, 968.72 MB/s
Direct I/O: Closed in 34.9ms, Amortized total 961.14 MB/s

As we can see, Buffered I/O loans provided us with the illusion of wealth. Once we had to pay the bill, Direct I/O is faster, and we are richer. Syncing a Direct I/O file is not free, as previously noted: but 35ms later we can predictably guarantee it is safely stored. Compare that to the more than 4s for Buffered I/O.

Things start to change as the file gets bigger. That is because there is more pressure in the operating system virtual memory. As the file grows in size, the operating system is no longer able to afford the luxury of waiting until the end to issue a flush. If we now write 16 GiB, a 32Gib, and a 64Gib file, we see that even the illusory difference between Buffered and Direct I/O start to fade away

Buffered I/O: Wrote 17.18 GB in 10.4s, 1.64 GB/s
Buffered I/O: Closed in 11.8s, Amortized total 769.58 MB/s
Buffered I/O: Wrote 34.36 GB in 29.9s, 1.15 GB/s
Buffered I/O: Closed in 12.2s, Amortized total 814.85 MB/s
Buffered I/O: Wrote 68.72 GB in 69.4s, 989.7 MB/s
Buffered I/O: Closed in 12.3s, Amortized total 840.59 MB/s

In all the cases above Direct I/O kept writing at around 960MB/s, which is the maximum throughput of this particular device.

Once the file gets bigger than memory, then there is no more pretending: Direct I/O is just faster, from whichever angle we look at it.

Buffered I/O: Wrote 107.37 GB in 113.3s, 947.17 MB/s
Buffered I/O: Closed in 12.2s, Amortized total 855.03 MB/s
Direct I/O: Wrote 107.37 GB in 112.1s, 957.26 MB/s
Direct I/O: Closed in 43.5ms, Amortized total 956.89 MB/s

Conclusion

Having access to credit is not bad. It is, many times, crucial for building wealth. However, we need to pay attention to total costs, make sure the interest rates are reasonable, to be sure we are building real, and not illusory wealth.

When writing to files on modern storage, the same applies. We can write them for cheap at first, but we are bound to pay the real cost — with interest — later. Whether or not that is a good thing is certainly situational. But with high interest rates and a potential for memory spiraling out of control if you write faster than what the device can chew, Buffered I/O can easily become subprime. Direct I/O, with its fixed memory usage, and cheaper CPU costs, is AAA.

I hope this article will empower you to make better choices so you can build real storage wealth.

WATCH GLAUBER’S P99 CONF SESSION: RUST IS SAFE, BUT IS IT FAST

What’s Next for ScyllaDB and NoSQL? Q&A with CEO Dor Laor

ScyllaDB CEO Dor Laor recently connected with veteran tech journalist George Anadiotis to follow up on all the announcements and innovations presented at ScyllaDB Summit 2022 (now available on-demand). George shared his analysis in a great article on ZDNet: Data is going to the cloud in real-time, and so is ScyllaDB 5.0.

George also captured the complete conversation in his podcast series, Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis, which is now in its third season.

Here’s the ScyllaDB podcast episode in its entirety:

If you prefer to read rather than listen, see our previous blog for a transcript covering what ScyllaDB has been up to since Dor and George last connected in 2020, and see below for insights into what’s on the horizon for ScyllaDB, NoSQL, and distributed databases.

READ PART 1

In this blog:

The interview has been edited for brevity and clarity.


The Growth of Database-as-a-Service (ScyllaDB Cloud)

George Anadiotis: So let’s shift to the business side of things. Are there any metrics you can share on year-over-year growth, use cases, that sort of thing?

Dor Laor: Right now the most growth we have is around our database as a service (DBaaS), ScyllaDB Cloud, which grew 200% last year after growing 200% in 2020. We’ve had amazing growth of that service, and it will continue to grow fast. This year, we predict 140% growth of the database-as-a-service. It will become half of our business and continue to expand.

Just in terms of use cases, people like to consume services in general. It’s hard to find talent to run a distributed database. It’s also very expensive. In general, the vendors who know the best practices and maintain their own automation for it will bring you a better result.

We’ve been expanding our AWS DBaaS offerings with options to run as a service in our account and in the customer’s own account. We also expanded to GCP and officially made it GA (Generally Available) in 2021. We’ll be on the Google Cloud marketplace quite soon too. We’ll also get to Azure as well.

Evolving Database Use Cases

Dor Laor: Going back to your use case question, we have a use case agnostic implementation. We have a specific edge with time series data, but for most users, if you are telco or streaming or e-commerce, there are many similarities from the database perspective – and that’s not just with ScyllaDB.

Across industries, we see an explosion of data that’s pushing databases to their limits. This is really good for us because this is where we shine. We are seeing people come to us from various databases: relational and a variety of NoSQL, not just Cassandra or DynamoDB.

There are really cool use cases that are super exciting for us – for example, delivery services like Instacart. I’m an Instacart user, and it’s really nice to serve the brand that serves you. And there are many other delivery services around the world, from food delivery to regular deliveries. It’s also nice to see different crypto use cases. With crypto, there is a blockchain solution where a database is not needed, but a database is still used for fraud detection. That’s a really common use case for ScyllaDB in various industries: blockchain, crypto, deliveries, and many other companies that offer any type of service. The prevalence of fraud is one downside of the spread of technology, but we can assist.

We also see a lot of NFT use cases with ScyllaDB, as well as a lot of mobile apps and social media use cases. Some are new so I don’t want to call them out, but they are companies you read about in the news. There are some, like Discord, that everybody knows, including my son. There are others that are super well known in other regions, like India, where we have really good adoption.

Greenfield vs Brownfield Database Adoption

George Anadiotis: You said that well, people are coming to you from all directions, some obviously from Cassandra and from DynamoDB, and then some from relational or other NoSQL databases. Could you say how many of your use cases are greenfield versus brownfield?

Dor Laor: Today, the majority of use cases still come from brownfield. Over time, we see that the greenfield portion is growing. As ScyllaDB becomes more well-known, people are choosing it from the very beginning if they expect to have a data-intensive application (versus starting with something else and switching to ScyllaDB when they hit a wall and are hurting their business). Since there are so many databases in the world and so many companies hit the wall with scale, we continue to see a good amount of conversions.

Another example from ScyllaDB Summit 2022 is Amdocs, an Israeli vendor that provides software for many, many telcos. They decided to adopt ScyllaDB and they considered three other databases: one of them was Cassandra, and two are ones that they didn’t disclose. At least one of those two was relational. They shared a matrix of their different requirements from availability to performance and community size, et cetera, et cetera. Each database they evaluated had advantages here and there, but only one database managed to meet their performance requirements and managed to finish their tests on time. Two databases couldn’t perform at all and Cassandra could perform, but was extremely far from their performance requirements and TCO requirements. That’s our promise, affordable performance at scale. So they selected ScyllaDB.

The ScyllaDB Roadmap: Immediate Consistency and Serverless ScyllaDB Cloud

George Anadiotis: Could you share more about what’s on your roadmap (you already hinted about that earlier) and how you see the overall database and streaming landscape evolving?

Dor Laor: From a roadmap perspective, we’re doing multiple things. We’ll continue to insert the Raft consistency protocol into all aspects of our core database, and it will improve operational aspects, elasticity, and end-user convenience to have immediate consistency at no cost.

And we’re doing a big effort now around our database-as-a-service. Today, our database-as-a-service is automated in a way such that every tenant is a single tenant. Basically, as a tenant, you come, you decide what your workload is, then we map it into some type of a cluster size, let’s say six servers of a given instance type and size. And then you start from there, maybe you grow the cluster or shrink it over time. Having a single tenant workload has lots of advantages. You’re on your own, it’s very clear what you’re getting in terms of cost and performance. And you’re also not influenced by security or by “noisy neighbors.”

However, there are a lot of advantages in a serverless type of consumption, where we use multi tenancy and we converge multiple cases together, which allows us to separate compute and storage. That gives more flexibility, better pricing, and also faster elasticity. There are actually servers under the serverless implementation — not a big surprise — and those larger servers are already pre-provisioned, so elasticity can be really immediate. So, there can be lots of benefits for the end user. We’re moving toward that direction, and we chose to do it with Kubernetes. We already have a Kubernetes operator, and we’re using it and eat our own dog food within our ScyllaDB Cloud implementation. This is quite a gigantic change for us, lots of work, lots of interesting details, and eventually another really good option for end users.

So these are the two main roadmap items. We also have a really great driver project with Rust, and we managed to wrap Rust for other languages, even C++. The C++ implementation will be faster with Rust under the hood because that Rust implementation is so good.

We’re doing more things too, but these are the two big projects: the consensus protocol with consistency and the multitenant serverless implementation.

Watch Dor’s keynote from ScyllaDB Summit


The Evolving Database Landscape

George Anadiotis: Okay. And what about your overall projection of where you see the database landscape in a couple of years?

Dor Laor: I don’t necessarily have a big headline for you. The demand for database-as-a-service is currently extremely strong. Cloud also provides more options. It’s not just having the same compute environment on demand. There is other infrastructure that a vendor like us can use, so things can become more sophisticated and a database vendor can provide more services, not just the database. I’m not saying that we will do that: there is so much depth in a database that you can do that all day long. But, for example, it does allow more options like size tiered storage, where some of the data doesn’t necessarily need to be there all the time and be with the same SLA for the hotter data. People do want to keep more and more data over time. That, for example, will be an important trend in the future.

Also, more options to integrate real time in analytical projects. It’s possible to go and offer more and more analytical capabilities and even have the ability to integrate with other analytical vendors – not necessarily do it all within the same vendor. So there are many possibilities.

It is exciting. And as the amount of data grows, the hardware offerings by the cloud providers also improves dramatically. Also, all of the new usages, like the metaverse, it’s just crazy. Everything will be digitized. Databases will continue to evolve. It’s a good time to be related to these workloads.

George Anadiotis: Yes, more things going digital means more data to be managed, means more demand for database vendors. So, as you said, it’s a good time to be in that business. Good luck with all your plans going forward, and thanks for the update and the conversation. Pleasure as always.

Read George’s latest ZDNet article on ScyllaDB

 

What’s New with ScyllaDB? Q&A with CEO Dor Laor

After all the announcements and innovations presented at the recent ScyllaDB Summit (now available on-demand), veteran tech journalist George Anadiotis caught up with ScyllaDB CEO Dor Laor. George shared his analysis in a great article on ZDNet: Data is going to the cloud in real-time, and so is ScyllaDB 5.0.

George also captured the complete conversation in his podcast series, Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis, which is now in its third season.

Here’s the ScyllaDB podcast episode in its entirety:

And for those of you who prefer to read rather than listen, here’s a transcript of the first part of the conversation. The focus here is on what ScyllaDB has been up to since Dor and George last connected in 2020. We’ll cover the second part, focused on what’s next for ScyllaDB and NoSQL, in a later blog.

Update: Part 2 is now published. 

Read Part 2

In this blog:

The interview has been edited for brevity and clarity.

 

ScyllaDB’s Roots

George Anadiotis: Could you start by saying a few words about yourself and ScyllaDB?

Dor Laor: I’m the CEO and co-founder at ScyllaDB, a NoSQL database startup. I’m now a CEO, but I’m still involved with the technology. Engineering is in my roots. My first job in the industry was to develop a terabit router, which was a super exciting project at the time. I met my existing co-founder, Avi Kivity, at another startup called Qumranet. That startup pivoted several times. The second pivot was around the open source Xen hypervisor, and then we switched to another solution that Avi invented, the KVM hypervisor. I believe that many are familiar with KVM. It powers Google Cloud and AWS…it was really a fantastic project. After that, Avi and I co-founded ScyllaDB.

There are a lot of parallels between that hypervisor startup and this database startup. With both, we had the luxury of starting after existing solutions were established, and we took an open source first approach.

We stumbled across Apache Cassandra many years ago in 2014, and we saw many things that we liked in the design. Cassandra is modeled after DynamoDB and Google Cloud Bigtable. But, we recognized that there was a big potential for improvement: Cassandra is written in Java and it’s not utilizing many modern optimizations that you can apply in operating systems. We wanted to apply our KVM experience to the database domain, so we decided to rewrite Cassandra from scratch. That was late 2014, and this is what we’ve been doing since. Over time, we completed implementing the Cassandra API. We also added the DynamoDB API, which we call Alternator. And we also added many additional features on top of that.

George Anadiotis: Great, thanks for the recap and introduction. As a side comment, I think your effort was the first one that I was aware of in which someone set out to reimplement an existing product, maintaining API compatibility and trying to optimize the implementation. After that, especially lately, I’ve seen others walking down this route as well. Apparently, it’s becoming somewhat of a trend, and I totally see the reasoning behind that.

What ScyllaDB Has Been Working on Over the Past Two Years

George Anadiotis: The last time we spoke was about two years ago, around the time that you released ScyllaDB 4.0. Now, I think you’re about to release version 5.0. Could you tell us what you’ve been up to over those two years, on the technical side and also on the business side?

Dor Laor: Let’s start with the technical side, and we can also mix it up because many times, the technical side is a response to the business problem.

Two years ago we launched ScyllaDB 4.0, which achieved full API compatibility with Cassandra, including the lightweight transaction API, which used to have Paxos under the hood. We also completed the Jepsen test that certified that API. We were glad to have all of these abilities and we’re proud to be a better solution across the board, both with API compatibility and with performance improvements and operational improvements. Also, we introduced unique Change Data Capture on top of the Cassandra capabilities and we released the first version of Alternator, the DynamoDB API.

Ever since, we’ve continued to develop in multiple areas. We’ve continued to improve all aspects of the database. A database is such a complex product. After years of work, we’re still amazed about the complexity of it, the wide variety of use cases, and what’s going on in production.

IO Scheduling

Dor Laor: In terms of performance, we improved things like IO scheduling. We have a unique IO scheduler. We guarantee low latency for queries on one hand, but on the other hand, we have gigantic workloads generated from streaming, etc., so lots of data needs to be shuffled around. It’s very intense and that can hurt latency. That’s why we have an IO scheduler. We’ve been working on this IO scheduler for the past six years, and we’re continuing to optimize it.

Over time, we’ve made it increasingly complex. It matches underlying hardware more and more. That scheduler, for example, controls for every shard, every CPU core – we have a unique design of a shard per core architecture. Every shard and every CPU core is independent, and it’s doing its own IO for networking and for storage. That’s why every shard needs to be isolated from the other shards. If there is pressure on one shard, the other shards shouldn’t feel it.

Disks are built in such a way that if you do only writes, then you get one level of performance. If you do only reads, you’ll get a similar level of performance, but not identical. Several years back, we had a special cap for writes and for reads in the IO scheduler. But if you do mixed IO, then it’s not a simple function of mixing those two with the same proportion. It can be a greater hit with mixed IO.

We just released the new IO scheduler with better control over mixed IO – and most workloads have a certain amount of mixed IO. This greatly improves performance, improves the latency of our workloads and reduces the level of compactions, which is important for data stored in a log-structured merge tree. It’s also better with repair and streaming operations.

IO scheduler details

Large Partitions

Dor Laor: Another major performance improvement is that we improved large partitions. In ScyllaDB, Cassandra, and other wide column stores, a partition is divided into many cells, many columns that can be indexed. But, it can reach millions of cells, or even more, in a single partition. It can be tricky for the database, and for the end user, to control these big partitions. So, we improved the indexing around these partitions and we cached those indexes.

We already had indexes; now, we added caching of those indexes. We basically solved the problem of large partitions. Cassandra had this problem; it’s half-solved in Cassandra, but it can still be a challenge. In ScyllaDB, it was half-solved as well. We have users with a hundred gigabyte partition (a single partition). We knew about those users because they reported problems. Now with the new solution, even a hundred gigabyte partition will just work, so all of the operations will be smoother.

Large partition support details

Operational Improvements with Repair Based Node Operations

Dor Laor: In addition to those performance improvements, we’ve also been working on plenty of operational improvements. The major one is what we call repair based node operations. So node operations are when you add nodes, you decommission nodes, you replace nodes. All of these operations need to stream data back and forth from the other replicas, so it’s really heavyweight. And after those operations, you also need to run repair, which basically compares the hashes in the source and the other replicas to see that the data matches.

The simplification that we added is called repair based node operations. So we’re doing repair, and repair fixes all of the differences and takes care of the streaming too. The first advantage is that there’s only one operation. It’s not streaming and repair, there’s just one repair. This means simplification and elimination of more actions. And the second advantage, which is even bigger, is that a repair is stateful and can be restarted. If you’re working with really large nodes, say with 30 terabytes nodes, and something happened to a node in the middle or you just need to reboot or whatever, then you can just continue from the previous state and not restart the whole operation and spend two hours again for nothing.

Repair based node operations details

Consensus Algorithms and Consistency

Dor Laor: Another major improvement is around consistency: our shift from being an eventually consistent database to an immediately consistent database.

Agreement between nodes, or consensus, in a distributed system is complicated but desirable. ScyllaDB gained Lightweight Transactions (LWT) through Paxos but this protocol has a cost of 3X round trips. Raft allows us to execute consistent transactions without a performance penalty. Unlike LWT, we’re integrating Raft with most aspects of ScyllaDB, making a leap forward in manageability and consistency.

Now, the first user-visible value is in the form of transactional schema changes. Before, ScyllaDB tracked its schema changes using gossip and automatically consolidated schema differences. However, there was no way to heal a conflicting Data Definition Language (DDL) change. Having transaction schema changes eliminates schema conflicts and allows full automation of DDL changes under any condition.

Next is making topology changes transactional using Raft. Currently, ScyllaDB and Cassandra can scale only one node at a time. ScyllaDB can utilize all of the available machine resources and stream data at 10GB/s, thus new nodes can be added quite quickly. However, it can take a long time to double or triple the whole cluster capacity. That’s obviously not the elasticity you’d expect. Transactional node range ownership will allow many levels of freedom, and we plan on improving more aspects of range movements towards tablets and dynamic range splitting for load balancing.

Beyond crucial operational advantages, end users will be able to use the new Raft protocol, gaining strong transaction consistency with zero performance penalty.

We’ve been working on this for quite some time, and we have more to come. We covered Raft in great length at ScyllaDB Summit – I encourage you to go and watch those presentations with much better speakers than myself. 🙂

Making Schema Changes Safe with Raft

The Future of Consensus in ScyllaDB 5.0 and Beyond

Change Data Capture and Event Streaming

George Anadiotis: If there was any doubt when you mentioned initially that you still like to be technical and hands-on, I think that that goes to prove it. Let me try and take you a little bit more to things that may catch the attention of less technically inclined people.

One important thing is Change Data Capture (CDC). I saw that you seem to have many partners that are actively involved in streaming: you work with Kafka, Redpanda, Pulsar, and I think a number of consultancies as well. The use case from Palo Alto Networks seemed very interesting to me because they basically said, “We’re using ScyllaDB for our streaming needs, for our messaging needs instead of a streaming platform.” I was wondering if you could say a few words on the CDC feature, how it has evolved, and how it’s being used in use cases.

Dor Laor: CDC, Change Data Capture, is a wonderful feature that allows users to collect changes to their data in a relatively simple manner. You can figure out what was written recently (for example, what is the highest score in the most recent hour) or consume those changes in a report without traversing through the entire data set. Change data capture is implemented in a really nice, novel way where you can enable change capture on a table and all of the changes within a certain period will be written to a new table that will have just these changes. The table will TTL itself; it erases itself after the period expires. We have client libraries that know how to consume this in several languages, which we developed over time. This is a really simple way to connect to Kafka, Redpanda, and others. Our DynamoDB API also implements the DynamoDB streaming in a similar way on top of our streaming solution.

You mentioned Palo Alto Networks. I’m not all that familiar with their details, but I think that they’re not using change data capture because they know more about the pattern of the data pattern that they use. They wanted to eliminate streaming due to cost and complexity. They manage their solution on their own, and they have an extreme scale: they have thousands of clusters, and each cluster has its own database and needs its own servers… so it’s expensive. You can understand why they want to eliminate so many additional streaming clusters.

They decided to use ScyllaDB directly, and they shared how they do it in their ScyllaDB Summit presentation. I believe that their pattern is such that instead of having ScyllaDB automatically create a change data capture table, they expect changes within a certain time limit, just recently, and they decided that they can just query those time changes on their own. I don’t know enough about the details to say what’s better, using off-the-shelf CDC or implementing this on your own. In their case, it’s probably possible to do it either way. Maybe they made the best decision to do it directly. If you know your data pattern, that’s the best. Usually, CDC will be implemented for users who don’t know what was written to the database. That’s why they need change data capture to figure out what happened across an entire half a petabyte data set.

George Anadiotis: Or potentially in cases where there is really no pattern in terms of how the data is coming in. It’s irregular, so CDC can help trigger updates in that scenario.

Dor Laor: Exactly. Otherwise, it’s really impossible other than to do the full scan. And even if you do a full scan, CDC allows you to know what was the previous value of the data – so it’s even more helpful than a regular full scan.

Database Benchmarking

George Anadiotis: All right. Another topic that caught my eye from the ScyllaDB Summit agenda was around benchmarking. There was one presentation where the presenter was detailing the type of benchmarking that you do, how you did it, and the results. Could you summarize how the progress that you have outlined on the technical front has translated to the performance gains and what was the process of verifying that?

Dor Laor: So benchmarking is hard and we’re sometimes doing more than one flavor of benchmarking. We have an open source project that automates things. We need to use multiple clients when we hit the database and it’s important to compute the latency correctly. That tool automatically builds a histogram for every client combined in the right way.

We did several benchmarks over the past two years. At ScyllaDB Summit, we presented the process and results of benchmarking our database at the petabyte level. We also presented a benchmark that compared the i3, Intel’s x86 solution, with the Arm instances by AWS to figure out what’s more cost effective. Also, AWS has another instance family based on newer x86 machines: the i4is. The bottom line is that the new i4is are more than twice as performant than the i3s, so that’s a huge boost.

AWS benchmarking details

The second part of this blog series, covering ScyllaDB’s roadmap and Dor’s insights on what’s next for distributed databases,  is now available. 

Read Part 2

Getting Started with ScyllaDB Cloud Using Node.js Part 2: CRUD operations

In Part 1, we saw how to create a ScyllaDB cluster in the cloud and connect to the database using CQLSH and ScyllaDB Drivers. In this article, we will explore how to create, read, update and delete data using NodeJS and ScyllaDB Cloud.

You can also find the video series associated with this article on youtube.

Create

Let’s now implement our callback function in the post middleware. Replace the router.post middleware with the following code:

router.post('/', async (req, res, next) => {
   try {
      const { name } = req.body;
      const itemId = cassandra.types.Uuid.random();
      const query = 'INSERT INTO items(id, name, completed) VALUES (?, ?, ?)';
      await cluster.execute(query, [itemId, name, false]);
      res.status(200).send({ itemId });
   } catch (err) {
      next(err);
   }
});

Let’s go over the above code. First, we are deconstructing the name object from the request’s body.  We generate a random itemID that we can return after running the INSERT query.

Third, we create a query object that represents the CQL query that is going to be executed in our database. We have replaced the values with question marks “?” that is going to be passed as an array to the execute function.

Finally, we will return the itemId.

Let’s test our code using Postman. We will send a POST request to http://localhost:3001/api/items endpoint. In the body tab, select raw and JSON and add the object name with the name of the task.

You should receive status 200 and a random itemId as illustrated below.

Let’s now implement the code to create items from our front-end.

The client directory, we will first install a package named axios that will allow us send HTTP requests to our server.

npm i axios

In the components/TodoList.js file, import axios then locate the onCreateItem function and add the following code inside the useCallback hook:

import axios from 'axios';

const BASE_URL = 'http://localhost:3001/api';

const TodoList = () => {
   const [items, setItems] = useState([]);

   const onItemCreate = useCallback(
      async (newItem) => {
         const res = await axios.post(BASE_URL, newItem);
            setItems([...items, { ...newItem, id: res.itemId }]);
         });
      },
      [items]
   );
//...

On your browser, add a new item and click on send.

On Postman, send a GET request to http://localhost:3001/api/items to see the list of all items. You should expect all the items you created.

Read

Now that we have successfully created an item and saved it to our database, let’s create the GET middleware that will allow us to read from our database and display all the items in the browser.

In items.js file, locate the router.get middleware and replace it with the following code:

// Get all items
router.get('/', async (_, res, next) => {
   try {
      const query = 'SELECT * FROM items';
      const result = await cluster.execute(query);
      res.json(result.rows);
   } catch (err) {
      next(err);
   }
});

In the code above, we have created a query constant to select all items from the items table. We  then execute the query and save the result. Finally, we will send the items using result.rows.

Let’s now display all the items in the browser. In the TodoList.js component, add the following code:

import React, { useState, useCallback, useEffect } from 'react';
// ...
const TodoList = () => {
   // ...
   useEffect(() => {
      const getItems = async () => {
         const res = await axios.get(BASE_URL);
         setItems(res.data);
   };
   // TODO: Uncomment the below after you implement the server
   getItems();
}, []);
   // ...
};

Save the file. You should expect all your items to be displayed in the browser.

So far, we implemented the post and get middleware to create a new item and read from the database. Next, we will implement the update and delete middlewares to complete the CRUD operations.

Update

Great! So far we have created items and read them from the database. We now will re-write the router.put middleware to update items.

In server project, locate the items.js file and update the put middleware with the following:

// Update item
router.put('/:id', async (req, res, next) => {
   try {
      const { id } = req.params;
      const { completed } = req.body;
      const query = 'UPDATE items SET completed=? WHERE id=?';
      await cluster.execute(query, [completed, id]);
      res.status(200).send();
   } catch (err) {
      next(err);
   }
});

Let’s explain the above code. We are sending the itemId as a parameter in the URL. We are likewise sending the completed field in the body. Since we only need to update the completed field, we do not send the entire item object.

Just like before in Create and Read, we create a query constant that uses a CQL statement to update the items table. The question marks “?” in the CQL statement represent the parameters that are passed as an array in the execute function. Finally, we will send status 200 after the query is successfully executed.

In the client application, in The TodoList.js component, locate the onItemUpdate function and replace it with the following code:

const onItemUpdate = useCallback(
   (item) => {
      async (item) => {
         await axios.put(`${BASE_URL}/${item.id}`, { completed: item.completed });
      const index = items.findIndex((i) => i.id === item.id);
      setItems([...items.slice(0, index), item, ...items.slice(index + 1)]);
   },
   [items]
);

We’re using axios to send a PUT request with the item ID as parameter and the completed object in the request body.

We can test the above with Postman to send a GET request to the API.

Delete

Finally, we can implement the delete middleware in our backend, then handle the request in the frontend.

<// Delete item
router.delete('/:id', async (req, res, next) => {
   try {
      const { id } = req.params;
      const query = 'DELETE FROM items WHERE id=?';
      await cluster.execute(query, [id]);
      res.status(200).send();
   } catch (err) {
      next(err);
   }
});

Just like in the update callback function, we use the item ID as a parameter and a WHERE clause in the CQL query to locate the item we want to delete.

In the TodoList component in the client project, locate the onItemDelete function and replace it with the following:

const onItemDelete = useCallback(
    async (item) => {
        await axios.delete(`${BASE_URL}/${item.id}`)
            const index = items.findIndex((i) => i.id === item.id);
            setItems([...items.slice(0, index), ...items.slice(index + 1)]);
        });
    },
    [items]
);

We use axios.delete to send DELETE request to the API. We pass the item.id as a param in the URL. Once the request is successful, we update the frontend a remove the item from the items state.

Congratulations! If you followed along, we came a long way! In this series, we create a Todo application to Create, Read, Update and Delete items stored in ScyllaDB Cloud, using NodeJS, express and the cassandra-driver.

I hope you enjoyed the articles and found them useful to quickly get started with ScyllaDB Cloud. Please feel free to share your feedback with us and let us know what you want to see in the future.

LEARN MORE IN SCYLLADB UNIVERSITY

P99 CONF 2022 Call for Speakers

P99 CONF was born out of the need for developers to create and maintain low-latency, high-performance, highly available applications that can readily, reliably scale to meet their ever-growing data and business demands.

P99 CONF is a cross-industry vendor-neutral free virtual event for engineers and by engineers. We believe there is a growing community of engineers, developers and DevOps practitioners who are each seeking to find or, more so, to create their own innovative solutions to the most intractable big data problems, so we want to bring them all together in a community. That is what P99 CONF is all about.

When we launched P99 CONF last year, these were our principles and beliefs, and the community proved us right. Thousands of professionals around the world attended live, and thousands more were able to watch our sessions on-demand in the weeks and months to follow. In our inaugural event, we had speakers from across the spectrum of the computing industry with talks spanning the following topics:

  • Advanced Programming Language Methods: Rust, C++, Java, Go
  • Advanced Operating System Methods: IO Rings, eBPF, tracing
  • Performance Optimization: measuring latency, minimizing OS noise, tuning, flamegraphs
  • Distributed Databases: ScyllaDB, Redis, Percona
  • Event Streaming Systems: Apache Kafka, Apache Pulsar, Redpanda
  • Distributed Storage Systems: Ceph, object compaction
  • Kubernetes, Unikernels, Hardware Architecture and more

Call For Speakers

For P99 CONF to truly succeed, it must be driven by leaders and stakeholders in the developer community, and we welcome your response to our Call for Speakers.

Are you a developer, architect, DevOps, SRE with a novel approach to high performance and low latency? Perhaps you’re a production-oriented practitioner looking to share your best practices for service level management, or a new approach to system management or monitoring that helps to shave valuable milliseconds off application performance.

BE A SPEAKER

All Things P99

P99 CONF is sort of a self-selecting audience title. If you know you know. If you are asking yourself “Why P99?” or “What is a P99?” this show may not be for you. But for those who do not know but are still curious, P99 is a latency threshold, often expressed in terms of milliseconds (or even microseconds) under which 99% of a network transaction occurs. Those last 1% of transactions represent the long-tail connections that can lead to retries, timeouts, and frustrated users. Given that even a single modern complex web page may be created out of dozens of assets and multiple database calls, hitting a snag due to even one of those long-tail events is far more likely than most people think. This conference is for professionals of all stripes who are looking to bring their latencies down as low as possible for reliable, fast applications of all sorts — consumer apps, enterprise architectures, or the Internet of Things (IoT).

P99 CONF is the place where industry leading engineers can present novel approaches for solving complex problems, efficiently, at speed. There’s no other event like this. A conference by engineers for engineers. P99 CONF is for a highly technical developer audience, focused on technology, not products, and will be vendor / tool agnostic. Vendor pitches are not welcome.

Sign Up Today!

Last year, thousands of people registered for our premier event. So even if you’re not up to speaking, we’d love to see you there! If your title includes terms like “IT Architect,” “Software Engineer,” “Developer,” “Software Engineer,” “SRE,” or “DevOps,” this is your tribe. Your boss is not invited.

P99 CONF 2022 will be held free and online October 19-20, 2022. You can sign up today for updates and reminders as the event approaches.

SIGN UP TODAY

Speaker FAQs

1. What type of content are you looking for? Is there a theme?

We’re looking for compelling technical sessions, novel algorithms, exciting optimizations, case studies, and best practices.

P99 CONF is designed to highlight the engineering challenges and creative solutions required for low-latency, high performance, high availability distributed computing applications. We’d like to share your expertise with a highly technical audience of industry professionals. Here are the categories we are looking for talks in:

  • Development — Techniques in programming languages and operating systems
  • Architecture — High performance distributed systems, design patterns and frameworks
  • Performance — Capacity planning, benchmarking and performance testing
  • DevOps — Observability & optimization to meet SLAs
  • Use Cases — Low-latency applications in production and lessons learned

Example topics include…

  • Operating systems techniques — eBPF, io_uring, XDP and friends
  • Framework development — Most exciting new enhancements of golang, Rust and JVMs
  • Containerization, virtualization and other animals
  • Databases, streaming platforms, and distributed computing technologies
  • Hardware advances — CPUs, SoCs, network and storage I/O
  • Tools — Observability tools, debugging, tracing and whatever helps your night vision
  • Storage methods, filesystems, block devices, object store
  • Methods and processes — Capacity planning, auto scaling and SLO management

2. Does my presentation have to include p99s, specifically?

No. It’s the name of the conference, not a strict technical requirement.

You are free to look at average latencies, p95s or even p99.9s. Or you might not even be focused on latencies per se, but on handling massive data volumes, throughputs, I/O mechanisms, or other intricacies of high performance, high availability distributed systems.

3. Is previous speaking experience required?

Absolutely not, first time speakers are welcome and encouraged!

Have questions before submitting? Feel free to ask us at community@scylladb.com.

4. If selected, what is the time commitment?

We are looking for 15-20 minute sessions which will be pre-recorded and live Q&A during the event. We expect the total time commitment to be 4-5 hours.

You’ll need to attend a 30 min speaker briefing and a 1 hour session recording appointment, all done virtually. Additionally, we ask that all speakers log onto the virtual conference platform 30 minutes before their session and join for live Q&A on the platform during and following their session. And, of course, all speakers are encouraged to attend the entire online conference to see the other sessions as well.

5. What language should the content be in?

While we hope to expand language support in the future, we ask that all content be in English.

6. What makes for a good submission?

There’s usually not one thing alone that makes for a good submission; it’s a combination of general worthiness of the subject, novelty of approach to a solution, usability/applicability to others, deep technical insights, and/or real-world lessons learned.

Session descriptions should be no more than 250 words long. We’ll look at submissions using these criteria:

Be authentic — Your peers want your personal experiences and examples drawn from real-world scenarios

Be catchy — Give your proposal a simple and straightforward but catchy title

Be interesting — Make sure the subject will be of interest to others; explain why people will want to attend and what they’ll take away from it

Be complete — Include as much detail about the presentation as possible

Don’t be “pitchy” — Keep proposals free of marketing and sales.

Be understandable — While you can certainly cite industry terms, try to write a jargon-free proposal that contains clear value for attendees

Be deliverable — Sessions have a fixed length, and you will not be able to cover everything. The best sessions are concise and focused.

Be specific — Overviews aren’t great in this format; the narrower and more specific your topic is, the deeper you can dive into it, giving the audience more to take home

Be cautious — Live demos sometimes don’t go as planned, so we don’t recommend them

Be rememberable — Leave your audience with take-aways they’ll be able to take back to their own organizations and work. Give them something they’ll remember for a good long time.

Be original — We’ll reject submissions that contain material that is vague, inauthentic, or plagiarized.

Be commercial-free — No vendor pitches will be accepted.

7. If selected, can I get help on my content & what equipment do I need?

Absolutely, our content team is here to help.

We will want to touch base with you 1-2 times to help you with any content questions you may need and offer graphic design assistance. This conference will be a virtual event and we’ll provide you with a mic and camera if needed (and lots of IT help along the way). Need something else? Let us know and we’ll do our best.

BE A SPEAKER

The NoSQL Developer’s Study Guide for ScyllaDB University LIVE

I’m excited about the upcoming ScyllaDB University LIVE event, which is happening next week.

AMERICAS – Tuesday, March 22nd – 9AM-1PM PT | 12PM-4PM ET | 1PM-5PM BRT EMEA

APAC – Wednesday, March 23rd – 8:00-12:00 UTC | 9AM-1PM CET | 1:30PM-5:30PM IST

REGISTER FOR SCYLLA UNIVERSITY LIVE

To help you prepare and get the most out of it, this guide tells you more about what to expect from each session recommends lessons you can take in advance on ScyllaDB University.

As I mentioned in my previous blog post, the ScyllaDB University LIVE event will be online and instructor-led. It will include two parallel tracks – one for beginners and one for more experienced participants. You can bounce back and forth between tracks or drop in for the sessions that most interest you. However, please be aware that there won’t be an on-demand equivalent.

After the two tracks, we will host a roundtable where you’ll have a chance to ask some of our leading experts and engineers questions. It will also be an opportunity to network with your fellow database monsters.

As you’ll see below, the topics we’ll cover are a mix of advanced topics and topics for people just getting started with NoSQL and ScyllaDB. Even if you previously attended other ScyllaDB University LIVE events, you’ll learn something new since we’re covering some topics we’ve never addressed before.

The labs that you will see in the event (as well as other hands-on labs from ScyllaDB University) can be run on ScyllaDB Cloud, our managed database as a service offering. You can create a ScyllaDB cluster and connect to it with just a few clicks.

If you have already registered for the event, you can take advantage of our ScyllaDB Cloud – Sandbox cluster Free Trial for one month!

Agenda and Recommended Lessons: NoSQL Data Modeling, Database Drivers, and More

Essentials Track Advanced Track
ScyllaDB Essentials

Covers an introduction to NoSQL and ScyllaDB, basic concepts, and architecture. You will also get an overview of ScyllaDB terminology, ScyllaDB components, data replication, consistency level, and the write and read paths. It includes two hands-on demos.

Suggested learning material:

Hands-on Migration to ScyllaDB Cloud

You’ll learn how to migrate from your existing database to ScyllaDB Cloud. By the end of this session, you’ll know the different migration options and when you should use each. We’ll cover hands-on examples for migrating from Cassandra and DynamoDB.

Suggested learning material:

ScyllaDB Basics

This session will start with Basic Data Modeling, covering key concepts like the partition key,  clustering key, and collections, how to use them and why they are important. It then discusses the ScyllaDB drivers, how they can be used, and why they deliver better performance compared to non-ScyllaDB Drivers. Finally, the session covers compaction.

Suggested learning material:

Best Practices in Highly Performant Applications

If you have some experience in creating NoSQL applications but want to learn how to do it better, then this session is for you. It starts with the ScyllaDB shard-ware drivers, what they are and why you should use them. Next, it touches on compaction strategies and how to choose the right one, paging, retries, common pitfalls, and how to avoid them.

Suggested learning material:

Build Your First ScyllaDB-Powered App

In this session, we’ll walk through an extensive hands-on example of how to develop your first application from scratch using ScyllaDB Cloud. The Git repository and different steps to run the lab yourself will be available on ScyllaDB University.

Suggested learning material:

Advanced Data Modeling

This talk covers advanced data modeling concepts like Materialized Views, TTL, Counters, Secondary Indexes, Lightweight Transactions. It also includes a hands-on example.

Suggested learning material:

Certificate of Completion

Participants who complete the training will have access to more free, online, self-paced learning material such as our hands-on labs on ScyllaDB University. Additionally, those who complete the training will get a certification and some cool swag!

See you there!

REGISTER FOR SCYLLA UNIVERSITY LIVE

 

Getting Started with ScyllaDB Cloud Using Node.js Part 1

This article is the first of a series that walks you through how to get started building a backend application withScyllaDB.

In this article, we will review the basics of ScyllaDB, then create and deploy a cluster on AWS using ScyllaDB Cloud.

What’s a CRUD App?

CRUD stands for Create, Read, Update and Delete. In this article, we will build a simple application that will connect to our database and do just that using NodeJS and ScyllaDB Cloud.

Why Should You Care?

JavaScript is and has been the most popular programming language for the past decade. With the evolution of JavaScript compilers, frameworks, and languages such as typescript, NodeJS has become one of the most popular choices to build backend applications and REST APIs. ScyllaDB on the other hand is one of the hottest NoSQL databases in the market today because of its lightspeed performance.

Whether you’re a junior or an experienced developer, this article will help you understand the building blocks and NodeJS application and how to connect it a ScyllaDB cluster running in the cloud.

Getting Started

To start, we will first open the terminal and run the following command to clone the project starter repository:

git clone https://github.com/raoufchebri/getting-started.git

Let’s then change directory:

cd getting-started/nodejs/starter

The project is split into two applications: client and server.

Client

The client application is a simple ReactJS Todo application that allows users to add, mark as complete, and remove tasks.

To run the application, move to the client directory and run the following commands:

npm install
npm start

It’s available, the application will run on http://localhost:3000.

The client application is not connected to any server. If you refresh your browser, you will notice all the tasks that you added disappear. In this article, we will create and connect our client app to the server and store the data in ScyllaDB Cloud.

Server

Let’s now move to the server directory and create implement the CRUD functions.

The application is structured as follow:

– src
—- api
—--- index.js
—--- items.js
—- index.js
– .env
– package.json

The package.json file lists all of the project dependencies:

"dependencies": {
    "cassandra-driver": "^4.6.3",
    "cors": "^2.8.5",
    "dotenv": "^16.0.0",
    "express": "^4.17.2",
    "nodemon": "^2.0.15"
}

In the server directory, let’s install all project dependencies by running the following command:

npm install

We will then start the application by running:

npm run dev

By default, the application will run on http://localhost:3001. If for any reason you would like to change the port, you can do so by updating the .env file and re-run the application.

Open http://localhost:3001 on your browser, and you should see the following message:

{
     "message": "Welcome to ScyllaDB 😉"
}

http://localhost:3001/api should return the following object:

{
    "message": "Welcome to API 🚀"
    }

Let’s now move to /api/items.js file explore the code:

const express = require('express');
const router = express.Router();

// Create one item
router.post('/', async (req, res, next) => {
    try {
        res.json({ message: 'Create one item' });
    } catch (err) {
        next(err);
    }
});

// Get all items
router.get('/', async (_, res, next) => {
    try {
        res.json({ message: 'Get all item' });
    } catch (err) {
        next(err);
    }
});

// Delete one item
router.delete('/:id', async (req, res, next) => {
   try {
      const { id } = req.params;
      res.json({ message: `Delete one item ${id}` });
   } catch (err) {
      next(err);
   }
});

// Update one item
router.put('/:id', async (req, res, next) => {
   try {
      const { id } = req.params;
      res.json({ message: `Update one item ${id}` });
   } catch (err) {
      next(err);
   }
});

module.exports = router;

In items.js, we have defined the post, get, put and delete middlewares with their respective callback functions.

We can test the API by sending a GET request through the following command:

curl -v http://localhost:3001/api/items

You should expect the following result:

*   Trying ::1:3001...
* Connected to localhost (::1) port 3001 (#0)
> GET /api/items HTTP/1.1
> Host: localhost:3001
> User-Agent: curl/7.77.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
< Content-Length: 26
< ETag: W/"1a-wtLtpakh/aHZijHym0cXyb81o1k"
< Date: Mon, 14 Feb 2022 10:22:12 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
{"message":"Get all item"}

The GET request runs the callback function implemented the router.get middleware in items.js.

Create a ScyllaDB Cloud Cluster

The next step is to create an account on https://cloud.scylladb.com. You can use the free trial account to follow along. The home page shows the running clusters. You will probably have an empty page if you are new to ScyllaDB Cloud. Let’s go ahead and create a new cluster by clicking on the “Create a New Cluster” card.

On the new cluster page, enter the name of your cluster. I will name mine “my scylla cluster”. I will also leave the provider to AWS and use the ScyllaDB Account to deploy the cluster.

If you scroll down a little bit, you will see the list of machines available to you. If you are using a free trial account, you will likely use a t3.micro, which is plenty for our example here. If you want to explore other machines, be aware of the cost per node when you create your cluster. To deploy your app on ScyllaDB Cloud, you can use the tools on https://price-calc.gh.scylladb.com/ to have an estimate of the cost associated with a cluster.

Let’s scroll to the bottom of the page and create the cluster. You might take this opportunity to have a small break and get a cup of coffee, as this process usually takes a few minutes.

Great! You’re back energized and ready to explore. First, click on the connect button or go to the connect tab on your cluster’s main page. You will notice the IP addresses of your nodes along with your credentials. That information is critical to access your database and should not be made public or accessible in your app. We will use the dotenv library and a .env file to hide our credentials later on.

Create a Keyspace and Table using CQLSH

Let’s now test it out. Click on the “Cqlsh” item on the left side menu and follow the instructions to connect to your cluster.

I will use docker since I do not have cqlsh installed on my machine.

docker run -it --rm --entrypoint cqlsh scylladb/scylla -u [USERNAME] -p [PASSWORD] [NODE_IP_ADDRESS]

Note that you will need to replace the USERNAME, PASSWORD, and NODE_IP_ADDRESS with your own.

The above command instructs docker to create a new container with the scylladb/scylla image and run cqlsh, which is short for the Cassandra Query Language Shell. You will then be at the shell prompt.

Let’s create a new keyspace. A keyspace is a location where the data is stored. I’m using the code provided in the connect tab under Cqlsh.

CREATE KEYSPACE todos WITH replication = {'class': 'NetworkTopologyStrategy', 'AWS_US_EAST_1' : 3} AND durable_writes = true;

Let’s then use the newly created keyspace.

USE todos;

Then create table items. Note that in ScyllaDB, it is mandatory to have a PRIMARY KEY. More about that on ScyllaDB University in this lesson.

CREATE TABLE items ( id uuid PRIMARY KEY, name text, completed boolean);

Let’s now insert some values.

INSERT INTO items (id, name, completed) VALUES (uuid(), 'my first task',false);

And run a SELECT query to make sure we correctly saved the data.

SELECT * FROM items;`

 id                                   | completed | name
--------------------------------------+-----------+---------------
 0536170a-c677-4f11-879f-3a246e9b032d |     False | my first task

Connect the Server App to ScyllaDB Cloud

In the server directory, locate and open the .env file, and provide the cluster information that you can find on ScyllaDB Cloud:

USERNAME=""
PASSWORD=""
DATA_CENTER=""
NODE_IP=""
KEYSPACE="todos"

The reason we’re using environment variables is to hide sensitive information that you might pass to the client.

In items.js file, Let’s import the cassandra-driver and the dotenv config like so:

const cassandra = require('cassandra-driver');
require('dotenv').config();

The reason we’re using the cassandra-driver package is that it is fully compatible with ScyllaDB, which is convenient to migrate from Apache Cassandra to ScyllaDB.

We will also add our environment variables to the items.js file and create an instance of Client using the following code:

const { NODE_IP, DATA_CENTER, USERNAME, PASSWORD, ITEMS_KEYSPACE } =
   process.env;

const cluster = new cassandra.Client({
   contactPoints: [NODE_IP],
   localDataCenter: DATA_CENTER,
   credentials: { username: USERNAME, password: PASSWORD },
   keyspace: KEYSPACE,
});

Et voilà!

Congratulations! You’ve created your first ScyllaDB cluster on the cloud.

In this article, we cloned the todo-app project and installed its dependencies. We looked at both the client and server structure and tested the frontend and the API. We also create a cluster using ScyllaDB Cloud, and connected it to NodeJS using the cassandra-driver and cqlsh. In the next article, we will see how to implement the CRUD operations.

GO TO PART 2

Upgrades to Our Internal Monitoring Pipeline: Using Redis™ as a Cassandra® Cache

Monitoring at scale

In this series of blogs, we explore the various ways we pushed our metrics pipeline—mainly our Apache Cassandra® cluster named Instametrics—to the limit, and how we went about reducing the load it was experiencing on a daily basis.

The Problem Space

Instaclustr hosts hundreds of clusters, running thousands of nodes, each of which is reporting metrics every 20 seconds. From operating system level metrics like cpu, disk and memory usage, to application specific metrics like Cassandra read latency or Kafka® consumer lag.

Instaclustr makes these metrics available on our metrics api for customers to query. As we continue to grow and add more customers and products, the underlying infrastructure to make these metrics available needs to scale to support the expanding number of nodes and metrics. 

Our Monitoring Pipeline

We collect metrics on each node with our monitoring application which is deployed on the node. This application is responsible for periodically gathering various metrics and converting them to a standard format, ready to be shipped.

We then ship the metrics off to our central monitoring infrastructure where it is processed by our fleet of monitoring servers where various operations are performed such as:

  • Calculating new metrics when necessary; in turn running counters into differential metrics, mapping various cloud service file system paths into common paths and the like.
  • Processing the metrics, applying rules, and generating reports or alerts for our technical operations team to respond to.
  • Storing the metrics, the primary data store is our Apache Cassandra cluster named Instametrics—itself running as a managed cluster on our platform, supported by our Technical Operations staff just like all of our clusters.

In a previous blog our Platform team had introduced how we had improved our metrics aggregation pipeline, to take a large amount of strain off our Instametrics Cassandra cluster.  They had achieved this by introducing a Kafka streaming pipeline. 

Taking a Load off the API

As we continued to scale out to new product offerings, adding more customers and more nodes, we started looking at where our monitoring pipeline needed further improvement.

One of the major concerns was our Monitoring REST API, which had continued to see response latency grow as we gained more customers who were eager to consume the metrics as quickly as we could produce them. Metrics are published every 20 seconds, and some customers like to ingest them as quickly as they are available. 

As part of the effort to reduce the latencies and errors being reported to customers, we did some investigation into what was causing the issues. After shoring up the capacity of the API servers themselves, we did some analysis and came to the conclusion that Cassandra is a relatively expensive way (in terms of $ and CPU load) to serve the ready-heavy workload we had for fresh metrics. If we wanted to continue servicing all the requests from our Instametrics cluster with the desired latency, it would need to be scaled further to appropriately handle the load. We had to service requests that were coming from multiple dimensions, significant write load from storing the metrics, significant load from the original Spark based metrics rollup calculations, and finally read load from the API requests.

Our Platform team had already started the work on the reduction of the metric aggregation calculations, but we also wanted to make an effort to further reduce the read load on our Cassasndra cluster. We ideally wanted something that would continue to scale as we added clusters and products, that would return an even better user experience, and reduce the strain on our Cassandra cluster. 

Enter Redis

For solving the read load, Redis was an easy decision to make. Our Instaclustr for Redis offering was created to help companies solve these exact types of problems. We created a plan using our Terraform provider and created a Redis cluster, configured and networked to our core applications, and it was all ready to serve metrics requests.

The challenge then became how to get the metrics there.

Getting the Data Model Correct

We had always anticipated that the data stored in Redis was always going to be slightly different than those stored in Instametrics.

Our Cassandra cluster stores all raw metrics for 2 weeks, but storing that amount of data in Redis would have been cost prohibitive. This is because while Cassandra stores this information on disk, Redis stores it in memory. This means significantly faster query times, but much more expensive to store. 

However, while it’s possible for customers to ask for a metric from 2 hours or 2 days ago, we know that the majority of API load comes from customers who are constantly querying for the latest metric available, which they often feed into their own monitoring pipelines. So we really only need to have the latest minute or so of data in Redis to serve the vast majority of API requests.

We also know that not every single customer uses the monitoring API—lots of our customers simply view metrics on our console as needed. For those that do like to store them themselves, they may not be using it 24×7, but rather only storing it in certain situations. We can save on data transfer costs, cpu, and memory overhead if we only cache metrics for clusters that will be reading them from the API. 

Taking into account that the majority of our API requests are for the most recent metrics, we apply a 15 minute Time To Live (TTL) on all the Redis records, and only cache metrics for customers who have been actively using the monitoring API within the last hour.

The First Attempt

The simplest solution, with minimal required changes to our existing stack, was to introduce dual writes in the monitoring pipeline. In addition to writing raw metrics into our Cassandra cluster, we would additionally write them into our Redis cluster.

This wasn’t entirely risk free, the monitoring servers have a constant large load which was delicately balanced. If the pipeline wasn’t cleared fast enough, it would create a type of negative feedback loop that would fail quickly.

But this pipeline is pretty well instrumented, and we can see when requests are backing up before it becomes a huge problem.

So we wrote a Redis metrics shipper and wired it into our processing engine behind a feature flag and turned it on a small subset of the monitoring servers and observed.

The average of monitoring boxes with the feature enabled is in blue, and the others in purple. As the image above relates, we saw our latencies spike significantly, which caused the queues upstream to spike heavily. We decided to turn off the experiment and assess.

We also reached out to the Instaclustr Technical Operations team, who had a quick look at the Redis cluster for us, and reached the conclusion that it was humming along with minimal issues. The bottleneck was not with the Redis cluster itself. 

This graph shows the CPU load—the periodic spikes are due to AOF rewriting, which we will write about in a subsequent article. Without them it was sitting at around 10% cpu load while ingesting roughly 30% of the total metrics.

All in all, a reasonable first attempt, but improvements had to be made!

Attempt 2

We were observing much more cpu being consumed on our monitoring servers than was anticipated and went to find some improvements. After a bit of poking around, we found a function that was moderately expensive when called once per metric collection, but was having its impact amplified when we called it 2-3 times due to the new Redis destination.

We did some work and consolidated the call so it was back to only once per cycle and turned it all on again.

As you can see, the average latency of our processing engine decreased markedly, including turning on the Redis metrics shipper! Our Redis cluster was, again, humming along with no issues. Each node was processing 90 thousand operations per second and sitting at around 70% cpu with plenty of memory overhead available.

Job done! It was time to start using these metrics in the API.

Attempt 2: Continued

No such luck.

Having solved the processing latency problem, we thought we were in the clear. But after leaving the servers alone for a few days, we observed sporadic CPU spikes causing the servers to spiral out of control, crash and then restart.

We were seeing a correlated increase in application stream latency—which was indicating that they were gradually slowing down, before crashing, and restarting.

This was observed on the busiest servers first, but eventually all of them experienced similar symptoms.

We had to turn off metrics shipping to Redis, again, and go back to the drawing board.

The issue was that we were running out of optimization options. Our monitoring pipeline is written in Clojure, and the options available for client libraries and support are constrained. We often end up calling out to native Java libraries to get the full functionality we require, but that can come with its own set of issues.

At this stage, we were a bit stuck.

Kafka to the Rescue

Thankfully we were not the only team working on improving the monitoring pipeline. One of our other teams was getting close to the final implementation of their Kafka metrics shipper

The raw metrics would go to a Kafka cluster first, before their final destination into our Instametrics Cassandra cluster.

Once the metrics are in Kafka, the possibility space opens up for us significantly. We can afford to ingest the metrics at a slower pace, since we are no longer blocking an incoming time critical queue, we can scale consumers more easily and time-walk records when necessary if there is a failure.

While they were working on getting it completely stable, we were writing the second version of our Redis metrics shipping service. This time we were consuming the metrics out of Kafka, and were able to build a small Java application running on a familiar set of tools and standards.

This really validated our choice of Kafka as the core of our metrics pipeline. We are already seeing the benefits of having multiple teams, creating multiple applications, all able to consume the same stream of messages.

Removing the above constraints around processing time on our monitoring instances meant we could push out this microservice in minimal time and effort, and have all the instrumentation we need using jmx metrics and logging tools.

Attempt 3: Redis Writers as Dedicated Kafka Consumer Applications

Developing an application to read from Kafka and write to Redis was reasonably straightforward, and it wasn’t too long before we had something we were ready to begin testing. 

We deployed the new Redis writer application to our test environment and let it soak for two weeks to check for stability and correctness. Our test environment has very little monitoring load compared to the production environment, however with both the Redis writers and the Redis cluster remaining stable after two weeks in the test environment, we decided to push ahead and deploy the writers to production to test using the production workload.

This was another one of the major benefits of deploying our monitoring infrastructure on Kafka. We saw this as an extremely low risk deployment, as even if the Redis writer was unable to keep up, or needed to be stopped, or just generally buggy, it would have placed only a tiny amount of additional load on our Kafka cluster – with no dependency on the rest of the system. 

After deploying the Redis writer application to production, it became obvious that the writers could not keep up with the amount of traffic. The CPU utilization was maxed out, and  rapidly increasing consumer lag for the Redis-writer consumer group. The overall  throughput was only a fraction of what the original riemann based solution was able to achieve.

Problem 3a: Excessive CPU Usage by the Writers

The next steps were to try and figure out exactly why our Redis writer was unable to meet our performance expectations. For this we began profiling using async-profiler, which revealed that 72% of CPU time was spent performing linear searches through lists of recently active object IDs. Essentially the code path which worked out if a customers metrics should be stored in Redis. That’s right, almost 75% of CPU was spent working out if we should cache a metric, and only 25% was utilized actually saving the metric to Redis. This was further worsened by the usage of the Java Stream API in a way that caused a large number of invokeinterface JVM instructions, contributing 24% towards the 72% total. For lists containing thousands of IDs, the solution is to use hash tables.

Problem 3b: Redis Caching Cluster Runs Out of Memory

While working on problem 3a, a would-be disaster strikes! Our monitoring system alerted  the on-call support team about an outage of the internal Redis caching cluster, which was quickly determined to be caused by the cluster running out of memory. How is it that we are running out of memory when we are processing less data than before? And how did we run out of memory when our Redis clusters are configured with what was thought to be reasonable memory limits, along with a least-recently-used (LRU) eviction policy?

Analysis of the new Redis writer code revealed a bug in the TTL-based expiration logic that rendered it almost completely useless for any non-trivial volume of data. It only applied the TTLs during one minute intervals every 30 minutes, so most of the data ended up with no TTLs, leading to the uncontrolled growth of memory usage. This was an unnecessary optimization, so we can fix it by always updating the TTLs when writing metrics, which is an easy enough change. But this led us to another larger question—why did the fallback memory limit mechanism not work?

The used-memory metrics reported by Redis tells us that the memory limit was respected, at least until the cluster started falling over, but what was surprising was the inverted spikes in the memory available as reported by the system, sometimes reaching down to zero!

We compared the timestamps of the spikes to Redis logs and discovered they are caused by append-only-file (AOF). Further research revealed the general problem, that the Redis peak memory usage can be much higher than the max memory limit (redis#6646). Redis forks the main process to take consistent snapshots of the database, needed for the AOF rewrite. Forking is typically space-efficient because it uses copy-on-write (COW), but if the workload is write-heavy, a significant proportion of the original memory pages needs to be copied. For this workload, we would need to restrict the Redis max memory to less than half of the total system memory, and even then, we needed to perform testing to be sure that this would prevent Redis from running out of memory.

Problem 3c: Inefficient Metrics Format in Redis

Our application stored metrics in Redis as JSON objects in sorted sets. After a few iterations of the solution, we ended up with a legacy model that duplicates the key name in each value. For the typical metric value, the key name started accounting for about half of the memory usage.

For example, here is a key for a CPU load metric of a node:

{46e4157b-e6de-42e1-9c37-5fe5e8d1e676}/metrics/cpuUtilization

And here is a value that could be stored in that key:

{"service":"{46e4157b-e6de-42e1-9c37-5fe5e8d1e676}/metrics/cpuUtilization","time":1623814124123,"value":0.0}

If we remove all the redundant information, we can reduce this down to:

{"time":1623814124123}

In addition to removing the service name, we can also remove the value if it is the default. With both of these optimizations, we can reduce memory usage by approximately half.

Attempt 4: Bugfix, Optimize, and Tune

After the problems were fixed, the CPU usage dropped, and the throughput increased, but the ever-increasing consumer lag barely slowed down. We still just weren’t processing as many messages as we needed to to keep up with the incoming event rate.

The low CPU usage along with the lack of any other obvious resource bottlenecks suggested that some sort of thread contention could be happening. The Redis writer uses multiple Kafka consumer threads, but all threads share the same instance of the Lettuce Redis client, which is what the Lettuce documentation recommends. Going against the recommendation, we tried refactoring the Redis writer so each consumer thread gets its own Lettuce client. 

Success! Immediately after deploying the new Redis writer, the throughput has doubled, and the consumer lag is racing downwards for the first time.

Note that a higher load is sustained while the writer is catching up. Once it has processed the backlog of metrics, the CPU drops significantly to around 15%. At this point, all we have left to do is to downsize the Redis writer instances to best match the CPU usage between the Redis writer and Redis cluster, while leaving plenty of headroom for future growth. 

Turning on Reading from the API

So now we have a pipeline which is continually caching the last 15 minutes of metrics for any customer nodes who have used the api recently. But that won’t have any impact unless we extend our API to query Redis! 

The final piece of work that needed to be completed was allowing our API instances to query metrics from Redis! 

In the end our API logic only filters for metric requests based on time, and if it is in the last 15 minutes – queries Redis first. Redis is fast at reads, but is extremely fast in saying that it doesn’t have a cached value. So rather than trying to programatically figure out if a particular recent metric is cached, we try redis, and if it’s not there – we query from Cassandra. Taking a “fail fast” approach to the metrics retrieval only adds a very minor latency increase in the worst case. 

The initial deployment of the API feature worked quite well, and we were seeing a reduction in reads from our Cassandra Cluster. However we did have some edge cases causing issues with a small number of metrics, requiring us to turn off the feature and develop a fix. This final solution was deployed on the 27th of October.

This first graph shows the reduction in the number of requests which were hitting our Instametrics Cassandra cluster from our API – showing that we have almost eliminated these reads.

This graph shows the number of reads that had been moved to our Redis Cluster (note this metric was only introduced on the 25th of October)

The interesting part is – that this actually hasn’t had a large impact on our API latencies. We are still reporting very similar P95 (blue) and P50 (purple) latencies.

This can actually be explained by two things: 

  1. Our Cassandra cluster was very large at this point, over 90 nodes of i3.2xlarge nodes, which includes extremely fast local storage. This meant that any read requests were actually still being serviced in a reasonable time. 
  2. The Redis cluster is much smaller than our Cassandra cluster, and we can still make some performance improvements. One is by moving from AOF persistence, to diskless persistence, which should further improve performance for a large write heavy workload like ours. 

At this point – the major benefit for our Redis Caching of metrics, is the impact it has had on our Cassandra cluster health. When we started work on our Redis Caching, we had a 90 node i3.2xlarge Cassandra cluster. This was actually resized to a 48 node i3.4xlarge cluster in early September to provide additional processing capacity. 

The first improvement the cluster saw was the Kafka based Rollups, which was released on the 28th of September, then the Redis Caching close to one month later on the 27th of October. 

You can see from the below graphs the significant improvement that both of these releases  had on on CPU utilization, OS Load, and the number of reads being performed on the cassandra cluster.

In the end this enabled us to downsize our Cassandra cluster from a 48 node i3.4xlarge cluster, to a 48 node i3en.2xlarge cluster in mid November. This represents a large saving in infrastructure costs, whilst maintaining our new found cluster health, and read latencies.

Everything has been running smoothly for the past few months on our redis writers, with no major rework needed to maintain a stable caching pipeline, and we are continuing to see really promising customer impacts.

Stay tuned for an upcoming blog, where we explain how having a Redis cache enabled us to build our new Prometheus Autodiscovery Endpoint, which makes it really easy for customers using Prometheus to scrape all available metrics. 

 

RabbitMQ® is a trademark of VMware, Inc. in the U.S. and other countries.

The post Upgrades to Our Internal Monitoring Pipeline: Using Redis™ as a Cassandra® Cache appeared first on Instaclustr.

ScyllaDB University LIVE – Spring 2022

The ScyllaDB University LIVE event is taking place next month. We will host the event on two consecutive days, each designed for a different time zone:

  • AMERICAS – Tuesday, March 22nd – 9AM-1PM PT | 12PM-4PM ET | 1PM-5PM BRT
  • EMEA and APAC – Wednesday, March 23rd – 8:00-12:00 UTC | 9AM-1PM CET | 1:30PM-5:30PM IST

It’s your chance to take a free half-day of online, instructor-led training from some of the leading engineers and experts. We’ll host two parallel tracks, an Essentials track for beginners and an Advanced Topics and Migrations track for people with previous experience.

Sign up for ScyllaDB University LIVE

The topics covered in this event are ScyllaDB Essentials, ScyllaDB Basics, Your First ScyllaDB Application, Hands-on Migration to ScyllaDB Cloud, Best Practices in Highly Performant Applications, and Advanced Data Modeling. More details are below.

While the sessions are live and we will not have an on-demand version, the hands-on labs and slide decks presented will be available on ScyllaDB University right after the event. Users that complete the accompanying courses will also get a certificate of completion and some cool swag.

Bring any questions you have to the sessions. We aim to make them as interactive and engaging as possible. After the sessions, we’ll host a roundtable discussion where you’ll have a chance to network with your colleagues and ask our leading experts questions.

Detailed Agenda

  • ScyllaDB Essentials: This session covers an introduction to NoSQL and ScyllaDB, basic concepts, and architecture. You will also get an overview of ScyllaDB terminology, ScyllaDB components, data replication, consistency level, and the write and read paths. It includes two hands-on demos. You’ll see how easy it is to spin up a cluster with docker and perform some basic queries in the first one. In the second one, we’ll see what happens when we conduct operations in our cluster with different replication factors and consistency levels while simulating a situation where some of the nodes are down.
  • ScyllaDB Basics: This session will start with Basic Data Modeling, covering key concepts like the Partition Key and the Clustering key, Collections, how to use them and why they are important. It then discusses the ScyllaDB Drivers, how they can be used, and why they deliver better performance compared to non-ScyllaDB Drivers. Finally, the session covers Compaction, including an explanation of how it works, the different compaction strategies, and how you can choose the best compaction strategy based on your use case.
  • Build your first ScyllaDB Powered App: In this session, you’ll see an extensive hands-on example of how to develop your first application from scratch using ScyllaDB Cloud. The Git repository and different steps to run the lab yourself will be available on ScyllaDB University.
  • Hands-on Migration to ScyllaDB Cloud: In this talk, you’ll learn how to migrate from your existing database to ScyllaDB Cloud. The discussion starts with an overview of the different migration tools and options, online migrations, offline migrations, and the standard steps for each process. By the end of this talk, you’ll know the different migration options and when you should use each. The talk includes hands-on examples for migrating from Cassandra and from DynamoDB.
  • Best Practices in Highly Performant Applications: If you have some experience in creating NoSQL applications but want to learn how to do it better, then this session is for you. It starts with the ScyllaDB Shard Aware Drivers, what they are and why you should use them. Next, it touches on compaction strategies and how to choose the right one, paging, retries, common pitfalls, and how to avoid them.
  • Advanced Data Modeling:  This talk covers advanced data modeling concepts like Materialized Views, TTL, Counters, Secondary Indexes, Lightweight Transactions. It also includes a hands-on example.

Get Started in ScyllaDB University

Before the event, and if you haven’t done so yet, we recommend you complete the ScyllaDB Essentials course on ScyllaDB University to better understand ScyllaDB and how the technology works.

Hope to see you at ScyllaDB University Live!

Sign up for ScyllaDB University LIVE

Security Advisory: CVE-2021-44521

On Friday, February 11, 2022, Instaclustr was advised that a new CVE for Apache Cassandra® had been published. This CVE affects users that have User Defined Functions (UDF) enabled with certain configurations. Instaclustr investigated our configurations and has confirmed that this CVE does not affect our services as we do not have UDF enabled. Self-hosted users are urged to double-check their configuration and modify them accordingly or update as advised below. Please note that upgrading Cassandra mitigates this specific CVE but this configuration is still considered to be unsafe.

If you have any queries regarding this vulnerability and how it relates to Instaclustr services, please contact security@instaclustr.com

Advisory for CVE-2021-44521:

When running Apache Cassandra with any of the following configurations

  • enable_user_defined_functions: true
  • enable_scripted_user_defined_functions: true
  • enable_user_defined_functions_threads: false

it is possible for an attacker to execute arbitrary code on the host. The attacker would need to have enough permissions to create user-defined functions in the cluster to be able to exploit this. Note that this configuration is documented as unsafe, and will continue to be considered unsafe after this CVE.

This issue is being tracked as CASSANDRA-17352

Mitigation:

Set enable_user_defined_functions_threads: true (this is default)

or

  • 3.0 users should upgrade to 3.0.26
  • 3.11 users should upgrade to 3.11.12
  • 4.0 users should upgrade to 4.0.2

Credit: This issue was discovered by Omer Kaspi of the JFrog Security vulnerability research team.

The post Security Advisory: CVE-2021-44521 appeared first on Instaclustr.

We’re Porting Our Database Drivers to Async Rust

Our Rust driver started as a humble hackathon project, but it has eventually grown to become our fastest and safest Cassandra Query Language (CQL) driver. We happily observed in our benchmarks that our ScyllaDB Rust Driver beats even the reference C++ driver in terms of raw performance, and that gave us an idea — why not unify all drivers to use Rust underneath?

Register for Our Rust Virtual Workshop

Benefits of a Unified Core

Database drivers are cool and all, but there’s one fundamental problem with them: a huge amount of reinventing the wheel, implementing the same logic over and over for every possible language that people write applications in. Implementing the same thing in so many languages also increases the chances of some of the implementations having subtle bugs, performance issues, bit rot, and so on. It sounds very tempting to deduplicate as much as possible by reusing a common core — in our case, that would be the Rust driver.

Easier Maintenance

When most of the logic is implemented once, maintainers can focus on this central implementation by extensive tests and proper review. It takes much less manpower to keep a single project up-to-date than trying to manage one project per each language. Once a new feature is added to ScyllaDB, it’s possible that only the core should be updated, making all the derivative implementations automatically benefit from it.

Less Bugs

It goes without saying that deduplication helps reduce the number of bugs, because there’s simply less code where a bug can occur. Additionally, backporting urgent fixes also gets substantially easier, because the same fix wouldn’t have to be carefully rewritten in each supported language.

What’s even better is that all existing drivers already have their own test suites – for unit tests, integration tests, and so on. A single core implementation would therefore be tested by many independent test harnesses and lots of cases. Sure, the majority of them will overlap, but there’s no such thing as a perfect test suite, so using several ones reduces the probability of missing a bug and generally improves test coverage. And all the tests are already there, for free!

Performance

Some drivers are slow due to their outdated design, some are faster than others because they’re implemented in a language that imposes less overhead. Some, like our Rust driver, are the fastest. Similarly to how Python relies on modules compiled in C to make other modules less unbearably slow faster, our CQL drivers could benefit from a Rust core. A lightweight API layer would ensure that the drivers are still backward compatible with their previous versions, but the new ones will delegate as much work as possible straight to the Rust driver, trusting that it’s going to perform the job faster and safer.

Rust’s asynchronous model is a great fit for implementing high-performance, low-latency database drivers, because it’s scalable and allows utilizing high concurrency in your applications. Contrary to what other languages implement, Rust abstracts away the layer responsible for running asynchronous tasks. This layer is called runtime, and it’s a very powerful tool for developers to be able to pick their own one, or even implement it. After careful research, we picked Tokio as our runtime of choice, due to its very active open-source community, focus on performance, rich feature set (including complete implementation for network streams, timers, etc.) and lots of fantastic utilities like tokio-console.

Language Bindings

Writing code in one language in order to use it in another is common practice and there are lots of tools available for the job. Rust is no exception — its ecosystem is generally very developer-friendly, and there are many crates that make bindings with other languages effortless.

C/C++

Binding with C/C++ applications doesn’t actually require much effort anyway. Rust uses LLVM for code generation, and the output executables, libraries and object files are more or less linkable with any C/C++ project. Still, there are a few good practices when using Rust and C/C++ in a single project.

First of all, make sure that name mangling won’t make it hard for the linker to find the functions you compiled in Rust. People who ever wrote functions in C++ and used them from C are definitely familiar with the keyword `extern “C”`, and the same trick applies to Rust: https://doc.rust-lang.org/std/keyword.extern.html. Simply mark your functions that you mean to export with `extern "C"`, and names will not be mangled in any way. Then, the linker will have an easier job matching the Rust parts with your C++ object files and executables.

For even smoother developer experience, the cxx crate can be used to reduce the amount of boilerplate code and make the bindings more robust.

Python

The Python CQL driver is extremely popular among ScyllaDB/Cassandra users, but, well, Python is not exactly well known for its blazing speed or scalability for high concurrency applications (ref: https://en.wikipedia.org/wiki/Global_interpreter_lock).

Fortunately, due to its dynamic typing and the interpreter being very lenient, it’s also really easy to provide bindings to a Python application. PyO3 crate sounds like it has great potential for simplifying the development of native Python modules.

Challenges

Even though there are lots of advantages for unifying the implementation of multiple drivers, one must also consider the drawbacks. First of all, each tiny bug in the Rust core now has global scope – it would affect all the derivative drivers. Then, the glue code provided to bind our Rust driver with the target language is also a potential place where a bug can hide. Relying on third party libraries for bindings also adds yet another dependency to each driver.

Driverless?

It’s become popular lately to embrace the “driverless” way and expose an interface implemented for a well known protocol like gRPC or HTTP(S). It’s an interesting point and certain applications and developers could definitely benefit from that approach, but going through yet another layer of protocols creates overhead (multiple rounds of serialization/deserialization, parsing the protocol frames, and so on), and users should be able to opt-in for better performance, which native CQL drivers provide.

What’s Already Done

Porting the CQL C++ driver was already mostly done during our internal hackathon: https://github.com/hackathon-rust-cpp/cpp-rust-driver. While it’s still work in progress, it’s also very promising, because the compatibility layer is quite thin for C++ – partly because ABI of both languages share many similarities.

Summary

Unifying the drivers is a big and complicated task and we’ve only just begun the journey, but we have very high hopes for the future performance and robustness of all our CQL drivers. Stay tuned for updates!

Watch the ScyllaDB Summit 2022 Talks on Rust!

We had a number of talks at ScyllaDB Summit 2022 on Rust:

  • ScyllaDB Rust Driver: One Driver to Rule Them All
  • Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
  • ORM and Query Building in Rust

Register now to watch all these sessions and more on-demand

WATCH SCYLLA SUMMIT VIDEOS ON DEMAND NOW

Learn to Build Low-Latency Apps in Rust!

If you want to learn to build your own low-latency apps in Rust, check out our upcoming virtual workshop on Thursday, March 3, 2022! Save your seat! Register now!

REGISTER FOR OUR VIRTUAL WORKSHOP

Learning Rust the hard way for a production Kafka+ScyllaDB pipeline

Learning Rust the hard way for a production Kafka+ScyllaDB pipeline

This is the web version of the talk I gave at [Scylla Summit 2022](https://www.scyllad...