Onsite and Online Live Training

Scylla offers a variety of ways to get your knowledge ramped up when it comes to Scylla administration, data modeling and various features usage. You can read through our Scylla documentation, take free courses in Scylla University, read our blogs or watch our on-demand webinars.

While the above options are very informative and approachable, many people still want the human touch: personalized and (as needed) customized knowledge delivery from our Scylla experts through onsite & online live training sessions.

Our Scylla Enterprise offering includes the option of having a paid personal onsite / online live training. One (or more) of our solutions architects and/or training product owners can deliver these lessons right in your offices, or as online sessions using video conferencing tools like Zoom or Webex.

Onsite training can be tailored to your team’s specific needs and availability, most typically ranging between one to three full days, while our online training is usually comprised of 3 sessions of a few hours each. The training focuses on Scylla administration topics and can also include hands-on training exercises.

Here are some of the topics we cover:

We truly enjoy meeting and engaging with our customers around the world. We also get to learn a lot from you about how important your big data is, and how you are using Scylla to solve your greatest challenges.

Here’s a short list of some of the onsite/online training sessions we delivered in the last 18 months:

By the way, if you are planning or thinking about attending Scylla Summit this year, that’s another great option to get training for yourself or your team. We have pre-Summit training courses for attendees ranging from beginner to advanced. Our training classes may fill up, though, so don’t hesitate! Register now.

But if you can’t make it to Scylla Summit this year never fear! We can still deliver Scylla training live over the Internet, or our experts can come straight to your door.

CONTACT US FOR ONSITE OR ONLINE TRAINING

The post Onsite and Online Live Training appeared first on ScyllaDB.

Q&A with Brian Hall and Ryan Stauffer

As we prepare for Scylla Summit 2019, we’re highlighting some of this year’s featured presenters. And a reminder, if you’re not yet registered for Scylla Summit, please take the time to register now!

Today we’ll share our conversation with Brian Hall of Expero, and Ryan Stauffer of Enharmonic. Together they will present back-to-back sessions entitled Incorporating JanusGraph into Your Scylla Ecosystem and Powering a Graph Data System with Scylla + JanusGraph. These two practitioners bring tremendous experience in getting the most out of your graph data.

There are countless different technologies today employed in big data and data analytics. What paths did you follow to get involved with graph databases specifically?

Ryan: A few years ago, when I was working at a large Private Equity-backed holding company, I was working on bringing data together from across different subsidiaries into a single unified data model. It became clear that thinking in terms of rows, columns, and tables was taking us down some very awkward and indirect routes to data insights. On the other hand, a graph data model maps directly to real-world business concepts. If the model is laid out clearly, it can be much easier for non-engineer decision-makers to conceptualize. I’ve experimented and iterated with different pieces in the graph “tech stack,” but since then I’ve viewed graph databases as a core part of a solution for how to ask better questions and reach deeper insights.

Brian: At Expero, we build products and applications for our clients… and we bring the tools and technologies required to do the job. Only sometimes do they require graph databases. And that’s exactly how we got into this business. A client came to us with a problem that we all thought needed a graph database. As part of project work, we evaluated several for them to consider and then recommended to them what technology they should pick and why. They allowed us to write up and publish our findings. This was back in 2014 when not many people were talking about them and boom! our phone started ringing off the hook. Voila! We were in the graph database business.

JanusGraph is a well-known graph database solution. It’s growth is in great part due to its flexibility via pluggable back-end storage systems. What makes Scylla different and better as a back-end from other options you can use under JanusGraph?

Brian: While there are options for the back-end storage, Scylla stands out for a couple of reasons: obviously first and foremost – performance. I don’t believe anyone would dispute that. Secondly, it really is simple to setup and maintain. While there are countless settings and knobs that can be adjusted on some of the other options, you don’t have to spend years learning how to tune Scylla. It’s performant right away. Lastly, unlike some others, you have some measure of cloud provider portability if that’s important to you.

Ryan: Brian already hit on performance and ease of setup – which I certainly agree with, and which on their own make Scylla the go-to storage backend choice for JanusGraph. I like being able to sidestep the JVM for this piece of the deployment (as opposed to dealing with HBase or Cassandra). I also feel very strongly about the commitment from the Scylla team. They provide an incredible breadth of knowledge and technical insight into their product and how to operate it – all the way down to hardcore performance metrics, and explaining “why” things work the way they do. It’s all very reassuring. And when hiccups do arise around deployments or ops, I’ve found them to be incredibly accessible and skilled at helping to quickly resolve issues.

Gremlin/Tinkerpop have now been around for a decade. GraphQL has been around since 2012. OpenCypher since 2015. Yet while these APIs aren’t new, there still seems to be confusion as to which to use, or not use. What’s your take?

Ryan: There’s no perfect answer this question, and it’s certainly not going to be decided here. There’s a Big Question that underpins all of this: What are you trying to accomplish? Are you just looking for a new form of API to access your existing data? If so you should probably at least investigate GraphQL (but, spoiler alert, it really has little to do with an actual “graph” model of data, nothing to do with graph databases, and nothing to do with graph algorithms).

On the other hand, if you’re looking for an expressive language that you can use to explore a graph at scale, I believe Gremlin is the way to go. I love the conceptual backing of Gremlin and TinkerPop. The language itself is expressive, and you can choose to write either imperatively or declaratively based on context and preference. You can also build and submit traversals in a variety of languages (Python, Ruby, etc.), which lets you avoid string manipulation while also streamlining automated query-building and testing.

Brian: Practically speaking when you select your graph database, you implicitly are also selecting your query and traversal language. Arguably, by selecting open standards you allow for graph database portability in the future to avoid vendor lockin. By choosing JanusGraph, you are also choosing Gremlin/Tinkerpop.

There is a movement afoot currently within the vendor community to lay down a universal query language, GQL, with many in the community participating. While that initiative is still very early, it would be good to align along more consistent syntax and processing paradigm. We would then benefit from shared skillsets like classic ANSI SQL. RDBMS vendors ultimately would all agree on selects, joins and unions and compete on more advanced features. I’m happy I don’t have to learn the syntax on how to do a 2 table customer/address join in any new SQL engine. I’d love the graph db market to get to the same place.

People in the Scylla community are likely aware of your prior work, from Ryan’s blog on Powering a Graph Data System with Scylla + JanusGraph, to Brian’s webinar on Architecting a Corporate Compliance Platform with Graph and NoSQL Databases. What will you present at Scylla Summit that goes beyond what people have seen or read from you both in the past?

Ryan: My goal will be to highlight just how quickly you can get up and running with a pretty powerful deployment of JanusGraph and Scylla. It’s easy to get stuck in documentation, or worry about a particular configuration setting, but really it doesn’t take much to get set up in a way that lets you start throwing meaningfully large amounts of data at your graph. I’m also really interested to hear what questions and experiences people in the audience have had. I find the fact that you can get everyone together in the same room to be the biggest draw of an event like this.

Brian: Since we’ll be tag teaming this event, we can cast the net a bit more broadly. I’ll be covering a lot more of the “art of the possible” with JanusGraph on top of Scylla. Since this will be primarily a cohort of Scylla folks who may know little to nothing about graphs in general, I want people to be introduced to graph and what it can do for them. There are a lot of mind-blowing things that are possible if you just know to look.

Lastly, we’d like to get to know a bit about you both outside of technology. What interests or hobbies do you each have beyond creating Big Data solutions?

Ryan: I think I’m an explorer at heart. My most recent big adventure was Iceland this past summer, but I also love adventuring around San Francisco. Sometimes that means seeking out new running routes (my current favorite is tracing the Embarcadero from Oracle Park, then cutting in and taking the stairs up to Coit Tower), and sometimes that means just relaxing at a new restaurant or bar. I also have a few computer music projects that help me reset my thoughts.

Brian: Primarily travel, food and wine with my friends and family. Recent journeys include Kenya, Tanzania and Spain and I’m looking forward to Austria and Hungary in the near future. To paraphrase Mark Twain: “travel is fatal to bigotry, prejudice and narrowmindedness… happy traveling!”

Thank you both for taking the time to speak with me today. I’m sure our attendees will find your presentations incredibly valuable!

REGISTER FOR SCYLLA SUMMIT 2019

The post Q&A with Brian Hall and Ryan Stauffer appeared first on ScyllaDB.

Compression in Scylla, Part Two

READ PART ONE OF THIS BLOG

In the first part of this blog we’ve learned a bit about compression theory and how some of the compression algorithms work. In this part we focus on practice, testing how the different algorithms supported in Scylla perform in terms of compression ratios and speeds.

I’ve run a couple of benchmarks to compare: disk space taken by SSTables, compression and decompression speeds, and the number and durations of reactor stalls during compression (explained later). Here’s how I approached the problem.

First we need some data. If you have carefully read the first part of the blog, you know that compressing random data does not really make sense and the obtained results wouldn’t resemble anything encountered in the “real world”. The data set used should be structured and contain repeated sequences.

Looking for a suitable and freely available data set, I found the list of actors from IMDB in a nice ~1GB text file (ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/actors.list.gz). The file contains lots of repeating names, words, and indentation — exactly what I need.

Now the data had to be inserted into the DB somehow. For each tested compression setting I created a simple table with an integer partition key, an integer clustering key, and a text column:

create table test_struct (a int, b int, c text, primary key (a, b)) with compression = {...};

Then I distributed ~10MB of data equally around 10 partitions, inserting it in chunks into rows with increasing value of b. 10MB is more than enough — the tables will be compressed in small chunks anyway.

Each of the following benchmarks was repeated for each of some commonly used values for chunk length: 4KB, 16KB, 64KB. I ran them for each of the compression algorithms plus when compression was disabled. For ZStandard, Scylla allows to specify different compression levels (from 1 to 22); I tested levels 1, 3, 5, 7, and 9.

Space

The first benchmark measures disk space taken by SSTables. Running a fresh Scylla instance on 3 shards I inserted the data, flushed the memtables, and measured the size of the SSTable directory. Here are the results.

The best compression ratios were achieved with DEFLATE and higher levels of ZStandard. LZ4 and Snappy performed great too, halving the space taken by uncompressed tables when 64 KB chunk lengths were used. Unfortunately the differences between higher levels of ZStandard are pretty insignificant in this benchmark, which should not be surprising considering the small sizes of chunks used; it can be seen though that the differences grow together with chunk length.

Compression speed

The purpose of the next benchmark is to compare compression speed. I did the same thing as in the first benchmark, except now I measured the time it took to flush the memtables and repeated the operation ten times for each compression setting. On the error bar below, dots represent means and bars represent standard deviations.

What I found most conspicuous was that flushing was sometimes faster when using LZ4 and Snappy compared to no compression. The first benchmark provides a possible explanation: in the uncompressed case we have to write twice the amount of data to disk than in the LZ4/Snappy case, and writing the additional data takes longer than compression (which happens in memory, before SSTables are written to disk), even with fast disks (my benchmarks were running on top of an M.2 SSD drive).

Another result that hit my eye was that in some cases compression happened to be significantly slower for 16KB chunks. I don’t have an explanation for this.

It looks like ZStandard on level 1 put up a good fight, being only less than 10 percentage points behind LZ4 in the 64KB case, followed by DEFLATE which performed slightly worse in general (the mean was better in the 4KB case, but there’s a big overlap when we consider standard deviations). Unfortunately ZStandard became significantly slower with higher levels.

Decompression speed

Let’s see how decompression affects the speed of reading from SSTables. I first inserted the data for each tested compression setting to a separate table, restarted Scylla to clean the memtables, and performed a select * from each table, measuring the time; then repeated that last part nine times. Here are the results.


In this case ZStandard even on level 1 behaves (much) worse than DEFLATE, which in turn is pretty close to what LZ4 and Snappy offer. Again, as before, reading compressed files is faster with LZ4/Snappy than in the uncompressed case, possibly because the files are simply smaller.
Interestingly, using greater chunks makes reading faster for most of the tested algorithms.

Reactor stalls

The last benchmark measures the number and length of reactor stalls. Let me remind you that Scylla runs on top of Seastar, a C++ framework for asynchronous programing. Applications based on Seastar have a sharded, “shared-nothing” architecture, where each thread of execution runs on a separate core which is exclusively used by this thread (if we take other processes in the system out of consideration). A Seastar programmer explicitly defines units of work to be performed on a core. The machinery which schedules units of work on a core in Seastar is called the reactor.

For one unit of work to start, the previous unit of work has to finish; there is no preemption happening as in classical threaded environment. It is the programmer’s job to keep units of work short — if they want to perform a long computation, then instead of doing it in a single shot, they should perform a part of it and schedule the rest for the future (in a programming construct called a “continuation”) so that other units have the chance to execute. When a unit of work computes for an undesirably long time, we call that situation a “reactor stall”.

When does a computation become “undesirably long”? That depends on the use case. Sometimes Scylla is used as a data warehouse, where the number of concurrent requests to the database is low; in that case we can give ourselves a little leeway. Many times Scylla is used as a real-time database serving millions of requests per second on a single machine, in which case we want to avoid reactor stalls like the plague.

The compression algorithms used in Scylla weren’t implemented from scratch by Scylla developers — there is little point in doing that. We are using the available fantastic open source implementations often coming from the authors of the algorithms themselves. A lot of work was put into these implementations to make sure they are correct and fast. However, using external libraries comes with a cost: we temporarily lose control over the reactor when we call a function from such a library. If the function is costly, this might introduce a reactor stall.

I measured the counts and lengths of reactor stalls that happen during SSTable flushes performed for the first two benchmarks. Any computation that took 1ms or more was considered a stall. Here are the results.

Fortunately reactor stalls aren’t a problem with the fast algorithms, including ZStandard on level 1. They sometimes appear when using DEFLATE and higher levels of ZStandard, but only with greater chunk lengths and still not too often. In general the results are pretty good.

Conclusions

Use compression. Unless you are using a really (but REALLY) fast hard drive, using the default compression settings will be even faster than disabling compression, and the space savings are huge.

When running a data warehouse where data is mostly being read and only rarely updated, consider using DEFLATE. It provides very good compression ratios while maintaining high decompression speeds; compression can be slower, but that might be unimportant for your workload.

If your workload is write-heavy but you really care about saving disk space, consider using ZStandard on level 1. It provides a good middle-ground between LZ4/Snappy and DEFLATE in terms of compression ratios and keeps compression speeds close to LZ4 and Snappy. Be careful however: if you often want to read cold data (from the SSTables on disk, not currently stored in memory, so for example data that was inserted a long time ago), the slower decompression might become a problem.

Remember, the benchmarks above are artificial and every case requires dedicated tests to select the best possible option. For example, if you want to decrease your tables’ disk overhead even more, perhaps you should consider using even greater chunk lengths where higher levels of ZStandard can shine — but always run benchmarks first to see if that’s better for you.

The post Compression in Scylla, Part Two appeared first on ScyllaDB.

Compression in Scylla, Part One

In this two-part blog we’ll focus on the problem of storing as much information as we can in the least amount of space as possible. This first part will deal with the basics of compression theory and implementations in Scylla. The second part will look at actual compression ratios and performance.

First, let’s look at a basic example of compression. Here’s some information for you:

Piotr Jastrzębski is a developer at ScyllaDB. Piotr Sarna is a developer at ScyllaDB. Kamil Braun is a developer at ScyllaDB.

And here’s the same information, but taking less space:

Piotr Jastrzębski#0. Piotr Sarna#0. Kamil Braun#0.0: is a developer atScyllaDB.

I compressed the data using a lossless algorithm. If you knew the algorithm I used, you’d be able to retrieve the original string from the latter string, i.e. decompress it. For our purposes we will only consider lossless algorithms.

We would like to apply compression to the files we store on our disk to make them smaller with the possibility of retrieving the original files later.

In the first part of this blog we focus on the theory behind compression: what makes compression possible, and what sometimes doesn’t; what are the general ideas used in the algorithms supported by Scylla; and how is compression used to make SSTables smaller.

In the second part we’ll look at a couple of benchmarks that compare the different supported algorithms to help us understand which ones are better suited for which situations: why should we use one for cases where latency is important, and why should we use the other for cases where lowering space usage is crucial.

Lossless compression

Unfortunately, one does not simply compress any file and expect that the file will take less space after compression.

Consider the set of all strings of length 5 over the alphabet {0, 1}:

A = {00000, 00001, 00010, 00011, …, 11110, 11111}.

Suppose that we have 32 (25) files on our disk, each taking 5 bits, each storing a different sequence from the above set. We would like to take our latest, bleeding-edge technology, better-than-everyone-else compression algorithm, and apply it to our files so that each file takes 4 or fewer bits after compression. And, of course, we want the compression to be lossless.

In other words, we want to have an injective function from our set A to the set of strings of length <= 4 over {0, 1}:

B = {0, 1, 00, 01, 10, 11, 100, …, 1110, 1111}

The function is the compression algorithm. Injectivity means that no two elements from A will be mapped to the same element in B, and that’s what makes the compression lossless. But that’s just not possible, because the cardinality of A is 32, while the cardinality of B is 31. A German mathematician named Dirichlet once said (paraphrasing): if you try to put n pigeons into n – 1 holes, then at least one hole will be shared by two pigeons. That rule is known as the pigeonhole principle.

The pigeonhole principle as illustrated on Wikipedia.

Therefore any lossless algorithm that makes some inputs smaller after compression, makes other inputs at least as big after compression as the input itself. So which inputs shall be made smaller?

Complexity

Try to understand the difference between the following two strings. First:

0000000011111111,

Second:

001100000100101.

To create the first string, I invented a simple rule: eight 0s first, then eight 1s. For the second string, I threw a coin 16 times (really).

A person asked on the street would say that the second string is “more random” than the first. An experienced mathematician would say that the second string has greater Kolmogorov complexity (probably).

Kolmogorov complexity, named after the Russian mathematician Kolmogorov, is a fun concept. The (Kolmogorov) complexity of a string is the length of the shortest program which outputs the string and terminates. By “outputs the string” we mean a program that takes no input and outputs the string each time it is executed (so we don’t count random programs which sometimes output a different string). To be completely precise we would have to specify what exactly a “program” means (usually we use Turing machines for that). Intuitively, strings of lower complexity are those that can be described using fewer words or symbols.

Returning to our example, try to describe our two strings in English in as few words as possible. Here’s my idea for the first one: “eight zeros, eight ones”. And for the second one, I’d probably say “two zeros, two ones, five zeros, one, two zeros, one, zero, one”. Send me a message if you find something much better. 😊

Intuitively, strings of high complexity should be harder to compress. Indeed, compression tries to achieve exactly what we talked about: creating shorter descriptions of data. Of course I could easily take a long, completely random string and create a compressor which outputs a short file for this particular string by hard-coding the string inside the compressor. But then, as explained in the previous section, other strings would have to suffer (i.e. result in bigger files), and the compressor itself would become more complex by remembering this long string. If we’re looking to develop compressors that behave well on “real-world” data, we should aim to exploit structure in the encountered strings — because the structure is what brings complexity down.

But the idea of remembering data inside the compressor is not completely wrong. Suppose we knew in advance we’ll always be dealing with files that store data related to a particular domain of knowledge — we expect these files to contain words from a small set. Then we don’t care if files outside of our interest get compressed badly. We will see later that this idea is used in real-world compressors.

Unfortunately, computing Kolmogorov complexity is an undecidable problem (you can find a short and easy proof on Wikipedia). That’s why the mathematician from above would say “probably.”

To finish the section on complexity, here’s a puzzle: in how many words can we describe the integer that cannot be described in fewer than twelve words?

Compressing SSTables

Scylla uses compression to store SSTable data files. The latest release supports 3 algorithms: LZ4, Snappy, and DEFLATE. Recently another algorithm, Zstandard, was added to the development branch. We’ll briefly describe each one and compare them in later sections.

What compression algorithm is used, if any, is specified when creating a table. By default, if no compression options are specified for a table, SSTables corresponding to the table will be compressed using LZ4. The following commands are therefore equivalent:

create table t (a int primary key, b int) with compression =
{'sstable_compression': 'LZ4Compressor'}; create table t (a int primary key, b int);.

You can check the compression options used by the table using a describe table statement. To disable compression you’d do:

create table t (a int primary key, b int) with compression =
{'sstable_compression': ''};

A quick reminder: in Scylla, rows are split into partitions using the partition key. Inside SSTables, partitions are stored continuously: between two rows belonging to one partition there cannot be a row belonging to a different partition.

This enables performing queries efficiently. Suppose we have an uncompressed table t with integer partition key a, and we’ve just started Scylla, so there’s no data loaded into memtables. When we perform a query like the following:

select * from t where a = 0;

then in general we don’t need to read each of the SSTables corresponding to t entirely to answer that query. We only need to read each SSTable from the point where partition a = 0 starts to the point where the partition ends. Obviously, to do that, we need to know the positions of partitions inside the file. Scylla uses partition summaries and partition indexes for that.

Things get a bit more complicated when compression is involved (but not too much). If we had compressed the entire SSTable, we would have to decompress it entirely to read the desired data. This is because the concept of order of data does not exist inside the compressed file. In general, no piece of data in the original file corresponds to a particular piece in the compressed file — the resulting file can only be understood by the decompression algorithm as one whole entity.

Therefore Scylla splits SSTables into chunks and compresses each chunk separately. The size of each chunk (before compression) is specified during table creation using the chunk_length_in_kb parameter:

create table t (a int primary key, b int) with compression =
{‘sstable_compression’: ‘LZ4Compressor’, ‘chunk_length_in_kb’: 64};

By the way, the default chunk length is 4 kilobytes.

Now to read a partition, an additional data structure is used: the compression offset map. Given a position in uncompressed data it tells the partition index which chunk corresponds to (contains) the position.

There is a tradeoff that the user should be aware of. When dealing with small partitions, large chunks may significantly slow down reads, since to read even a small partition, the entire chunk containing it (or chunks, if the partition happens to start at the end of one chunk) must be decompressed. On the other hand, small chunks may decrease the space efficiency of the algorithm (i.e., the compressed data will take more space in total). This is because a lot of compression algorithms (like LZ4) perform better the more data they see (we explain further below why that is the case).

Algorithms supported by Scylla

LZ4 and Snappy focus on compression and decompression speed, therefore they are well suited for low latency workloads. At the same time they provide quite good compression ratios. Both algorithms were released in 2011. LZ4 was developed by Yann Collet, while Snappy was developed at, and released by Google.

Both belong to the LZ77 family of algorithms, meaning they use the same core approach for compression as the LZ77 algorithm (described in a paper published in 1977). The differences mostly lie in the format of the compressed file and in the data structures used during compression. Here I’ll only briefly describe the basic idea used in LZ77.

The string

abcdefgabcdeabcde

would be described by a LZ77-type algorithm as follows:

abcdefg(5,7)(5,5)

How the shortened description is then encoded in the compressed file depends on the specific algorithm.

The decompressor reads the description from left to right, remembering the decompressed string up to this point. Two things can happen: either it encounters a literal, like abcdefg, in which case it simply appends the literal to the decompressed string, or it encounters a length-distance pair (l,d), in which case it copies l characters at offset d behind the current position, appending them to the decompressed string.

In our example, here are the steps the algorithm would make:

  1. Start with an empty string,
  2. Append abcdefg (current decompressed string: abcdefg),
  3. Append 5 characters at 7 letters behind the current end of the decompressed string, i.e., append abcde (current string: abcdefgabcde),
  4. Append 5 characters at 5 letters behind the current end of the decompressed string, i.e., append abcde (current string: abcdefgabcdeabcde).

We see that the compressor must find suitable candidates for the length-distance pairs. Therefore a LZ77-type algorithm would read the input string from left to right, remembering some amount of the most recent data (called the sliding window), and use it to replace encountered duplicates with length-distance pairs. Again, the exact data structures used are a detail of the specific algorithm.

Now we can also understand why LZ4 and Snappy perform better on bigger files (up to a limit): the chance to find a duplicated sequence of characters to be replaced with a length-distance pair increases the more data we have seen during compression.

DEFLATE and ZStandard also build on the ideas of LZ77. However, they add an extra step called entropy encoding. To understand entropy encoding, we first need to understand what prefix codes (also called prefix-free codes) are.

Suppose we have an alphabet of symbols, e.g. A = {0, 1}. A set of sequences of symbols from this alphabet is called prefix-free if no sequence is a prefix of the other. So, for example, {0, 10, 110, 111} is prefix-free, while {0, 01, 100} is not, because 0 is a prefix of 01.

Now suppose we have another alphabet of symbols, like B = {a, b, c, d}. A prefix code of B in A is an injective mapping from B to a prefix-free set of sequences over A. For example:

a -> 0, b -> 10, c -> 110, d -> 111

is a prefix code from B to A.

The idea of entropy encoding is to take an input string over one alphabet, like B, and create a prefix code into another alphabet, like A, such that frequent symbols are assigned short codes while rare symbols are assigned long codes.

Say we have a string like “aaaaaaaabbbccd”. Then the encoding written earlier would be a possible optimal encoding for this string (another optimal encoding would be a -> 0, b -> 10, c -> 111, d -> 110).

It’s easy to see how entropy coding might be applied to the problem of file compression: take a file, split it into a sequence of symbols (for example, treating every 8-bit sequence as a single symbol), compute a prefix code having the property mentioned above, and replace all symbols with their codes. The resulting sequence, together with information on how to translate the codes back to the original alphabet, will be the compressed file.

Both DEFLATE and ZStandard begin with running their LZ77-type algorithms, then apply entropy encoding to the resulting shortened descriptions of the input files. The alphabet used for input symbols and the entropy encoding method itself is specific to each of the algorithms. DEFLATE uses Huffman coding, while ZStandard combines Huffman coding with Finite State Entropy.

DEFLATE was developed by Phil Katz and patented in 1991 (the patent expired in 2019). However, a later specification in RFC 1951 said that “the format can be implemented readily in a manner not covered by patents.“ Today it is a widely adopted algorithm implemented, among other places, in the zlib library and used in the gzip compression program. It achieves high compression ratios but can be a lot slower during compression than LZ4 or Snappy (decompression remains fast though) — we will see an example of this in benchmarks included in part two of this blog. Therefore, in Scylla, DEFLATE is a good option for workloads where latency is not so important but we want our SSTables to be as small on disk as possible.

In 2016 Yann Collet, author of LZ4, released the first version of zstd, the reference C implementation of the ZStandard algorithm. It can achieve compression ratios close to those of DEFLATE, while being competitive on the speed side (although not as good as LZ4 or Snappy).

Remember how we talked about storing data inside the compressor to make it perform better with files coming from a specific domain? ZStandard enables its users to do that with the option of providing a compression dictionary, which is then used to enhance the duplicate-removing part of the algorithm. As you’ve probably guessed, this is mostly useful for small files. Of course the same dictionary needs to be provided for both compression and decompression.

Conclusions

We have seen why not everything can be compressed due to the pigeonhole principle and looked at Kolmogorov complexity, which can be thought of as a mathematical formalization of the intuitive concept of “compressibility”. We’ve also described how compression is applied to SSTables in a way that retains the possibility of reading only the required parts during database queries. Finally, we’ve discussed the main principles behind the LZ77 algorithm and the idea of entropy encoding, both of which are used by the algorithms supported in Scylla.

I hope you learned something new when reading this post. Of course we’ve only seen the tip of the iceberg; I encourage you to read more about compression and related topics if you found the topic interesting.

There is no one size that fits all. Therefore make sure to read the second part of the blog, where we run a couple of benchmarks to examine the compression ratios and speeds of different algorithms, which’ll help you better understand which algorithm should you use to get the most of you database.

CONTINUE TO PART TWO

The post Compression in Scylla, Part One appeared first on ScyllaDB.

Monitoring Scylla with Datadog: A Tale about Datadog – Prometheus integration

As an open-source project, our Scylla NoSQL database has an open-source monitoring stack, Scylla Monitoring Stack. It uses Prometheus as the reporting and collecting layer and Grafana for dashboard creation. Prometheus pulls metrics directly from each Scylla Server.

The Scylla Monitoring Stack

A question we keep getting from customers is: “How can I monitor Scylla with Datadog?” And more generally “How do you integrate Datadog and Prometheus?”

I went to find out and returned with two answers. You can sort out the first from Datadog documentation, but there’s an alternative and more surprising answer — read to the end to see it. If you want to use this solution for your other applications, you can replace “Scylla” below with your application.

Before we go any further, a few words about Prometheus.

Prometheus essentials

Prometheus is an open-source monitoring solution. It uses a pull-based approach, as the Prometheus server “scrapes” its targets periodically. Prometheus reads the targets directly from applications that have a Prometheus API (like Scylla does) or use exporters for applications that do not.

Prometheus protocol is text-based and human-readable so if at any point you have an issue with the data, you can point your browser or use curl to fetch the metrics directly.
For example, you can look at Scylla’s metrics by calling:

$ curl http://localhost:9180/metrics

Prometheus Datadog Integration – Local Agent

The first approach, which you will find if you check Datadog documentation, is to use a local agent (installed on the same node as the Scylla database server).

Datadog agent version 6.1.0 and onwards can read Prometheus protocol. On each of the hosts that run an application that you want to monitor (Scylla in our case) run the Datadog agent and let the agent read from the application.

For this blog-post, we assume that you already have a Datadog account. You can register for a 14 day trial for free if you don’t.

Now follow these steps:

  1. Install the Datadog agent on each Scylla server as described here.
  2. Create a Prometheus configuration file for the agent.
    sudo cp
    /etc/datadog-agent/conf.d/prometheus.d/conf.yaml.example
    /etc/datadog-agent/conf.d/prometheus.d/conf.yaml
  3. Edit the configuration file:
    Point the prometheus_url to Scylla’s Prometheus endpoint: http://localhost:9180/metrics
    Set the metrics to scylla_* and set the namespace to scylla.An example of a minimal config file:
    init_config: instances:
      - prometheus_url: http://localhost:9180/metrics
        namespace: scylla
        metrics:
          - scylla_*
  4. Restart the agent as described here.
  5. You should now be able to use the Scylla metrics in your dashboard.

Please note installing an agent on each server can compete with Scylla on resources. If you do choose this method, it’s recommended to limit Datadog agents by using systemd slices.

Reading directly from a Prometheus Server

If you have Datadog agents running on each of your nodes, the previous section could be good enough. But, if you are running a Prometheus server it is sometimes useful to read the metrics from the Prometheus server and not from the endpoints.

Happily enough, there is such an option using Prometheus federation. Federation allows one Prometheus server to read metrics that are stored in another server.

To use Prometheus federation with Datadog, you need the Datadog agent only on the machine that runs the Prometheus server. Install and configure the Datadog agent like in the previous section, but this time set the prometheus_url to the federate endpoint.

An example of such a configuration file:

init_config:

instances:
  - prometheus_url: http://localhost:9090/federate?match[]={job=~"scylla"}
    namespace: scylla
    metrics:
      - scylla_*

In the example, we used the match with job equals scylla, but it would work for any other monitored application with different parameters.

Note that you must supply a match parameter for the federate endpoint to work

A Scylla dashboard example in Datadog

To create a Datadog dashboard follow the instructions here.

When you add a graph, Datadog will show the available metrics, scylla’s metrics start with ‘scylla’.

As a reference and suggestion, you can look at Scylla Monitoring Dashboards for the metrics we think are more useful.

Conclusion (and which option to choose)

There are two alternatives for reporting Scylla metrics to Datadog:

  1. Using a local agent on each Scylla server
  2. Using one agent on Scylla Monitoring Stack, reading from Prometheus

Although both methods work, we recommend the second one for the following reasons:

  • Simpler to deploy (only one agent)
  • Does not risk Scylla by running an additional process on each DB server. Such an agent may compete with Scylla on server resources, bandwidth etc.

Please note, both methods can work in parallel to Scylla Monitoring Stack.

We recommend having Scylla Monitoring Stack in place, even if you are using DataDog. Scylla Monitoring dashboards are very rich and useful and will help our support team to debug any issue you might have.

The post Monitoring Scylla with Datadog: A Tale about Datadog – Prometheus integration appeared first on ScyllaDB.

Announcing the 2019 Scylla User Awards

We’re excited to once again honor the amazing things our users are doing with Scylla. Our 2019 Scylla User Awards will recognize user accomplishments and contributions to the Scylla community across 9 categories.

Do you have an especially interesting Scylla use case? Perhaps you’re handling massive amounts of data? Or maybe your migration to Scylla significantly reduced the number of nodes you need? Doing something really cool with Scylla and Kafka, Spark or a graph database?

ENTER TO WIN THE 2019 SCYLLA USER AWARDS!

If so, please tell us all about yourself and your use case. Submit a valid nomination and we’ll send you a limited-edition Scylla hoodie in your size.

Winners will be announced at Scylla Summit 2019. Oh, and award winners can attend Scylla Summit at no charge!

Nominations will be accepted until October 17, 2019.

2019 Scylla User Award Categories

  • Most Innovative Use of Scylla
  • Biggest Node Reduction with Scylla
  • Best Scylla Cloud Use Case
  • Best Real-Time Use Case
  • Biggest Big Data Use Case
  • Scylla Community Member of the Year
  • Best Use of Scylla with Kafka
  • Best Use of Scylla with Spark
  • Best Use of Scylla with a Graph Database

You can nominate yourself in as many categories as you like. We look forward to seeing all the great things you’re doing with Scylla!

ENTER TO WIN THE 2019 SCYLLA USER AWARDS!

Winners from the 2018 Scylla User Awards

The post Announcing the 2019 Scylla User Awards appeared first on ScyllaDB.

Isolating workloads with Systemd slices

Introduction

Modern software is often comprised of many independent pieces working in tandem for a common goal. That’s obviously true for microservices, which are built from the ground up with the communication paradigm in mind, but it can also be just as true for things that look more like a monolith, like databases.

For infrastructure software like databases, there is usually a piece of software that is more important than others— the server itself, but that is often accompanied by other small, focused applications like agents, backup tools, etc.

But once we have many entities running in the same system, how to make sure they are properly isolated? The common answer these days seems to be just use docker. But that comes with two main drawbacks:

  • Docker and other container technologies ship an entire version of the Operating System image, which is often an overkill
  • Docker and other container technologies force communication between isolated entities through a network abstraction— albeit virtual, which may not always be suitable and can create a security risk

Figure 1: How docker was born

In this article we’ll explore another way, recently adopted by Scylla, to achieve isolation between internal tasks of a complex database server, namely – systemd slices.

What are systemd slices?

If we want to achieve a result like docker, without paying the full complexity price of docker, it pays to ask ourselves “what is docker made of?”

Docker — and other Linux container technologies, rely on two pieces of infrastructure exposed by the kernel: namespaces and cgroups. Linux namespaces are not relevant to our investigation, but for completeness, they allow a process to establish a virtual view of particular resources in the system. For instance: by using the network namespace, a process would not be exposed to the actual network devices, but would view virtualized devices instead, that only exist in that private namespace. Other examples of namespaces are the process namespace, user namespace, etc.

Cgroups are the Linux way of organizing groups of processes: roughly speaking a cgroup is to a process what a process is to a thread: one can have many threads belonging to the same process, and in the same way one can join many processes inside the same cgroup.

By now, cgroups are nothing new: systemd uses cgroups heavily, and makes sure that every service that comprises the system is organized in its own cgroup. That allows systemd to better organize those services and more safely deal with operations like starting and stopping a service.

But how does that relate to resources? Cgroups, on their own, don’t provide any isolation. However, to a particular cgroup one can attach one or more controllers. The controllers are hierarchical entities that control which share of a resource the processes in the cgroups attached to those controllers can consume. Examples of controllers are the CPU controller, the memory controller, blkio controller, etc.

Systemd relies on controllers for its basic functions. For instance, when many users log in, systemd will do its best— through controllers — to make sure that no single user can monopolize the resources of the system. Less widely known, however, that the same functionality is exposed to any service that registers itself with systemd through slices. Slices allow one to create a hierarchical structure in which relative shares of resources are defined for the entities that belong to those slices. Nothing more than the raw power of cgroup controllers gets exposed.

Applying that to the Scylla NoSQL database

Why would a database need something like that? Unlike microservices, databases are long-lived pieces of infrastructure that usually deal with a large number of connections and thrive on proximity to the hardware infrastructure. While that is true for most databases, it is especially true for Scylla, that relies on a modern shard-per-core architecture, heavily relying on kernel bypass and other hardware optimization techniques to squeeze every drop of the hardware resources available.

Scylla will carefully pre-allocate memory, bind particular threads to particular CPUs, and usually assume that it owns the entire hardware where it is running for best performance. However, rarely the Scylla server runs alone: in a real life deployment there are other entities that may run alongside it, like:

  • scylla-jmx, a compatibility layer that exposes the Scylla RESTful control plane as Java Management Extensions (JMX), for compatibility with Cassandra tools,
  • node_exporter, a prometheus-based metrics agent,
  • other ad-hoc agents such as backup-tools,
  • arbitrary commands and processes started by the administrator.

A common display of how such arrangement may play out is depicted in Figure 2, where a Scylla database runs in a node that also runs a backup agent periodically copying files to a remote destination, a security agent imposed by their IT admin, a Kubernetes sidecar and a Prometheus agent collecting metrics about the operating system’s health.

Figure 2: An example of the Scylla server and auxiliary applications running in a node.

Usually those auxiliary services are lightweight and provided by ScyllaDB itself, so their memory and general resource usage are already accounted for. But in situations in which either the hardware resources are not as plentiful, or other user provided utilities are used, auxiliary services can use too many resources and create performance issues in the database.

To demonstrate that, we ran a benchmark in a machine with 64 vCPUs where we read random keys from the database for 15 minutes and expect to fetch them mostly from the cache— which stresses the CPU in isolation. First, the database was running alone in the machine (the Baseline), and in a second run we have fired up 16 CPU hogs of the form:

/bin/sh -c "while true; do true; done" &

While running the command true is not really a common practice (we certainly hope), this could represent other software like some reporting or backup tool that does intense computations for a brief period of time.

How does that affect our throughput and latency?

As we can see, from roughly 1 million reads per second with a lower than 1 millisecond p95 (server-side latencies), the benchmark is now only able to sustain 400 thousand reads per second, with a much higher p95, over 7 milliseconds.

That’s certainly bad news to any application connected to this database.

Systemd slices in practice

Systemd slices work in a hierarchical structure. Each slice can have a name, with parts of the hierarchy separated by the “-” character. In our example, we created two slices: scylla-server and scylla-helper. That means a top-level entity called “scylla” will be created, and every relative share assignment will then be done with respect to that.

Figure 3: Systemd slices hierarchy: there is a “scylla” slice that sits at the top level, and all shares definition will be applied with respect to that. The database processes will be inside “scylla-server”, and all agents and helper processes will be inside “scylla-helper.”

We then define the following slices:

Scylla-helper:

[Unit]
Description=Slice used to run companion programs to Scylla. Memory, CPU and IO restricted
Before=slices.target

[Slice]
MemoryAccounting=true
IOAccounting=true
CPUAccounting=true

CPUWeight=10
IOWeight=10

MemoryHigh=4%
MemoryLimit=5%
CPUShares=10
BlockIOWeight=10

Scylla-server:

[Unit]
Description=Slice used to run Scylla. Maximum priority for IO and CPU
Before=slices.target

[Slice]
MemoryAccounting=true
IOAccounting=true
CPUAccounting=true
MemorySwapMax=1
CPUShares=1000
CPUWeight=1000

We will not get into details about what each of those mean, since the SystemD documentation is quite extensive. But let’s compare, for instance, the “CPUWeight” parameter: By using 1000 for the server and 10 for the helper slice, we are informing Systemd that we believe the server is 100 times more important if there is CPU contention. If the database is not fully loaded, the helper slice will be able to use all remaining CPU. But when contention happens, the server will get roughly 100 times more CPU time.

With the slices defined, we then add a Slice directive to the service’s unit files. For instance, the scylla-server.service unit file now has this:

[Service]
...
Slice=scylla-server.slice

So what is the result of repeating the same benchmark, with the same heavy CPU user, but now constraining it to the scylla-helper slice, while the server is placed into the scylla-server slice?

As we can see, the impact on the database is now much lower. The database is still able to sustain about the same amount of requests (around a million reads per second), and the latencies on the p95 are just a bit higher, but still under one millisecond.

We have successfully leveraged the systemd slices infrastructure to protect the central piece of the database infrastructure— the database server— from its auxiliary agents using too much resources. And while our example focused on CPU utilization, Systemd slices can also control memory usage, swap preference, I/O, etc.

Bonus content – ad hoc commands

Systemd slices are a powerful way to protect services that are managed by systemd from each other. But what if I just want to run some command, and am too worried that it may use up precious resources from the main service?

Thankfully, systemd provides a way to run ad-hoc commands inside an existing slice. So the cautious admin can use that too:

sudo systemd-run --uid=$(id -u scylla) --gid=$(id -g scylla) -t --slice=scylla-helper.slice /path/to/my/tool

And voila, resource protection achieved.

Conclusion

Systemd slices exposes the cgroups interface — which underpins the isolation infrastructure used by Docker and other Linux container technologies — in an elegant and powerful way. Services managed by Systemd, like databases, can use this infrastructure to isolate their main components from other auxiliary helper tools that may use too much resources.

We have demonstrated how Scylla is successfully using Systemd slices to protect its main server processes from other tools like translation layers and backup agents. This level of isolation is native to systemd and can be leveraged without the need to artificially wrap the solution in heavy container technologies like docker.

The post Isolating workloads with Systemd slices appeared first on ScyllaDB.

Scylla Summit 2019 Agenda and Speakers Announced

Whether you are a current Scylla community member or you have a background with Cassandra, NoSQL or other database technologies, you’ll learn a lot by joining us at Scylla Summit 2019. This year’s event will be held November 5-6 at the Parc 55 Hilton Hotel in San Francisco, where you get two full days of technical use cases, Scylla internals and best practices.

Before diving into our sessions, I’ll share five key reasons to attend:

  1. Bring the latest innovations back to your team
  2. Network with the best brains in the industry
  3. Get Trained at our Pre-Summit Training Day (November 4)
  4. Influence our roadmap
  5. Hear from your colleagues

In fact, this year’s Scylla Summit promises to be our best conference yet. We’ve got quite a lineup! With dozens of speakers from our community of users, there will be a lot to learn from your industry peers and colleagues in terms of use cases, migrations, and real-world performance results.

  • Comcast on Scaling User Experience — Philip Zimich will share how Comcast powers the X1 Scheduler system for using Scylla to support DVR and program reminder functions across 30 million set top boxes and 15 million more “second screens,” scaling to two billion RESTful calls daily.
  • Kiwi.com Reaches Cruising Altitude with Scylla — Martin Strýček from Kiwi.com will speak about getting through the turbulence of their cloud-based deployment to smooth performance.
  • Opera on Using Python and Django to Connect to Tens of Millions on the web — Rafał Furmański and Jerzy Kozera will also share a tale of scaling Scylla to support their browser synchronization feature.
  • Capital One on Big Data Tools — Glen Gomez Zuazo will speak about getting the most out of streaming data using technologies like Apache Kafka, CNCF NATS and Amazon SQS.
  • SAS Institute Makes the Switch — David Blythe will talk about the transition of the SAS Intelligent Advertising ad serving platform from DataStax Cassandra to Scylla for its real-time visitor data storage.
  • Numberly Compares Scylla and MongoDB — Alexys Jacob compares these two NoSQL leaders, and explains how and why Numberly employs both these systems for their own best advantage in their ad tech architecture.
  • TubiTV on Machine Learning — Alexandros Bantis will explain how they re-architected their Ranking Service with Scylla to improve Machine Learning experimentation speed and how their next version should improve performance a full order of magnitude.
  • Lookout on Security for 100 Million Devices — Richard Ney will describe the challenge of scaling a mobile device security system to handle telemetry ingestion for over 100 million devices using Kafka Connect and Scylla.
  • iFood’s Road to Scylla — Thales Biancalana will describe how this Brazilian food delivery giant evolved from PostrgreSQL to Amazon DynamoDB, but eventually turned to Scylla to keep scaling their business while maintaining consistent low latencies.
  • JanusGraph Best Practices — Enharmonic’s Ryan Stauffer and Expero’s Brian Hall will provide insights into and best practices for using JanusGraph on top of Scylla.

And those are just some of the highlights. We have many more exciting sessions on the full schedule.

Plus our own engineers will share new features and capabilities for Scylla, from Project Alternator — the new Amazon DynamoDB-compatible API — to the new incremental compaction strategy, lightweight transactions (LWT) using consensus algorithms (Paxos and Raft), Change Data Capture (CDC) and much more.

  • ScyllaDB CEO Dor Laor will provide an overview of our five-year journey to today, and highlight exciting news we’re sharing with the community.
  • Our CTO, Avi Kivity will present on the topics of User-Defined Functions (UDFs), User-Defined Aggregates (UDA) and what the future holds in store as proposed extensions to Scylla.
  • ScyllaDB VP of Field Engineering, Glauber Costa will host a session on “How to be Successful with Scylla.” Drawn from conversations held across our community, including our mailing list, Slack channel and support tickets, Glauber will answer some of the most frequently asked questions on how to best employ Scylla in your environment.

So reserve a spot now. We look forward to meeting you there!

Grab your tickets by 9/30 to get Early Bird pricing!

REGISTER NOW FOR SCYLLA SUMMIT 2019

The post Scylla Summit 2019 Agenda and Speakers Announced appeared first on ScyllaDB.