Mutant Monitoring System Day 15 – Coding with Java Part 3

mms

This is part 15 of a series of blog posts that provides a story arc for Scylla Training.

In the previous post, we explained how to do data analytics on a Scylla cluster using Apache Spark, Hive, and Superset. Division 3 has decided to explore the Java programming language a bit further and came across an interesting feature of Scylla that would allow us to store files in the database. With this ability, we can store images of the mutants in the catalog keyspace. With the images stored, Division 3 can see what the mutant looks like whenever they want and even share the image and tracking details with local law enforcement officials if needed.

A table in Scylla supports a wide array of data types such as timestamp, text, integer, UUID, blob, and more. The blob datatype stores binary data into a table. For Division 3’s use case, we will add a blob column to the catalog.mutant_data table and store images for each mutant there using a Java application. Since Scylla is a distributed system with fault protection and resiliency, storing files in Scylla will have the same benefits as our existing data based on the replication factor of the keyspace. To get started, we will first need to bring up the Scylla cluster.

Starting the Scylla Cluster

The MMS Git repository has been updated to provide the ability to automatically import the keyspaces and data. If you have the Git repository already cloned, you can simply do a “git pull” in the scylla-code-samples directory.

git clone https://github.com/scylladb/scylla-code-samples.git
cd scylla-code-samples/mms

Modify docker-compose.yml and add the following line under the environment: section of scylla-node1:

- IMPORT=IMPORT

Now we can build and run the containers:

docker-compose build
docker-compose up -d

After roughly 60 seconds, the existing MMS data will be automatically imported. When the cluster is up and running, we can run our application code.

Building and Running the Java Example

The Java sample application that we are using was modified from the Datastax Blob example located on the Cassanda Driver’s GitHub page. To build the application in Docker, change into the java-datatypes subdirectory in scylla-code-samples:

cd scylla-code-samples/mms/java-datatypes

Now we can build and run the container:

docker build -t java .

To run the container and connect to the shell, run the following command:

docker run -it --net=mms_web --name java java sh

Finally, the sample Java application can be run:

java -jar App.jar

The output of the application will be the following:

Let’s dive a little bit deeper to see what the code is doing. When the application is run, it will add two columns to the catalog.mutant_data table: b and m. Column b is the blob column where the binary file is stored and column m is used to record the file’s name.

In the container, there is an image file for each mutant that will be read by the Java application and stored in Scylla according to their names.

The following functions will read the file and store it in a memory buffer and then insert them into the table using a prepared statement:

The final step for the Java application is to read the data from Scylla and write the data to /tmp. The select query used fetches the blob column and sorts it by the primary keys (first_name and last_name).

To retrieve the images from the container to verify that everything worked properly, we can run the following Docker commands to copy the newly written images out of the container and to your machine:

Using your favorite image viewer, you can verify that each image is correct.

Conclusion

In this blog post, we went over the different data types that someone can use in their database tables and learned how to store binary files in Scylla with a simple Java application. By being able to store images in Scylla, Division 3 will be able to quickly see what a mutant looks like and can share the details with local law enforcement if needed. With the ability to store files in the Mutant Monitoring System, the possibilities are endless for how Division 3 can continue to evolve the system. Please be safe out there and continue to monitor the mutants!

The post Mutant Monitoring System Day 15 – Coding with Java Part 3 appeared first on ScyllaDB.

Scylla Open Source Release 2.1.4

release

Today we released Scylla Open Source 2.1.4, a bugfix release of the Scylla 2.1 stable branch. Release 2.1.4, like all past and future 2.x.y releases, is backward compatible and supports rolling upgrades.

Critical Patch

Scylla 2.1.4 fixes a possible data loss when using Leveled Compaction Strategy #3513. The issue causes Scylla to miss a small fraction of data in a full table scan. This was originally observed in decommission (which performs a full table scan internally), where some data (<1% in a test) was not streamed.

In addition to a full scan query, scans are used internally as part of compaction and streaming, including decommission, adding a node, and repairs. Our investigation into the matter concluded that Scylla can cause data loss while running any of these actions.

The issue is limited to tables using LCS and does not affect tables using other compaction strategies. If you are using LCS, you should upgrade to Scylla 2.1.4 ASAP. 

Action to Take

The problem may be mitigated by restoring backups of the relevant table. If you are using LCS and have relevant backups, please contact our support team for additional information on how to run the restore procedure.

How This Happened

We take data integrity very seriously and are investigating why this issue was not identified earlier. Our initial findings are that a low-level optimization around disjoint SSTable merging introduced the bug in the 2.1 release. It surfaced only in our 2.2 testing since it happened very rarely with 2.1 based code. The Scylla cluster test suite did detect the issue, however, meeting quorum persistence papered over it together with the test suite itself – one of the roles of this suite is to run disruptors (corruption emulation, node and data center failures) against the cluster and to trigger corruptions and repairs. The bug was not identified since the test suite incorrectly concluded that it is part of the disruptor activity of the suite. We are now working to improve the cluster test suite’s ability to detect errors.

Please contact us with any questions or concerns. We will publish a full root cause analysis report as soon as possible and disclose enhancements to prevent such a case in the future.

Related Links

Get Scylla 2.1.4 – Docker, binary packages. AMI will be published soon.
Get started with Scylla 2.1
Upgrade from 2.1.x to 2.1.y
Please let us know if you encounter any problems.

Additional bugs fixed in this release

  • Scylla AMI error: “systemd: Unknown lvalue ‘Ambient / Unknown lvalue ‘AmbientCapabilities’ “ Issue is solved by moving to a new CentOS 7.4.1708 base image #3184
  • Upgrading to latest version of RHEL kernel causes Scylla to lose access to the RAID 0 data directory #3437 (detailed notice has been sent to all relevant customers)
  • Wrong Commit log error handling may cause a core dump #3440
    Closing a secure connection (TLS) may cause a core dump #3459
  • When using TLS for interconnect connections, shutting down a node generates errors:”on system_error (error system:32, Broken pipe) other nodes” #3461
  • Ec2MultiRegionSnitch does not (always) honor or prefer the local DC, which results with redundant requests to remote DC #3454

Next Steps

  • Learn more about Scylla from our product page.
  • See what our users are saying about Scylla.
  • Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.
  • Take Scylla for a Test drive. Our Test Drive lets you quickly spin-up a running cluster of Scylla so you can see for yourself how it performs.

The post Scylla Open Source Release 2.1.4 appeared first on ScyllaDB.

Scylla Enterprise Release 2018.1.3

scylla release
Today we released Scylla Enterprise 2018.1.3, a production-ready Scylla Enterprise minor release. Scylla Enterprise 2018.1.3 is a bug fix release for the 2018.1 branch, the latest stable branch of Scylla Enterprise.

More about Scylla Enterprise here.

Critical Patch

Scylla Enterprise 2018.1.3 fixes possible data loss when using Leveled Compaction Strategy.  The issue causes Scylla to miss a small fraction of data in a full table scan. This was originally observed in decommission (which performs a full table scan internally), where some data (<1% in a test) was not streamed.

In addition to full scan query, scans are used internally as part of compaction and streaming, including decommissioning, adding a node, and repairs. Our investigation into the matter concluded that Scylla can cause data loss while running any of these actions. The issue is limited to tables using LCS and does not affect tables using other compaction strategies.

If you are using LCS, you should upgrade to Scylla Enterprise 2018.1.3 ASAP.

Action to Take

The problem may be mitigated by restoring backups of the relevant table. If you are using LCS and have relevant backups, please contact our support team for additional information on how to run the restore procedure.

How This Happened

We take data integrity very seriously and are investigating why this issue was not identified earlier. Our initial findings are that a low-level optimization around disjoint SSTable merging introduced the bug in the 2.1 release. It surfaced only in our 2.2 testing since it happened very rarely with 2.1 based code. The Scylla cluster test suite did detect the issue, however, meeting quorum persistence papered over it together with the test suite itself – one of the roles of this suite is to run disruptors (corruption emulation, node and data center failures) against the cluster and to trigger corruptions and repairs. The bug was not identified since the test suite incorrectly concluded that it is part of the disruptor activity of the suite. We are now working to improve the cluster test suite’s ability to detect errors.

Please contact us with any questions or concerns. We will publish a full root cause analysis report as soon as possible and disclose enhancements to prevent such a case in the future.

Related Links

Scylla Enterprise customers are encouraged to upgrade to Scylla Enterprise 2018.1.3 in coordination with the Scylla support team.

Additional Issues Solved in This Release (with Open Source Issue Reference When Applicable)

  • Additional issue solved in this release (with open source issue reference when one exist)
  • Ec2MultiRegionSnitch does not (always) honor prefer the local DC, which result with redundant requests to remote DC #3454
  • When using TLS for interconnect connections, shutting down a node generates errors on system_error (error system:32, Broken pipe) other nodes #3461

Next Steps

  • Learn more about Scylla from our product page.
  • See what our users are saying about Scylla.
  • Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.
  • Take Scylla for a Test drive. Our Test Drive lets you quickly spin-up a running cluster of Scylla so you can see for yourself how it performs.

The post Scylla Enterprise Release 2018.1.3 appeared first on ScyllaDB.

Cassandra CQL Cheatsheet

Every now and then I find myself looking for a couple of commands I do often. In some other software/technologies we sometimes find a thing called a “cheatsheet” that displays the more used (and some more obscure commands) of that software/technology.

I tried to find one for CQL and since I didn’t find one… I created a CQL one! This is not an extensive, exhaustive list, its just the commands I tend to use to most and related ones.

Suggestions are accepted! Leave your suggestions in the comments below!

Also, Printable version coming soon!

DISCLAIMER: This is for the latest Cassandra version (3.11.2)

Without further conversation, here it is:

CQLSH Specific

Login

$ cqlsh [node_ip] -u username -p password

Use Color

$ cqlsh [node_ip] -C -u username -p password

Execute command

$ cqlsh -e ‘describe cluster’

Execute from file

$ cqlsh -f cql_commands.cql

Set consistency

$cqlsh> CONSISTENCY QUORUM ;

Run commands from file

$cqlsh> SOURCE ‘/home/cjrolo/cql_commands.cql’ ;

Capture output to file

$cqlsh> CAPTURE ‘/home/cjrolo/cql_output.cql’ ;

Enable Tracing

$cqlsh> TRACING ONE;

Vertical Printing of Rows

$cqlsh> EXPAND ON;

Print tracing session

$cqlsh> SHOW SESSION 898de000-6d83-11e8-9960-3d86c0173a79;

Full Reference:

https://cassandra.apache.org/doc/latest/tools/cqlsh.html

 

CQL Commands

Create Keyspace

CREATE KEYSPACE carlos WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’ : 3};

Alter Keyspace

ALTER KEYSPACE carlos WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’ : 1};

Drop Keyspace

DROP KEYSPACE carlos;

Create Table

CREATE TABLE carlos.foobar (

  foo int PRIMARY KEY,

  bar int

);

Alter Table

ALTER TABLE carlos.foobar WITH compaction = { ‘class’ : ‘LeveledCompactionStrategy’} AND read_repair_chance = 0;

Drop Table

DROP TABLE foobar;

Create Role

CREATE ROLE admins WITH LOGIN = true AND SUPERUSER = true;

Create User

CREATE USER carlos WITH PASSWORD ‘some_password’ NOSUPERUSER;

Assign Role

GRANT admins TO carlos;

Revoke Role

REVOKE admins FROM carlos;

List Roles

LIST ROLES;

Use Keyspace

USE carlos;

Insert

INSERT INTO foobar (foo, bar) VALUES (0, 1);

Insert with TTL

INSERT INTO foobar (foo, bar) VALUES (1, 2) USING TTL 3600;

Update

UPDATE foobar SET bar = 42 WHERE foo = 1;

Select

SELECT * FROM foobar WHERE foo=0;

Delete

DELETE FROM foobar WHERE foo = 1;

Full Reference

https://cassandra.apache.org/doc/latest/cql/index.html

Kubernetes and Scylla: 10 Questions and Answers

We recently hosted the webinar, Steering the Sea Monster: Integrating Scylla with Kubernetes. We received a lot of great questions during the live webinar, so here they are along with our answers. Miss the live webinar? You can access the on-demand version here.


What’s the difference between Helm and the YAML manifest files?

Kubernetes manifest files (written in YAML) describe the specification of Kubernetes API objects like StatefulSets and Services. Helm is a packing system that can combine and automate the deployment of many manifest files and help an administrator to manage them effectively. Helm calls groups of manifests (and their configuration) “Helm Charts”.

If I lose a Scylla pod of a StatefulSet, and rather than trying to bring that Node back, I create a new Node, does the StatefulSet replace the old Scylla pod from the old node by rescheduling it on a new node? Or does it generate a new pod replica in the sequence?

The new Scylla pod will replace the downed pod. For example, let’s image that we initially had scylla-0, scylla-1, and scylla-2 pods. If pod scylla-1 goes down, the StatefulSet will create an additional pod and it will still be called scylla-1.

Are multi-datacenters supported when Scylla is deployed with Kubernetes?

Yes, multi-datacenters are supported with Scylla on Kubernetes. To deploy a Scylla cluster in each region, you are required to create a StatefulSet per region and each region can be scaled separately. The StatefulSet will include the definition of the end-point-snitch, seed nodes information, and prefix information per datacenter.

How does Kubernetes know that a pod is ready?

A pod is ready when Kubernetes has completed a status probe, in our case nodetool status. Once nodetool reports that the pod is up and running, Kubernetes will mark it as ready.

The StatefulSet is responsible for creating DNS records for nodes?

Yes, the StatefulSet is responsible for setting an internal DNS server to register the different instances that the StatefulSet consists of. For example, if the StatefulSet includes four instances and the pod moves to a different host, the DNS information will be updated accordingly.

Is there a performance penalty when running Scylla on Kubernetes + containers?

At the time of writing this blog post, there is a performance degradation using Scylla in Docker containers. For Kubernetes, we are using the Scylla default Docker container as the base image. From recent testing conducted, we have seen 25% to 40% degradation of performance. Scylla is actively working to improve its performance in Docker containers.

Why use Docker for Scylla if performance is not as good as using the AMI?

Some users prefer to have a single deployment platform that can be used across multiple infrastructure systems, e.g., cloud, on-prem. Kubernetes provides the users with a single view of the platform. AMIs are specific for AWS deployments. While It is possible to create preconfigured images for many on-prem and cloud solutions, Kubernetes and Docker offer a single image to be used over any infrastructure.

How would you upgrade Scylla in this deployment?

Users are required to take a snapshot of the current data and configuration of the image. Once the persistent storage directory is backed up, we can remove the current container and attach the upgraded version to the pre-existing persistent storage used in the previous container.

Will the Helm Chart be available?

Yes, the Helm Charts are currently available from Scylla code examples, and we are working to submit the charts into the main Helm project repositories.

How is the deployment/decommission affected by the amount of data?

In the case of decommissioning or adding pods, Scylla will redistribute the data. Streaming of data between nodes implies less CPU, I/O , and network bandwidth to the normal operations. The more data, the more throughput is required from the I/O and network system to maintain SLA.

Our next webinar, ‘Analytics Showtime: Powering Spark with Scylla’ is on June 27th where we will cover best practices, use cases, and performance tuning. We hope you can join us! REGISTER NOW

Next Steps

  • Learn more about Scylla from our product page.
  • See what our users are saying about Scylla.
  • Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.
  • Take Scylla for a Test drive. Our Test Drive lets you quickly spin-up a running cluster of Scylla so you can see for yourself how it performs.

The post Kubernetes and Scylla: 10 Questions and Answers appeared first on ScyllaDB.

Taming the Beast: How Scylla Leverages Control Theory to Keep Compactions Under Control

controller

From a bird’s eye view, the task of a database is simple: The user inserts some data and later fetches it. But as we look closer things get much more complex. For example, the data needs to go to a commit log for durability, it needs to get indexed, and it’s rewritten many times so it can easily be fetched.

All of those tasks are internal processes of the database that will compete for finite resources like CPU, disk, and network bandwidth. However, the payoffs for privileging one or the other aren’t always clear. One example of such internal processes are compactions, which are a fact of life in any database, like Scylla, with a storage layer based on Log Structured Merge (LSM) trees.

LSM trees consist of append-only immutable files originating from the database writes. As writes keep happening, the system is potentially left with data for a key present in many different files, which makes reads very expensive. Those files are then compacted in the background by the compaction process according to a user-selected compaction strategy. If we spend fewer resources compacting existing files, we will likely be able to achieve faster write rates. However, reads are going to suffer as they will now need to touch more files.

What’s the best way to set the amount of resources to use for compactions? A less-than-ideal option is to push the decision to the user in the form of tunables. The user can then select in the configuration file the bandwidth that will be dedicated to compactions. The user is then responsible for a trial-and-error tuning cycle to try to find the right number for the matching workload.

At ScyllaDB, we believe that this approach is fragile. Manual tuning is not resilient to changes in the workload, many of which are unforeseen. The best rate for your peak load when resources are scarce may not be the best rate when the cluster is out of business hours–when resources are plentiful. But even if the tuning cycle can somehow indeed find a good rate, that process significantly increases the cost of operating the database.

In this article, we’ll discuss the approach ScyllaDB prescribes for solving this problem. We borrow from the mathematical framework of industrial controllers to make sure that compaction bandwidth is automatically set to a fair value while maintaining a predictable system response.

A Control Systems Primer

While we can’t magically determine the optimal compaction bandwidth by looking at the system, we can set user-visible behaviors that we would like the database to abide by. Once we do that we can then employ Control Theory to make sure that all pieces work in tandem at specified rates so that the desired behavior is achieved. One example of such a system is your car’s cruise control. While it is impossible to guess the individual settings for each part of the that would combine to cause the car to move at the desired speed, we can instead simply set the car’s cruising speed then expect the individual parts to adjust to make that happen.

In particular, we will focus in this article on closed-loop control systems—although we use open-loop control systems in Scylla as well. For closed-loop control systems we have a process being controlled, and an actuator, which is responsible for moving the output to a specific state. The difference between the desired state and the current state is called the error, which is fed back to the inputs. For this reason, closed-loop control systems are also called feedback control systems.

Let’s look at another example of a real-world closed-loop control system: we want the water in a water tank to be at or around a certain level, we will have a valve working as an actuator, and the difference between the current level and the desired level is the error. The controller will open or close the valve so that more or less water flows from the tank. How much the valve should open is determined by the transfer function of the controller. Figure 1 shows a simple diagram of this general idea.

Figure 1: An industrial controller, controlling the water level in a tank. The current level is measured and fed to a feedback-loop controller. Based on that an actuator will adjust the rate of water flow in the tank. Systems like this are commonly used in industrial plants. We can leverage that knowledge to control database processes as well.

The Pieces of the Puzzle: Schedulers and the Backlog Controller

Inside Scylla, the foundation of our control processes are the Schedulers embedded in the database. We discussed the earliest of them, the I/O Scheduler, extensively in a three-part article (part 1, part 2, and part 3). The Schedulers work as the actuators in a control system. By increasing the shares of a certain component we increase the rate at which that process is executed—akin to the valve in Figure 1 allowing more (or less) fluid to pass through. Scylla also embeds a CPU Scheduler, which plays a similar role for the amount of CPU used by each of the database’s internal processes.

To design our controller it’s important to start by reminding ourselves of the goal of compactions. Having a large amount of uncompacted data will lead to a read to both read amplification and space amplification. We have read amplification because each read operation will have to read from many SSTables, and space amplification because overlapping data will be duplicated many times. Our goal is to get rid of this amplification.

We can then define a measurement of how much work remains to be done to bring the system to a state of zero amplification. We call that the backlog. Whenever x new bytes are written into the system, we generate f(x) bytes of backlog into the future. Note that there isn’t a one-to-one relationship between x and f(x), since we may have to rewrite the data many times to achieve a state of zero amplification. When the backlog is zero, everything is fully compacted and there is no read or space amplification.

We are still within our goal if we allow the system to accumulate backlog, as long as the backlog doesn’t grow out of bounds. After all, there isn’t a need to compact all work generated by incoming writes now. The amount of bandwidth used by all components will always add up to a fixed quantity—since the bandwidth of CPU and disk are fixed and constant for a particular system. If the compaction backlog accumulates, the controller will increase the bandwidth allocated to compactions, which will cause the bandwidth allocated to foreground writes to be reduced. At some point, a system in a steady state should reach equilibrium.

Different workloads have a different steady-state bandwidth. Our control law will settle on different backlog measures for them; a high-bandwidth workload will have a higher backlog than a low-bandwidth workload. This is a desirable property: higher backlog leaves more opportunity for overwrites (which reduce the overall amount of write amplification), while a low-bandwidth write workload will have a lower backlog, and therefore fewer SSTables and smaller read amplification.

With this in mind, we can write a transfer function that is proportional to the backlog.

Determining Compaction Backlog

Compaction of existing SSTables occurs in accordance with a specific compaction strategy, which selects which SSTables have to be compacted together, and which and how many should be produced. Each strategy will do different amounts of work for the same data. That means there isn’t a single backlog controller—each compaction strategy has to define its own. Scylla supports the Size Tiered Compaction Strategy (STCS), Leveled Compaction Strategy (LCS), Time Window Compaction Strategy (TWCS) and Date Tiered Compaction Strategy (DTCS).

In this article, we will examine the backlog controller used to control the default compaction strategy, the Size Tiered Compaction Strategy. STCS is described in detail in a previous post. As a quick recap, STCS will try to compact together SSTables of similar size. If at all possible, we try to wait until 4 SSTables of similar sizes are created and compact that. As we compact SSTables of similar sizes, we may create much larger SSTables that will belong to the next tier.

In order to design the backlog controller for STCS, we start with a few observations. The first is that when all SSTables are compacted into a single SSTable, the backlog is zero as there is no more work to do. As the backlog is a measure of work still to be done for this process, this follows from the definition of the backlog.

The second observation is that the larger the SSTables, the larger the backlog. This is also easy to see intuitively. Compacting 2 SSTables of 1MB each should be much easier on the system than compacting 2 SSTables of 1TB each.

A third and more interesting observation is that the backlog has to be proportional to the number of tiers that a particular SSTable still have to move through until it is fully compacted. If there is a single tier, we’ll have to compact the incoming data being currently written with all the SSTables in that tier. If there are two tiers, we’ll compact the first tier but then will have to compact every byte present in that tier – even the ones that were already sealed in former SSTables into another tier.

Figure 2 shows that in practice. As we write a new SSTable, we are creating a future backlog as that new SSTable will have to be compacted with the ones in its tier. But if there is a second tier, then there will be a second compaction eventually, and the backlog has to take that into account.

Figure 2: SSTables shown in blue are present in the system already when a new SSTable is being written. Because those existing SSTables live in two different Size Tiers, the new SSTable creates a future backlog roughly twice as big as if we had a single tier—since we will have to do two compactions instead of one.

Note that this is valid not only for the SSTable that is being written at the moment as a result of data being flushed from memory—it is also valid for SSTables being written as a result of other compactions moving data from earlier tiers. The backlog for the table resulting from a previous compaction will be proportional to the number of levels it still has to climb to the final tier.

It’s hard to know how many tiers we have. That depends on a lot of factors including the shape of the data. But because the tiers in STCS are determined by the SSTable sizes, we can estimate an upper bound for the number of tiers ahead of a particular SSTable based on a logarithmic relationship between the total size in the table for which the backlog is calculated and the size of a particular SSTable— since the tiers are of exponentially larger sizes.

Consider, for example, a constant insert workload with no updates that continually generates SSTables sized 1GB each. They get compacted into SSTables of size 4GB, which will then be compacted into SSTables of size 16GB, etc.

When the first two tiers are full, we will have four SSTables of size 1GB and four more SSTables of size 4GB. The total table size is 4 * 1 + 4 * 4 = 20GB and the table-to-SSTable ratios are 20/1 and 20/4 respectively. We will use the base 4 logarithm, since there are 4 SSTables being compacted together, to yield for the 4 small SSTables:

    \[ tier = log_4(20/1) = 2.1 \]

and for the large ones

    \[ tier = log_4(20 / 4) = 1.1 \]

So we know that the first SSTable, sized 1GB, belongs to the first tier while the SSTable sized 4GB belongs in the second.

Once that is understood, we can write the backlog of any existing SSTable that belongs to a particular table as:

    \[ B_i = S_i log_4( \frac{\sum _{i=0}^{N} S_i }{S_i}) \]

where B_i is the backlog for SSTable i,  S_i is the size of SSTable  i. The total backlog for a table is then

    \[ B = \sum_{i=0}^{N}{S_i log_4( \frac{\sum _{i=0}^{N} S_i }{S_i}) } \]

To derive the formula above, we used SSTables that are already present in the system. But it is easy to see that it is valid for SSTables that are undergoing writes as well. All we need to do is to notice that S_i is, in fact, the partial size of the SSTable—the number of bytes written so far.

The backlog increases as new data is written. But how does it decrease? It decreases when bytes are read from existing SSTables by the compaction process. We will then adjust the formula above to read:

    \[ B = \sum_{i=0}^{N}{S_i log_4( \frac{\sum _{i=0}^{N} (S_i - C_i) }{(S_i - C_i)}) } \]

where C_i is the number of bytes already read by compaction from sstable i. It will be 0 in SSTables that are not undergoing compactions.

Note that when there is a single SSTable, S_i = \sum _{i=0}^{N}S_i, and since log_4(1) = 0, there is no backlog left—which agrees with our initial observations.

The Compaction Backlog Controller in Practice

To see this in practice, let’s take a look at how the system responds to an ingestion-only workload, where we are writing 1kB values into a fixed number of random keys so that the system eventually reaches steady state.

We will ingest data at maximum throughput, making sure that even before any compaction starts, the system is already using 100% of its resources (in this case, it is bottlenecked by the CPU), as shown in Figure 3. As compactions start, the internal compaction process uses very little CPU time in proportion to its shares. Over time, the CPU time used by compactions increases until it reaches steady state at around 15%. The proportion of time spent compacting is fixed, and the system doesn’t experience fluctuations.

Figure 4 shows the progression of shares over time for the same period. Shares are proportional to the backlog. As new data is flushed and compacted, the total disk space fluctuates around a specific point. At steady state, the backlog sits at a constant place where we are compacting data at the same pace as new work is generated by the incoming writes.

A very nice side effect of this is shown in Figure 5. Scylla CPU and I/O Schedulers enforce the amount of shares assigned to its internal processes, making sure that each internal process consumes resources in the exact proportion of their shares. Since the shares are constant in steady state, latencies, as seen by the server, are predictable and stable in every percentile.

Figure 3: Throughput of a CPU in the system (green), versus percentage of CPU time used by compactions (yellow). In the beginning, there are no compactions. As time progresses the system reaches steady state as the throughput steadily drops.

Figure 4: Disk space assigned to a particular CPU in the system (yellow) versus shares assigned to compaction (green). Shares are proportional to the backlog, which at some point will reach steady state

Figure 5: 95th, 99th, and 99.9th percentile latencies. Even under 100% resource utilization, latencies are still low and bounded.

Once the system is in steady state for some time, we will suddenly increase the payload of each request, leading the system to ingest data faster. As the rate of data ingestion increases, the backlog should also increase. Compactions will now have to move more data around.

We can see the effects of that in Figure 6. With the new ingestion rate, the system is disturbed as the backlog grows faster than before. However, the compaction controller will automatically increase shares of the internal compaction process and the system will reach a new equilibrium.

In Figure 7 we revisit what happens to the percentage of CPU time assigned to compactions now that the workload changes. As requests get more expensive the throughput in request/second naturally drops. But, aside from that, the larger payloads will lead to compaction backlog accumulating quicker. The percentage of CPU used by compactions increases until a new equilibrium is reached. The total drop in throughput is a combination of both of those effects.

Having more shares, compactions are now using more of the system resources, disk bandwidth, and CPU. But the amount of shares is stable and doesn’t have wide fluctuations leading to a predictable outcome. This can be observed through the behavior of the latencies in Figure 8. The workload is still CPU-bound. There is now less CPU available for request processing, as more CPU is devoted to compactions. But since the change in shares is smooth, so is the change in latencies.

Figure 6: The ingestion rate (yellow line) suddenly increases from 55MB/s to 110MB/s, as the payload of each request increases in size. The system is disturbed from its steady state position but will find a new equilibrium for the backlog (green line).

Figure 7: Throughput of a CPU in the system (green), versus percentage of CPU time used by compactions (yellow) as the workload changes. As requests get more expensive the throughput in request/second naturally drops. Aside from that, the larger payloads will lead to compaction backlog accumulating quicker. The percentage of CPU used by compactions increases.

Figure 8: 95th, 99th and 99.9th percentile latencies after the payload is increased. Latencies are still bounded and move in a predictable way. This is a nice side effect of all internal processes in the system operating at steady rates.

Conclusions

At any given moment, a database like ScyllaDB has to juggle the admission of foreground requests with background processes like compactions, making sure that the incoming workload is not severely disrupted by compactions, nor that the compaction backlog is so big that reads are later penalized.

In this article, we showed that isolation among incoming writes and compactions can be achieved by the Schedulers, yet the database is still left with the task of determining the amount of shares of the resources incoming writes and compactions will use.

Scylla steers away from user-defined tunables in this task, as they shift the burden of operation to the user, complicating operations and being fragile against changing workloads. By borrowing from the strong theoretical background of industrial controllers, we can provide an Autonomous Database that adapts to changing workloads without operator intervention.

As a reminder, Scylla 2.2 is right around the corner and will ship with the Memtable Flush controller and the Compaction Controller for the Size Tiered Compaction Strategy. Controllers for all compaction strategies will soon follow.

Next Steps

  • Learn more about Scylla from our product page.
  • See what our users are saying about Scylla.
  • Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.
  • Take Scylla for a Test drive. Our Test Drive lets you quickly spin-up a running cluster of Scylla so you can see for yourself how it performs.

 

The post Taming the Beast: How Scylla Leverages Control Theory to Keep Compactions Under Control appeared first on ScyllaDB.

Cassandra backups using nodetool

Cassandra nodetool provides several types of commands to manage your Cassandra cluster. See my previous posts for an orientation to Cassandra nodetool and using nodetool to get Cassandra information. My colleague has provided an in-depth analysis of backup strategies in Cassandra that you can review to learn more about ways to minimize storage cost and time-to-recovery, and to maximize performance. Below I will cover the nodetool commands used in scripting these best practices for managing Cassandra full and incremental backups.

Snapshots

The basic way to backup Cassandra is to take a snapshot. Since sstables are immutable, and since the snapshot command flushes data from memory before taking the snapshot, this will provide a complete backup.

Use nodetool snapshot to take a snapshot of sstables. You can specify a particular keyspace as an optional argument to the command, like nodetool snapshot keyspace1. This will produce a snapshot for each table in the keyspace, as shown in this sample output from nodetool listsnapshots:

Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
1528233451291 keyspace1 standard1 1.81 MiB 1.81 MiB
1528233451291 keyspace1 counter1 0 bytes 864 bytes

The first column is the snapshot name, to refer to the snapshot in other nodetool backup commands. You can also specify tables in the snapshot command.

The output at the end of the list of snapshots — for example, Total TrueDiskSpaceUsed: 5.42 MiB — shows, as the name suggests, the actual size of the snapshot files, as calculated using the walkFileTree Java method. Verify this by adding up the files within each snapshots directory under your data directory keyspace/tablename (e.g., du -sh /var/lib/cassandra/data/keyspace1/standard1*/snapshots).

To make the snapshots more human readable, you can tag them. Running nodetool snapshot -t 2018June05b_DC1C1_keyspace1 keyspace1 results in a more obvious snapshot name as shown in this output from nodetool listsnapshots:

2018June05b_DC1C1_keyspace1 keyspace1 standard1 1.81 MiB 1.81 MiB
2018June05b_DC1C1_keyspace1 keyspace1 counter1 0 bytes 864 bytes

However, if you try to use a snapshot name that exists, you’ll get an ugly error:

error: Snapshot 2018June05b_DC1C1_keyspace1 already exists.
-- StackTrace --
java.io.IOException: Snapshot 2018June05b_DC1C1_keyspace1 already exists....

The default snapshot name is already a timestamp (number of milliseconds since the Unix epoch), but it’s a little hard to read. You could get the best of both worlds by doing something like (depending on your operating system): nodetool snapshot -t keyspace1_date +”%s” keyspace1. I like how the results of listsnapshots sorts that way, too. In any case, with inevitable snapshot automation, the human-readable factor becomes largely irrelevant.

You may also see snapshots in this listing that you didn’t take explicitly. By default, auto_snapshot is turned on in the cassandra.yaml configuration file, causing a snapshot to be taken anytime a table is truncated or dropped. This is an important safety feature, and it’s recommended that you leave it enabled. Here’s an example of a snapshot created when a table is truncated:

cqlsh> truncate keyspace1.standard1;

root@DC1C1:/# nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
truncated-1528291995840-standard1 keyspace1 standard1 3.57 MiB 3.57 MiB

To preserve disk space (or cost), you will want to eventually delete snapshots. Use nodetool clearsnapshot with the -t flag and the snapshot name (recommended, to avoid deleting all snapshots). Specifying and the keyspace name will additionally filter the deletion to the keyspace specified. For example, nodetool clearsnapshot -t 1528233451291 — keyspace1 will remove just the two snapshot files listed above, as reported in this sample output:

Requested clearing snapshot(s) for [keyspace1] with snapshot name [1528233451291]

Note that if you forget the -t flag or the you will get undesired results. Without the -t flag, the command will not read the snapshot name, and without the delimiter, you will end up deleting all snapshots for the keyspace. Check syntax carefully.

The sstables are not tied to any particular instance of Cassandra or server, so you can pass them around as needed. (For example, you may need to populate a test server.) If you put an sstable in your data directory and run nodetool refresh, it will load into Cassandra. Here’s a simple demonstration:

cqlsh> truncate keyspace1.standard1

cp /var/lib/cassandra/data/keyspace1/standard1-60a1a450690111e8823fa55ed562cd82/snapshots/keyspace1_1528236376/* /var/lib/cassandra/data/keyspace1/standard1-60a1a450690111e8823fa55ed562cd82/

cqlsh> select * from keyspace1.standard1 limit 1;

key | C0 | C1 | C2 | C3 | C4
-----+----+----+----+----+----
(0 rows)

nodetool refresh keyspace1 standard1

cqlsh> select count(*) from keyspace1.standard1;
count
7425

This simple command has obvious implications for your backup and restore automation.

Incrementals

Incremental backups are taken automatically — generally more frequently than snapshots are scheduled — whenever sstables are flushed from memory to disk. This provides a more granular point-in-time recovery, as needed.

There’s not as much operational fun to be had with incremental backups. Use nodetool statusbackup to show if they are Running or not. By default, unless you’ve changed the cassandra.yaml configuration file, they will be not running. Turn them on with nodetool enablebackup and turn them off with nodetool disablebackup.

A nodetool listbackups command doesn’t exist, but you can view the incremental backups in the data directory under keyspace/table/backups. The backups/snapshots nomenclature is truly confusing, but you could think of snapshots as something you do, and backups as something that happen.

Restoring from incrementals is similar to restoring from a snapshot — copying the files and running nodetool refresh — but incrementals require a snapshot.

These various nodetool commands can be used in combination in scripts to automate your backup and recovery processes. Don’t forget to monitor disk space and clean up the files created by your backup processes.

Remember that if you’d like to try out these commands locally, you can use the ccm tool or the Cassandra Docker cluster here.

Why Your Database Isn’t Working for eCommerce

In eCommerce, your database ultimately powers the customer experience. Today’s customers expect seamless, always-available, easy shopping experiences. You fail at any point in this journey and they won’t be back. Legacy relational databases are no longer capable of keeping up with today’s demands.

Likewise, as customer expectations scale fast, the ability to deliver faster and better than your competitors is critical to survival. The front-end experience relies on your database’s ability to access multiple systems of record in real time to deliver a robust, high-performance experience. Unfortunately, many businesses are struggling to do this—and those who fail to transform will become irrelevant.

Here are five ways your eCommerce database may be failing your business:

1.   It’s not resilient and doesn’t scale fast

Legacy database systems were not designed to be as resilient and scalable as today’s ecommerce demands. They cannot keep up with the personalization and millions of interactions that differentiate the best ecommerce businesses.

2. It can’t handle mixed workloads

Many databases can’t handle OLTP/Database CRUD, OLAP, analytics, and search workloads simultaneously. Without this ability, you fall behind.

3. It isn’t multi-model

Your database needs to be multi-model and able to handle all types of data, including tabular, key-value, JSON, and graph, if it’s going to be able to support the demands of the Right-Now Economy.

4. It isn’t comprehensive

One of the headaches of legacy databases is that they do not provide a comprehensive platform. Today’s best platforms are robust and include such things as advanced performance, NodSync, advanced database features, AlwaysOn SQL, advanced security, graph, analytics, search, an OpsCenter and studio features that give you broad functionality from one trusted vendor.

5. It requires on-premises management

Legacy databases require on-premises management and expertise. Today’s best platforms offer managed services to shift workloads and prioritize resources.

Your data is the single most important resource you have and managing it in today’s digitally transforming world is daunting. Legacy approaches with relational databases simply are failing, and cloud strategies can quickly get confusing, limiting, and competitively dangerous.

DataStax powers the Right-Now Enterprise with the always-on, distributed cloud database for hybrid cloud, built on Apache Cassandra™ and designed for real-time applications at massive scale. We make it possible for your company to exceed expectations through consumer and enterprise applications that provide responsive and meaningful engagement to each customer wherever they go.

Evaluating Databases for eCommerce (eBook)

Learn the specific things you need to look for in a database if you’re going to be using it for eCommerce applications.

READ NOW

ScyllaDB Coming to a City Near You!

Come Talk Real-Time Big Data With Us…

We’ve got a busy schedule in the weeks ahead. Come join us at one of these exciting events.

Spark+AI Summit – San Francisco, CA – June 5-6
First up, we are sponsors at Spark+AI Summit in San Francisco June 5-6. Visit us at booth #318 to see a live demo of Scylla serving 1/2 million write requests per second with consistent millisecond latency. Plus play our fun iPad game to get a Scylla t-shirt and a chance to win more cool swag.

Spark+AI Summit

 

Montréal Big Data Meetup – Montréal, QC – June 5
Next stop is Montreal, where we will sponsor the Montréal Big Data Meetup. Come hear the session, ‘Real-time big data at scale: 1 million queries per second with single-digit millisecond latencies’, by Mina Naguib from AdGear and Glauber Costa from ScyllaDB. The talk will present the architecture and design decisions which allow Scylla users to achieve high throughput and predictably low latencies while loweing the overall TCO of the system. You will learn about a real example of how that played out in practice by AdGear. Join us June 5th from 6:15-8:15. Food, drinks, and swag provided!

BDM58 (NEW LOCATION!): 1M queries/s with single-digit millisecond latency

Tuesday, Jun 5, 2018, 6:15 PM

Maison Notman
51 Sherbrooke Ouest & St-Laurent (Clark) Montréal, QC

113 Big Data Montrealers Attending

Big Data Montreal would like to invite you to its 58th meeting! ================================================= NEW LOCATION: Notman House – see below ================================================= Join us on Tuesday June 5th 2018 at 6:00PM to attend a conference, as well as to network with other Big Data enthusiasts from Montreal! All are wel…

Check out this Meetup →

 

Fullstack LX – Lisbon, Portugal – June 5
June 5th is a busy night! ScyllaDB engineer Duarte Nunes is presenting ‘NoSQL at Ludicrous Speed’ at this month’s Fullstack LX in Lisbon. Attend the meetup to get the low-level details of how Scylla employs a totally asynchronous, shared nothing programming model, relies on its own memory allocators, and meticulously schedules all its IO requests, plus learn how Scylla fully utilizes the underlying hardware resources.

Fullstack LX June edition

Tuesday, Jun 5, 2018, 6:30 PM

Talkdesk
Rua Tierno Galvan, Torre 3 das Amoreiras, 15º Piso 1070-274 Lisboa, PT

39 Membros Attending

DETAILS ScyllaDB: NoSQL at Ludicrous Speed ScyllaDB is a NoSQL database compatible with Apache Cassandra, distinguishing itself by supporting millions of operations per second, per node, with predictably low latency, on similar hardware. Achieving such speed requires a great deal of mechanical sympathy: ScyllaDB employs a totally asynchronous, shar…

Check out this Meetup →

 

SQL NYC, The NoSQL & NewSQL Database Meetup – New York, NY – June 19th
Later this month we’ll be in New York for the SQL NYC NoSQL Database Meetup. Join this informative session to hear Oracle Data Cloud’s Principal Software Engineer discuss his team’s use cases for a redesigned streaming system. He will cover the challenges they encountered while prototyping various data stores to meet their needs and the client-side changes that ultimately made the project a success. Plus snacks and swag!

⏩ Choosing the Right NoSQL Datastore: Journeying From Apache Cassandra to Scylla

Tuesday, Jun 19, 2018, 6:30 PM

Galvanize, 2nd Floor
303 Spring St. New York, NY

87 Data Enthusiasts Attending

• Free fun-food, beer, swag and more! • Aaron Stockton, Principal Software Engineer, Oracle Data Cloud • A Database Month event http://www.DBMonth.com/database/oracle-data-cloud Join this informative session to hear from Oracle Data Cloud’s Principal Software Engineer discuss his team’s use cases for a redesigned streaming system, along with the ch…

Check out this Meetup →

The post ScyllaDB Coming to a City Near You! appeared first on ScyllaDB.

Lessons from Building Observability Tools at Netflix

Our mission at Netflix is to deliver joy to our members by providing high-quality content, presented with a delightful experience. We are constantly innovating on our product at a rapid pace in pursuit of this mission. Our innovations span personalized title recommendations, infrastructure, and application features like downloading and customer profiles. Our growing global member base of 125 million members can choose to enjoy our service on over a thousand types of devices. If you also consider the scale and variety of content, maintaining the quality of experience for all our members is an interesting challenge. We tackle that challenge by developing observability tools and infrastructure to measure customers’ experiences and analyze those measurements to derive meaningful insights and higher-level conclusions from raw data. By observability, we mean analysis of logs, traces, and metrics. In this post, we share the following lessons we have learned:

  • At some point in business growth, we learned that storing raw application logs won’t scale. To address scalability, we switched to streaming logs, filtering them on selected criteria, transforming them in memory, and persisting them as needed.
  • As applications migrated to having a microservices architecture, we needed a way to gain insight into the complex decisions that microservices were making. Distributed request tracing is a start, but is not sufficient to fully understand application behavior and reason about issues. Augmenting the request trace with application context and intelligent conclusions is also necessary.
  • Besides analysis of logging and request traces, observability also includes analysis of metrics. By exploring metrics anomaly detection and metrics correlation, we’ve learned how to define actionable alerting beyond just threshold alerting.
  • Our observability tools need to access various persisted data types. Choosing which kind of database to store a given data type depends on how each particular data type is written and retrieved.
  • Data presentation requirements vary widely between teams and users. It is critical to understand your users and deliver views tailored to a user’s profile.

Scaling Log Ingestion

We started our tooling efforts with providing visibility into device and server logs, so that our users can go to one tool instead of having to use separate data-specific tools or logging into servers. Providing visibility into logs is valuable because log messages include important contextual information, especially when errors occur.

However, at some point in our business growth, storing device and server logs didn’t scale because the increasing volume of log data caused our storage cost to balloon and query times to increase. Besides reducing our storage retention time period, we addressed scalability by implementing a real-time stream processing platform called Mantis. Instead of saving all logs to persistent storage, Mantis enables our users to stream logs into memory, and keep only those logs that match SQL-like query criteria. Users also have the choice to transform and save matching logs to persistent storage. A query that retrieves a sample of playback start events for the Apple iPad is shown in the following screenshot:

Mantis query results for sample playback start events

Once a user obtains an initial set of samples, they can iteratively refine their queries to narrow down the specific set of samples. For example, perhaps the root cause of an issue is found from only samples in a specific country. In this case, the user can submit another query to retrieve samples from that country.

The key takeaway is that storing all logs in persistent storage won’t scale in terms of cost and acceptable query response time. An architecture that leverages real-time event streams and provides the ability to quickly and iteratively identify the relevant subset of logs is one way to address this problem.

Distributed Request Tracing

As applications migrated to a microservices architecture, we needed insight into the complex decisions that microservices are making, and an approach that would correlate those decisions. Inspired by Google’s Dapper paper on distributed request tracing, we embarked on implementing request tracing as a way to address this need. Since most inter-process communication uses HTTP and gRPC (with the trend for newer services to use gRPC to benefit from its binary protocol), we implemented request interceptors for HTTP and gRPC calls. These interceptors publish trace data to Apache Kafka, and a consuming process writes trace data to persistent storage.

The following screenshot shows a sample request trace in which a single request results in calling a second tier of servers, one of which calls a third-tier of servers:

Sample request trace

The smaller squares beneath a server indicate individual operations. Gray-colored servers don’t have tracing enabled.

A distributed request trace provides only basic utility in terms of showing a call graph and basic latency information. What is unique in our approach is that we allow applications to add additional identifiers to trace data so that multiple traces can be grouped together across services. For example, for playback request traces, all the requests relevant to a given playback session are grouped together by using a playback session identifier. We also implemented additional logic modules called analyzers to answer common troubleshooting questions. Continuing with the above example, questions about a playback session might be why a given session did or did not receive 4K video, or why video was or wasn’t offered with High Dynamic Range.

Our goal is to increase the effectiveness of our tools by providing richer and more relevant context. We have started implementing machine learning analysis on error logs associated with playback sessions. This analysis does some basic clustering to display any common log attributes, such as Netflix application version number, and we display this information along with the request trace. For example, if a given playback session has an error log, and we’ve noticed that other similar devices have had the same error with the same Netflix application version number, we will display that application version number. Users have found this additional contextual information helpful in finding the root cause of a playback error.

In summary, the key learnings from our effort are that tying multiple request traces into a logical concept, a playback session in this case, and providing additional context based on constituent traces enables our users to quickly determine the root cause of a streaming issue that may involve multiple systems. In some cases, we are able to take this a step further by adding logic that determines the root cause and provides an English explanation in the user interface.

Analysis of Metrics

Besides analysis of logging and request traces, observability also involves analysis of metrics. Because having users examine many logs is overwhelming, we extended our offering by publishing log error counts to our metrics monitoring system called Atlas, which enables our users to quickly see macro-level error trends using multiple dimensions, such as device type and customer geographical location. An alerting system also allows users to receive alerts if a given metric exceeds a defined threshold. In addition, when using Mantis, a user can define metrics derived from matching logs and publish them to Atlas.

Next, we have implemented statistical algorithms to detect anomalies in metrics trends, by comparing the current trend with a baseline trend. We are also working on correlating metrics for related microservices. From our work with anomaly detection and metrics correlation, we’ve learned how to define actionable alerting beyond just basic threshold alerting. In a future blog post, we’ll discuss these efforts.

Data Persistence

We store data used by our tools in Cassandra, Elasticsearch, and Hive. We chose a specific database based primarily on how our users want to retrieve a given data type, and the write rate. For observability data that is always retrieved by primary key and a time range, we use Cassandra. When data needs to be queried by one or more fields, we use Elasticsearch since multiple fields within a given record can be easily indexed. Finally, we observed that recent data, such as up to the last week, is accessed more frequently than older data, since most of our users troubleshoot recent issues. To serve the use case where someone wants to access older data, we also persist the same logs in Hive but for a longer time period.

Cassandra, Elasticsearch, and Hive have their own advantages and disadvantages in terms of cost, latency, and queryability. Cassandra provides the best, highest per-record write and read rates, but is restrictive for reads because you must decide what to use for a row key (a unique identifier for a given record) and within each row, what to use for a column key, such as a timestamp. In contrast, Elasticsearch and Hive provide more flexibility with reads because Elasticsearch allows you to index any field within a record, and Hive’s SQL-like query language allows you to match against any field within a record. However, since Elasticsearch is primarily optimized for free text search, its indexing overhead during writes will demand more computing nodes as write rate increases. For example, for one of our observability data sets, we initially stored data in Elasticsearch to be able to easily index more than one field per record, but as the write rate increased, indexing time became long enough that either the data wasn’t available when users queried for it, or it took too long for data to be returned. As a result, we migrated to Cassandra, which had shorter write ingestion time and shorter data retrieval time, but we defined data retrieval for the three unique keys that serve our current data retrieval use cases.

For Hive, since records are stored in files, reads are relatively much slower than Cassandra and Elasticsearch because Hive must scan files. Regarding storage and computing cost, Hive is the cheapest because multiple records can be kept in a single file, and data isn’t replicated. Elasticsearch is most likely the next more expensive option, depending on the write ingestion rate. Elasticsearch can also be configured to have replica shards to enable higher read throughput. Cassandra is most likely the most expensive, since it encourages replicating each record to more than one replica in order to ensure reliability and fault tolerance.

Tailoring User Interfaces for Different User Groups

As usage of our observability tools grows, users have been continually asking for new features. Some of those new feature requests involve displaying data in a view customized for specific user groups, such as device developers, server developers, and Customer Service. On a given page in one of our tools, some users want to see all types of data that the page offers, whereas other users want to see only a subset of the total data set. We addressed this requirement by making the page customizable via persisted user preferences. For example, in a given table of data, users want the ability to choose which columns they want to see. To meet this requirement, for each user, we store a list of visible columns for that table. Another example involves a log type with large payloads. Loading those logs for a customer account increases the page loading time. Since only a subset of users are interested in this log type, we made loading these logs a user preference.

Examining a given log type may require domain expertise that not all users may have. For example, for a given log from a Netflix device, understanding the data in the log requires knowledge of some identifiers, error codes, and some string keys. Our tools try to minimize the specialized knowledge required to effectively diagnose problems by joining identifiers with the data they refer to, and providing descriptions of error codes and string keys.

In short, our learning here is that customized views and helpful context provided by visualizations that surface relevant information are critical in communicating insights effectively to our users.

Conclusion

Our observability tools have empowered many teams within Netflix to better understand the experience we are delivering to our customers and quickly troubleshoot issues across various facets such as devices, titles, geographical location, and client app version. Our tools are now an essential part of the operational and debugging toolkit for our engineers. As Netflix evolves and grows, we want to continue to provide our engineers with the ability to innovate rapidly and bring joy to our customers. In future blog posts, we will dive into technical architecture, and we will share our results from some of our ongoing efforts such as metrics analysis and using machine learning for log analysis.

If any of this work sounds exciting to you, please reach out to us!

— Kevin Lew (@kevinlew15) and Sangeeta Narayanan (@sangeetan)


Lessons from Building Observability Tools at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.