Scylla Enterprise Release 2019.1.5

Scylla Enterprise Release Notes

The ScyllaDB team announces the release of Scylla Enterprise 2019.1.5, a production-ready Scylla Enterprise patch release. As always, Scylla Enterprise customers are encouraged to upgrade to Scylla Enterprise 2019.1.5 in coordination with the Scylla support team.

The focus of Scylla Enterprise 2019.1.5 is improving stability, robustness, reducing memory footprint and bug fixes. The most urgent fix is for users using both Hinted Handoff and Counters. More below.

Related Links

Fixed issues in this release are listed below, with open source references, if present:

  • Correctness: Hinted handoff (HH) sends counter hints as counter updates when a node targeted by the hint does not exist. This may cause wrong counter value when HH is enabled, Counters are used, and nodes are down. #5833 #4505
  • Correctness: wrong encoding of a negative value of type varint. More details in #5656
  • Correctness: Materialized Views: virtual columns in a schema may not be propagated correctly #4339
  • CQL: error formats field name as a hex string instead of text #4841
  • Stability: Running nodetool clearsnapshot can cause a failure, if a new snapshot is created at the exact same time #4554 #4557
  • Stability: using an invalid time UUID can cause Scylla to exit. For example
    select toDate(max(mintimeuuid(writetime(COLUMN))))#5552
  • Stability: out of memory in cartesian product IN queries, where each column filter is multiple by all other filters. For example:
    create table tab (
    pk1 int, pk2 int, pk3 int, pk4 int, pk5 int, pk6 int, pk7 int, pk8 int, pk9 int,
    primary key((pk1, pk2, pk3, pk4, pk5, pk6, pk7, pk8, pk9))
    );
    
    select * from tab where pk1 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk2 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk3 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk4 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk5 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk6 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk7 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk8 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    and pk9 in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    
    ServerError: std::bad_alloc

    #4752

  • Stability: IPv6 – seed nodes do not connect to peers with the scope field #5225
  • Stability: a graceful shutdown fais produce an error when Tracing is on
    [with Service = tracing::tracing]: Assertion `local_is_initialized()' #5243
  • Stability: running a misformatted ALTER command on a UDT will cause a crash #4837
  • Stability: Replace dead node for a seed is allowed but does not work #3889
  • Stability: Immediate abort on OS errors EBADF (Bad file number) and ENOTSOCK (Socket operation on non-socket). These errors usually hint on a deeper root cause, and aborting as soon as possible is both safer and makes it easier to analyze.
  • Stability: downgrade assert on row: append(): downgrade to an error. The assertion, introduced in 2019.1.4 proved to be too strict, aborting on cases which are not fatal.
  • Stability: storage_proxy: limit resources consumed in cross-shard operations
    Performance: New configuration option, enable_shard_aware_drivers, allows disabling shard aware drivers from the server (Scylla) side
  • Reduce memory footprint: Scylla keeps SSTable metadata in memory #4915
  • Reduce memory footprint: cell locking data structures consume 64KiB of RAM per table #4441
  • Install: Scylla installs wrong java version on Ubuntu 18.04 #4548

The post Scylla Enterprise Release 2019.1.5 appeared first on ScyllaDB.

Golang and Scylla – Using the GoCQLX Package

We recently published a new lesson in Scylla University, using GoCQLX to interact with a Scylla cluster. This is a summary of the lesson.

The GoCQLX package is an extension to the GoCQL driver. It improves developer productivity without sacrificing any performance.

It’s inspired by sqlx, a tool for working with SQL databases, but it goes beyond what sqlx provides. For example:

  • Builders for creating queries
  • Support for named parameters in queries
  • Support for binding parameters from struct fields, maps, or both
  • Scanning query results into structs based on field names
  • Convenient functions for common tasks such as loading a single row into a struct or all rows into a slice (list) of structs

GoCQLX is fast. Its performance is comparable to GoCQL. You can find some benchmarks here.

Creating a Sample GoCQLX Application

The sample application that we will go over is very similar to the one we saw in the lesson Golang and Scylla Part 1. It will connect to a Scylla cluster, display the contents of the Mutant Catalog table, insert and delete data, and show the contents of the table after each action. This blog post goes through each section of the code that will be used. To see more details and run the code check out the full lesson.

For this application, the main class is called main, and the code is stored in a file called main.go.
In the application, we first need to import the GoCQLX package:

import (
    "goapp/internal/log"
    "goapp/internal/scylla"
    "github.com/gocql/gocql"
    "github.com/scylladb/gocqlx"
    "github.com/scylladb/gocqlx/qb"
    "github.com/scylladb/gocqlx/table"
    "go.uber.org/zap"
)

The createStatements() function simply sets up the different queries we want to use later in the program.

var stmts = createStatements()
func createStatements() *statements {
    m := table.Metadata{
        Name: "mutant_data",
        Columns: []string{"first_name", "last_name", "address", "picture_location"},
        PartKey: []string{"first_name", "last_name"},
    }
    tbl := table.New(m)
    deleteStmt, deleteNames := tbl.Delete()
    insertStmt, insertNames := tbl.Insert()
    // Normally a select statement such as this would use `tbl.Select()` to select by
    // primary key but now we just want to display all the records...
    selectStmt, selectNames := qb.Select(m.Name).Columns(m.Columns...).ToCql()
    return &statements{
        del: query{
        stmt: deleteStmt,
        names: deleteNames,
        },
        ins: query{
        stmt: insertStmt,
        names: insertNames,
        },
        sel: query{
        stmt: selectStmt,
        names: selectNames,
        },
    }
}

We can set up the different static queries for later use, as they do not change. This also means that they don’t have to be computed every time. We use the “gcqlx/table” package to create a representation of the tables and as a result, the queries we need are created. Beyond the initial configuration, we don’t have to worry about CQL syntax or getting variable names correct.

type query struct {
    stmt string
    names []string
}

type statements struct {
    del query
    ins query
    sel query
}

The query and statements structs are convenient constructs to allow us to access the different queries in a modular and unified way. Although not required, you can group them to reduce clutter in the code.

func deleteQuery(session *gocql.Session, firstName string, lastName string, logger *zap.Logger) {
    logger.Info("Deleting " + firstName + "......")
    r := Record{
        FirstName: firstName,
        LastName: lastName,
    }
    err := gocqlx.Query(session.Query(stmts.del.stmt), stmts.del.names).BindStruct(r).ExecRelease()
    if err != nil {
        logger.Error("delete catalog.mutant_data", zap.Error(err))
    }
}

The deleteQuery function performs a delete in the database. It uses a Record struct to identify which row to delete. Gocqlx takes the values in the struct and sends them on as parameters to the underlying prepared statement that is sent to the database.

func insertQuery(session *gocql.Session, firstName, lastName, address, pictureLocation string, logger *zap.Logger) {
    logger.Info("Inserting " + firstName + "......")
    r := Record{
        FirstName: firstName,
        LastName: lastName,
        Address: address,
        PictureLocation: pictureLocation,
    }
    err := gocqlx.Query(session.Query(stmts.ins.stmt), 
    stmts.ins.names).BindStruct(r).ExecRelease()
    if err != nil {
        logger.Error("insert catalog.mutant_data", zap.Error(err))
    }
}

The function insertQuery works in the same way as the deleteQuery function. It uses a Record instance as data for the insert query using the BindStruct function.

Note the similarity between the different queries. They are almost identical and the only difference is the amount of data in the insert Record instance.

func selectQuery(session *gocql.Session, logger *zap.Logger) {
    logger.Info("Displaying Results:")
    var rs []Record
    err := gocqlx.Query(session.Query(stmts.sel.stmt), stmts.sel.names).SelectRelease(&rs)
    if err != nil {
        logger.Warn("select catalog.mutant", zap.Error(err))
        return
    }
    for _, r := range rs {
    logger.Info("\t" + r.FirstName + " " + r.LastName + ", " + r.Address + ", " + r.PictureLocation)
    }
}

The function selectQuery reads all the available records into the local variable “rs”. Note that this query loads the entire dataset into memory and if you know that you have too many rows for this then you should use an iterator based approach instead.

type Record struct {
    FirstName string `db:"first_name"`
    LastName string `db:"last_name"`
    Address string `db:"address"`
    PictureLocation string `db:"picture_location"`
}

The “Record” struct is a convenient example that illustrates some data that you might have in a database. Please take note of the annotations that names the database column names. They are here to ensure that the correct column is used for the given field.

func main() {
    logger := log.CreateLogger("info")

    cluster := scylla.CreateCluster(gocql.Quorum, "catalog", "scylla-node1", "scylla-node2", "scylla-node3")
    session, err := gocql.NewSession(*cluster)
    if err != nil {
        logger.Fatal("unable to connect to scylla", zap.Error(err))
    }
    defer session.Close()

    selectQuery(session, logger)
    insertQuery(session, "Mike", "Tyson", "12345 Foo Lane", "http://www.facebook.com/mtyson", logger)
    insertQuery(session, "Alex", "Jones", "56789 Hickory St", "http://www.facebook.com/ajones", logger)
    selectQuery(session, logger)
    deleteQuery(session, "Mike", "Tyson", logger)
    selectQuery(session, logger)
    deleteQuery(session, "Alex", "Jones", logger)
    selectQuery(session, logger)
}

The main function is the entry point of every Go program. Here we first connect to the database by using the two functions scylla.CreateCluster and gocql.NewSession. This is the code for the scylla.CreateCluster function (located here: mms/go/internal/scylla/cluster.go):

func CreateCluster(consistency gocql.Consistency, keyspace string, hosts ...string) *gocql.ClusterConfig {
    retryPolicy := &gocql.ExponentialBackoffRetryPolicy{
        Min: time.Second,
        Max: 10 * time.Second,
        NumRetries: 5,
    }
    cluster := gocql.NewCluster(hosts...)
    cluster.Keyspace = keyspace
    cluster.Timeout = 5 * time.Second
    cluster.RetryPolicy = retryPolicy
    cluster.Consistency = consistency
    cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy())
    return cluster
}

The scylla.CreateCluster is a little helper function to get the driver configuration out of the way. gocql.NewSession connects to the database and sets up a session that can be used to issue queries against the database.

Conclusion

This blog explains how to create a sample Go application that executes a few basic CQL statements with a Scylla cluster using the GoCQLX package. It’s only an overview. For more details, including the code, the instructions for running it, check out the full lesson at Scylla University.

The course Using Scylla Drivers covers drivers in different languages such as Python, Node.js, Golang and Java and how to use them for application development. If you haven’t done so yet, register as a user for Scylla University and start learning. It’s free!

REGISTER NOW FOR SCYLLA UNIVERSITY

The post Golang and Scylla – Using the GoCQLX Package appeared first on ScyllaDB.

Nauto: Achieving Consistency in an Eventually Consistent Environment

Today we feature a guest blog by Rohit Saboo, Machine Learning Engineering Lead at Nauto. Rohit has spoken at multiple Scylla Summits, and in 2018 his work at Nauto was awarded the Scylla User Award for More Interesting Technical Use Case.

How we build trips consistently in a distributed system

Early on, I noticed that one of our services would often have errors in retrieving data from the database. Upon investigation, we discovered that some of the queries were being run with full consistency, which in this case was the equivalent of enforcing that if any single node in a three node database cluster is not reachable the query should fail. The CAP theorem in distributed systems states that you cannot have it all — a system that is fully consistent will not be available in the event of a network partition and if we want it to be available, we cannot do so with assured full consistency.

For most applications that do not involve a bank account, having less than full consistency is often more than enough. Still, the engineer who wrote the code and had previously only worked with fully consistent, SQL systems exclaimed, “How can we run a production system without full consistency!”

This leads one to question if the limits imposed by the CAP theorem are truly absolute or is it possible to get around them for specific applications?

The Problem

One of Nauto’s technologies is a “trip builder” that processes and analyzes large volumes of data in the cloud to enable a number of product features. The trip builder needs to have low latency, high availability, and good reliability. It also needs to be cost-effective and scalable with low operational overhead.

It needs to process data by batching it up into chunks, which even for a small scale would amount to over 2,000 write operations/second with each operation using the trip route and associated data for that time segment.
Additionally, due to various reasons in how distributed systems operate, a large percentage of the data does not arrive in-order and sometimes sending the latest information first when recovering from errors is required. As a result, a solution that simply appends data at the end does not work for us.

Data Arriving at Percentage.
end of trip

in-between a trip

beginning of trip

between two trips causing them to merge

46%

34%

19%

1%

Achieving a few of these goals is perhaps easy; achieving all of them together is not.

Solution

We built a distributed time-series merging solution modeled on a new ACID paradigm that provides strong application-level guarantees without being limited by the CAP hypothesis using Scylla as the backing store.

While designing the solution, one of the decisions we needed to make was which database would store all the data. We explored various technologies based on publicly available benchmarks, and concluded that traditional SQL solutions such as Postgres would not scale (let alone achieve other goals). We considered sharded Postgres, but the management/operational overhead brought on by such a system would not scale well. Other solutions we considered could not meet one or more of our requirements for reliability, cost, or throughput. We decided to go with a Cassandra-like solution and settled on Scylla for reasons of reliability, cost, latency, and maintainability.

When we consider NoSQL solutions, we also have to decide how we trade off between consistency and availability. Do we really need to be consistent? What does consistency really mean for our data? We will explore later how in our particular problem, we can provide strong theoretical guarantees that the solution is perfectly safe under eventual consistency.

The GPS points along with the vehicle state are ingested by an edge service, which forward the data to Kafka. The Kafka partitions are sharded by the device id. Multiple instances of the trip building service consume the data from these partitions. Each of these instances internally subdivides the work across several workers. The work is distributed using a different hash function and a slightly different sharding scheme which relies on both the device id and the day bucket. (Using the same sharding scheme and a number of workers that is not coprime with the number of Kafka partitions would have resulted in some workers not receiving any data.) This setup ensures that all data relevant to a particular time range (except for the edge case of cross day trips) are delivered to the same worker, i.e., two different workers never process data meant for the same trip (time range) barring rare conditions such as network partitions causing worker disconnects followed by Kafka rebalances and lagging workers. (Network partitions do tend to happen because it’s relatively easy to saturate the network bandwidth when working at high throughputs on one of the smaller AWS instances.)

This algorithm reads and writes from the trips table, whose schema is:

CREATE TABLE trips (
    version int,
    id text,
    bucket timestamp,
    end_ms timestamp,
    start_ms timestamp,
    details blob,
    summary blob,
    PRIMARY KEY ((version, id, bucket), end_ms, start_ms))
WITH CLUSTERING ORDER BY (end_ms DESC, start_ms DESC)

Where

  • version allows us to run multiple versions or experiments at the same time while maintaining data separation.
  • id is the device id (or id for the timeseries).
  • bucket is the day bucket.
  • start_ms, end_ms are the trip start and end times respectively.
  • details contains the detailed information related to the trip, e.g., the location coordinates for the route. We save this as a compressed, serialized protocol buffer as opposed to json because it both saves space and stops the individual cell sizes from becoming too large. This could have been saved as compressed, serialized flatbuffers, too, if one were very sensitive to client-side deserialization costs.
  • summary contains summarized information from the trip such as total distance traveled. The summary fields avoid the need to load details each time we retrieve a trip but don’t really need the path. Again we use compressed, serialized protocol buffers to save this.

The partition key is chosen to include the version, id, and the day bucket because we don’t need to access data across two different partitions at the same time for this algorithm. The clustering key is chosen to make the common case of in-order arrival of data as optimal as possible.

The table has a replication factor of 3. All writes to the table are written with consistency level Quorum so that we are sure that at least two replicas have the data. Now if we want to guarantee that we are reading the latest data that was written, given that we have a three node cluster, one could do all reads with consistency level quorum. However, that is not important for us, and therefore we read with a consistency level of one.

The Merging Step

Let’s follow the algorithm through an example. Each sequence of timestamped location coordinates forms a trip segment. For illustrative purposes, we will use batch sizes much larger than normal. Let’s say the device has sent the following trip segment so far.

After a while, we receive some additional data for this trip, but there’s a large break in between because data was sent out of order due to some connectivity issue. We consider two trip segments to be part of the same trip only if they are within five minutes of each other.

We query the table to see if there is a trip segment that ended within five minutes of this trip segment starting or that starts within five minutes of this trip segment ending. This constraint can be simplified by the following observation: If there’s only a single thread operating on trip segments for a device at a time, we should have at most one trip segment before and one trip segment after within range of being merged. This is because if we have more than these, they would have been merged in some previous step. However, given that we are working with eventual consistency, we may end up with more than two trips for merging with potential overlap. We will address this consistency issue later. For now, we can use this observation to simplify by only looking for the first three trips (one more than strictly necessary so that at each operation, we allow for one additional trip not previously merged to get merged) that end after five minutes before the current trip segment start time.

In the example, there’s nothing to merge the new trip segment with. Therefore, we save it as it is. Next, we get a new trip segment for which both of the previously received trip segments fall within its 5 minute neighborhood.

Therefore, we create a new combined trip, save it back, and delete the former ones. We first save the combined trip and then delete the older ones so that if anything failed in between, all we have is a couple of trips that overlap with some larger trip but were not deleted and will get deleted sometime later. You could also ask why not use logged batches here? Logged batches are expensive and this operation happens a lot. Therefore we choose not to use logged batches.

We could define the merging function as taking two trip segments and producing a merged trip segment m.

f(a,b) → m

This function is

  • Associative:
    If there are three trips to merge, say a, b, and c, both
    f(f(a,b), c) and f(a, f(b, c))
    will yield the same result: it doesn’t matter which of those two trips we merge first.
  • Commutative:
    f(a, b) = f(b, a).
    It doesn’t matter in which order we give the trips, the merged trip is always the same.
  • Idempotent:
    f(a, f(a, b)) = f(a, b)
    The merged path always de-duplicates and orders the time series data before saving it.
  • Deterministic:
    f(a, b) is always the same.
    There’s no non-determinism in the merging process and the output is always the same.

Such a function is also sometimes referred to as having ACID 2.0 properties. For such a function, starting with any set of inputs, we will always arrive at the same output irrespective of the path of operations chosen. A program implementing this function is also referred to as monotonic. One could reason that the monotonicity falls out because in order to arrive at the output we don’t have to run any coordination between the distributed execution of this function so that a particular execution order is guaranteed.

Let’s take an example with two instances and three trips. Let’s call these trips T0 (10:00 am — 10:10 am), T1 (10:10 am — 10:20 am), and T2 (10:20 am — 10:30 am), and the workers as W0 and W1.

Running this coordination-free, let’s look at some of the ordering of operations

Case 1:

  • W0 loads T0 and T1.
  • W0 writes merged trip of T0 and T1 (T01).
  • W0 deletes T0 and T1.
  • W1 loads T01 and T2
  • W1 writes merged trip of T01 and T2 (T012).
  • W1 deletes T01 and T2.

Case 2:

  • W0 loads T1 and T2.
  • W0 writes merged trip of T1 and T2 (T12).
  • W0 deletes T1 and T2.
  • W1 loads T0 and T12.
  • W2 writes merged trip of T0 and T12 (T012).
  • W2 deletes T0 and T12.

Case 3: There’s a “race” between W0 and W1.

  • W0 loads T0 and T1.
  • W1 loads T1 and T2.
  • W0 writes merged trip of T0 and T1 (T01).
  • W1 writes merged trip of T1 and T2 (T12).
  • W0 deletes trips T0 and T1.
  • W1 deletes trips T1 and T2. The delete operation for T1 may fail at this point, but that’s okay.
  • W0 now loads T01 and T12.
  • W0 writes merged trip T012. (Our merge operation is defined such that we de-duplicate the data inside.

The article “Relational Transducers for Declarative Networking” by Ameloot, Neven, et al. [pdf] substantiates this method.

If we apply this function at all reads and writes to this table, including reads where we are surfacing this data in various frontends, we are guaranteed to have consistent data, where data consistency includes correctness, no duplicates, and being in-order, but does not include having the latest data.

Scylla deployment

Our Scylla cluster is comprised of three i3.4xlarge EC2 instances spread across three availability zones where the availability zones are the same as the ones in which the cloud services run. AWS inter-availability zone network bandwidth as well as bandwidth on smaller instances often tends to be lower. Additionally, the high update rate created a large internode traffic in order for Scylla to manage the replication, resulting in us running into network bandwidth limits. (This also happened to be quite expensive and was several multiples of the cost of everything else.) At the time of writing this article, Scylla does not enable internode compression by default, and we enabled it to stop us from running into bandwidth limits and reduce network costs. The cluster has its own VPC with limited ports exposed to only the relevant VPCs that run services needing access to Scylla.

For each region, we run a completely isolated cluster to ensure that data related to customers in that region stays within the physical boundary of the particular region. In order to simplify management and operation of these isolated Scylla clusters, we use cloudformation with a custom AMI based on Scylla’s public AMI that can bring up the entire cluster with a single cloudformation command allowing us to have an identical setup across various environments. This cloudformation template also includes Scylla manager and the Grafana/Prometheus Scylla monitoring stack running on a different instance. Backups are done using the cassandras3 tool. The tool is configured to run via a cron schedule on each machine at midnight local time.

Overall, we have been able to find a low management overhead, low latency, cost-effective solution by using Scylla for our DB layer for our algorithms.

Conclusion

In conclusion, we see how formulating our problem as a monotonic function has helped us build a coordination-free algorithm on top of Scylla that achieves all of our goals — high availability, good reliability, cost-effective, and scalable with low operational overhead.

About Nauto

NautoⓇ is the only real-time AI-powered, Driver Behavior Learning Platform to help predict, prevent, and reduce high-risk events in the mobility ecosystem. By analyzing billions of data points from over 400 million AI-processed video miles, Nauto’s machine learning algorithms continuously improve and help to impact driver behavior before events happen, not after. Nauto has enabled the largest commercial fleets in the world to avoid more than 25,000 collisions, resulting in nearly $100 million in savings. Nauto is located in North America, Japan, and Europe. Learn more at nauto.com or on LinkedIn, Facebook, Twitter and YouTube.

If you’ve found this blog illuminating, please go to the original posted on Medium and give Rohit fifty claps. Meanwhile, if you have more questions about Scylla itself, you can contact us through our website, or drop in and join the conversation on Slack.

The post Nauto: Achieving Consistency in an Eventually Consistent Environment appeared first on ScyllaDB.

Scylla Open Source Release 3.1.3

Scylla Open Source Release Notes

The ScyllaDB team announces the release of Scylla Open Source 3.1.3, a bugfix release of the Scylla 3.1 stable branch. Scylla Open Source 3.1.3, like all past and future 3.x.y releases, is backward compatible and supports rolling upgrades.

Please note the latest stable Scylla release is 3.2.1, and 3.3 is in the RC phase.
Once 3.3 is released, we will stop support for the 3.1.x release.

Related links:

Issues fixed in this release

  • Stability: updates of MVs, for example after a node restart, can take too much IO resources from on-line requests, increasing their latency. #4615
  • Stability: User Defined Types (UDT) in columns with descending order may fail #4672
  • Stability: ALTER of a nested UDT will fail when user try to insert data into the new type #5049
  • Stability: running a mis formatted ALTER command on a UDT will cause a crush #4837
  • Stability: a graceful shutdown fais produce an an error when Tracing is on
    [with Service = tracing::tracing]: Assertion `local_is_initialized()’ #5243
  • Stability (ARM): possible Use-after-move in make_multishard_streaming_reader() #5419
  • CQL: error formats field name as a hex string instead of text #4841
  • Performance: Materialized views use scan readers for single-partition scans during read-before-write which causes read amplification #5418
  • Install: ubuntu/debian missing scyllatop files #5518
[1] Note: if and only if you installed a fresh Scylla 3.1.0, you must add the following line to scylla.yaml of each node before upgrading to 3.1.3:
enable_3_1_0_compatibility_mode: true

This is not relevant if your cluster was upgraded to 3.1.0 from an older version, or you are upgrading from or to any other Scylla releases, like 3.1.1.

If you have doubts, please contact us using the user mailing list.

The post Scylla Open Source Release 3.1.3 appeared first on ScyllaDB.

Instaclustr Announces PCI-DSS Certification

Instaclustr is very pleased to announce that we have achieved PCI-DSS certification for our Managed Apache Cassandra and Managed Apache Kafka offerings running in AWS. PCI-DSS (Payment Card Industry – Data Security Standard) is a mandated standard for many financial applications and we increasingly see the PCI-DSS controls adopted as the “gold standard” in other industries where the highest standards of security are crucial. PCI-DSS certification adds to our existing SOC2 accreditation to provide the levels of security assurance required by even the most demanding business requirements. 

Overall, this certification effort was the most significant single engineering project in the history of Instaclustr, requiring several person years of engineering effort to implement well over 100 changes touching every aspect of our systems over the course of several months. We’re very proud that, despite this level of change, impact to our customers has been absolutely minimal and we’re able to deliver another very significant piece of background infrastructure, allowing a wider range of customers to focus their efforts on building innovative business applications based on open source data technologies.

While PCI-DSS compliance may not be required by all customers, and is only supported on selected Instaclustr products, most of the security enhancements we have implemented will result in improved levels of security for all our Managed Service customers, regardless of product or platform. The most significant of these changes are:

  • Tightening of our admin access environment with technical controls to prevent egress of data via our admin systems.
  • Improved logging and auditing infrastructure.
  • Tightened operating system hardening and crypto standards.
  • Addition of a WAF (Web Application Firewall) in front of our console and APIs.
  • More automated scanning, and tightened resolution policies, for code dependency vulnerabilities.
  • More frequent security scanning of our central management systems.
  • More developer security training.

Customers wishing to achieve full PCI-DSS compliance will need to opt-in when creating a cluster as achieving PCI compliance will enforce a range of more restrictive security options (for example, password complexity in the Instaclustr console and use of Private Network Clusters) and enabling the required additional logging on the cluster incurs a performance penalty of approximately 5%.  There are also a set of customer responsibilities that customers must implement for full compliance. Additional technical controls activated for PCI compliant clusters include:

  • Logging of all user access to the managed applications (Cassandra, Kafka)
  • Locked-down outbound firewall rules
  • Second approver system for sudo access for our admins

For full details please see our support page.

Customers with existing clusters who wish to move to full PCI compliance should contact support@instaclustr.com who will arrange a plan to apply the new controls to your cluster.

We will be publishing more detail on many of these controls in the coming weeks and holding webinars to cover the Cassandra and Kafka specific implementation details which we expect will be of broad interest. In the meantime, should you have any interest in any further information please contact your Instaclustr Customer Success representative or sales@instaclustr.com who will be able to arrange technical briefings.

The post Instaclustr Announces PCI-DSS Certification appeared first on Instaclustr.

Introducing the Kafka Scylla Connector

Apache Kafka is an increasingly foundational component of enterprise Big Data architectures. Data streaming and event-driven systems have rapidly supplanted batch-driven processes since Kafka was first invented by engineers at LinkedIn in 2011. It was then brought into the Apache Software Foundation (ASF) as open-source Apache Kafka as well as made into a commercial product by the good folks over at Confluent. Over the ensuing decade, the drive for real-time stream processing has seen Kafka spread to tens of thousands of companies, including the majority of the Fortune 100.

Apache Kafka is capable of delivering reliable, scalable, high-throughput data streams across a myriad of data sources and sinks. A great number of our open source users and enterprise customers — like IBM, General Electric, Nauto, Grab, and Lookout — use Scylla and Kafka together. For those looking for more information, you can read about using Kafka and Scylla together to, for example, create a scalable backend for an IoT service.

New Shard-Aware Kafka Connector for Scylla

Because Scylla is an API-compatible implementation of Apache Cassandra, to date users who wished to connect Scylla to Kafka have been able to use the Kafka Cassandra Connector (also known as the Confluent Cassandra Sink Connector). Users could simply swap out their Cassandra database with Scylla transparently with no code changes.

However the disadvantage of that approach was that Kafka performance was not as speedy as could be, since it was not fully taking advantage of Scylla’s underlying shard-per-core, shared-nothing architecture.

To remedy this, we are now introducing a new shard-aware Kafka sink connector for Scylla. This will allow you to use Apache Kafka and the Confluent platform to their fullest advantages.

Using Scylla and Apache Kafka Together

Scylla’s high performance NoSQL database is a natural fit with Apache Kafka. Both support the massive scalability and high throughput required in modern event streaming data architectures.

Users are guaranteed flexibility, either via open source, enterprise-grade or cloud-hosted fully managed solutions for both technologies. For example, consider the following combinations:

  • Scylla Open Source and open source Apache Kafka
  • Scylla Enterprise and the Confluent Platform
  • Scylla Cloud and Confluent Cloud managed cloud services

With the new Kafka Scylla Connector you now have the final missing piece with all the flexibility and performance you need to build the next generation of Big Data applications.

Learn More in Our Webinar

Learn more about the advantages of using Scylla together with the Confluent Cloud platform in our upcoming webinar Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL, this Thursday, February 20. Register now!

Further Reading

DOWNLOAD THE KAFKA SCYLLA CONNECTOR NOW!

The post Introducing the Kafka Scylla Connector appeared first on ScyllaDB.

Apache Cassandra 4.0 – Netty Transport

In this series of blog posts, we’ll take a meandering tour of some of the important changes, cool features, and nice ergonomics that are coming with Apache Cassandra 4.0. In part 1 we focussed on Cassandra 4.0 stability and testing, and Part 2 around virtual tables. In this part we will learn about Netty Transport Framework

One of the headline features for Apache Cassandra 4.0 is the refactor of internode messaging to use Javas (non-blocking) NIO capability (https://issues.apache.org/jira/browse/CASSANDRA-8457 and https://issues.apache.org/jira/browse/CASSANDRA-15066) via the Netty library (link to Netty). 

This allows Cassandra to move away from having an N threads per peer model, to a single thread pool for all connections. This dramatically reduces performance issues related to thread signalling, coordination and context switching.

Moving to the Netty framework has also enabled other features like zero copy streaming for SSTables.

Performance improvements have been apparent from day one, both in terms of throughput and latency. The big win however is the significant reduction in tail end latency with up to 40%+ reductions in P99s seen in initial testing.  

Of course patches and improvements are being made all the time during the beta process, so benchmarking will need to be revalidated against each workload and within your own environment, but progress is promising!

https://issues.apache.org/jira/browse/CASSANDRA-14746

The post Apache Cassandra 4.0 – Netty Transport appeared first on Instaclustr.

But Don’t Take It from Us…

We’re always glad to hear about the innovative things our users are doing with Scylla and the benefits they’re getting. Our use cases come in all shapes and sizes–from very large organizations that have migrated to Scylla from some other database to start-ups that use us coming out the gates.

With that in mind, I’m glad to share some of the videos we’ve recently posted to our website and YouTube channel. These testimonials are all less than three minutes. We recorded them at our Scylla Summit last fall, where we asked several of our customers one simple question “What has been your experience with Scylla?”

Here’s what they had to say…

“We’ve reduced our P99, three 9 and four 9 latencies by 95%.”

Comcast has realized a number of benefits from migrating to Scylla — a snappier UI, less administration and a big reduction in nodes. Hear about this and more from Phil Zimich, Senior Director of Engineering at Comcast:



“I have a baby at home; I don’t want a database that requires constant attention.”

Southeast Asia’s leading super app, Grab, uses Scylla for a growing set of use cases. Here you can see what Aravind Srinivasan, Engineering Lead at Grab, has to say about Scylla’s performance and self-optimizing capabilities:

“We did not change our applications one bit. The same drivers, the same commands we were issuing before with Cassandra were working with Scylla with absolutely no changes.”

We often talk about how easy it is to migrate from Cassandra or DataStax to Scylla. But don’t listen to us. David Blythe, Principal Software Engineer at SAS Institute will tell you all about it:

“When we did performance tests for a back-end to JanusGraph we found that Scylla was giving 10 times faster performance than our existing system.”

Leading cyber security company FireEye was looking for a better alternative to their combination of a custom graph database and PostgreSQL. Senior Manager Krishna Palati and DevOps Engineer Rahul Gaikwad share their selection process and why they chose Scylla here:

“Now we can sleep well at night.”

Web browser pioneer Opera uses Scylla to synch browser experiences across desktop and mobile devices for millions of users. You can hear from Engineering Manager Rafal Furmanski and DevOps Engineer Piotr Olchawala on why they chose Scylla and how important reliability of the system has been to them.

“Scylla has reduced latency from 300 ms to 10 ms, and the database clusters had no errors all year”

Alex Bantis, Backend Scala Developer at Tubi, describes how Scylla helped Tubi to unify ML-driven apps with Scala and Spark. Hear about his journey from not knowing Scylla to deploying it for a growing number of use cases:

“Scylla Cloud is pretty much zero maintenance and zero stress!”

Mistaway Systems have come to rely on our fully managed DBaas so they can focus instead on their applications. Hear from Software Engineer Kevin Johnson about the ease of using Scylla Cloud:

“One person can manage a production Scylla cluster for the whole company”

Another great IOT case study can be found at GPS Insight. Hear from Doug Stuns, Senior Database Architect, about how Scylla’s scalability is “off the charts” here:

“We were very pleased with the operational characteristics of Scylla. We were able to meet requirements with far fewer servers than expected.”

Mobile endpoint security company Lookout evaluated a number of different NoSQL options before selecting Scylla. Hear from Richard Ney, Principal at Lookout, on how Scylla compared with DynamoDB in cost and performance:

That’s just a sample of some of our latest customer testimonials. You’ll find a complete set, along with written case studies on our Users page.

Scylla Summit Sessions

Want more details on our customer use cases and the benefits our users are gaining? You’ll be glad to know that we recorded all the presentations from our annual user conference, Scylla Summit. You’ll find them all on the Tech Talks page of our website, along with Keynotes from our executives and engineering talks from our technical staff.

Got a Scylla Story to Tell?

Got an interesting Scylla use case you’d like to share? We’re glad to work with you to do a blog post, a case study or testimonial. Just reach out to us on info@scylladb.com.

The post But Don’t Take It from Us… appeared first on ScyllaDB.