ApacheCon 2019: DataStax Announces Cassandra Monitoring Free Tier, Unified Drivers, Proxy for DynamoDB & More for the Community

It’s hard to believe that we’re celebrating the 20th year of the Apache Software Foundation. But here we are—and it’s safe to say open source has come a long way over the last two decades.

We just got back from ApacheCon, where DataStax—one of the major forces behind the powerful open source Apache Cassandra™ database—was a platinum sponsor this year. 

We don’t know about you, but we couldn’t be more excited about what the future holds for software development and open source technology in particular.

During CTO Jonathan Ellis’ keynote, we announced three exciting new developer tools for the Cassandra community:

  • DataStax Insights (Cassandra Performance Monitoring)
  • Unified Drivers
  • DataStax Spring Boot Starter for the Java Driver

 

While we’re at it, in my talk “Happiness is a hybrid cloud with Apache Cassandra,” I announced our preview release for another open source tool: DataStax Proxy for DynamoDB™ and Apache Cassandra.

This tool enables developers to run their AWS DynamoDB™ workloads on Cassandra. With this proxy, developers can run DynamoDB workloads on-premises to take advantage of the hybrid, multi-model, and scalability benefits of Cassandra.

These tools highlight our commitment to open source and will help countless Cassandra developers build transformative software solutions and modern applications in the months and years ahead. 

Let’s explore each of them briefly.

1. DataStax Insights (Cassandra Performance Monitoring)

Everyone who uses Cassandra—whether they’re developers or operators—stands to benefit from DataStax Insights, a next-generation performance management and monitoring tool that is included with DataStax Constellation, DataStax Enterprise, and open source Cassandra 3.x and higher. 

We’re now offering sign-ups for DataStax Insights (or better said: Cassandra Monitoring) for free, allowing Cassandra users to have an at-a-glance health index to get a single view of all clusters. The tool also enables users to optimize their clusters using AI to recommend solutions to issues, highlight anti-patterns, and identify performance bottlenecks, among other things. 

DataStax Insights is free for all Cassandra users for up to 50 nodes and includes one week of rolling retention. Interested in joining the DataStax Insights early access program? We’re taking sign-ups now (more on that below). 

ACNA2019

2. Unified Drivers

Historically, DataStax has maintained two sets of drivers: one for DataStax Enterprise and one for open source Cassandra users. Moving forward, we are merging these two sets into a single unified DataStax driver for each supported programming language including C++, C#, Java, Python, and node.js. As a result, each unified driver will work for both Cassandra and DataStax products.

This move benefits developers by simplifying driver choice, which makes it easier to determine which driver to use when building applications. At the same time, developers using the open source version of Cassandra will now have free access to advanced features that initially shipped with our premium solutions. Further, developers that have previously used two different sets of drivers will now only need to use one driver for their applications across any DataStax platform and open source Cassandra. This will help with enhanced load balancing and reactive streams support. 

3. Spring Boot Starter 

The DataStax Java Driver Spring Boot Starter, which is now available in DataStax Labs, streamlines the process of building standalone Spring-based applications with Cassandra and DataStax databases.

Developers will enjoy that this tool centralizes familiar configuration in one place while providing easy access to the Java Driver in Spring applications.

It’s just one more way that makes the application development process easier.

4. DataStax Proxy for DynamoDB™ and Apache Cassandra™

With the DataStax Proxy for DynamoDB and Cassandra, developers can run DynamoDB workloads on-premises, taking advantage of the hybrid, multi-model, and scalability benefits of Cassandra.

The proxy is designed to enable users to back their DynamoDB applications with Cassandra. We determined that the best way to help users leverage this new tool and to help it flourish was to make it an open source Apache 2 licensed project.

The code consists of a scalable proxy layer that sits between your app and the database. It provides compatibility with the DynamoDB SDK which allows existing DynamoDB applications to read and write data to Cassandra without application changes.

ACNA2019

Sign up for the DataStax Insights early access program today!

Are you interested in optimizing your on-premises or cloud-based Cassandra deployments using a platform that lets novices monitor and fine-tune their cluster performance like experts? 

If so, you may want to give DataStax Insights a try. 

We’re currently accepting sign-ups to our early access program. Click the button below to get started!

GET STARTED

DataStax Labs

DataStax Labs provides the Apache Cassandra™ and DataStax communities with early access to product previews or enhancements for developers that are being considered for future production software; including tools, aids, and partner software designed to increase productivity. When you try out some of our new Labs technologies, we would love your feedback—good or bad—let us know!

Making Ad Tech Work at Scale: MediaMath Innovates with Scylla

MediaMath Case Study Graphic

As it enters its 13th year of business, MediaMath is leading the charge to create an accountable, addressable, transparent supply chain that is more aligned to brands’ interests. MediaMath provides a globally scaled, enterprise-grade ad tech platform that delivers personalized content across “touchpoints” that include display ads, mobile, video, advanced TV, native, audio, and digital out-of-home.

Supporting advertisers in 42 countries around the world, MediaMath has customers in all verticals, including retail, consumer packaged goods, travel, and finance. Notable customers include IBM, Uber and a wide range of marquee clients who use MediaMath to elevate their marketing.

MediaMath plays a key role in the ad tech landscape by unifying two underlying platform technologies: a demand-side platform (DSP) and a data management platform (DMP).

A DSP is a system that enables marketers and agencies to buy media, globally, from a multitude of sources through open and private markets. A DSP enables marketers to optimize campaigns based on performance indicators like effective cost per click (eCPC), and effective cost per action (eCPA).

A DMP is essentially a data warehouse, ingesting, sorting and storing information, and presenting it to marketers and publishers. DMP is used for modeling, analyzing, and segmenting online customers in digital marketing.

MediaMath bridges data management and demand-side platforms

By unifying DMP and DSP, Mediamath is able to bridge data management and media activation. These combined capabilities enable programmatic buying strategies that scale campaigns and improve their overall performance.

MediaMath provides several other offerings. MediaMath Audiences is a data solution that identifies the best customers using predictive modeling. Mediamath Brain, a machine-learning algorithm, increases advertiser ROI by making millions of buying decisions per second in real-time.

Programmatic advertising at global scale presents significant technical hurdles. MediaMath’s customers expect a real-time response to campaign activity along with segmented audiences at scale. The sheer data volume from media touchpoints can be staggering. The underlying technologies need to support massive throughput, measured in transactions per second. Bid-matching analytics must support queries with real-time operational performance characteristics.

Initially, MediaMath had 60 nodes running Apache Cassandra on AWS servers. The number of nodes and the complexity of Cassandra was too much, operationally, for the MediaMath team. Just keeping Cassandra up and running took three full-time site reliability engineers. When nodes went down, restoring them was an intensive manual process. Combined with the testing and maintenance drudgery around compactions, JVM garbage collection, and constant tuning, Cassandra soon wore out its welcome.

MediaMath found out about Scylla on Reddit. An engineer on the team read about this new drop-in replacement for Cassandra. Noting how Scylla is implemented in C++ instead of Java, he evangelized it inside MediaMath.

Deciding to take a look, the team decided on two fundamental criteria. First, the new database needed to deliver on its claim of Cassandra compatibility. Second, it needed to prove out performance benchmarks. “We wanted to make sure that what we were switching onto was not going to incur a lot of development work on our side, and that worked out great,” said Knight Fu, Director of Engineering at MediaMath

The installation and evaluation process went smoothly. “Of all of the database migrations I’ve worked on in my career, Scylla was the smoothest, for sure,” Knight noted. “There weren’t any tooling changes that were required for our automation either, which was a huge plus. From our point of view, one day it was Cassandra, and the next day it was Scylla. Seemed as though it was as easy as that.”

A key driver in the decision to go with Scylla is its lower operational overhead. Ultimately, Scylla enabled MediaMath to realize efficiency gains and pivot resources to focus on other important projects.

Today, MediaMath runs 17 Scylla nodes on i3.metal AWS instances, handling about 200,000 events per second and performing about a million reads per second. Read latency is consistently below the company’s required 10-millisecond threshold.

“With Scylla, our uptime was tremendous throughout the last holiday. We saw 99.9999% availability with Scylla.”

One of the main benefits for MediaMath is Scylla’s compatibility with Cassandra. “Our access pattern into the database was relatively new, so being able to keep our schemas was a huge benefit. Not having to retool monitoring and active node management when going from Cassandra to Scylla was also really valuable.”

Emera Trujillo is Senior Product Manager at MediaMath. From her perspective, Scylla has helped MediaMath enhance the data management platform service for their customers. “Thanks to Scylla, we’re increasing data retention in our segmentation product, which lets us deliver new value to clients without increasing prices,” she said.

Want to learn more about Scylla first-hand and hear other success stories from our growing user base? Come meet us at Scylla Summit 2019!

REGISTER FOR SCYLLA SUMMIT

The post Making Ad Tech Work at Scale: MediaMath Innovates with Scylla appeared first on ScyllaDB.

Top 5 Reasons to Choose Apache Cassandra Over DynamoDB

Overview

DynamoDB and Apache Cassandra are both very popular distributed data store technologies. Both are used successfully in many applications and production-proven at phenomenal scale. 

At Instaclustr, we live and breathe Apache Cassandra (and Apache Kafka!). We have many customers at all levels of size and maturity who have built successful businesses around Cassandra-based applications. Many of those customers have undertaken significant evaluation exercises before choosing Cassandra over DynamoDB and several have migrated running applications from DynamoDB to Cassandra. 

This blog distills the top reasons that our customers have chosen Apache Cassandra over DynamoDB.

Reason 1: Significant Cost of Write to DynamoDB

For many use cases, Apache Cassandra can offer a significant cost saving over DynamoDB. This is particularly the case of requirements that are write-heavy. The cost of write to DynamoDB is five times that cost of the read (reflected directly in your AWS bill). For Apache Cassandra, write are several times cheaper than reads (reflected in system resource usage).

Reason 2: Portability

DynamoDB is available in AWS and nowhere else. For multi-tenant SaaS offerings where only a single instance of the application will ever exist, then being all-in on AWS is not a major issue. However, many applications, for a lot of good reasons, still need to be installed and managed on a per-customer basis and many customers (often the largest ones!) will not want to run on AWS. Choosing Cassandra allows your application to run anywhere you can run a linux box.  

Reason 3: Design Without Having to Worry About Pricing Models

DynamoDB’s pricing is complex with two different pricing models and multiple pricing dimensions. Applying the wrong pricing models or designing your architecture without considering pricing can result in order of magnitude differences in costs. This also means that a seemingly innocuous change to your application can dramatically impact cost.  With Apache Cassandra, you have your infrastructure and you know your management fees, once you have completed performance testing and you know that your infrastructure can meet your requirements, you know your costs.

Reason 4: Multi-Region Functionality

Apache Cassandra was the first NoSQL technology to offer active-active multi-region support. While DynamoDB has added Global Tables, these have a couple of key limitations when compared to Apache Cassandra. The most significant in many cases is that you cannot add replicas to an existing global table. So, if you set up in two regions and then decide to add a third you need to completely rebuild from an empty table. With Cassandra, adding a region to a cluster is a normal, and fully online, operation. Another major limitation is that DynamoDB only offers eventual consistency across Global Tables, whereas Apache Cassandra’s tunable consistency levels can enforce strong consistency across multiple regions.

Reason 5: Avoiding Vendor Lock-In

Apache Cassandra is true open source software, owned and governed by the Apache Software Foundation to be developed and maintained for the benefit of the community and able to be run in any cloud or on-premise environment. DynamoDB is an AWS proprietary solution that not only locks you in to DynamoDB but also locks your application to the wider AWS ecosystem. 

While these are the headline reasons that people make the choice of Apache Cassandra over DynamoDB, there are also many advantages at the detailed functional level such as:

  • DynamoDB’s capacity is limited by partition with a maximum of 1,000 write capacity units and 3,000 read capacity units per partition. Cassandra’s capacity is distributed per node which typically provide a per-partition limit orders of magnitude higher than this.
  • Cassandra’s CQL query language provides a simple learning curve for developers familiar with SQL.
  • DynamoDB only allows single value partition and sort (called clustering in Cassandra) keys while Cassandra support multi-part keys. A minor difference but another way Cassandra reduces application complexity.
  • Cassandra supports aggregate functions which in some use cases can provide significant efficiencies.

 

The post Top 5 Reasons to Choose Apache Cassandra Over DynamoDB appeared first on Instaclustr.

Scylla Monitoring Stack 3.0.1 – Alternator Dashboard

Scylla Monitoring Stack Release Notes

The Scylla team is pleased to announce the release of Scylla Monitoring Stack 3.0.1.

Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.0 supports:

  • Scylla Open Source versions 2.3, 3.0 and 3.1
  • Scylla Enterprise versions 2018.x and 2019.x
  • Scylla Manager 1.4.x

Scylla Monitoring 3.0.1 adds support for ScyllaDB’s new Amazon DynamoDB compatible API (Project Alternator). To use the new Alternator dashboard:

  1. Upgrade Scylla Monitoring Stack to version 3.0.1
  2. Run start_all.sh with -v master option (Alternator is not part of a formal Scylla release yet).

Scylla Monitoring 3.0.1 does not fix or add any other functionality to 3.0 beyond the new dashboard.

Related Links

New Metrics in the Altenator dashboard

High-level counters:

  • scylla_alternator_total_operations – number of total operations via Alternator API

Data Plan counters:

  • scylla_alternator_operation number of requests for operations:
    • GetItem
    • PutItem
    • UpdateItem
    • DeleteItem
    • BatchWriteItem
    • Query
    • Scan

Data Plan latencies:

  • scylla_alternator_op_latency – Latency histogram for operations:
    • GetItem
    • PutItem
    • UpdateItem
    • DeleteItem

Control Plan counters:

  • scylla_alternator_operation – number of requests for operations:
    • CreateTable
    • DeleteTable
    • DescribeTable
    • ListTables
    • DescribeEndpoints

The post Scylla Monitoring Stack 3.0.1 – Alternator Dashboard appeared first on ScyllaDB.

DataStax Proxy for DynamoDB™ and Apache Cassandra™ – Preview

Yesterday at ApacheCon, our very own Patrick McFadin announced the public preview of an open source tool that enables developers to run their AWS DynamoDB™ workloads on Apache Cassandra. With the DataStax Proxy for DynamoDB and Cassandra, developers can run DynamoDB workloads on premises, taking advantage of the hybrid, multi-model, and scalability benefits of Cassandra.

The Big Picture

Amazon DynamoDB is a key-value and document database which offers developers elasticity and a zero-ops cloud experience. However, the tight AWS integration that makes DynamoDB great for cloud is a barrier for customers that want to use it on premises.

Cassandra has always supported key-value and tabular data sets so supporting DynamoDB workloads just meant that DataStax customers needed a translation layer to their existing storage engine.

Today we are previewing a proxy that provides compatibility with the DynamoDB SDK, allowing existing applications to read/write data to DataStax Enterprise (DSE) or Cassandra without any code changes. It also provides the hybrid + multi-model + scalability benefits of Cassandra to DynamoDB users.

If you’re just here for the code you can find it in GitHub and DataStax Labs: https://github.com/datastax/dynamo-cassandra-proxy/

Possible Scenarios

Application Lifecycle Management: Many customers develop on premises and then deploy to the cloud for production. The proxy enables customers to run their existing DynamoDB applications using Cassandra clusters on-prem.

Hybrid Deployments: DynamoDB Streams can be used to enable hybrid workload management and transfers from DynamoDB cloud deployments to on-prem Cassandra-proxied deployments. This is supported in the current implementation and, like DynamoDB Global Tables, it uses DynamoDB Streams to move the data. For hybrid transfer to DynamoDB, check out the Cassandra CDC improvements which could be leveraged and stay tuned to the DataStax blog for updates on our Change Data Capture (CDC) capabilities.

What’s in the Proxy?

The proxy is designed to enable users to back their DynamoDB applications with Cassandra. We determined that the best way to help users leverage this new tool and to help it flourish was to make it an open source Apache 2 licensed project.

The code consists of a scalable proxy layer that sits between your app and the database. It provides compatibility with the DynamoDB SDK which allows existing DynamoDB applications to read and write data to Cassandra without application changes.

How It Works

A few design decisions were made when designing the proxy. As always, these are in line with the design principles that we use to guide development for both Cassandra and our DataStax Enterprise product.

Why a Separate Process?

We could have built this as a Cassandra plugin that would execute as part of the core process but we decided to build it as a separate process for the following reasons:

  1. Ability to scale the proxy independently of Cassandra
  2. Ability to leverage k8s / cloud-native tooling
  3. Developer agility and to attract contributors—developers can work on the proxy with limited knowledge of Cassandra internals
  4. Independent release cadence, not tied to the Apache Cassandra project
  5. Better AWS integration story for stateless apps (i.e., leverage CloudWatch alarm, autoscaling, etc.)

Why Pluggable Persistence?

On quick inspection, DynamoDB’s data model is quite simple. It consists of a hash key, a sort key, and a JSON structure which is referred to as an item. Depending on your goals, the DynamoDB data model can be persisted in Cassandra Query Language (CQL) in different ways. To allow for experimentation and pluggability, we have built the translation layer in a pluggable way that allows for different translators. We continue to build on this scaffolding to test out multiple data models and determine which are best suited for:

  1. Different workloads
  2. Different support for consistency / linearization requirements
  3. Different performance tradeoffs based on SLAs

Conclusion

If you have any interest in running DynamoDB workloads on Cassandra, take a look at the project. Getting started is easy and spelled out in the readme and DynamoDB sections. Features supported by the proxy are quickly increasing and collaborators are welcome.

https://github.com/datastax/dynamo-cassandra-proxy/

All product and company names are trademarks or registered trademarks of their respective owner. Use of these trademarks does not imply any affiliation with or endorsement by the trademark owner.


1Often in the DynamoDB documentation, this key is referred to as a partition key, but since these are not one-to-one with DynamoDB partitions we will use the term hash key instead.

Migrating from DynamoDB to Scylla’s DynamoDB-compatible API

This week we announced our Project Alternator, open-source software that will enable application- and API-level compatibility between Scylla and Amazon DynamoDB. Available for use with Scylla Open Source, Alternator will let DynamoDB users easily migrate to an open source database that runs anywhere — on any cloud platform, on-premises, on bare-metal, virtual machines or Kubernetes — without having to change their client code.

Of course, applications operate on data, and the data that is in DynamoDB has to first be migrated to Scylla so that users can take full advantage of this new capability. Scylla already supported migrations from Apache Cassandra by means of the scylla-migrator project, an Apache Spark-based tool that we’ve previously described.

This article presents the extensions done to the Scylla Migrator to also support data movement between an existing DynamoDB installation and Scylla.The idea is simple: a cluster of Spark workers reads the source database (DynamoDB) in parallel and writes in parallel to the target database, Scylla.

Running the Migrator

The latest copy of the scylla-migrator, which already contains the DynamoDB-compatible API bindings can be found here. Always refer for latest steps to bundled README.dynamodb.md.

The first step is to build it. Make sure a recent copy of sbt is properly installed on your machine, and run build.sh.

Next, the connection properties with DynamoDB have to be configured. The migrator ships with an example file, so we can start by copying the example to a configuration file

cp config.dynamodb.yaml.example config.dynamodb.yaml

Most importantly, we will configure the source and target sections to describe our source DynamoDB cluster and target Scylla cluster:

The first thing to notice is that the target configuration is simpler; since we one will be connecting to a live Scylla cluster, not to an AWS service, there is no need to provide the access keys (for public clusters usernames, and passwords can be provided instead).

The next step is to submit the spark application. The application is submitted in a generic way, the migrator leverages spark.scylla.config variable to pass on parameters.

An Example Migration

To test and demonstrate the migrator, we have created an example table in DynamoDB and filled it with data.

Be sure to pay attention to throughput variables and number of splits and mappers. Do set them up properly based on what capacity you can use from what you’ve provisioned. The scan_segments controls the split size(how many tasks there will be, so this divides your data into smaller chunks for processing) and max_map_tasks controls maximum of map tasks for map reduce job that migrates the data.

Tuning those two variables will generally require more resources on spark worker side, so make sure you allocate them, e.g. by using spark-submit option –conf “spark.executor.memory=100G and similar for CPUs.

Since the migrator is just a spark application, the running application can be monitored using spark web UI:

Figure 1: Workers and Running Applications

Figure 2: Active Stages

Verifying the Migration

To verify the migration is successful, we can first match the count of keys from DynamoDB. In this case, we have 7,869,600 keys:

Figure 3: Table Details

When finished, the spark tasks will print messages to their logs with the number of keys that each task transferred:

Which, as we expect, sums up to 7,869,600, the same number of keys we had in the source.

We have also queried some random keys from both databases, since they now accept the same API. We can see that all keys that are randomly sampled are present in both sides:

Script:

Results:

Future work

Much like the DynamoDB-compatible API in Scylla itself, the migrator is a work in progress. In live migrations without downtime, there is usually the need to perform dual writes, a technique in which the application is temporarily and surgically changed so that new writes are sent to both databases, while the old data is scanned and written to the new database. However, DynamoDB supports a well-rounded stream interfaces, where an application can listen for changes. The streams API can be used to facilitate migrations without explicitly coding for dual writes. That is a work in progress in the migrator, expected to debut soon.

Next Steps

If you want to learn more about Project Alternator, we have a webinar that is scheduled for September 25, 10 AM Pacific, 1 PM Eastern. And you can also check out the Project Alternator Wiki.

REGISTER NOW FOR THE ALTERNATOR WEBINAR

READ THE PROJECT ALTERNATOR WIKI

The post Migrating from DynamoDB to Scylla’s DynamoDB-compatible API appeared first on ScyllaDB.

Why Developing Modern Applications Is Getting Easier

Historically, software was monolithic. In most cases, development teams would have to rewrite or rebuild an entire application to fix a bug or add a new feature. Building applications with any sense of speed or agility was largely out of the question, which is why software suites like Microsoft Office were generally released once a year.

Much has changed over the last decade or so. In the age of lightning-fast networks and instant gratification, leading software development teams are adopting DevOps workflows and prioritizing CI/CD so they can pump out stronger software releases much faster and much more frequently. 

Monthly, weekly, or even more frequent releases, for example, are becoming something closer to the norm. 

This accelerated release process is the result of the fact that—over several years—it’s become much easier to develop applications. 

Today, many engineering teams are utilizing new technologies to build better applications in less time, developing software with agility. Let’s take a look at four of the key technologies that have largely transformed the development process in recent years. 

1. Microservices

Microservices enable development teams to build applications that—you guessed it—are made up of several smaller services. 

Compared to the old-school monolithic approach, microservices speed up the development process considerably. Engineers can scale microservices independently of one another; updating or adding a feature no longer requires an entire rewrite of an application. 

Beyond that, microservices also bring more flexibility to developers. For example, developers can use their language of choice, building one service in Java and another in Node.js. 

The speed, flexibility, and agility microservices bring to the table have made it much easier to develop modern applications. Add it all up, and it comes as no surprise that a recent survey found that 91 percent of companies are using or plan to use microservices today.

2. Containers

Containers (think Docker) go hand-in-hand with microservices. Using containers, developers can create, deploy, and run applications in any environment.

At a very basic level, containers let developers “package” an application’s code and dependencies together as one unit. Once that package has been created, it can quickly be moved from a container to a laptop to a virtual server and back again. Containers enable developers to start, create, copy, and spin down applications rapidly.

It’s even easier to build modern applications with containers when you use Kubernetes to manage containerized workloads and services.

3. Open source tools

Docker and Kubernetes are both open source. So are Apache Cassandra™, Prometheus, and Grafana. There’s also Jenkins, too, which helps developers accelerate CI/CD workflows. With Jenkins, engineering teams can use automation to safely build, test, and deploy code changes, making it easier to integrate new features into any project.

Open source tools simplify the development process considerably. With open source, engineering teams get access to proven technologies that are built collaboratively by developers around the world to improve the coding process.

Not only does open source provide access to these tools, popular open source projects also have robust user communities that developers can turn to when they get stuck on something. 

4. Hybrid cloud

More and more companies are building applications in hybrid cloud environments because it enables them to leverage the best of what both the public and private cloud have to offer. 

For example, with hybrid cloud, you get the scalability of the public cloud while being able to use on-premises or private cloud resources to keep sensitive data secure (e.g., for HIPAA or GDPR compliance). What’s more, hybrid cloud also increases availability. In the event one provider gets knocked offline, application performance remains unchanged—so long as you have the right database in place.

The same sentiment holds true for multi-cloud or intercloud environments where organizations use several different cloud vendors to take advantage of each of their strengths, avoid vendor lock-in, or reduce the risk of service disruption. 

How does your development process compare?

If you’re not using microservices, containers, open source tools, and hybrid cloud environments to build applications, it’s time to reconsider your approach. 

The rise of these new technologies has given development teams the ability to pivot at a moment’s notice, incorporating user feedback to build new features and respond to incidents quickly and effectively.

Give them a try. It’s only a matter of time before you’ll start wondering why you didn’t think of it sooner.

Four Key Technologies That Enable Microservices (white paper)

READ NOW

Scylla Alternator: The Open Source DynamoDB-compatible API

About Scylla’s Alternator Project

Alternator is an open source project that gives Scylla compatibility with Amazon DynamoDB™.

Our goal is that any application written for Amazon DynamoDB could be run, unmodified, against Scylla with Alternator enabled. Originally, Scylla began as a re-implementation of Apache Cassandra, and it has since proven to be a solid database engine with key performance and TCO benefits over Cassandra. However, we always considered Cassandra to be just a starting point. Now a 5-year old project, Scylla is able to scale to hundreds of machines, petabytes of data and many regions and availability zones.

Scylla can easily run millions of operations per second at ≤1 msec latency for 99% of requests on a single node. We can map different workloads to roles and prioritize them, thus allowing users to combine analytics and real-time transactional workloads on the same cluster.

Now that we’ve met our first major goal of creating a better Cassandra (along with a fully managed cloud version of Scylla), we’ve decided it’s time to add new APIs. A DynamoDB-compatible API makes a lot of sense as the Dynamo paper played a major role in the design of DynamoDB, Cassandra and, of course, Scylla. (Aren’t we all one big competitive family?) The three databases have a high availability design and offer similar functionality even prior to this recent enhancement.

There are three key benefits Scylla’s Alternator brings to DynamoDB users:

  • Cost: DynamoDB is very convenient when consumed as a service. When you run it at small scale, it just works and you don’t have to worry about database administration. At scale, however, the sub-cents per operation charges add-up to a shockingly large amount of money. Scylla’s efficiency allows you to to use significantly fewer resources for the same task or workload. According to our benchmark, one can expect to save 80% – 93% to support the same workload (5x-14x less expensive)!
  • Performance: Scylla was implemented in modern C++ with a lot of expertise dedicated to details such as instructions per cycle, lock-free, log structured cache, heat-based load balancing, workload prioritization, userspace schedulers and much more. It all adds up to huge improvements in latency and throughput. This lets Scylla scale vertically to modern servers with hundreds of cores per machine. Even when data is not evenly balanced (“hot node” and “hot partition” problems) you can still enjoy solid performance. It’s also worth noting that Scylla does not require the expensive Cache/DAX in front of it.
  • Openness! Scylla is open source. You can run it everywhere for free. Scylla can run in any possible deployment: on-prem, hybrid cloud, multi cloud, containerized, virtualized, bare metal, etc. You won’t be locked-in to one vendor’s expensive cloud platform. Read more below.

Alternator Webinar

My co-founder Avi Kivity and I are going to give an in-depth look into Scylla Alternator via a webinar on Wednesday, September 25, 2019 at 10 AM Pacific, 1 PM Eastern. You won’t want to miss it!

REGISTER NOW FOR THE ALTERNATOR WEBINAR

Open Source First

Our goal at Scylla is to become the default NoSQL database by giving developers more and better choices for their deployments. With years of open source development experience, we are strong believers in this path and have thus decided to release our Alternator code to the world first, before an official product release. This is a proven method to get a better product out the door with constant, bidirectional feedback from knowledgeable developers.

A global project should be open, a property that all parties enjoy and benefit from. In a world where cloud providers routinely commercialize Open Source Software (OSS), leaving little space for the OSS vendor who then, in turn, begins to blur the lines of open source licenses, Scylla remains devoted to OSS. Our chosen OSS licensing model, AGPL, encourages contribution back plus provides reasonable support for the retention and longevity of the OSS business model.

It’s not just about no lock-in, by being OPEN, one can trace the system. You can analyze the query path. Scylla has wonderful observability with hundreds of different Prometheus metrics. You no longer need to assume the database is a black box. Plus, if you need a feature or see a bug and you want to contribute, you can extend or fix the code! Now, if your team doesn’t have that sort of technical depth, you can always run on Scylla Cloud, our fully-managed cloud platform. But if you do have a deep bench on your team (and many of our customers do) we give you all the tooling you need to get maximum performance out of your database.

Open source traditionally disrupts commercial vendors. Not only is disruption what we do best, truth be told we also enjoy it!

The Current Status of Alternator

Alternator is still in development. It is not yet generally available and we haven’t created a product release with it. However, it is part of the Scylla source code right now.

Our alternator.md and design doc provide detailed information of what’s supported and not yet supported today. In short, most standard applications will just work. The JSON HTTP API is mostly implemented, indexing works, multi zones are implemented, and many more features will work. There are consistency differences that arise from the fact that DynamoDB itself has a leader/follower model versus the active-active model that Scylla implemented. This may be an issue in certain cases. For anyone looking to use Alternator, we advise you to first take a close look at the documentation and your code.

Roadmap

Within the next few months we plan to harden the code, bringing it to production quality by inserting it into our robust quality assurance cycles. We will also completely implement all of the nitty gritty API differences. In the future, we will offer a Scylla Enterprise release containing the Alternator software and also release a version to run on Scylla Cloud.

Future updates to our managed DBaaS will run on Azure and GCP but you won’t have to wait. As soon as Alternator releases you will be able to run it on your own Amazon, Azure and Google Cloud instances. We also plan on having a General Availability (GA) version for our Kubernetes operator so you can fully deploy and manage a DynamoDB-compatible database wherever you wish.

We will also address the load balancing differences between Scylla and DynamoDB clients. Unlike Scylla, where the client is topology aware and can access any node, in DynamoDB the clients receive a load-balanced IP DNS translation that is Timed-to-Live (TTLed) every 4 seconds.

When Scylla’s workload prioritization feature is enabled, developers can assign more resources to crucial workloads, cap less important workloads, and so, for example, mix analytics and real time operational loads on the same cluster.

At our Scylla Summit (November 5-6, 2019) we plan to release Scylla’s first Light Weight Transaction (LWT) feature. Initially based on Cassandra’s Paxos, LWT will allow us to add consistency to the Alternator. In parallel we are working on a Raft version that has a leader/follower model and better performance.

We’re also going to announce our own streaming – Change Data Capture (CDC) feature and we may add compatibility with DynamoDB streams later on.

In another upcoming release, Scylla will introduce User Defined Functions (UDFs) and, soon after, map-reduce computation based on it. UDFs are far more efficient than serverless alternative since the functions are executed on the server.

To make the switch as easy as possible we’re working on online migration tools that will be relatively simple: just start streaming the changes from DynamoDB plus run a full scan. The Scylla Spark Migrator project will be enhanced to support the DynamoDB-compatible API.

Beyond DynamoDB Compatibility

Interested in more protocols? Drop us a github pull request or a github issue request. As Scylla looks to use our expertise beyond traditional NoSQL environments, it makes more and more sense to add new functionality and protocols. For example, we’ve already been asked (nicely) for a Redis API: [feature] add redis protocol for drop-in replacement redis and guess what? There is already a work in-progress to merge the Pedis project (“Parallel Redis” built on Seastar) into Scylla.

You can find out more and stay current with the status on the new Project Alternator home page. And for those who want more details, you can register for our upcoming webinar, all about Project Alternator.

We have longer-term plans but we’ll leave that for another day and get back to coding. If you’ve enjoyed reading this, add a github star or take us for a spin using docker-compose with step by step instructions.

READ MORE ABOUT SCYLLA’S PROJECT ALTERNATOR

The post Scylla Alternator: The Open Source DynamoDB-compatible API appeared first on ScyllaDB.