What’s Next on ScyllaDB’s Path to Strong Consistency
We explored the Raft capabilities introduced in ScyllaDB 5.2 in the blog ScyllaDB’s Path to Strong Consistency: A New Milestone. Now let’s talk about what’s coming next: Raft-based consistent and centralized topology management. Topology changes will be safe and fast: we won’t be bound by ring delay any longer, and operator mistakes won’t corrupt the cluster.
Limitations We Plan to Address
Right now, our documentation says that:
- You can’t perform more than one topology operation at a time, even when the cluster is empty
- A failure during a topology change may lead to lost writes
- Topology operations take time, even if streaming is quick
The topology change algorithm uses a sleep period configured in ring_delay to ensure that the old topology propagates through the cluster before it starts being used. During this period all reads and writes, which are based on the old topology, are expected to have quiesced. However, if this is not the case, the topology operation moves on, so it’s not impossible that a lost or resurrected write corrupts the database state.
Cassandra calls this approach “probabilistic propagation.” In most cases, by the end of ring delay the cluster is aware of the new state. Its key advantage is liveness – the algorithm operates within any subset of the original cluster. However, as discussed in the previous blog post, the price paid is the speed of information dissemination and consistency issues. And the extreme liveness characteristics are unnecessary anyway: given the scale ScyllaDB typically operates at, it’s OK to require a majority of nodes to be alive at any given time. Using Raft to propagate token ring information would permit establishing a global order of state change events, detect when concurrent operations intersect, and provide transactional semantics of topology changes.
I am very happy that the Cassandra community has also recognized the problem and there is Cassandra work addressing the same issues.
Moving Topology Data to Raft
Here’s what we achieve by moving topology data to Raft:
- Raft group includes all cluster members
- Token metadata is replicated using Raft
- No stale topology
Let’s explore in more detail.
The Centralized Coordinator
In our centralized topology implementation, we were able, to a significant extent, to address all of the above. The key change that allowed us to do the transition was a turnaround in the way topology operations are performed.
Before, the node that is joining the cluster would drive the topology change forward. If anything would happen to that node, the failure would need to be addressed by the DBA. In the new implementation, a centralized process (which we call the topology change coordinator) runs alongside the Raft cluster leader node and will drive all topology changes.
Before, the state of the topology, such as tokens, was propagated through gossip and eventually consistent. Now, the state is propagated through Raft, replicated to all nodes and is strongly consistent. A snapshot of this data is always available locally, so that starting nodes can quickly obtain the topology information without reaching out to the leader of the Raft group.
Linearizable Token Metadata
Even if nodes start concurrently, token metadata changes are now linearized. Also, the view of token metadata at each node is not dependent on the availability of the owner of the tokens. Node C is fully aware of node B tokens (even if node B is down) when it bootstraps.
Automatic Coordinator Failover
But what if the coordinator fails? Topology changes are multi-step sagas. If the coordinator dies, the saga could stall or the topology change would remain incomplete. Locks could be held blocking other topology changes. Automatic topology changes (e.g. load balancing) cannot wait for admin intervention. We need fault-toleranсe.
A new coordinator follows the Raft group0 leader and takes over
from where the previous one left off. The state of the process is
in a fault-tolerant linearizable storage. As long as a quorum is
alive, we can make progress.
Change in the Data Plane
There’s also a new feature, which we call fencing, on the read/write path. Each read or write in ScyllaDB is now signed with the current topology version of the coordinator performing the read or write. If the replica has newer, incompatible topology information, it responds with an error, and the coordinator refreshes its ring and re-issues the query to new, correct replicas. If the replica has older information, which is incompatible with the coordinator’s version, it refreshes its topology state before serving the query.
We plan to extend the fencing all the way down to ScyllaDB drivers, which can now protect their queries with the most recent fencing token they know. This will make sure that drivers never perform writes based on an outdated topology, and, for schema changes, will make sure that a write into a newly created table never fails with “no such table” because the schema didn’t propagate yet.
With fencing in place, we are able to avoid waiting for ring delays during topology operations, making them quick, more testable, and more reliable. We also resolve consistency anomalies which are not impossible during topology changes.
Summary: What to Expect in Next
So let me summarize what we plan to deliver next:
- While you can’t perform multiple topology operations concurrently, *requesting* multiple operations will now also be safe. The centralized coordinator will queue the operations, if they are compatible.
- Incorrect operations, such as removing and replacing the same node at the same time, will be rejected.
- If a node that is joining the cluster dies, the coordinator will notice and abort the join automatically.
In a nutshell, our main achievement is that topology changes will be safe and fast: we won’t be bound by ring delay any longer, and operator mistakes won’t corrupt the cluster.
ScyllaDB’s Path to Strong Consistency: A New Milestone
In ScyllaDB 5.2, we made Raft generally available and use it for the propagation of schema changes. Quickly assembling a fresh cluster, performing concurrent schema changes, updating node’s IP addresses – all of this is now possible.
And these are just the beginning of visible user changes for Raft-enabled clusters; next on the list are safe topology changes and automatically changing data placement to adjust to the load and distribution of data. Let’s focus on what’s new in 5.2 here, then we’ll cover what’s next in a follow up blog.
Why Raft?
If you’re wondering what the heck Raft is and what it has to do with strong consistency, strongly consider watching these two tech talks from ScyllaDB Summit 2022.
Making Schema Changes Safe with Raft
The Future of Consensus in ScyllaDB 5.0 and Beyond
But if you want to read a quick overview, here it goes…
Strong vs Eventual Consistency
Understanding a strongly consistent system is easy. It’s like observing two polite people talking: there is only one person talking at a time and you can clearly make sense of who talks after who and capture the essence of the conversation. In an eventually consistent system, changes such as database writes are allowed to happen concurrently and independently, and the database guarantees that eventually there will be an order in which they all line up.
Eventually consistent systems can accept writes to nodes that are partitioned away from the rest of the cluster. Strongly consistent systems require a majority of the nodes to acknowledge an operation (such as the write) in order to accept it. So, the tradeoff between strong and eventual consistency is in requiring the majority of the participants to be available to make progress.
ScyllaDB, as a Cassandra-compatible database, started as an eventually consistent system, and that made perfect business sense. In a large cluster, we want our writes to be available even if a link to the other data center is down.
Metadata Consistency
Apart from storing user data, the database maintains additional information about it, called metadata. Metadata consists of topology (nodes, data distribution) and schema (table format) information.
A while ago, ScyllaDB recognized that there is little business value in using the eventually consistent model for metadata. Metadata changes are infrequent, so we do not need to demand extreme availability for them. Yet, we want to reliably change the metadata in an automatic mode to bring elasticity, which is hard to do with the eventually consistent model underneath.
Raft for Metadata Replication
This is when we embarked on a journey that involved Raft: an algorithm and a library we implemented to replicate any kind of information across multiple nodes.
Here’s how it works.
Suppose you had a program or an application that you wanted to make reliable. One way to do that is to execute that program on a collection of machines and ensure they all execute it in exactly the same way. A replicated log can help to ensure that these state machines (programs or applications that take inputs and produce outputs) execute exactly the same commands.
A client of the system that wants to execute a command passes it to one of these machines.
That command – let’s call it X – gets recorded in the log of the local machine, and it’s then passed to the other machines and recorded in their logs as well. Once the command has been safely replicated in the logs, it can be passed to the state machines for execution. And when one of the state machines is finished executing the command, the result can be returned back to the client program.
And you can see that as long as the logs on the state machines are identical, and the state machines execute the commands in the same order, we know they are going to produce the same results. So Raft is the algorithm that keeps the replicated log identical and in sync on all cluster members. Its consensus module ensures that the command is replicated and then passed to the state machine for execution.
The system makes progress as long as any majority of the servers are up and can communicate with each other. (2 out of 3, 3 out of 5).
Again – we strongly recommend that you watch the videos above if you want more details.
The Benefits of Raft in ScyllaDB 5.2
In ScyllaDB 5.2, our first use of Raft – for propagation of schema changes – is generally available (GA). This brings you:
- –consistent-cluster-management command line and a scylla.yaml option
- A procedure to recover a cluster after a loss of a majority
- IP address change support
Let’s review each in turn.
Consistent Cluster Management
We’ve branched off ScyllaDB 5.2 with the consistent_cluster_management feature enabled for new clusters. We also implemented an online upgrade procedure to Raft for existing clusters, and machinery to recover existing clusters from a disaster.
What are the benefits of –consistent-cluster-management in ScyllaDB 5.2?
- It is now safe to perform concurrent schema change statements. Change requests don’t conflict, get overridden by “competing” requests, or risk data loss. Schema propagation happens much faster since the leader of the cluster is actively pushing it to the nodes. You can expect the nodes to learn about new schema in a healthy cluster in under a few milliseconds (used to be a second or two).
- If a node is partitioned away from the cluster, it can’t perform schema changes. That’s the main difference, or limitation, from the pre-Raft clusters that you should keep in mind. You can still perform other operations with such nodes (such as reads and writes) so data availability is unaffected. We see results of the change not only in simple regression tests, but in our longevity tests which execute DDL. There are fewer errors in the log and the systems running on Raft are more stable when DDL is running.
Going GA means that this option is enabled by default in all new clusters. We achieve this by shipping a new default scylla.yaml with our installation.
If you’re upgrading an existing ScyllaDB installation, set consistent_cluster_management: true in all your scylla.yaml files, perform a rolling restart, and Raft will be enabled cluster-wide.
# Use Raft to consistently manage schema information in the cluster.
# Refer to https://docs.scylladb.com/master/architecture/raft.html for
# more details.
consistent_cluster_management: true
You can watch the progress of the upgrade operation in scylla log. Look for raft_upgrade markers, and as soon as the upgrade finishes, the system.scylla_local key for group0_upgrade_state starts saying “use_post_raft_procedures”:
cqlsh> SELECT * FROM system.scylla_local WHERE key IN ('group0_upgrade_state', 'raft_group0_id');
key | value
----------------------+--------------------------------------
group0_upgrade_state | use_post_raft_procedures
raft_group0_id | a5e9e860-cccf-11ed-a674-c759a640dbb0
(2 rows)
cqlsh>
Support for Majority Loss
The new Raft-based support for disaster recovery provides a way to salvage data when you permanently lose the majority of your cluster. As a preface, note that a majority loss is considered a rare event and that the Raft recovery procedure should be followed only when the affected nodes are absolutely unrecoverable. Otherwise, follow the applicable node restore process for your situation. Be sure to check the Raft manual recovery procedure for additional details.
In a nutshell, we added a way to drop the state of consistent
cluster management on each node and establish it from scratch in a
new cluster. The key takeaway is that your Raft-based clusters,
even though they don’t accept schema changes in a minority, are
still usable even if your majority is permanently lost. We viewed
this as a big concern for the users who are performing upgrades
from their existing clusters. To help users see the current cluster
configuration as recorded in Raft, we added a new
system.raft_state table (it is actually a
system-generated view) registering all the current members of the
cluster:
cqlsh> SELECT * FROM system.raft_state ;
group_id | disposition | server_id | can_vote
--------------------------------------+-------------+--------------------------------------+----------
a5e9e860-cccf-11ed-a674-c759a640dbb0 | CURRENT | 44d1cfcf-26b3-40e9-bea9-99a2c639ed40 | True
a5e9e860-cccf-11ed-a674-c759a640dbb0 | CURRENT | 75c2f8c4-98fe-4b78-95fc-10fa29b0b5c5 | True
a5e9e860-cccf-11ed-a674-c759a640dbb0 | CURRENT | ad30c21d-805d-4e96-bb77-2588583160cc | True
(3 rows)
cqlsh>
Changing IPs
Additionally, we added IP address change support. To operate seamlessly in Kubernetes environments, nodes need to be able to start with existing data directories but different IP addresses. We added support for that in Raft mode as well.
If just one or a few of your node’s IP addresses are changed, you can just restart these nodes with a new IP address and this will work. There’s no need to change the configuration files (e.g. seed lists).
You can even restart the cluster with all node IPs changed. Then, of course, you need to somehow prompt the existing nodes with the new IP addresses of each other. To do so, you should update the seeds: the relevant section of your scylla.yaml with new node IPs will be used to discover the new peers and the contents of system.peers will be updated automatically.
Moving to Raft-Based Cluster Management in ScyllaDB 5.2: What Users Need to Know
As mentioned above, all new clusters will be created with Raft enabled by default from 5.2 on. Upgrading from 5.1 will use Raft only if you explicitly enable it (see the upgrade docs). As soon as all nodes in the cluster opt-in to using Raft, the cluster will automatically migrate those subsystems to using Raft (you should validate that this is the case).
Once Raft is enabled, every cluster-level operation – like updating schema, adding and removing nodes, and adding and removing data centers – requires a quorum to be executed.
For example, in the following use cases, the cluster does not have a quorum and will not allow updating the schema:
- A cluster with 1 out of 3 nodes available
- A cluster with 2 out of 4 nodes available
- A cluster with two data centers (DCs), each with 3 nodes, where one of the DCs is not available
This is different from the behavior of a ScyllaDB cluster with Raft disabled.
Nodes might be unavailable due to network issues, node issues, or other reasons. To reduce the chance of quorum loss, it is recommended to have 3 or more nodes per DC, and 3 or more DCs for a multi-DCs cluster. Starting from 5.2, Scylla provides guidance on how to handle nodetool and topology change failures.
To recover from a quorum loss, it’s best to revive the failed nodes or fix the network partitioning. If this is impossible, see the Raft manual recovery procedure.
Common Questions
To close, let me publicly address a few questions we’ve heard quite often:
How and when will ScyllaDB support multi-key transactions?
We haven’t been planning on distributed transactions yet, but we’re watching the development of the Accord algorithm with interest and enthusiasm. It’s not impossible that ScyllaDB will add transaction support following Cassandra’s footsteps if their implementation is successful.
When will we be able to do concurrent topology changes?
Concurrent topology changes is a broad term; here, I will try to break it up.
First, it implies being able to bootstrap all nodes in a fresh cluster concurrently and quickly assemble such a cluster. This is what we plan to achieve in 5.3, at least in experimental mode.
Second, it implies being able to concurrently stream data from/to multiple nodes in the cluster, including dynamic streaming data for the purposes of load balancing. This is part of our tablets effort, and it is coming after 5.3.
Finally, it’s being able to add (or remove) multiple nodes at a time. Hopefully, with the previous features in place, that won’t be necessary. You will be able to start two nodes concurrently, and they will be added to the cluster quickly and safely but serially. Later, the dynamic load balancing will take care of populating them with data concurrently.
Will Raft eventually replace Gossip?
To a large extent, but not completely.
Gossip is an epidemic communication protocol. The word “epidemic” means that a node does not communicate with its peers directly, but instead chooses one or a few peers to talk to at a given time – and it exchanges the full body of information it possesses with those peers. The information is versioned, so whenever a peer receives an incoming ping, it updates each key with a newer revision. The peers will use the updated data for its own pings, and this is how the information spreads quickly across all nodes, in an “epidemic” fashion.
The main advantage of Gossip is that it issues linear ( O(CLUSTER_SIZE) ) number of messages per round rather than quadratic ( O(CLUSTER_SIZE^2) ) which would be needed if each node would try to reach out to all other nodes at once. As a downside, Gossip takes more time to disseminate information. Typically, the new information reaches all members in logarithmic (O(log(CLUSTER_SIZE))) number of rounds.
In pre-Raft ScyllaDB, Gossip was used to disseminate information about node liveness; for example, node UP/DOWN state and the so-called application state: different node properties, such as:
- Its IP address
- The product version and supported features on that node
- The current schema version on that node
- The tokens the node owns
- CDC generations (keys used by CDC clients to fetch the latest
updates)
…
Thanks to its low overhead, Gossip is excellent for propagation of infrequently changed data (e.g., node liveness). Most of the time, all nodes are alive, so having to ping every node from every other node every second is excessive. Yet this is exactly what Gossip started doing in ScyllaDB 5.0. At some point, it was decided that epidemic failure detection, taking log(CLUSTER_SIZE) rounds on average to detect a node failure is too slow (Gossip sends its pings every second; for a large cluster, the failure detection could take several seconds). So, direct failure detection was used: every node started to ping every other node every second. To exchange the application state, Gossip still used the epidemic algorithm.
To summarize, at some point we started living in the worst of all worlds: paying the price of quadratic (O(N^2) ) number of messages for failure detection, and still having slow propagation of key application data, such as database schema.
Even worse, Raft requires its own kind of failure detection – so in 5.2, another direct failure detector was implemented in addition to Gossip. It also sends O(N^2) pings every 100 ms.
Then, the journey of moving things out of application state to Raft began. In 5.2 we started with the schema version. For 5.3, we have moved out the tokens, and are working on CDC generations and cluster features.
Hopefully, eventually almost no application data will be propagated through Gossip. But will it allow us to get rid of the Gossip subsystem altogether?
The part of Gossip that pings other nodes and checks their liveness is used by Raft itself, so it will stay. The part that disseminates some of the most basic node information, such as its IP address and supported features, may also have to stay in Gossip. We would like to be able to exchange this data even before the node joins the cluster and Raft becomes available. Gossip is one of the first subsystems initialized at node start and, as such, it is fundamental to bootstrap. Our goal here at ScyllaDB is to make sure it is used efficiently and doesn’t stand in the way of fast cluster assembly.
Is there any API/tooling/nodetool command to infer and manage Raft, such as discovering which node is the active leader, which nodes are unreachable or managing Raft state?
Raft is internal to ScyllaDB and does not need to be explicitly managed. All the existing nodetool commands continue to work and should be used for topology operations. We currently do not expose the active leader. With tablets, each one will have its own Raft leader. The state of the failure detection is available as before in nodetool status. For detailed procedure nodetool failure handling, I recommend looking at our manual.
ScyllaDB Open Source 5.2: With Raft-Based Schema Management
ScyllaDB Open Source 5.2 is now available. It is a feature-based minor release that builds on the ScyllaDB Open Source 5.0 major release from 2022. ScyllaDB 5.2 introduces Raft-based strongly consistent schema management, DynamoDB Alternator TTL, and many more improvements and bug fixes. It also resolves over 100 issues, bringing improved capabilities, stability and performance to our NoSQL database server and its APIs.
DOWNLOAD SCYLLADB OPEN SOURCE 5.2 NOW
In this blog, we’ll share the TL;DR of what the move to Raft cluster management means for our users (two more detailed Raft blogs will follow), then highlight additional new features that our users have been asking about most frequently. For the complete details, read the release notes.
Strongly Consistent Schema Management with Raft
Consistent Schema Management is the first Raft based feature in ScyllaDB, and ScyllaDB 5.2 is the first release to enable Raft by default. It applies to DDL operations that modify the schema – for example, CREATE, ALTER, or DROP for KEYSPACE, TABLE, INDEX, UDT, MV, etc.
Unstable schema management has been a problem in past Apache Cassandra and ScyllaDB versions. The root cause is the unsafe propagation of schema updates over gossip: concurrent schema updates can lead to schema collisions.
Once Raft is enabled, all schema management operations are serialized by the Raft consensus algorithm. Quickly assembling a fresh cluster, performing concurrent schema changes, updating node’s IP addresses – all of this is now possible with ScyllaDB 5.2.
More specifically:
- It is now safe to perform concurrent schema change statements. Change requests don’t conflict, get overridden by “competing” requests, or risk data loss. Schema propagation happens much faster since the leader of the cluster is actively pushing it to the nodes. You can expect the nodes to learn about new schema in a healthy cluster in under a few milliseconds (used to be a second or two).
- If a node is partitioned away from the cluster, it can’t perform schema changes. That’s the main difference, or limitation, from the pre-Raft clusters that you should keep in mind. You can still perform other operations with such nodes (such as reads and writes) so data availability is unaffected. We see results of the change not only in simple regression tests, but in our longevity tests which execute DDL. There are fewer errors in the log and the systems running on Raft are more stable when DDL is running.
For more details, see the related Raft blog.
Moving to Raft-Based Clusters: Important User Impacts
Starting with the 5.2 release, all new clusters will be created with Raft enabled by default from 5.2 on. Upgrading from 5.1 will use Raft only if you explicitly enable it (see the upgrade docs). As soon as all nodes in the cluster opt-in to using Raft, the cluster will automatically migrate those subsystems to using Raft (you should validate that this is the case).
Once Raft is enabled, every cluster-level operation – like updating schema, adding and removing nodes, and adding and removing data centers – requires a quorum to be executed. For example, in the following use cases, the cluster does not have a quorum and will not allow updating the schema:
- A cluster with 1 out of 3 nodes available
- A cluster with 2 out of 4 nodes available
- A cluster with two data centers (DCs), each with 3 nodes, where one of the DCs is not available
This is different from the behavior of a ScyllaDB cluster with Raft disabled.
Nodes might be unavailable due to network issues, node issues, or other reasons. To reduce the chance of quorum loss, it is recommended to have 3 or more nodes per DC, and 3 or more DCs, for a multi-DCs cluster. To recover from a quorum loss, it’s best to revive the failed nodes or fix the network partitioning. If this is impossible, see the Raft manual recovery procedure.
The docs provide more details on handling failures in Raft.
TTL for ScyllaDB’s DynamoDB API (Alternator)
ScyllaDB Alternator is an Amazon DynamoDB-compatible API that allows any application written for DynamoDB to be run, unmodified, against ScyllaDB. ScyllaDB supports the same client SDKs, data modeling and queries as DynamoDB. However, you can deploy ScyllaDB wherever you want: on-premises, or on any public cloud. ScyllaDB provides lower latencies and solves DynamoDB’s high operational costs. You can deploy it however you want via Docker or Kubernetes, or use ScyllaDB Cloud for a fully-managed NoSQL database-as-a-service solution.
In ScyllaDB 5.0, we introduced Time To Live (TTL) to Alternator as an experimental feature. In ScyllaDB 5.2, we promoted it to production ready. Like in DynamoDB, Alternator items that are set to expire at a specific time will not disappear precisely at that time but only after some delay. DynamoDB guarantees that the expiration delay will be less than 48 hours (though for small tables, the delay is often much shorter). In Alternator, the expiration delay is configurable – it defaults to 24 hours but can be set with the –alternator-ttl-period-in-seconds configuration option.
MORE ON SCYLLADB AS A DYNAMODB ALTERNATIVE
Large Collection Detection
ScyllaDB has traditionally recorded large partitions, large rows, and large cells in system tables so they can be identified and addressed. Now, it also records collections with a large number of elements since they can also cause degraded performance.
Additionally, we introduced a new configurable warning threshold:
compaction_collection_elements_count_warning_threshold – how many elements are considered a “large” collection (default is 10,000 elements).
The information about large collections is stored in the large_cells table, with a new collection_elements column that contains the number of elements of the large collection.
Automating Away the gc_grace_seconds Parameter
Optional automatic management of tombstone garbage collection, replacing gc_grace_seconds, is now promoted from an experimental feature to production ready. This drops tombstones more frequently if repairs are made on time, and prevents data resurrection if repairs are delayed beyond gc_grace_seconds. Tombstones older than the most recent repair will be eligible for purging, and newer ones will be kept. The feature is disabled by default and needs to be enabled via ALTER TABLE.
For example:
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
MORE ON REPAIR BASED TOMBSTONE GARBAGE COLLECTION
Preventing Timeouts When Processing Long Tombstone Sequences
Previously, the paging code required that pages had at least one row before filtering. This could cause an unbounded amount of work if there was a long sequence of tombstones in a partition or token range, leading to timeouts. ScyllaDB will now send empty pages to the client, allowing progress to be made before a timeout. This prevents analytics workloads from failing when processing long sequences of tombstones.
Secondary Index on Collections
Secondary indexes can now index collection columns. This allows you to index individual keys as well as values within maps, sets, and lists.
For example:
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id)); CREATE INDEX ON test(keys(somemap)); CREATE INDEX ON test(values(somemap)); CREATE INDEX ON test(entries(somemap)); CREATE INDEX ON test(values(somelist)); CREATE INDEX ON test(values(someset)); CREATE INDEX IF NOT EXISTS ON test(somelist); CREATE INDEX IF NOT EXISTS ON test(someset); CREATE INDEX IF NOT EXISTS ON test(somemap); SELECT * FROM test WHERE someset CONTAINS 7; SELECT * FROM test WHERE somelist CONTAINS 7; SELECT * FROM test WHERE somemap CONTAINS KEY 7; SELECT * FROM test WHERE somemap CONTAINS 7; SELECT * FROM test WHERE somemap[7] = 7;
Additional Improvements
The new release also introduces numerous improvements across:
- CQL API
- Amazon DynamoDB Compatible API (Alternator)
- Correctness
- Performance and stability
- Operations
- Deployment and installations
- Tools
- Configuration
- Monitoring and tracing
For complete details, see the release notes.
The State of NoSQL: Trends & Tradeoffs
If you’re hitting performance and scalability plateaus with SQL and are considering a migration to NoSQL, the recent discussion hosted by TDWI is a great place to get up to speed, fast. ScyllaDB VP of Product Tzach Livyatan was invited to join TDWI Senior Research Director James Kobielus to explore some of the most common NoSQL questions that they’ve been hearing recently. You can watch the complete video here:
Here are a few highlights from the NoSQL trends and tradeoffs discussed.
What’s driving wide column NoSQL database adoption (e.g., Cassandra, Bigtable, ScyllaDB?
- What we were calling “Big Data” 10 or 15 years ago has become bigger and bigger. The number of data-generating sensors and devices is only increasing, and people need to store all this data in a way that allows them to make it valuable.
- Databases and many other systems are becoming much more distributed. You have customers all around the world, and all of them want low latency, all of them want fast responses. As a result, splitting the data between countries or regions is often critical, but it’s not simple. You might partition the data between countries and have each country access only its own data. Someone might have an account in the US, but want to access their account while traveling in Europe. You would need global synchronization between all these regions, but it doesn’t always have to be completely synchronous. Asynchronous might be fine in many cases.
- We’ve also noticed a significant increase in organizations with use cases that require low latency – across domains like gaming, IoT, and media streaming. Wide column databases have proven to be a very effective way to meet low latency expectations, especially at high throughputs.
Learn more about wide column databases
What are some of the tradeoffs of NoSQL?
There are probably thousands of different NoSQL databases out there, all with slightly different approaches to negotiating tradeoffs. NoSQL started as a relaxation from the model of what used to be known as a relational database (RDBMS) – Oracle, MySQL, and such. Some properties were relaxed and some properties were gained. For example, transactions were relaxed – NoSQL does not support the same level of transactions as RDBMS. But, you gain more availability and more distribution of data. Some databases relax the schema (for example, MongoDB a document database) to gain some more flexibility from the application side. There are many tradeoffs in NoSQL: distribution, transactions, latency, scale, flexibility, and more. Different NoSQL databases take different positions on these tradeoffs.
The CAP theorem is another big tradeoff. If you want your system to always be available, you have to give up on another property, which is full consistency. Imagine that you have two servers in two data centers, located in two different regions. What happens if one goes down and you can’t guarantee that the data is the same on both servers? In this case, databases that prioritize consistency limit availability – they prevent reads and writes that they cannot guarantee will be consistent. But databases that prioritize availability will continue serving reads and writes, even if the two servers might not be in sync for a short amount of time. In many cases, this is fine. Consider your cable or your Netflix – you always want to have some service, even if in some rare cases you’re not getting the latest update from your watch history.
How does NewSQL (Distributed SQL) compare with NoSQL? Is the distinction between SQL and NoSQL blurring?
When NoSQL was born, many requirements were relaxed to gain availability – for example, transaction support and full SQL (with JOINs, etc.) . But, in recent years, some databases actually managed to merge distribution and, to some extent, availability – plus also support for transactions and full SQL.
However, if you do want to support SQL, you must pay for it – usually in latency. This latency comes from the fact that you need to send a lot of messages between the distributed system nodes to reach a consensus, and all these messages require processing, which impacts latency. The end result is that these databases – some of which are quite cool – still cannot compete with NoSQL’s excellent performance. But, you gain things like support for transactions. Again, it’s a tradeoff. PACELC, an extension to the CAP theorem, formalizes this tradeoff.
Will a Kubernetes-based cloud native orchestration model become standard for databases in the next few years?
The trend is obvious: Everyone is moving to Kubernetes, and now databases are finally joining the party. Kubernetes was created and designed largely for stateless applications. When using stateful applications, it’s still a challenge to some extent, but the market has spoken and everyone is moving to Kubernetes.
For a database that focuses on performance, it can be a challenge to retain that level of performance and low latency on Kubernetes because it adds latency – as does any kind of abstraction layer that you add to the system. It’s certainly not trivial; but we’ve worked on it quite a bit at ScyllaDB and we’ve shown that it is possible.
Top Mistakes with ScyllaDB: Intro & Infrastructure
If you have use cases that require high throughput and predictable low latency, there is a good chance that you will eventually stumble across ScyllaDB (and – hopefully – that you landed here!) whether you come from another NoSQL database or are exploring alternatives to your existing relational model. No matter which solution you end up choosing, there is always a learning curve, and chances are that you may stumble upon some roadblocks which could have been avoided.
Although it is very easy to get up to speed with ScyllaDB, there are important considerations and practices to follow before getting started. Fairly often, we see users coming from a different database using the very same practices they learned in that domain – only to find out later that what worked well with their previous database might inhibit their success with another.
ScyllaDB strives to provide the most seamless user experience when using our “monstrously fast and scalable” database. Under the hood, we employ numerous advancements to help you achieve unparalleled performance – for example, our state-of-the-art I/O scheduler, self-tuning scheduling groups, and our close-to-the-metal shared-nothing approach. And to benefit from all the unique technology that ScyllaDB has to offer, there are some considerations to keep in mind.
Despite all the benefits that ScyllaDB has to offer, it is fundamentally “just” a piece of software (more precisely, a distributed database). And, as with all complex software, it’s possible to take it down paths where it won’t deliver the expected results. That’s why we offer extensive documentation on how to get started. But we still see many users stumbling at critical points in the getting started process. So, we decided to create this blog series sharing very direct and practical advice for navigating key decisions and tradeoffs and explaining why certain moves are problematic.
This blog series is written by field individuals who have spent years helping thousands of users – just like you – daily and seeing the same mistakes being made over and over again. Some of these mistakes might cause issues immediately, while others may come back to haunt you at some point in the future. It’s not a full-blown guide on ScyllaDB troubleshooting (that would be another full-fledged documentation project). Rather, it aims to make your landing painless so that you can get the most out of ScyllaDB and start off on a solid foundation that’s performant as well as resilient.
We’ll be releasing new blogs in this series (at least) monthly, then compile the complete set into a centralized guide. If you want to get up to speed FAST, you might also want to look at these tips and tricks.
What are you hoping to get out of ScyllaDB?
Mistake: Assuming that one size fits all
Before we dive into considerations, recommendations, and mistakes, it’s helpful to take a step back and try to answer a simple question: What are you hoping to get out of ScyllaDB? Let’s unravel the deep implications of such a simple question.
Fundamentally, a database is a very boring computer program. Its purpose is to accept input, persist your data, and (whenever necessary) return an output with the requested data you’re looking for. Although its end goal is simple, what makes a database really complex is how it will fit your specific requirements.
Some use cases prioritize extremely low read latencies, while others focus on ingesting very large datasets as fast as possible. The “typical” ScyllaDB use cases land somewhere between these two. To give you an idea of what’s feasible, here are a few extreme examples:
- A multinational cybersecurity company manages thousands of ScyllaDB clusters with many nodes, each with petabytes of data and very high throughput.
- A top travel site is constantly hitting millions of operations per second.
- A leading review site requires AGGRESSIVELY low (1~3ms) p9999 latencies for their ad business.
However, ultra-low latencies and high throughput are not the only reason why individuals decide to try out ScyllaDB. Perhaps:
- Your company has decided to engage in a multi-cloud strategy, so you require a cloud-agnostic / vendor-neutral database that scales across your tenants.
- Your application simply has a need to retain and serve Terabytes or even Petabytes of data.
- You are starting a new small greenfield project that does not yet operate at “large scale,” but you want to be proactive and ensure that it will always be highly available and that it will scale along with your business.
- Automation, orchestration, and integration with Kubernetes, Terraform, Ansible, (you name it) are the most important aspects for you right now.
These aren’t theoretical examples. They are all based on real-world use cases of organizations using ScyllaDB in production. We’re not going to name names, but we’ve recently worked closely with:
- An AdTech company using ScyllaDB Alternator (our DynamoDB-compatible API) to move their massive set of products and services from AWS to GCP.
- A blockchain development platform company using ScyllaDB to store and serve Terabytes of blockchain transactions on their on-premises facilities.
- A company that provides AI-driven maintenance for industrial equipment; they anticipated future growth and wanted to start with a small but scalable solution.
- A risk scoring information security company using ScyllaDB because of how it integrates with their existing automations (Terraform and Ansible build pipeline).
Just as there are many different reasons for considering ScyllaDB, there is no single formula or configuration that will fit everyone. The better you understand your use case needs, the easier your infrastructure selection path is going to be. That being said, we do have some important recommendations to guide you before you begin your journey with us.
Infrastructure Selection
Selecting the right infrastructure is complex, especially when you try to define it broadly. After all, ScyllaDB runs on a variety of different environments, CPU architectures, and even across different platforms.
Deciding where to run ScyllaDB
Mistake: Departing from the official recommendations without a compelling reason and awareness of the potential impacts
As a quick reference, we recommend running ScyllaDB on storage-optimized instances with local disks, such as (but not limited to) AWS i4i and i3en, GCP N2 and N2D with local SSDs, and Azure Lsv3 and Lasv3-series. These are well-balanced instance types with production-grade CPU:Memory:Disks ratios.
Note that ScyllaDB’s recommendations are not hard requirements. Many users have been able to meet their specific expectations while running ScyllaDB beyond our official recommendations (e.g., on network-attached or hard drive disks, within their favorite container orchestration solution of choice, with 1G of memory, under slow and unreliable network links…). The possibilities are endless.
Mistake: Starting with too much, too soon
ScyllaDB is very flexible. It will run on almost any relatively recent hardware (in containers, in your laptop, in your Raspberry Pi) and it supports any topology you require. We encourage you to try ScyllaDB in a small under-provisioned environment before you take your workload on to production. When the time comes and you are comfortable moving to the next stage, be sure to consider our deployment guidelines. This will not only improve your performance, but also save you frustration, wasted time, and wasted money.
Planning your cluster topology
Mistake: Putting all your eggs in a single basket
We recommend that you start with at least 3 nodes per region, and that your nodes have a homogeneous configuration. The reason for 3 nodes is fairly simple: It is the minimum number required in order to provide a quorum and to ensure that your workload will continuously run without any potential data loss in the event of a single node failure.
The recommendation of a homogeneous configuration requires a bit more of an explanation. ScyllaDB is a distributed database where all nodes are equal. There is no concept of primary, nor standby. Therefore, ScyllaDB nodes are leaderless. An interesting aspect of a distributed system is the fact that it will always run as fast as its slowest replica. Having a heterogeneous cluster may easily introduce cluster-wide contention if you happen to overwhelm your “less capable” replicas.
Your node placement is equally important. After all, no one wants to be woken up in the middle of the night because your highly available database became totally inaccessible due to something simple (e.g., a power outage occurred, a network link went down). When planning your topology, ensure that your nodes are colocated in a way that ensures a quorum during a potential infrastructure outage. For example, for a 3 node cluster, you will want each node to be placed in a separate availability zone. For a 6 node cluster, you will want to keep the same 3 availability zones, and ensure that you have 2 nodes in each.
Planning for the inevitable failure
Mistake: Hoping for the best instead of planning for the worst
Infrastructure fails, and there is nothing that our loved database – or any database – can do to prevent external failures from happening. It is simply out of our control. However, you DO have control over how much the inevitable failure will impact your workload.
For example, a single node failure in a 3 node cluster means that you have temporarily lost around 33.5% of your processing power, which may or may not be acceptable for your use case. That same single failure in a 6 node cluster would mean roughly a 17% compute loss.
While having many smaller nodes spreads out compute and data distribution, it also increases the probability of hitting infrastructure failures. The key here is to ensure that you keep the right balance: You might not want to rely solely on a few nodes, nor on too many.
Keeping it balanced as you scale out
Mistake: Topology imbalances
When you feel it is time to scale out your topology, ensure that you do so in increments that are a multiple of 3. Otherwise, you will introduce a scenario known as imbalance into your cluster. An imbalance is essentially a situation where some nodes of your cluster take more requests and own more data than others and thus become heavily loaded. Remember the homogeneous recommendation? Yes, it also applies to the way you place your instances. We will explain more about how this situation may happen when we start talking about replication strategies in a later blog.
On-prem considerations
Mistake: Resource sharing
Although the vast majority of ScyllaDB deployments are on the cloud (and if that’s your case, take a look at ScyllaDB Cloud), we understand that many organizations may need to deploy it inside their own on-premise facilities. If that’s your case – and high throughput and low latencies are a key concern – be aware that you want to avoid noisy neighbors by all means. Similarly, you want to avoid overcommitting compute resources, such as CPU and memory, with other guests. Instead, ensure that you dedicate these to your database from your hypervisor’s perspective.
Infrastructure specifics
All right! Now that we understand the basics, let’s discuss some specifics, including numbers, for core infrastructure components – starting with the CPU.
CPUs
Mistake: Per core throughput is static
As we explained in our benchmarking best practices and in our sizing article, we estimate a single physical core to deliver around 12,500 operations per second after replication, considering a payload of around 1KB. The actual number you will see depends on a variety of factors, such as the CPU make and model, the underlying storage bandwidth and IOPS, the application access patterns and concurrency, the expected latency goals, and the data distribution – among others.
Note that we said a single physical core. If you are deploying ScyllaDB in the cloud, then you probably know that vCPU (virtual CPU) terminology is used when describing instance types. ARM instances – such as the AWS Graviton2 – define a single vCPU as a physical core since there is no concept of Simultaneous Multithreading (SMT) for this processor family. Conversely, on Intel-based instances, a vCPU is a single physical core thread. Therefore, a single physical core on SMT-enabled processors is equivalent to 2 vCPUs in most cloud environments.
Also note the after replication mention. With a distributed database, you expect that your data is highly available and durable via replication. This means that every write operation that you issue against the database requires data replication to its peer replicas. As a result, depending on your replication settings, the effective throughput per core that you will be able to achieve from the client-side perspective (before replication) will vary.
Memory
Mistake: Being too conservative with memory
The memory allocated for ScyllaDB nodes is also important. In general, there is no such thing as “too much memory.” Here, the primary considerations are how much data you are planning to store per node and what your latency expectations are.
ScyllaDB implements its own specialized internal cache. That means that our database does not require complex memory tuning, and that we do not rely on the Linux kernel caching implementation. This allows us to have visibility into how many rows and partitions are being served from the cache, and which reads have to go to disk before being served. It also allows us to have fine-grained control over cache evictions, when and how flushes to disk should happen, prioritize some requests over others, and so on. As a result, the more memory allocated to the database, the larger ScyllaDB’s cache will be. This results in fewer round trips to disk, and thus lower latencies.
We recommend at least 2GB of memory per logical core, or 16GB in total (whichever number is higher). There are some important aspects around these numbers that you should be aware of.
As part of ScyllaDB’s shared-nothing architecture, we statically partition the process available memory to its respective CPUs. As a result, a virtual machine with 4 vCPUs and 16GB of memory will have a bit less than 4GB memory allocated per core. It is a bit less because ScyllaDB will always leave some memory for the Operating System to function. On top of that, ScyllaDB also shards its data per vCPU, which essentially means that – considering the same example – your node will have 4 shards, each one with roughly 4GB of memory.
Mistake: Ignoring the importance of swap
We recommend that you DO configure swap for your nodes. This is a very common oversight and we even have a FAQ entry for it. By default, ScyllaDB will mlock() when it starts, a system call to lock a specified memory range, preventing it from swapping. The swap space recommendation helps you avoid the system running out of memory.
Disk/RAM ratios
Mistake: Ignoring or miscalculating the disk/RAM ratio
Considering an evenly spread data distribution, you can safely store a ratio of up to 100:1 Disk/RAM. Using the previous example again, with 16GB of memory, you can store around ~1.6TB of data in that node. That means that every vCPU can handle about ~400GB of data.
It is important for you to understand the differences here. Even though you can store ~1.6TB node-wise, it does not mean that a single vCPU alone will be able to handle the same amount of data. Therefore, if your use case is susceptible to imbalances, you may want to take that into consideration when analyzing your vCPU, Memory, and Data Set sizes altogether.
But what happens if we get past that ratio? And why does that ratio even exist? Let’s start with the latter. The ratio exists because ScyllaDB needs to cache metadata in order to ensure that disk lookups are done efficiently. If the core’s memory ends up with no room to store these components, or if these components take a majority of the RAM space allocated for the core, then we get to the answer of the former question: Well, what will happen is that your node may not be able to allocate all the memory it needs, and strange (bad) things may start to happen.
In case you are wondering: Yes, we are currently investigating approaches to ease up on that limit. However, it is important that you understand the reasoning behind it.
One last consideration concerning memory: You do not necessarily want to live on the bleeding-edge. In fact, we recommend that you do not. The actual Disk/RAM ratio will highly depend on the nature of your use case. That is, if you require constant low p99 latencies and your use case frequently reads from the cache (hot data), then smaller ratios will benefit you the most. The higher your ratio gets, the less cache space you will have. We consider a 30:1 Disk/RAM ratio to be a good starting point.
Networking
Mistake: Overlooking the importance of networking
The larger your deployment gets, the more network-intensive it will be. As a rule of thumb, we recommend starting with a network speed of 10 Gbps or more. This is commonly available within most deployments nowadays.
Having enough networking bandwidth not only impacts application latency. It is also important for communication within the database. As a distributed database, ScyllaDB nodes require constant communication to propagate and exchange information, and to replicate data among its replicas. Moreover, maintenance activities such as replacing nodes, topology changes, and performing repairs and backups can become very network-intensive as these require a good amount of data transmission.
Here’s a real-life example. A MarTech company reported elevated latencies during their peak hours, even though their cluster resources had no apparent contention. Upon investigation, we realized that the ScyllaDB cluster was behind a 2 Gbps link, and that the amount of network traffic generated during their peak hours was enough for some nodes to max out their allocated bandwidth. As a result, the entire cluster’s speed hit a wall.
Avoid hitting a wall when it comes to networking, and be sure to also extend that recommendation to your application layer. If – and when – network contentions are introduced anywhere in your infrastructure, you’ll pay the price in the form of latency due to queueing.
What else?
We are not close to being done yet; there is still a long way to go!
You have probably noticed that we skipped one rather important infrastructure component: storage. This is likely the infrastructure component that most people will have trouble with. Storage plays a crucial role with regard to performance. In the next blog, we will discuss disk types, best practices, common misconceptions, filesystem types, RAID setups, ScyllaDB tunable parameters, and what makes disks so special in the context of databases.
Pro Tip: Felipe will be sharing top data modeling mistakes in our upcoming NoSQL Data Modeling Masterclass. We welcome you to join – and bring your toughest questions!
Tutorial: Spring Boot & Time Series Data in ScyllaDB
The following tutorial walks you through how to use Spring Boot apps with ScyllaDB for time series data, taking advantage of shard-aware drivers and prepared statements.
It’s based on a new ScyllaDB University (self-paced free training) lab. Some parts are omitted for brevity. To see the full lab, visit ScyllaDB University. Also, If you take this lab and the entire course on ScyllaDB University , you can get credit for it – and also access dozens of other lessons, organized across role-based paths (e.g., for developers, architects, and DBAs).
TAKE THE LAB IN SCYLLADB UNIVERSITY
About this Spring Boot & Time Series Data Tutorial
This tutorial provides a step-by-step demonstration of how to use the popular Spring Boot framework to build a sample stock application; ScyllaDB is used to store the stock price (time series data). The application has several APIs that support the create, get, and delete operations on stocks with the date and price. Additionally, an HTTP API tool, Postman, is used to interact with our application and to test the API functionality.
By the end of the tutorial, you’ll have a running Spring Boot app that serves as an HTTP API server with ScyllaDB as the underlying data storage. You’ll be able to create stocks with prices and dates and get a list of stock items. And you’ll learn how ScyllaDB can be used to store time series data.
Note that ScyllaDB University offers a number of videos with additional background that’s helpful for this lesson. For example:
- A close look at the data model used in this application
- How to connect Spring to the ScyllaDB Cluster.
- How to test the application via Postman
- Options for creating a UI
- What should be done differently for a real app in production
There’s also a video where you can watch a ScyllaDB engineer “live coding” with Spring Boot.
A Quick Introduction to Spring Boot
Spring Boot is an open-source micro framework maintained by a company called Pivotal. It provides Java developers with a platform to get started with an auto-configurable production-grade Spring application. With it, developers can get started quickly without wasting time preparing and configuring their Spring application.
Spring Boot is built on top of the Spring framework and has many dependencies that can be plugged into the Spring application. Some examples are Spring Kafka, Spring LDAP, Spring Web Services, and Spring Security. However, developers have to configure each building brick using many XML configuration files or annotations.
A Quick Introduction to Postman
Postman is an API tool for building and using APIs. It simplifies each step of the API lifecycle and streamlines collaboration so you can create better APIs. It enables you to easily explore, debug, and test your APIs while also enabling you to define complex API requests for HTTP, REST, SOAP, GraphQL, and WebSockets.
The API client automatically detects the language of the
response, links, and format text inside the Body to make inspection
easy. The client also includes built-in support for authentication
protocols like OAuth 1.2/2.0, AWS Signature, Hawk, and many
more.
Through the API client, you can organize requests into Postman
Collections to help you organize your requests for reuse so you
don’t waste time building everything from scratch. Your collections
can also contain JavaScript code to tie requests together or
automate common workflows, and you can use
scripting to visualize
your API responses as charts and graphs.
Setup a ScyllaDB Cluster
You can run this lab with a cluster created using ScyllaDB Cloud or Docker. The following steps use Docker.
If you choose to run the cluster on ScyllaDB Cloud, skip the docker-compose section part below.
Get the sample project code from the GitHub repo and start a three-node ScyllaDB cluster using docker-compose:
git clone https://github.com/scylladb/scylla-code-samples
cd scylla-code-samples/spring/springdemo-custom
docker-compose -f docker-compose-spring.yml up -d
Wait one minute or so and check that the cluster is up and running:
docker exec -it scylla-node1 nodetool status
Time series data model
You can learn more about compaction strategies here and specifically about TWCS in this lesson.
Once the cluster status is ready, create the data model using cqlsh for the stock application. You can also see the data schema in the sample repo.
docker exec -it scylla-node1 cqlsh
CREATE KEYSPACE IF NOT EXISTS springdemo WITH replication = {'class':'NetworkTopologyStrategy', 'replication_factor':3} AND durable_writes = false;
CREATE TABLE IF NOT EXISTS springdemo.stocks (symbol text, date timestamp, value decimal, PRIMARY KEY (symbol, date)) WITH CLUSTERING ORDER BY (date DESC);
ALTER TABLE springdemo.stocks WITH compaction = { 'class' : 'TimeWindowCompactionStrategy', 'compaction_window_unit' : 'DAYS', 'compaction_window_size' : 31 };
ALTER TABLE springdemo.stocks WITH default_time_to_live = 94608000;
Exit
The above creates a keyspace called springdemo and sets the replication to 3, equal to the cluster size. The table stocks contains the columns symbol, date, and value to store the time series stock data. It also sets the table stocks compaction policy to TWCS with 31 days as the compaction window.
Also, since it is assumed you are only interested in the last
three years of quotes, the TTL is
set (in seconds) for the table.
Time-Window Compaction Strategy compacts SSTables within each time
window using the
Size-tiered Compaction Strategy (STCS). SSTables from different
time windows are never compacted together. You set the
TimeWindowCompactionStrategy parameters when you create a table
using a CQL command.
Time-Window Compaction Strategy (TWCS)
It works as follows:
- A time window is configured. The window is determined by the compaction window size compaction_window_size and the time unit (compaction_window_unit).
- SSTables created within the time window are compacted using Size-tiered Compaction Strategy (STCS).
- Once a time window ends, take all SSTables created during the time window and compact the data into one SSTable.
- The final resulting SSTable is never compacted with other time-windows SSTables.
If the time window were for one day, at the end of the day, the SSTables accumulated for that day only would be compacted into one SSTable.
Build the App With the Example Code
‘The spring boot application has a default configuration file in the /spring/springdemo-custom/src/main/resources/ directory named application.yml. In it, you can add custom config settings such as the ScyllaDB nodes IP address and port, credentials, and the name of your data center (DC1 by default).
Edit application.yml with your system settings.
Then, build the application:
./gradlew build
Launch the spring boot application:
java -jar ./build/libs/springdemo-custom-0.0.1-SNAPSHOT.jar
The application runs as an HTTP server and, by default, listens to port 8082.
Test the Application’s API
Download the Postman app from https://www.postman.com/downloads/, and install it to your dev environment. You can import an existing API definition into Postman. API definitions can be imported from a local file or directory, a URL, raw text, a code repository, or an API gateway.
This is the API definition file named /spring/springdemo-custom/Springdemo-custom.postman_collection.json:
Launch the app and import it by clicking Collections, Import, File, and Upload Files. This file contains the stock application API definition.
After importing the JSON file, you will see a list of collections used to interact with the stock application’s API with predefined data requests as an example. The first two are used to send HTTP POST requests to create the AAPL and MSFT stocks with a stock value and date. After creating the stocks, you can then use the list API to get a list of stocks. All the stock data is stored in the created ScyllaDB table stocks.
Click Body to see the Post data to be sent through the HTTP API to the Java sample application. You can see that the JSON includes the symbol, date, and value items. Note that if your Java sample application isn’t running in localhost, you need to replace localhost with the actual IP on which your application is running. Click Send to use Postman to send the HTTP POST request to the application as below.
This is the response after you send the request to create the APPL stock:
Next, select create stock MSFT and click the send button to create the stock.
Summary
In this lab, you built an application with the Spring Boot framework that stores time series stock data to ScyllaDB. It uses the TimeWindowCompactionStrategy. Check out the full lesson on ScyllaDB University; it’s more detailed, offers additional resources, and contains an extra part where you can see the data by querying the ScyllaDB cluster.
You can discuss this lab on the ScyllaDB community forum.
Enabling DataStax Astra DB Data Searchability with Elasticsearch
For over a decade, developers and enterprise IT managers have relied on Apache Cassandra® for its unmatched at-scale read and write performance. However, manipulating this data at scale has often proven to be a challenge, owing to Cassandra’s log-style, row-oriented architecture.To address this,...How Numberly Replaced Kafka with a Rust-Based ScyllaDB Shard-Aware Application
At Numberly, the Omnichannel Delivery team owns all the types of messages we support and operate for our clients, from the well-known and established email, to the still emerging Rich Communication Services (RCS) – and don’t forget the over-the-top (OTT) platforms such as WhatsApp.
The team recently got the chance to build a “platform to rule them all” with the goal of streamlining the way all our components send and track messages, whatever their form. The general logic is as follows: Clients or programmatic platforms send messages or batches of messages using REST API gateways that are responsible for validating and rendering the message payload. Then, those gateways will all converge towards a Central Message Routing Platform that will implement full-featured scheduling, accounting, tracing and of course routing of the messages using the right platform or operator connectors.
Moving from a dedicated platform per channel to one
Looking at the Central Messaging Platform
High Constraints
Putting all your eggs in one basket is always risky, right? Making this kind of move places a lot of constraints on our platform requirements. It has to be very reliable, first, in terms of being highly available and resilient because it will become a single point of failure for all our messages. Second, in terms of being able to scale fast to match the growth of one or multiple channels at once as our routing needs change.
Strong Guarantees
High availability and scale look easy when compared to our observability and idempotence requirements. When you imagine all your messages going through a single place, the ability to trace what happened to every single one of them, or a group of them, becomes a real challenge. Even worse, one of the greatest challenges out there, even more so in a distributed system, is the idempotence guarantee that we lacked so far on the other pipelines. Guaranteeing that a message cannot be sent twice is easier said than done.
Design Thinking and Key Concepts
We split up our objectives into three main concepts that we promised to strictly respect to keep up with the constraints and guarantees of our platform.
Reliability
- Simple: few share-(almost?)-nothing components
- Low coupling: keep remote dependencies to a minimum
- Coding language: efficient with explicit patterns and strict paradigms
Scale
- Application layer: easy to deploy and scale with strong resilience
- Data bus: high-throughput, highly-resilient, horizontally scalable, time- and order- preserving capabilities message bus
- Data querying: low-latency, one-or-many query support
Idempotence
- Processing isolation: workload distribution should be deterministic
Architecture Considerations
The Possible Default Choice
Considering Numberly’s stack, the first go-to architecture could have been something like this:
- Application layers running on Kubernetes
- Kafka as a message-passing bus from Gateway APIs
- Kafka as a log of messages to be processed and sent
- ScyllaDB as a storage layer to query the state of individual messages or group of messages
- Redis as a hot cache for some optimizations
- Kafka as a messaging bus between our Central Message Routing Platform to individual channel routing agents
On paper, it sounds like a solid and proven design, right?
A Not-So-Default Choice After All
This apparently simple go-to architecture has caveats that break too many of the concepts we promised to stick with.
Reliability
High availability with low coupling: We would rely on and need to design our reliability upon three different data technologies, each of which could fail for different reasons that our platform logic should handle.
Scalability
While we are lucky to dispose of a data technology to match each scalability constraint we set, the combination of the three does not match our reliability + idempotence requirements. Their combination adds too much complexity and points of failure to be efficiently implemented together:
Easy to deploy: Kubernetes would do the job all right.
- Data horizontal scaling: While ScyllaDB would scale for sure, Kafka scaling with its partitions logic requires caution, and Redis does not scale that well out of the box.
- Data low-latency querying: ScyllaDB and Redis are the clear winners here, while Kafka is obviously not designed to “query” a piece of data easily.
- Data-ordered bus: That’s where Kafka excels and where Redis exposes a queuing capability that will scale hazardously. ScyllaDB on the other hand might be able to act as an ordered bus if we give it some thought.
Idempotence
As expected, idempotence becomes a nightmare when you imagine
achieving it on such a complex ecosystem mixing many
technologies.
Deterministic workload distribution: Can you achieve it when adding
ScyllaDB+Kafka+Redis?
The Daring Architecture: Replacing Kafka with ScyllaDB
So, we decided to be bold and make a big statement: we’ll only use ONE data technology to hold everything together! ScyllaDB was the best suited to face the challenge:
- It’s highly available
- It scales amazingly
- It offers ridiculously fast queries for both single and range queries
This means that ScyllaDB can also be thought of as a distributed cache, effectively replacing Redis. Now replacing Kafka as an ordered-data bus is not so trivial using ScyllaDB, but it seems doable. The biggest question still on our plate was, “How can we get a deterministic workload distribution, if possible, for free?” That’s where I got what turned out to be a not-so-crazy idea after all: “What if I used ScyllaDB’s shard-per-core architecture inside my own application?”
Let’s take a quick detour and explain ScyllaDB shard-per-core architecture.
ScyllaDB Shard-Per-Core Architecture
ScyllaDB’s low-level design uses a shard-per-core architecture to deterministically distribute and process data. The main idea is that the partition key in your data table design determines not only which node is responsible for a copy of the data, but also which CPU core gets to handle its I/O processing.
You got it right: ScyllaDB distributes the data in a deterministic fashion down to a single CPU core.
So, my naive idea was to distribute our messaging platform processing using the exact same logic of ScyllaDB:
The expected effect would be to actually align ScyllaDB’s per-CPU core processing with our application’s and benefit from all the latency/scaling/reliability that comes with it.
The 100% Shard-Aware Application
That’s how we effectively created a 100% shard-aware application. It brings amazing properties to the table:
- Deterministic workload distribution
- Super-optimized data processing capacity aligned from the application to the storage layer
- Strong latency and isolation guarantees per application instance (pod)
- Infinite scale following ScyllaDB’s own ability to grow seamlessly
- Building a Shard-Aware Application
Selecting the Right Programming Language
Now that we got our architecture inspiration, it was time to answer the perpetual question: “Which language to use?”
- We need a modern language that reflects our desire to build a reliable, secure and efficient platform.
- The shard calculation algorithm requires fast hashing capabilities and a great low-level synergy with the ScyllaDB driver.
- Once we established that, Rust was a no-brainer.
The Deterministic Data Ingestion
Incoming messages are handled by a component that we call the ingester. For each message we receive, after the usual validations, we calculate the shard to which the message belongs as it will be stored in ScyllaDB. For this, we use the ScyllaDB Rust driver internal functions (which we contributed).
More precisely, we compute a partition key that matches ScyllaDB’s storage replica nodes and CPU core from our message partition key, effectively aligning our application’s processing with ScyllaDB’s CPU core.
Once this partition key is calculated to match ScyllaDB’s storage layer, we persist the message with all its data in the message table, and at the same time add its metadata to a table named “buffer” with the calculated partition key.
The Deterministic Data Processing
That’s how the data is stored in ScyllaDB. Now, let’s talk about the second component, which we call “schedulers.” Schedulers will consume the ordered data from the buffer table and effectively proceed with the message-routing logic. Following the shard-to-component architecture, a scheduler will exclusively consume the messages of a specific shard just like a CPU core is assigned to a slice of ScyllaDB data.
A scheduler will fetch a slice of the data that it is responsible for from the buffer table.
At this point, a scheduler will have the IDs of the messages it should process. It then fetches the message details from the message table.
The scheduler then processes and sends the message to the right channel it is responsible for.
Each component of the platform is responsible for a slice of messages per channel by leveraging ScyllaDB’s shard-aware algorithm. We obtain 100% aligned data processing from the application’s perspective down to the database.
Replacing Kafka with ScyllaDB
Replacing Kafka as an ordered-data bus is not so trivial using ScyllaDB, but it was surely doable. Let’s get a deeper view into how it works from the scheduler component perspective.
We store messages’ metadata as a time series in the buffer table, ordered by ScyllaDB’s time of ingestion (this is an important detail). Each scheduler keeps a timestamp offset of the last message it successfully processed. This offset is stored in a dedicated table. When a scheduler starts, it fetches the timestamp offset of the shard of data it is assigned to.
A scheduler is an infinite loop fetching the messages it is assigned to within a certain and configurable time window. In fact, a scheduler doesn’t fetch data strictly starting from the last timestamp offset, but instead from the oldest timestamp. That does indeed mean that a single message will be fetched multiple times, but this is handled by our idempotence business logic and optimized by a memory cache. Overlapping the previous time range allows us to prevent any possible message miss that could be caused by a potential write latency or subtle time skew between nodes, since we rely on ScyllaDB’s timestamps.
Retrospective
Reaching our goal was not easy. We failed many times, but finally made it and proved that our original idea was not only working, but also was convenient to work with while being amazingly efficient.
What We Learned
The first thing we want to emphasize is that load testing is more than useful. Early during the development, we set up load tests, sending dozens of thousands of messages per second. Our goal was to test our data schema design at scale and idempotence guarantee. It allowed us to spot multiple issues, sometimes non-trivial ones (like when the execution delay between the statements of our insertion batch was greater than our fetch time window). Yeah, a nightmare to debug…
By the way, our first workload was a naive insert-and-delete, and load testing made large partitions appear very fast.
Hopefully, we also learned about compaction strategies, and especially the Time-Window Compaction Strategy, which we are using now. This allowed us to get rid of the large partitions issue. Message buffering as time series processing allowed us to avoid large partitions.
We Contributed to the ScyllaDB Rust Driver
To make this project possible, we contributed to the ScyllaDB ecosystem, especially to the Rust driver, with a few issues and pull requests. For example, we added code to compute the replica nodes of a primary key, as we needed it to compute the shard of a message:
- optimize PreparedStatement::compute_partition_key
- add ClusterData::get_endpoints
- connection_pool/node: expose node sharder
- session: add keyspaces_to_fetch configuration
- zero-copy and lazy rows deserialization enhancement
We hope it will help you if you want to use this cool sharding pattern in your own shard-aware application at some point.
We also discovered some ScyllaDB bugs, so of course we worked with ScyllaDB support to have them fixed (thanks for your responsiveness).
What We Wish We Could Do
As in all systems, everything is not perfect, and we have some points we wish we could do better. Obviously, ScyllaDB is not a message queuing platform, and we miss Kafka long-polling. Currently, our architecture does regular fetching of each shard buffer, so that’s a lot of useless bandwidth consumed. But, we are working on optimizing this.
Also, we encountered some memory issues, where we did suspect the ScyllaDB Rust driver. We didn’t take so much time to investigate, but it made us dig into the driver code, where we spotted a lot of memory allocations.
As a side project, we started to think about some optimizations; actually, we did more than think, because we wrote a whole prototype of an (almost) allocation-free ScyllaDB Rust driver.
We will maybe make it the subject of a future article, with the Rust driver outperforming the Go driver again.
Going Further With ScyllaDB Features
So we bet on ScyllaDB, and that’s a good thing because it has a lot of other features that we want to benefit from. For example change data capture: Using the CDC Kafka source connector, we could stream our message events to the rest of the infrastructure without touching our application code. Observability made easy. We are looking forward to the ScyllaDB path toward strongly consistent tables with Raft as an alternative to lightweight transactions (LWT). Currently, we are using LWT in a few places, especially for dynamic shard workload attribution, so we can’t wait to test this feature!
This Breakthrough Design Has Been Awarded
We are very proud to have won the ScyllaDB Innovation Award: Top Technical Achievement for this work and results. The design is now in production at Numberly. Feel free to get in touch with us if you want to know more, or even better, if you’d like to join one of our amazing tech teams.
ScyllaDB User Talk Takeaways: CEO Dor Laor’s Perspective
Much has changed since the first ever ScyllaDB Summit, when ~100 people gathered in San Jose, CA in 2016. The ScyllaDB database, company, and community have all developed quite substantially. But one thing remains the same: my favorite part of the event is hearing about what our users are achieving with ScyllaDB.
I want to share some of the highlights from these great tech talks by our users, and encourage you to watch the ones that are most relevant to your situation. As you read/watch, you will notice that some of the same themes appear across multiple talks:
- Sheer scale, latency, and throughput stories
- Migration from other good databases that couldn’t tackle all of the performance, availability, and replication needs
- NVMe caching (not a surprise, we plan on doing it ourselves)
- Elimination of event streaming/queuing by using ScyllaDB
ZeroFlucs
About: ZeroFlucs’ technology enables bookmakers and wagering service providers to offer same-game betting. Their simulation-based architecture enables unique and innovative bet types that maximize profits for bookmakers as well as engage end users.
Use Case: Customer Experience
Takeaways: ZeroFlucs first caught our attention by publishing a blog on Charybdis: a cleverly named library that they created to “simplify and enhance the developer experience when working with ScyllaDB in Go.” We invited them to ScyllaDB Summit to share more about this “other sea monster” as well as their broader ScyllaDB use case – and their Director of Software Engineering, Carly Christensen, joined us to share their story.
In short, ZeroFlucs uses a really smart approach to figuring out exactly how data should be replicated to support low latency for their global usage patterns without racking up unnecessary storage costs. Their business is processing sports betting data, and it’s extremely latency sensitive. Content must be processed in near real-time, constantly, and in a region local to both the customer and the data. And they face incredibly high throughput and concurrency requirements – events can update dozens of times per minute and each one of those updates triggers tens of thousands of new simulations (they process ~250,000 in-game events per second).
They considered MongoDB, Cassandra, and Cosmos DB, and ultimately selected ScyllaDB due to it meeting their performance and regional distribution requirements at a budget-friendly cost. Carly walked us through the ScyllaDB setup they use to keep data local to customers, and also talked about the Charybdis (open source) library that they created to orchestrate the management of keyspaces across their many global services.
Discord
About: Discord is “a voice, video and text app that helps friends and communities come together to hang out and explore their interests — from artists and activists, to study groups, sneakerheads, plant parents, and more. With 150 million monthly users across 19 million active communities, called servers, Discord has grown to become one of the most popular communications services in the world.”
Use Case: Customer Experience
Migration: MongoDB to Cassandra, then to ScyllaDB
Takeaways: ScyllaDB and Discord have worked closely for many years as both organizations grew. I was thrilled when they finally moved 100% from Cassandra to ScyllaDB – and even more excited to hear that they would be sharing their experiences with our community in two great ScyllaDB Summit talks.
First, Bo Ingram shared how Discord stores trillions of messages on ScyllaDB. He explained their reasons for moving from Apache Cassandra to ScyllaDB, their migration strategy, the role of Rust, and how they designed a new storage topology – using a hybrid-RAID1 architecture – for extremely low read latency on GCP. There’s even a great World Cup example at the end!
Stephen Ma also took a technical deep dive into Discord’s approach to more quickly and simply onboarding new data storage use cases. Discord developed a key-key-value store service that abstracts many ScyllaDB-specific complexities–like schema design and performance impacts from tombstones and large partitions–from developers. (Note: ScyllaDB has addressed many of those shortcoming in our recent releases).
Discord just wrote a great blog about this massive achievement. Give it a read!
Watch Discord’s talk (Stephen)
Epic Games
About: Epic Games develops Unreal Engine, the 3D engine that powers the world’s leading games (e.g., Fortnite) and is used across industries such as film and television, architecture, automotive, manufacturing, and simulation. With 7.5 million active developers, the Unreal Engine is one of the world’s most popular game development engines.
Migration: DynamoDB to ScyllaDB
Takeaways: Epic Games uses ScyllaDB as a binary cache in front of NVMe and S3 to accelerate power global distribution of large game assets used by Unreal Cloud DDC (the asset caching system Unreal Engine). The cache is used to accelerate game “cook time” – taking texture + mesh and converting it to a specific platform like PlayStation 5 or Xbox. A developer does this once, then it’s replicated across the world.
They store structured objects with blobs that are referenced by a content hash. The largest payload is stored within S3, whereas the content hash is stored in ScyllaDB. When Unreal Engine goes to ScyllaDB to fetch the metadata, they get sub-millisecond responses. They selected ScyllaDB for a number of reasons. They started looking for alternatives to DynamoDB in search of cloud agnosticity. DynamoDB was simple to adopt, but they later realized the value of having a database that was not tied to any particular cloud provider and can also work on-premises. Looking at ScyllaDB, they found that the lower latency was a better match for their performance-sensitive workload and the cost was much lower as well.
Numberly
About: Numberly is a digital data marketing technologist and expert helping brands connect and engage with their customers using all digital channels available. They are proud to be an independent company with solid internationally recognized expertise in both marketing and technology – and they just celebrated their 23rd anniversary!
Use case: Recommendation & Personalization
Migration: Kafka + Redis to ScyllaDB
Takeaways: Numberly’s team are longtime ScyllaDB power users and dedicated community contributors. They continuously push the boundaries of what is possible, and their latest “crazy idea” to replace Kafka and Redis with a (Rust-based) ScyllaDB shard-aware application is no exception. It’s an impressive example of how, in many situations, you don’t need a Kafka queue (or Redis cache) and use a lookup into ScyllaDB itself.
With ScyllaDB as their only backend, they managed to reduce operational costs while benefiting from core architectural paradigms like predictable data distribution and processing capacity, idempotence. I have to admit that I am personally quite proud to see this cool new use of ScyllaDB’s shard-per-core architecture. 🙂 I hope it inspires others to use it for their own trailblazing projects.
ShareChat
About: ShareChat provides India’s leading multilingual social media platform (ShareChat) and India’s biggest short video platform (Moj). ShareChat serves 180M Monthly Active Users and Moj serves 300M Monthly Active Users. Together, they are changing the way the next billion users will interact on the internet.
Migration: Their cloud provider’s DBaaS + some Cassandra instances to ScyllaDB
Use Case: Customer Experience
Takeaways: This is a team with a lot of achievements to share! Just a couple months ago, ShareChat’s Geetish Nayak provided an overview of their rapid ScyllaDB ramp up and expansion to ~50 use cases in a detailed webinar. And now they shared two more technical talks at ScyllaDB Summit.
Charan Movva talked about ShareChat handles the aggregations of a post’s engagement metrics/counters at scale with sub-millisecond P99 latencies for reads and writes (note that they operate at the scale of 55k-60k writes/sec and 290k-300k reads/sec, respectively). By rethinking how they handle events – by using ScyllaDB with their existing Kafka Streams – they were to maintain ultra-low latency while reducing database costs by at least 50%.
Anuraj Jain and Chinmoy Mahapatra also introduced us to the live migration framework they designed to move ~100TB of raw data for their various use cases over to ScyllaDB without downtime or even any noticeable latency hits. It’s a very rich framework, with support for both counter and non-counter use cases, support for Go, Java, and Node.js, fallback, recovery, auditing, and support for many corner cases. If you’re thinking of moving data into ScyllaDB, definitely take a look. I’m truly amazed by what this entire team has achieved so far and look forward to partnering with them on what’s next.
Watch ShareChat’s talk (Charan)
Watch ShareChat’s talk (Anuraj and Chinmoy)
SecurityScorecard
About: SecurityScorecard aims to make the world a safer place by transforming the way thousands of organizations understand, mitigate, and communicate cybersecurity. Their rating platform is an objective, data-driven and quantifiable measure of an organization’s overall cybersecurity and cyber risk exposure.
Migration: Redis, Aurora, and Presto + HDFS to ScyllaDB and Presto + S3
Use case: Fraud & thread detection
Takeaways: You can read the explanation before or just skip to their results slide that speaks for itself.
This is the first migration from Redis and Aurora to ScyllaDB that I’m aware of. Some data modeling work was required, but there were strong payoffs. Their previous data architecture served them well for a while, but couldn’t keep up with their growth. Their platform API queried one of three data stores: Redis (for faster lookups of 12M scorecards), Aurora (for storing 4B measurement stats across nodes), or a Presto cluster on HDFS (for complex SQL queries on historical results).
As data and requests grew, challenges emerged. Aurora and Presto latencies spiked under high throughput. The largest possible instance of Redis still wasn’t sufficient, and they didn’t want the complexity of working with Redis Cluster. To reduce latencies at their new scale, they moved to ScyllaDB Cloud and developed a new scoring API that routed less latency-sensitive requests to Presto + S3 storage. Look at the slide below to see the impact:
Watch SecurityScorecard’s talk
Strava
About: Strava is the largest sports community in the world. Strava’s platform enables 100M+ athletes across 195 countries to connect and compete by sharing data collected via 400+ types of devices throughout the day.
Use Case: IoT
Migration: Cassandra to ScyllaDB
Takeaways: As an avid mountain biker, I’m quite a fan of Strava – so it was fun to see the architecture behind Strava platform’s segment tracking, matched rides, performance metrics, route discovery, etc. After an overview of the overall architecture, Phani Teja talked about a few of the use cases they recently moved from Cassandra to ScyllaDB. Horton is a flexible scalar value store for activity data such as distance, max/avg speed, max/avg heart rate, bikeID, shoeID, and so on. Segments is a write-heavy use case that stores athletes’ progress reports on segments and loads them into their feeds. And Neogeo is read-heavy (with a fair volume of writes) use case that stores encoded map styles for static images.
I was pleased to see that they are using the powerful new I4i instances, which ScyllaDB can take great advantage of. They use i4i.large all the way through i4i.8xlarge, and they vary those instances based on the load prediction.
Level Infinite
About: Level Infinite is the global game publishing division of Tencent Games, the world’s leading video game platforms. Some of the titles they are currently running include PUBG Mobile, Arena of Valor, and Tower of Fantasy.
Use case: Analytics
Takeaways: This talk explains how Level Infinite uses ScyllaDB along with Apache Pulsar to solve the problem of dispatching events to numerous gameplay sessions. It’s for a very interesting use case: monitoring and responding to risks that can occur in Tencent games – for example, cheating and harmful content.
Fully aware that using Cassandra as a distributed queue has historically been considered an antipattern, they recognized the potential to defy expectations here and they architected a great solution. In particular, I encourage you to look at their “pseudo-now” approach to finding new events even when events are not committed to ScyllaDB in the order indicated by the event id.
Coralogix
About: DevOps and Security engineers around the world use Coralogix’s real-time streaming analytics pipeline to monitor their stacks and fix production problems faster. Their ML algorithms continuously monitor data patterns and flows between system components and trigger dynamic alerts.
Use Case: Data ingest and storage
Migration: Postgres to ScyllaDB
Takeaways: The spoiler alert here is that Coralogix shrank query processing times for their next generation distributed query engine from 30 seconds (!!!) to 86 ms by moving from PostgreSQL to ScyllaDB. The engine queries semi-structured data and the underlying data is stored in object storage (e.g., EBS, S3) using a specialized Parquet format. It was originally designed as a stateless query engine on top of the underlying object storage, but reading Parquet metadata that way was too much of a latency hit.
So, they developed a “metastore” that would take the Parquet metadata needed to execute a large query and put it into a faster storage system that they could query quickly. The work is technically impressive – listen to the different components of Parquet’s metadata that needed to be queried together to retrieve data from 50,000 Parquet files.
They moved from first discovering ScyllaDB to getting into production with terabytes of data within just 2 months (and this was a SQL to NoSQL migration requiring data modeling work vs a simpler Cassandra or DynamoDB migration).
Beyond ScyllaDB Adoption/Migration Stories
To conclude, I also want to encourage you to watch two other awesome talks presented by ScyllaDB users.
Optimizely
Brian Taylor from Optimizely shared their tips for getting the most out of ScyllaDB’s “amazing” concurrency (his word, not mine – but I do agree). It’s a really smart technical presentation; I think anyone considering or using ScyllaDB should watch it very carefully.
iFood
Also, if you’re working with (or curious about) Quarkus, take a look at the nice live-coding Quarkus + ScyllaDB session by iFood’s João Martins!
Free Hands-On NoSQL Training in Asia-Friendly Time Zones
Note: This event has concluded – but there are many more live events to come! Also, visit ScyllaDB University for self-paced training.
ScyllaDB has earned a bit of a reputation for holding free educational – and engaging – virtual events: NoSQL masterclasses, developer workshops, ScyllaDB University LIVE, ScyllaDB Summit, and P99 CONF (a broader event on “all things performance.”) One of the key ways we keep them engaging is to ensure that they’re interactive: attendees are connecting with the presenters and fellow attendees in realtime. But, the reality of a live virtual event is that it’s just not convenient for everyone.
We have heard many requests to replay our events in more Asia-friendly timezones. We’re NOT going to do that. It won’t provide the same level of experience for our attendees. Instead, we’re doing something that we hope you’ll like much better. We’re holding a special 3-hour event on April 19 that’s designed specifically for our many friends across India, Singapore, Malaysia, Australia, and neighboring areas: ScyllaDB Labs – Introduction to NoSQL.
Save Your Spot – Attendance is Limited
This is a free 3-hour event that will help you jumpstart your NoSQL mastery in a supportive, collaborative environment with our top ScyllaDB experts + your peers across the region. First, you’ll learn the core NoSQL concepts and strategies used by gamechangers like Disney+ Hotstar, ShareChat, Discord, Ola Cabs, Tencent Games, and Grab. Then, you will log into our special ScyllaDB lab environment, where you can get real hands-on experience applying these strategies through the exercises we’ve designed for this event. A team of ScyllaDB experts will be just a click away to answer all of your questions and guide you along the path to success.
You’ll leave knowing how to deploy and interact with ScyllaDB, the monstrously fast and scalable NoSQL database. Specifically, you will learn:
- The best (and worst) uses of different NoSQL options
- Critical considerations for NoSQL data modeling and architecture
- How to set up and interact with a 3-node distributed database cluster
- Do’s and don’ts for successful SQL to NoSQL migrations
Here’s a look at how the day will be structured:
10:00-10:05am IST | Welcome |
10:05-11:00am IST | Getting Started session |
11:00am-12:00pm IST | Working with ScyllaDB session |
12:00-1:00pm IST | Hands on labs and Q&A |
Guy Shtub’s
Perspective
Here’s a little more detail on the event from its creator and host Guy Shtub, head of ScyllaDB University and training…
I often hear from people who are interested in NoSQL and in ScyllaDB and want to attend our LIVE training events…but they can’t make it because of the time difference. I’m excited to hold this event– our first in an Asia-friendly time zone – and hope that this will enable many more people to attend.
This event marks another first for us: it is more experiential and applied than our typical events. You’ll have a chance to run the hands-on labs during the event, with myself and other experts ready to help you if you have any questions.
I’ll start with a brief welcome talk describing the different sessions and receiving everyone. Afterward, I’ll kick off with the first talk “Getting Started with ScyllaDB.” This will explain the basics of NoSQL and an intro to ScyllaDB.
After that, I’ll dive into ScyllaDB Architecture, covering concepts like Node, Cluster, Replication Factor, Tokens, Consistency Level, and more. Finally, if we have enough time, I’ll go over the read and write paths in ScyllaDB so that you can see what happens when we read or write data at different consistency levels.
The next talk (by Tzach Livyatan, our VP of Product) is titled “Working with ScyllaDB.” This is a fast track to getting started with ScyllaDB. It covers best practices for NoSQL data modeling, CQL, Partition Key, Clustering Key, selecting the right compaction strategy for your workload type, selecting and working with drivers, and more.
After the two talks, you’ll get to put the theory into practice by running the following hands-on labs:
- Quick Wins Lab – See how easy it is to get ScyllaDB up and running and to perform some basic queries.
- High Availability Lab – Using a hands-on example, demonstrates how Availability works in ScyllaDB. You’ll try setting the Replication Factor and Consistency Level in a three-node cluster and you’ll see how they affect read and write operations when all of the nodes in the cluster are up – and also when some of them become unavailable.
- Basic Data Modeling Lab – Learn the basics and key concepts of NoSQL data modeling, understand the importance of the primary key, and perform related queries.
- Materialized Views and Indexes Lab 1 – Create a base table and different Materialized Views (MV) for that base table, execute updates to the base table, and learn how to query the MV.
- Materialized Views and Indexes Lab 2 – Experience Global and Local Secondary indexes in action, with example use cases and an explanation of when to use each.
- Project Alternator Lab – Get started with Alternator, an open-source project that makes ScyllaDB compatible with Amazon DynamoDB.
As you run the labs, our team of experts will be available to assist and answer any questions you might have.
You can prepare for the event by taking some of our courses at ScyllaDB University. They are completely free and will allow you to better understand ScyllaDB and how the technology works. I recommend starting with the ScyllaDB Essentials course.
Have any questions or ideas for the event? Discuss the event on the ScyllaDB Community Forum.
We hope to see you there!
Register to Access the Free NoSQL Training
A New Hands-On Lab to Guide You Through App Migrations
Migrating to new technology while keeping the business operational is a challenge faced by many organizations. This is particularly true for enterprises that use Apache Cassandra® or other CQL-compatible databases, which are typically relied upon for continuous availability. Eliminating downtime...Apache HBase® vs. Apache Cassandra®: An In-Depth Comparison of NoSQL Databases
Apache HBase® and Apache Cassandra® are both open source NoSQL databases well-equipped to handle incredible amounts of data–but that’s where the similarities end.
In this blog, discover the architectures powering these technologies, when and how to use them, and which option may prove to be the better choice for your operations.
What is Apache HBase?
Apache Hbase is an open source NoSQL database built on top of the Hadoop Distributed File System (HDFS). It is a fault tolerant, distributed system designed for random read/write of large volumes of data.
What is Apache Cassandra®?
Apache Cassandra is an open source, non-relational (or NoSQL) database that supports continuous availability, tremendous scale, and data distribution across multiple data centers and cloud availability zones.
HBase vs. Cassandra: The Difference
Both HBase and Cassandra were born from the same requirement: Storing and processing immense amounts of sparse data. However, they achieve this requirement by different means.
Architectures
HBase
HBase is built on top of HDFS, which is used to store the data that is being processed by HBase. HDFS is a distributed file system used by Apache Hadoop®.
HBase acts as an in-memory layer for reading and writing the data to HDFS.
An HBase deployment requires servers
performing 3 roles: HMaster, Region Server, and Apache ZooKeeper. These
roles can be performed by a single node, but for a cluster of any
reasonable size these roles should be assigned to individual
servers.
HMaster Server
The HMaster is responsible for region assignment. In HBase, a region is a group of lexicographically adjacent rows. The HMaster is responsible for distributing regions evenly across the available region servers and keeping their sizes roughly equivalent.
Additionally, the HMaster handles the DDL commands such as creating and deleting databases.
Finally, the HMaster server is constantly observing the region servers and handling load balancing and failover operations for the cluster.
Region Server
The primary role of these servers is to serve read and write requests. Each region server is responsible for the data that falls within its assigned region. A single region server can serve data for multiple regions, into the thousands.
Apache
ZooKeeper
ZooKeeper is used as the central source of truth for the health of the cluster, and the current state of the tenant servers within it. When a change in configuration is observed in the cluster, such as a region server becoming unavailable, it will notify the HMaster which can take the appropriate action.
ZooKeeper itself is often deployed in a cluster formation of at least 3 nodes, where it can effectively perform fault tolerant configuration management via consensus among its servers.
Apache Cassandra®
A Cassandra cluster is comprised of homogenous servers, called nodes, connected to each other and all performing the same role. Cassandra is a masterless system, meaning no single node is the sole source of truth for the data within the cluster.
Data in a Cassandra cluster is split between the nodes using the primary key that is then hashed and assigned to the node or nodes which are responsible for serving that hash.
Cluster administration and failover is handled internally by the cluster nodes themselves, communicating via the gossip protocol. This system can coordinate shuffling data to new nodes, marking nodes as offline and failing over in the event of an outage.
Data Models
HBase
At the top level of the HBase data model is the column-oriented Table. Each table has a set of row-keys which can be thought of as a primary key in a traditional relational database.
Rows are divided into related columns of data called column families.
This data structure allows related information to be stored close to each other on disk which makes reads more efficient.
Example read:
get ’<table name>’,’row1’
> get 'CustomerOrders ', '1' COLUMN CELL User: FName timestamp = 1675123184293, value = Rafa User: LName timestamp = 1675123184293, value = Lopez Orders: Client timestamp = 1675123388213, value = Lopez Orders: Client timestamp = 1675123388214, value = Apple 4 row(s) in 0.0260 seconds
Apache Cassandra
Cassandra’s data model is best described as a partitioned row store. At the top level of the Apache Cassandra data model is the Keyspace. Within a keyspace are column-families (aka tables).
In Cassandra’s partitioned row store, rows within a column-family are stored together on disk.
Example read:
select <columns> from <table>;
cqlsh> select * from User; id | fname | lname --------+-----------+---------- 1 | Rafa | Lopez 2 | Alice | Smith (2 rows)
HBase vs. Apache Cassandra Security
HBase Security
HBase also supports user authentication and authorization. Client encryption is achieved in conjunction with Kerberos authentication on the cluster.
Authorization can be restricted down to the cell level, if required.
Cassandra Security
Cassandra supports basic security features such as user authentication and authorization. Users can be assigned roles, which can restrict access to records; down to specific rows in a column family, if required.
Client connections support SSL (Secure Sockets Layer) encryption, and internode communication can also be SSL encrypted.
Since Cassandra 4, audit logging can be enabled on the cluster to allow administrators to see who performed what command on the data.
Performance
Read and Write
Cassandra is designed for large scale data ingestion, and it writes data simultaneously to log and cache, making it the faster of the 2 technologies for writing data.
HBase writes must be negotiated through ZooKeeper, then the HMaster to determine where the data should be written, which negatively impacts its write speed.
Conversely, the masterless design of Cassandra makes the read path slower, as data is retrieved by consensus among the nodes that hold the data.
HBase can take advantage of the HDFS underpinnings, which include bloom filters and caches, which allows its read performance to outperform Cassandra given a similar data set.
Transactions
ACID transactions are not possible in
HBase by default, but they are made possible when combined
with Apache
Phoenix, although
this feature is still in beta.
Cassandra does not support ACID transactions. You can update a record using compare and set, which Cassandra also calls lightweight transactions, but no rollback capability exists.
Query Language
Cassandra provides a SQL-like language called CQL. With CQL you can select, insert, update, and delete records with very similar syntax to SQL. However, care needs to be taken with large data sets, as poorly optimized queries can significantly impact cluster performance.
CQL can be used in conjunction with many other Apache Cassandra client libraries, or directly with the Cassandra cluster.
The HBase shell is the closest analogue to a query language for HBase. You can use put(), scan(), create(), and other commands to interact with your data. Apache Phoenix can be added to HBase which does give it a SQL-like query language.
A better way to use your HBase cluster is to use the Java API. While this does require you to use Java, it allows a much richer way to create, insert, and update your data in HBase.
Key Similarities Between HBase and Cassandra
Scalability
HBase can be scaled up by adding nodes to the cluster. Most of the time, we would be adding nodes to serve as region servers. HBase automatically splits data sets into new regions when they get too large, and adding region servers allows HBase to distribute the load more effectively.
It is possible to add additional HMaster nodes, but only 1 master is available at a time, so the new node would exist for failover purposes only.
Cassandra can also be scaled by adding additional nodes to the cluster. Cassandra uses a consistent hash to evenly partition the data in the cluster to the nodes within. Adding new nodes to a cluster immediately distributes data to that node. The amount of data depends on numerous factors such as how much total data exists in the cluster, number of nodes, replication settings, and others.
Both HBase and Cassandra were designed to be distributed, scalable databases and can scale to hundreds of nodes effectively.
Replication Capabilities
In HBase, data replication is handled by HDFS. HDFS replication settings can be any number, typically 3. A replication factor of 3 means there is 1 primary copy of data, which is then replicated to 2 additional servers.
HDFS Replication is rack aware, meaning when it assigns replicas it will prioritize servers that are on different networks or racks. This ensures that the system can tolerate a single network outage and not lose data.
In Cassandra, replication is handled internally. Replication settings are set on individual keyspaces, and they can also be made rack aware like HDFS.
In contrast to HBase, Cassandra doesn’t have a primary replica or master node for a particular record. If the keyspace has a replication factor of 3, a write command will be issued to all 3 nodes responsible for that data.
Unique to Cassandra, is the concept of the data center. A data center in Cassandra is how we can define a set of nodes and racks in a geographical region. A Cassandra cluster can consist of one or more data centers, and replication can be configured between them.
What this means in practice is you can have a single database with different regional localities which allows clients to achieve lower latency but still maintaining data consistency between the regions.
Common Use Cases
Both Cassandra and HBase are built to handle, store, and distribute huge data sets as their primary objective. They do differ in some areas, which may help inform which system is the best suit for you.
Application of HBase
The best reason to use HBase is to take advantage of the Hadoop and HDFS underpinnings. The main drawcard is the support for MapReduce, which is a method of processing and analyzing a huge scale of data in parallel.
The application of Hadoop and HDFS are manifold, and industries include the Financial, Healthcare, and Telecom sectors. These industries have a huge amount of data across millions of customers which presents a large set of data to store and analyze.
Application of Cassandra
Sometimes you just need to store a lot of data quickly, and that’s where Cassandra shines. Use cases like IoT (Internet of Things) devices, internet messaging, or application metrics where we need to store large amounts of data frequently but may need to access it infrequently.
That’s not to say Cassandra is not suitable for read loads. Effectively designed data models and queries can make Cassandra a great place to store and retrieve data.
Which is Better—HBase or Cassandra?
As with most things, there is not a “best” option, per se—but that does not necessarily mean that both technologies will offer equal value for you in the long run, either.
HBase and Cassandra are certainly excellent choices for big data, NoSQL databases. They are both scalable, highly available, and consistent databases that can store large amounts of data in an efficient and inexpensive way with no vendor lock-in.
If you anticipate using the features offered by Hadoop such as MapReduce, then HBase does make itself an obvious choice. But if you do not need those specific features—and don’t anticipate needing them in the future—then Cassandra can prove to be the more attractive option for several reasons.
If you have other analytics tools already or want a streamlined infrastructure deployment, Cassandra can fill that requirement. Equally, if you require data replication between geographical regions, Cassandra natively supports these deployments and is the logical choice.
But with Cassandra, you also get an incredible array of support on top of an already-robust community. This includes managed platforms to offload your entire Cassandra operations—complete with the flexibility to operate on-prem or in the cloud—leaving you to focus entirely on your applications. Expert and diverse help is also easy to come by for those inevitable situations when you run into trouble with your Cassandra clusters—a situation with which Instaclustr’s Cassandra consultants are very familiar.
Ultimately, it will come down to your particular data problems to decide which tech could be the best option for you. While both offer incredible capabilities—some distinct, others similar—if you don’t envision ever needing any of the features unique to HBase, then the wealth of additional resources and community expertise that comes with Cassandra certainly makes it the worthwhile option for long-term value.
The post Apache HBase® vs. Apache Cassandra®: An In-Depth Comparison of NoSQL Databases appeared first on Instaclustr.
ScyllaDB vs MongoDB vs PostgreSQL: Tractian’s Benchmarking & Migration
This article was written independently by João Pedro Voltani, Head of Engineering at TRACTIAN, and originally published on the TRACTIAN blog. It was co-authored by João Granzotti, Partner & Head of Data at TRACTIAN
In the beginning of 2022, TRACTIAN faced the challenge of upgrading our real-time machine learning environment and analytical dashboards to support an aggressive increase in our data throughput, as we managed to grow our customers database and data volume by 10x during 2021.
We recognized that in order to stay ahead in the fast-paced world of real-time machine learning (ML), we needed a data infrastructure that was flexible, scalable, and highly performant. We believed that ScyllaDB would provide us with the capabilities we lacked to support our real-time ML environment, enabling us to push our product and algorithms to the next level.
But probably you are wondering— why was ScyllaDB our best fit? We’d like to take you on the journey of transforming our engineering process to focus on improving our product’s performance. We’ll cover why we ultimately decided to use ScyllaDB, the positive outcomes we’ve seen as a result, and the obstacles we encountered during the transition.
How We Compared NoSQL Databases: MongoDB vs ScyllaDB vs PostgreSQL
When talking about databases, many options come to mind. However, we started by deciding to focus on those with the largest communities and applications. This left three direct options: two market giants and a newcomer that has been surprising the competitors. We looked at four characteristics of those databases — data model, query language, sharding, and replication — and used these characteristics as decision criteria for our next steps.
First off, let’s give you a deeper understanding of the three databases using the defined criteria:
MongoDB NoSQL
- Data Model: MongoDB uses a document-oriented data model where data is stored in BSON (Binary JSON) format. Documents within a collection can have different fields and structures, providing a high degree of flexibility. The document-oriented model enables basically any data modeling or relationship modeling.
- Query Language: MongoDB uses a custom query language called MongoDB Query Language (MQL), which is inspired by SQL but with some differences to match the document-oriented data model. MQL supports a variety of query operations, including filtering, grouping, and aggregation.
- Sharding: MongoDB supports sharding, which is the process of dividing a large database into smaller parts and distributing the parts across multiple servers. Sharding is performed at the collection level, allowing for fine-grained control over data placement. MongoDB uses a config server to store metadata about the cluster, including information about the shard key and shard distribution.
- Replication: MongoDB provides automatic replication, allowing for data to be automatically synchronized between multiple servers for high availability and disaster recovery. Replication is performed using a replica set, where one server is designated as the primary and the others as secondary members. Secondary members can take over as the primary in case of a failure, providing automatic fail recovery.
ScyllaDB NoSQL
- Data Model: ScyllaDB uses a wide column-family data model, which is similar to Apache Cassandra. Data is organized into columns and rows, with each column having its own value. This model is designed to handle large amounts of data with high write and read performance.
- Query Language: ScyllaDB uses the Cassandra Query Language (CQL), which is similar to SQL but with some differences to match the wide column-family data model. CQL supports a variety of query operations, including filtering, grouping, and aggregation.
- Sharding: ScyllaDB uses sharding, which is the process of dividing a large database into smaller parts and distributing the parts across multiple nodes [and down to individual cores]. The sharding is performed automatically, allowing for seamless scaling as the data grows. ScyllaDB uses a consistent hashing algorithm to distribute data across the nodes [and cores], ensuring an even distribution of data and load balancing.
- Replication: ScyllaDB provides automatic replication, allowing for data to be automatically synchronized between multiple nodes for high availability and disaster recovery. Replication is performed using a replicated database cluster, where each node has a copy of the data. The replication factor can be configured, allowing for control over the number of copies of the data stored in the cluster.
PostgreSQL
- Data Model: PostgreSQL uses a relational data model, which organizes data into tables with rows and columns. The relational model provides strong support for data consistency and integrity through constraints and transactions.
- Query Language: PostgreSQL uses the Structured Query Language (SQL), which is the standard language for interacting with relational databases. SQL supports a wide range of query operations, including filtering, grouping, and aggregation.
- Sharding: PostgreSQL does not natively support sharding, but it can be achieved through extensions and third-party tools. Sharding in PostgreSQL can be performed at the database, table, or even row level, allowing for fine-grained control over data placement.
- Replication: PostgreSQL provides synchronous and asynchronous replication, allowing data to be synchronized between multiple servers for high availability and disaster recovery. Replication can be performed using a variety of methods, including streaming replication, logical replication, and file-based replication.
What were our conclusions of the benchmark?
In terms of performance, ScyllaDB is optimized for high performance and low latency, using a shared-nothing architecture and multi-threading to provide high throughput and low latencies.
MongoDB is optimized for ease of use and flexibility, offering a more accessible and developer-friendly experience and has a huge community to help with future issues.
PostgreSQL, on the other hand, is optimized for data integrity and consistency, with a strong emphasis on transactional consistency and ACID (Atomicity, Consistency, Isolation, Durability) compliance. It is a popular choice for applications that require strong data reliability and security. It also supports various data types and advanced features such as stored procedures, triggers, and views.
When choosing between PostgreSQL, MongoDB and ScyllaDB, it is essential to consider your specific use case and requirements. If you need a powerful and reliable relational database with advanced data management features, then PostgreSQL may be the better choice. However, if you need a flexible and easy-to-use NoSQL database with a large ecosystem, then MongoDB may be the better choice.
But we were looking for something really specific: a highly scalable, and high-performance NoSQL database. The answer was simple: ScyllaDB is a better fit for our use case.
MongoDB vs ScyllaDB vs PostgreSQL Performance Compared
After the research process, our team was skeptical about making a choice that would shape the future of our product using just written information. We started digging to be sure about our decision in practical terms.
First, we built an environment to replicate our data acquisition pipeline, but we did it aggressively. We created a script to simulate a data flow bigger than the current one. At the time, our throughput was around 16,000 operations per second, and we tested the database with 160,000 operations per second (so basically 10x).
To be sure, we also tested the write and read response times for different formats and data structures; some were similar to the ones we were already using at the time.
You can see below our results with the new optimal configuration using ScyllaDB and the configuration using that we had with MongoDB (our old setup) applying the tests mentioned above:
MongoDB vs ScyllaDB P90 Latency (Lower is Better)
MongoDB vs ScyllaDB Request Rate / Throughput (Higher is Better)
The results were overwhelming! With similar infrastructure costs, we achieved much better latency and capacity; the decision was clear and validated! We had ahead of ourselves a massive database migration.
Migrating from MongoDB to ScyllaDB NoSQL
As soon as we decided to start the implementation, we faced real-world difficulties. Some things are important to mention.
In this migration, we added new information and formats, affecting all production services that consume this data directly or indirectly. They would have to be refactored by adding adapters in the pipeline or recreating part of the processing and manipulation logic.
During the migration journey, both services and databases had to be duplicated, since it is not possible to use an outage event to swap between old and new versions to validate our pipeline. It’s part of the issues that you have to deal with in critical real-time systems: an outage is never permitted, even if you are fixing or updating the system.
The reconstruction process should go through the Data Science models, so that they could take advantage of the new format, increasing accuracy and computational performance.
Given these guidelines, we created two groups. One was responsible for administering and maintaining the old database and architecture. The other group performed a massive re-processing of our data lake and refactored the models and services to handle the new architecture.
The complete process, from the designing the structure to the final deployment and swap of the production environment, took six months. During this period, adjustments and significant corrections were necessary. You never know what lessons you’ll learn along the way.
NoSQL Migration Challenges
ScyllaDB can achieve this kind of performance because it is designed to take advantage of high-end hardware and very specific data modeling. The final results were astonishing, but we took some time to achieve them. Hardware has a significant impact on performance. ScyllaDB is optimized for modern multi-core processors and utilizes all available CPU cores to process data. It uses hardware acceleration technologies such as AVX2 and AES-NI; it also depends on the type and speed of storage devices, including SSDs and NVMe drives.
In our early testing, we messed up some hardware configurations, leading to performance degradation. When those problems were fixed, we stumbled upon another problem: the data modeling.
ScyllaDB uses the Cassandra data model, which heavily dictates the performance of your queries. If you make incorrect assumptions about the data structures, queries, or the data volume ( as we did at the beginning), the performance will suffer.
In practice, the first proposed data format ended up exceeding the maximum size recommended for a ScyllaDB partition in some cases, which made the database perform poorly.
Our main difficulty was understanding how to translate our old data modeling to a ScyllaDB performing one. We had to re-structure the data into multiple tables and partitions, sometimes duplicating data to achieve better performance.
Our Lessons Learned Comparing and Migrating NoSQL Databases
In short, we learned three lessons during this process: some came from our successes and others from our mistakes.
In the process of researching and benchmarking the databases, it became clear that many of the specifications and functionalities present in the different databases have specific applications. Your specific use case will dictate the best database for your application. And that truth is only discovered by carrying out practical tests and simulations of the production environment in stressful situations. We invested a lot of time and our choice to use the most appropriate database has paid off.
When starting a large project, it is crucial to be prepared for a change of route in the middle of the journey. If you developed a project that did not change after its conception, you probably didn’t learn anything during the construction process or you didn’t care about the unexpected twists. Planning cannot completely predict all real-world problems, so be ready to adjust your decisions and beliefs along the way.
You shouldn’t be afraid of big changes. Many people were against the changes we were proposing due to the risk it brought and the inconvenience it caused to developers (by changing a tool already owned by the team to a new tool that was completely unknown to the team).
Ultimately, the decision was driven based on its impact on our product improvements and not on our engineering team, even though it was one of the most significant engineering changes we have had to date.
It doesn’t matter what architecture or system you are using. The real concern is whether it will be able to take your product into a bright future or not.
This is, in a nutshell, our journey in building one the bridges the future of TRACTIAN’s product. If you have any questions or comments, feel free to contact us.
How Mongoose Will Bring JSON-Oriented Developers to Apache Cassandra
Apache Cassandra® is becoming the best database for handling JSON documents. If you’re a Cassandra developer who finds that statement provocative, read on. In a previous post, I discussed using data APIs and data modeling to mold Cassandra into a developer experience more idiomatic to the way...ScyllaDB in 2023: Incremental Changes are Just the Tip of the Iceberg
This article was written by tech journalist George Anadiotis.
Is incremental change a bad thing? The answer, as with most things in life, is “it depends.” In the world of technology specifically, the balance between innovation and tried-and-true concepts and solutions seems to have tipped in favor of the former. Or at least, that’s the impression the headlines give. Good thing there’s more to life than headlines.
Innovation does not happen overnight, and is not applied overnight either. In most creative endeavors, teams work relentlessly for long periods until they are ready to share their achievements with the world. Then they go back to their garage and keep working until the next milestone is achieved. If we were to peek in the garage intermittently, we’d probably call what we’d see most of the time “incremental change.”
The ScyllaDB team works with their garage doors up and are not necessarily after making headlines. They believe that incremental change is nothing to shun if it leads to steady progress. Compared to the release of ScyllaDB 5.0 at ScyllaDB Summit 2022, “incremental change” could be the theme of ScyllaDB Summit 2023 in February. But this is just the tip of the iceberg, as there’s more than meets the eye here.
I caught up with ScyllaDB CEO and co-founder Dor Laor to discuss what kept the team busy in 2022, how people are using ScyllaDB, as well as trends and tradeoffs in the world of high- performance compute and storage.
Note: In addition to reading the article, you can hear the complete conversation in this podcast:
Data is Going to the Cloud in Real Time, and so is ScyllaDB
The ScyllaDB team have their ears tuned to what their clients are doing with their data. What I noted in 2022 was that data is going to the cloud in real time, and so is ScyllaDB 5.0. Following up, I wondered whether those trends have kept pace with the way they manifested previously.
The answer, Laor confirmed, is simple: absolutely yes. ScyllaDB Cloud, the company’s database-as-a-service, has been growing over 100% year over year in 2022. In just three years since its introduction in 2019, ScyllaDB Cloud is now the major source of revenue for ScyllaDB, exceeding 50%.
“Everybody needs to have the core database, but the service is the easiest and safest way to consume it. This theme is very strong not just with ScyllaDB, but also across the board with other databases and sometimes beyond databases with other types of infrastructure. It makes lots of sense”, Laor noted.
Similarly, ScyllaDB’s support for real-time updates via its change data capture (CDC)feature is seeing lots of adoption. All CDC events go to a table that can be read like a regular table. Laor noted that this makes CDC easy to use, also in conjunction with the Kafka connector. Furthermore, CDC opens the door to another possibility: using ScyllaDB not just as a database, but also as an alternative to Kafka.
“It’s not that ScyllaDB is a replacement for Kafka. But if you have a database plus Kafka stack, there are cases that instead of queuing stuff in Kafka and pushing them to the database, you can just also do the queuing within the database itself” Laor said.
This is not because Kafka is bad per se. The motivation here is to reduce the number of moving parts in the infrastructure. Palo Alto Networks did this and others are following suit too. Numberly is another example in which ScyllaDB was used to replace both Kafka and Redis.
High Performance and Seamless Migration
Numberly was one of the many use cases presented in ScyllaDB Summit 2023. Others included the likes of Discord, Epic Games, Optimizely, ShareChat and Strava. Browsing through those, two key themes emerge: high performance and migration.
Migration is a typical adoption path for ScyllaDB as Laor shared. Many users come to ScyllaDB from other databases in search of scalability. As ScyllaDB sees a lot of migrations, it offers support for two compatible APIs, one for Cassandra and one for DynamoDB. There are also several migration tools, such as Spark Migrator, scanning the source database and writing to the target database. CDC may also help there.
While each migration has its own intricacies, when organizations like Discord or ShareChat migrate to ScyllaDB, it’s all about scale. Discord migrated trillions of messages. ShareChat migrated dozens of services. Things can get complicated and users will make their own choices. Some users rewrite their stack without keeping API compatibility, or even rewrite parts of their codebase in another programming language like Go or Rust.
Either way, ScyllaDB is used to dealing with this, Laor said. Another thing that ScyllaDB is used to dealing with is delivering high performance. After all, this was the premise it was built on. Epic Games, Optimizely and Strava all presented high- performance use cases. Laor pointed out that as an avid gamer and mountain bike rider, having ScyllaDB being part of the Epic Games stack and powering Strava was gratifying.
Epic Games are the creators of the Unreal game engine. Its evolution reflects the way that modern software has evolved. Back in the day, using the Unreal game engine was as simple as downloading a single binary file. Nowadays, everything is distributed. Epic Games works with game makers by providing a reference architecture and a recommended stack, and makers choose how to consume it. ScyllaDB is used as a distributed cache in this stack, providing fast access over objects stored in AWS S3.
A Sea of Storage, Raft and Serverless
ScyllaDB increasingly is being used as a cache. For Numberly, it happened because ScyllaDB does not need a cache, so that made Redis obsolete. For Epic Games, the need was to add a fast-serving layer on top of S3.
S3 works great, is elastic and economic, but if your application has stringent latency requirements, then you need a cache, Laor pointed out. This is something a lot of people in the industry are aware of, including ScyllaDB engineering. As Laor shared, there is an ongoing R&D effort in ScyllaDB to use S3 for storage too. As he put it:
“S3 is a sea of extremely cheap storage, but it’s also slow. If you can marry the two, S3 and fast storage, then you manage to break the relationship between compute and storage. That gives you lots of benefits, from extreme flexibility to lower TCO.”
This is a key project that ScyllaDB’s R&D is working on these days, but not the only one. Yaniv Kaul, who just joined ScyllaDB as vice president of R&D coming from Red Hat, where he was senior director of engineering, has a lot to keep him busy. The team is growing, and recently ScyllaDB held its R&D Summit bringing everyone together to discuss what’s next.
ScyllaDB comes in two flavors, open source and enterprise. There are not that many differences between the two, primarily security features and a couple of performance and TCO (total cost of ownership)-based features. However, the enterprise version, based on the DBaaS offering, also comes with 2 ½ years of support, and is the one that the DBaaS offering is based on. The current open source version is 5.1 while the current enterprise version is 2022.2.
In the upcoming open source version 5.2, ScyllaDB will have a consistent transactional schema operation based on Raft. In the next release, 5.3, transactional topology changes will also be supported. Metadata strong consistency is essential for sophisticated users who programmatically scale the cluster and Data Definition Language.
In addition, these changes will enable ScyllaDB to pass the Jepsen test with many topology and schema changes. More importantly, this paves the way toward changes in the way ScyllaDB shards data, making it more dynamic and leading to better load balancing.
Many of these efforts come together in the push toward serverless. There is a big dedicated team in ScyllaDB working on Kubernetes, which is already used in the DBaaS offering. This work will also be leveraged in the serverless offering. This is a major direction ScyllaDB is headed toward and a cross-team project. A free trial based on serverless was made available at the ScyllaDB Summit and will become generally available later this year.
P99, Rust and Beyond
Laor does not live in a ScyllaDB-only world. Being someone with a hypervisor and Linux Red Hat background, he appreciates the nuances of P99. P99 latency is the 99th latency percentile. This means 99% of requests will be faster than the given latency number, or that only 1% of the requests will be slower than your P99 latency.
P99 CONF is also the name of an event on all things performance, organized by the ScyllaDB team but certainly not limited to them. P99 CONF is for developers who obsess over P99 percentiles and high-performance, low-latency applications. It’s where developers can get down in the weeds, share their knowledge and focus on performance in real-time.
P99 CONF is not necessarily a ScyllaDB event or a database event. Laor said that participants are encouraged to present databases, including competitors, as well as talk about operating systems development, programming languages, special libraries and more. It’s not even just about performance. It’s also about performance predictability, availability and tradeoffs. In his talk, Laor emphasized that it’s expensive to have a near-perfect system, and not everybody needs a near-perfect system all the time.
Among many distinguished P99 CONF speakers, one who stood out was Bryan Cantrill. Cantrill has stints at Sun Microsystems, Oracle and Joyent and a loyal following among developers and beyond. In his P99 2021 talk, Cantrill shared his experience and thinking on “Rust, Wright’s Law and The Future of Low-Latency Systems.” In it, Cantrill praised Rust as being everything he has come to expect from C and C++ and then some.
ScyllaDB’s original premise was to be a faster implementation of the Cassandra API, written in C++ as opposed to Java. The Seastar library that was developed as part of this effort has gotten a life of its own and is attracting users and contributors far and wide (such as Redpanda and Ceph). Dor weighed in on the “what is the best programming language today” conversation, always a favorite among developers.
Although ScyllaDB has invested heavily in its C++ codebase and perfected it over time, Laor also gives Rust a lot of credit. So much, in fact, that he said it’s quite likely that if they were to start the implementation effort today, they would have done it using Rust. Not so much for performance reasons, but more for the ease of use. In addition, many ScyllaDB users like Discord and Numberly are moving to Rust.
Even though a codebase that has stood the test of time is not something any wise developer would want to get rid of, ScyllaDB is embracing Rust too. Rust is the language of choice for the new generation of ScyllaDB’s drivers. As Laor explained, going forward the core of ScyllaDB’s drivers will be written in Rust. From that, other language-specific versions will be derived. ScyllaDB is also embracing Go and Wasm for specific parts of its codebase.
To come full circle, there’s a lot of incremental change going on. Perhaps if the garage door wasn’t up, and we only got to look at the finished product in carefully orchestrated demos, those changes would stack up more impressions. Apparently, that’s not what matters more for Laor and ScyllaDB.