25 February 2025, 7:25 am by
Datastax Technical HowTo's
With IBM’s planned acquisition of DataStax, we’re entering a new
chapter that promises even more innovation and investment in
Cassandra’s future.
20 February 2025, 4:13 pm by
ScyllaDB
Build a high-performance distributed Rust backend—without
losing the expressiveness and ease of use of Ruby on Rails
and SQL Editor’s note: This post was originally
published on Goran’s
blog. Ruby on Rails
(RoR) is one of the most renowned web frameworks. When combined
with SQL databases, RoR transforms into a powerhouse for developing
back-end (or even full-stack) applications. It resolves numerous
issues out of the box, sometimes without developers even realizing
it. For example, with the right callbacks, complex business logic
for a single API action is automatically wrapped within a
transaction, ensuring ACID (Atomicity, Consistency, Isolation,
Durability) compliance. This removes many potential concerns from
the developer’s plate. Typically, developers only need to define a
functional data model and adhere to the framework’s conventions —
sounds easy, right? However, as with all good things, there are
trade-offs. In this case, it’s performance. While the RoR and RDBMS
combination is exceptional for many applications, it struggles to
provide a suitable solution for large-scale systems. Additionally,
using frameworks like RoR alongside standard relational databases
introduces another pitfall: it becomes easy to develop poor data
models. Why? Simply because SQL databases are highly flexible,
allowing developers to make almost any data model work. We just
utilize more indexing, joins, and preloading to avoid the dreaded
N+1 query problem. We’ve all fallen into this trap at some point.
What if we could build a high-performance, distributed, Rust-based
backend while retaining some of the expressiveness and ease-of-use
found in RoR and SQL? This is where ScyllaDB and
Charybdis ORM come
into play. Before diving into these technologies, it’s essential to
understand the fundamental differences between traditional
Relational Database Management Systems (RDBMS) and ScyllaDB NoSQL.
LSM vs. B+ Tree ScyllaDB, like Cassandra, employs a Log-Structured
Merge Tree (LSM) storage engine, which optimizes write operations
by appending data to in-memory structures called memtables and
periodically flushing them to disk as SSTables. This approach
allows for high write throughput and efficient handling of large
volumes of data. By using a partition key and a hash function,
ScyllaDB can quickly locate the relevant SSTables and memtables,
avoiding global index scans and focusing operations on specific
data segments. However, while LSM trees excel at write-heavy
workloads, they can introduce read amplification since data might
be spread across multiple SSTables. To mitigate this,
ScyllaDB/Cassandra uses Bloom filters and optimized indexing
strategies. Read performance may occasionally be less predictable
compared to B+ trees, especially for certain read patterns.
Traditional SQL Databases: B+ Tree Indexing In contrast,
traditional SQL databases like PostgreSQL and MySQL (InnoDB) use B+
Tree indexing, which provides O(log N) read operations by
traversing the tree from root to leaf nodes to locate specific
rows. This structure is highly effective for read-heavy
applications and supports complex queries, including range scans
and multi-table joins. While B+ trees offer excellent read
performance, write operations are slower compared to LSM trees due
to the need to maintain tree balance, which may involve node splits
and more random I/O operations. Additionally, SQL databases benefit
from sophisticated caching mechanisms that keep frequently accessed
index pages in memory, further enhancing read efficiency.
Horizontal Scalability ScyllaDB/Cassandra: Designed for Seamless
Horizontal Scaling ScyllaDB/Cassandra are inherently built for
horizontal scalability through their shared-nothing architecture.
Each node operates independently, and data is automatically
distributed across the cluster using consistent hashing. This
design ensures that adding more nodes proportionally increases both
storage capacity and compute resources, allowing the system to
handle growing workloads efficiently. The automatic data
distribution and replication mechanisms provide high availability
and fault tolerance, ensuring that the system remains resilient
even if individual nodes fail. Furthermore, ScyllaDB/Cassandra
offer tunable consistency levels, allowing developers to balance
between consistency and availability based on application
requirements. This flexibility is particularly advantageous for
distributed applications that need to maintain performance and
reliability at scale. Traditional SQL Databases: Challenges with
Horizontal Scaling Traditional SQL databases, on the other hand,
were primarily designed for vertical scalability, relying on
enhancing a single server’s resources to manage increased load.
While replication (primary-replica or multi-primary) and sharding
techniques enable horizontal scaling, these approaches often
introduce significant operational complexity. Managing data
distribution, ensuring consistency across replicas, and handling
failovers require careful planning and additional tooling.
Moreover, maintaining ACID properties across a distributed SQL
setup can be resource-intensive, potentially limiting scalability
compared to NoSQL solutions like ScyllaDB/Cassandra. Data Modeling
To harness ScyllaDB’s full potential, there is one fundamental
rule: data modeling should revolve around queries. This means
designing your data structures based on how you plan to access and
query them. At first glance, this might seem obvious, prompting the
question: Aren’t we already doing this with traditional RDBMSs? Not
entirely. The flexibility of SQL databases allows developers to
make nearly any data model work by leveraging joins, indexes, and
preloading techniques. This often masks underlying inefficiencies,
making it easy to overlook suboptimal data designs. In contrast,
ScyllaDB requires a more deliberate approach. You must carefully
select
partition and clustering keys to ensure that queries are scoped
to single partitions and data is ordered optimally. This eliminates
the need for extensive indexing and complex joins, allowing
ScyllaDB’s Log-Structured Merge (LSM) engine to deliver high
performance. While this approach demands more upfront effort, it
leads to more efficient and scalable data models. To be fair, it
also means that, as a rule, you usually have to provide more
information to locate the desired data. Although this can initially
appear challenging, the more you work with it, the more you
naturally develop the intuition needed to create optimal models.
Charybdis Now that we have grasped the fundamentals of data
modeling in ScyllaDB, we can turn our attention to
Charybdis. Charybdis
is a Rust ORM built on top of the
ScyllaDB Rust
Driver, focusing on ease of use and performance. Out of the
box, it generates nearly all available queries for a model and
provides helpers for custom queries. It also supports automatic
migrations, allowing you to run commands to migrate the database
structure based on differences between model definitions in your
code and the database. Additionally, Charybdis supports partial
models, enabling developers to work seamlessly with subsets of
model fields while implementing all traits and functionalities that
are present in the main model. Sample User Model Note: We will
always query a user by
id
, so we simply added
id
as the partition key, leaving the clustering key
empty. Installing and Running Migrations First, install the
migration tool:
cargo install charybdis-migrate
Within
your
src/
directory, run the migration:
migrate
--hosts <host> --keyspace <your_keyspace>
--drop-and-replace (optional)
This command will create the
users
table with fields defined in your model. Note
that for migrations to work, you need to use types or aliases
defined within
charybdis::types::*
. Basic Queries for
the User Model Sample Models for a Reddit-Like Application In a
Reddit-like application, we have communities that have posts, and
posts have comments. Note that the following sample is available
within the
Charybdis examples repository. Community Model Post Model
Actix-Web Services for Posts Note: The
insert_cb
method triggers the
before_insert
callback within our
trait, assigning a new
id
and
created_at
to the post. Retrieving All Posts for a Community Updating a Post’s
Description To avoid potential inconsistency issues, such as
concurrent requests to update a post’s description and other
fields, we use the automatically generated
partial_<model>! macro. The
partial_post!
is
automatically generated by the
charybdis_model
macro.
The first argument is the new struct name of the partial model, and
the others are a subset of the main model fields that we want to
work with. In this case,
UpdateDescriptionPost
behaves
just like the standard
Post
model but operates on
a subset of model fields. For each partial model, we must provide
the complete primary key, and the main model must implement the
Default
trait. Additionally, all traits on the main
model that are defined below the
charybdis_model
will
automatically be included for all partial models. Now we can have
an Actix service dedicated to updating a post’s description: Note:
To update a post, you must provide all components of the primary
key (
community_id
,
created_at, id
). Final
Notes A full working sample is available within the
Charybdis examples repository. Note: We defined our models
somewhat differently than in typical SQL scenarios by using three
columns to define the primary key. This is because, in designing
models, we also determine where and how data will be stored for
querying and data transformation. ACID Compliance Considerations
While ScyllaDB offers exceptional performance and seamless
horizontal scalability for many applications, it is not suitable
for scenarios where ACID (Atomicity, Consistency, Isolation,
Durability) properties are required. Sample Use Cases Requiring
ACID Integrity
Bank Transactions: Ensuring that
fund transfers are processed atomically to prevent discrepancies
and maintain financial accuracy.
Seat
Reservations: Guaranteeing that seat allocations in
airline bookings or event ticketing systems are handled without
double-booking.
Inventory Management: Maintaining
accurate stock levels in e-commerce platforms to avoid overselling
items. For some critical applications, the lack of inherent ACID
guarantees in ScyllaDB means that developers must implement
additional safeguards to ensure data integrity. In cases where
absolute transactional reliability is non-negotiable, integrating
ScyllaDB with a traditional RDBMS that provides full ACID
compliance might be necessary. In upcoming articles, we will
explore how to handle additional scenarios and how to leverage
eventual consistency effectively for the majority of your web
application, as well as strategies for maintaining strong
consistency when required by your data models in ScyllaDB.
18 February 2025, 1:39 pm by
ScyllaDB
Monster SCALE Summit speakers have amassed a rather
impressive list of publications, including quite a few books. This
blog highlights 10+ of them. If you’ve seen the
Monster
SCALE Summit agenda, you know that the stars have aligned
nicely. In just two half days, from anywhere you like, you can
learn from 60+ outstanding speakers – all exploring extreme scale
engineering challenges from a variety of angles. Distributed
databases, event streaming, AI/ML, Kubernetes, Rust…it’s all on the
agenda. If you read the bios of our speakers, you’ll note that many
have written books. This blog highlights eleven of those Monster
SCALE Summit speakers’ books – plus two new books by past
conference speakers. Once you register for the conference (it’s
free + virtual), you’ll gain 30-day full access to the complete
O’Reilly library (thanks to O’Reilly, a conference media sponsor).
And Manning Publications is also a media sponsor. They are offering
the Monster SCALE community a nice 50% discount on all Manning
books . One more bonus: conference attendees who participate in the
speaker chat will be eligible to win book bundles, courtesy of
Manning.
See the agenda and
register – it’s free Designing Data-Intensive Applications, 2nd
Edition
By Martin Kleppmann and Chris Riccomini
O’Reilly ETA: December 2025 Data is at the center of many
challenges in system design today. Difficult issues such as
scalability, consistency, reliability, efficiency, and
maintainability need to be resolved. In addition, there’s an
overwhelming variety of tools and analytical systems, including
relational databases, NoSQL datastores, plus data warehouses and
data lakes. What are the right choices for your application? How do
you make sense of all these buzzwords? In this second edition,
authors Martin Kleppmann and Chris Riccomini build on the
foundation laid in the acclaimed first edition, integrating new
technologies and emerging trends. You’ll be guided through the maze
of decisions and trade-offs involved in building a modern data
system, from choosing the right tools like Spark and Flink to
understanding the intricacies of data laws like the GDPR. Peer
under the hood of the systems you already use, and learn to use
them more effectively Make informed decisions by identifying the
strengths and weaknesses of different tools Navigate the trade-offs
around consistency, scalability, fault tolerance, and complexity
Understand the distributed systems research upon which modern
databases are built Peek behind the scenes of major online
services, and learn from their architectures
Martin and Chris
are presenting “Designing Data-Intensive Applications in 2025”
Think Distributed Systems
Dominik Tornow
ETA: Fall 2025
Manning
(use code SCALE2025 for 50% off) All modern software is
distributed. Let’s say that again—all modern software is
distributed. Whether you’re building mobile utilities,
microservices, or massive cloud native enterprise applications,
creating efficient distributed systems requires you to think
differently about failure, performance, network services, resource
usage, latency, and much more. This clearly-written book guides you
into the mindset you’ll need to design, develop, and deploy
scalable and reliable distributed systems. In Think Distributed
Systems you’ll find a beautifully illustrated collection of mental
models for: Correctness, scalability, and reliability Failure
tolerance, detection, and mitigation Message processing
Partitioning and replication Consensus
Dominik is presenting
“The Mechanics of Scale” Latency: Reduce Delay in
Software Systems
Pekka Enberg ETA: Summer 2025
Manning (use
code SCALE2025 for 50% off) Slow responses can kill good software.
Whether it’s recovering microseconds lost while routing messages on
a server or speeding up page loads that keep users waiting, finding
and fixing latency can be a frustrating part of your work as a
developer. This one-of-a-kind book shows you how to spot,
understand, and respond to latency wherever it appears in your
applications and infrastructure. This book balances theory with
practical implementations, turning academic research into useful
techniques you can apply to your projects. In Latency you’ll learn:
What latency is—and what it is not How to model and measure latency
Organizing your application data for low latency Making your code
run faster Hiding latency when you can’t reduce it
Pekka
presented “Patterns
of Low Latency” at P99 CONF 2024. And his Turso co-founder
Glauber Costa will be presenting “Who Needs One Database Anyway?”
at Monster SCALE Summit Writing for Developers: Blogs
That Get Read
By Piotr Sarna and Cynthia Dunlop
January 2025
Amazon |
Manning (use code SCALE2025 for 50% off) This book is a
practical guide to writing more compelling engineering blog posts.
We discuss strategies for nailing all phases of the technical
blogging process. And we have quite a bit of fun exploring the core
blog post patterns that are most common across engineering blogs
today, like “The Bug Hunt,” “How We Built It,” “Lessons Learned,”
“We Rewrote It in X,” “Thoughts on Trends,” etc. Each “pattern”
chapter includes an analysis of real-world examples as well as
specific dos/don’ts for that particular pattern. There’s a section
on moving from blogging into opportunities such as article writing,
conference speaking, and book writing. Finally, we wrap with a
critical (and often amusing) look at generative AI blogging uses
and abuses. Oh…and there’s also a foreword by Bryan Cantrill and an
afterword by Scott Hanselman! Readers will learn how to: Pinpoint
topics that make intriguing posts Apply popular blog post design
patterns Rapidly plan, draft, and optimize blog posts Make your
content clearer and more convincing to technical readers Tap AI for
revision while avoiding misuses and abuses Increase the impact of
all your technical communications
Piotr is presenting “A Dist
Sys Programmer’s Journey Into AI” ScyllaDB in Action
Bo Ingram October 2024
Amazon |
Manning
(use code SCALE2025 for 50% off) |
ScyllaDB
(free chapters) ScyllaDB in Action is your guide to everything you
need to know about ScyllaDB, from your very first queries to
running it in a production environment. It starts you with the
basics of creating, reading, and deleting data and expands your
knowledge from there. You’ll soon have mastered everything you need
to build, maintain, and run an effective and efficient database.
This book teaches you ScyllaDB the best way—through hands-on
examples. Dive into the node-based architecture of ScyllaDB to
understand how its distributed systems work, how you can
troubleshoot problems, and how you can constantly improve
performance.You’ll learn how to: • Read, write, and delete data in
ScyllaDB • Design database schemas for ScyllaDB • Write performant
queries against ScyllaDB • Connect and query a ScyllaDB cluster
from an application • Configure, monitor, and operate ScyllaDB in
production
Bo’s colleagues Ethan Donowitz and Vicki Niu are
both presenting at Monster SCALE Summit Data
Virtualization in the Cloud Era
Dr. Daniel Abadi and Andrew
Mott July 2024
O’Reilly Data virtualization had been held back by complexity
for decades until recent advances in cloud technology, data lakes,
networking hardware, and machine learning transformed the dream
into reality. It’s becoming increasingly practical to access data
through an interface that hides low-level details about where it’s
stored, how it’s organized, and which systems are needed to
manipulate or process it. You can combine and query data from
anywhere and leave the complex details behind. In this practical
book, authors Dr. Daniel Abadi and Andrew Mott discuss in detail
what data virtualization is and the trends in technology that are
making data virtualization increasingly useful. With this book,
data engineers, data architects, and data scientists will explore
the architecture of modern data virtualization systems and learn
how these systems differ from one another at technical and
practical levels. By the end of the book, you’ll understand: The
architecture of data virtualization systems Technical and practical
ways that data virtualization systems differ from one another Where
data virtualization fits into modern data mesh and data fabric
paradigms Modern best practices and case study use cases
Daniel
is presenting “Two Leading Approaches to Data Virtualization: Which
Scales Better?” Bonus: Read Daniel Abadi’s
article on
the PACELC theorem. Database Performance at Scale
By Felipe Cardeneti Mendes, Piotr Sarna, Pavel Emelyanov,
and Cynthia Dunlop October 2023
Amazon |
ScyllaDB
(free) Discover critical considerations and best practices for
improving database performance based on what has worked, and
failed, across thousands of teams and use cases in the field. This
book provides practical guidance for understanding the
database-related opportunities, trade-offs, and traps you might
encounter while trying to optimize data-intensive applications for
high throughput and low latency. Whether you’re building a new
system from the ground up or trying to optimize an existing use
case for increased demand, this book covers the essentials. The
ultimate goal of the book is to help you discover new ways to
optimize database performance for your team’s specific use cases,
requirements, and expectations. Understand often overlooked factors
that impact database performance at scale Recognize data-related
performance and scalability challenges associated with your project
Select a database architecture that’s suited to your workloads, use
cases, and requirements Avoid common mistakes that could impede
your long-term agility and growth Jumpstart teamwide adoption of
best practices for optimizing database performance at scale
Felipe is presenting “ScyllaDB is No Longer “Just a Faster
Cassandra” Piotr is presenting “A Dist Sys Programmer’s
Journey Into AI” Algorithms and Data Structures for
Massive Datasets
Dzejla Medjedovic, Emin Tahirovic, and
Ines Dedovic May 2022
Amazon |
Manning (use code SCALE2025 for 50% off) Algorithms and Data
Structures for Massive Datasets reveals a toolbox of new methods
that are perfect for handling modern big data applications. You’ll
explore the novel data structures and algorithms that underpin
Google, Facebook, and other enterprise applications that work with
truly massive amounts of data. These effective techniques can be
applied to any discipline, from finance to text analysis. Graphics,
illustrations, and hands-on industry examples make complex ideas
practical to implement in your projects—and there’s no mathematical
proofs to puzzle over. Work through this one-of-a-kind guide, and
you’ll find the sweet spot of saving space without sacrificing your
data’s accuracy. Readers will learn: Probabilistic sketching data
structures for practical problems Choosing the right database
engine for your application Evaluating and designing efficient
on-disk data structures and algorithms Understanding the
algorithmic trade-offs involved in massive-scale systems Deriving
basic statistics from streaming data Correctly sampling streaming
data Computing percentiles with limited space resources
Dzejla
is presenting “Read- and Write-Optimization in Modern Database
Infrastructures” Kafka: The Definitive Guide, 2nd
Edition
By Gwen Shapira, Todd Palino, Rajini Sivaram, Krit
Petty November 2021
Amazon |
O’Reilly Engineers from Confluent and LinkedIn responsible for
developing Kafka explain how to deploy production Kafka clusters,
write reliable event-driven microservices, and build scalable
stream processing applications with this platform. Through detailed
examples, you’ll learn Kafka’s design principles, reliability
guarantees, key APIs, and architecture details, including the
replication protocol, the controller, and the storage layer. You’ll
learn: Best practices for deploying and configuring Kafka Kafka
producers and consumers for writing and reading messages Patterns
and use-case requirements to ensure reliable data delivery Best
practices for building data pipelines and applications with Kafka
How to perform monitoring, tuning, and maintenance tasks with Kafka
in production The most critical metrics among Kafka’s operational
measurements Kafka’s delivery capabilities for stream processing
systems
Gwen is presenting “The Nile Approach: Re-engineering
Postgres for Millions of Tenants” The Missing README: A
Guide for the New Software Engineer
by Chris Riccomini and
Dmitriy Ryaboy
Amazon |
O’Reilly August 2021 For new software engineers, knowing how to
program is only half the battle. You’ll quickly find that many of
the skills and processes key to your success are not taught in any
school or bootcamp. The Missing README fills in that gap—a
distillation of workplace lessons, best practices, and engineering
fundamentals that the authors have taught rookie developers at top
companies for more than a decade. Early chapters explain what to
expect when you begin your career at a company. The book’s middle
section expands your technical education, teaching you how to work
with existing codebases, address and prevent technical debt, write
production-grade software, manage dependencies, test effectively,
do code reviews, safely deploy software, design evolvable
architectures, and handle incidents when you’re on-call. Additional
chapters cover planning and interpersonal skills such as Agile
planning, working effectively with your manager, and growing to
senior levels and beyond. You’ll learn: How to use the legacy code
change algorithm, and leave code cleaner than you found it How to
write operable code with logging, metrics, configuration, and
defensive programming How to write deterministic tests, submit code
reviews, and give feedback on other people’s code The technical
design process, including experiments, problem definition,
documentation, and collaboration What to do when you are on-call,
and how to navigate production incidents Architectural techniques
that make code change easier Agile development practices like
sprint planning, stand-ups, and retrospectives
Chris and Martin
Kleppmann are presenting “Designing Data-Intensive Applications in
2025” The DynamoDB Book
By Alex Debrie
April 2020
Amazon |
Direct
DynamoDB is a highly available, infinitely scalable NoSQL database
offering from AWS. But modeling with a NoSQL database like DynamoDB
is different than modeling with a relational database. You need to
intentionally design for your access patterns rather than creating
a normalized model that allows for flexible querying later. The
DynamoDB Book is the authoritative resource in the space, and it’s
the recommended resource within Amazon for learning DynamoDB. Rick
Houlihan, the former head of the NoSQL Blackbelt team at AWS, said
The DynamoDB Book is “definitely a must read if you want to
understand how to correctly model data for NoSQL apps.” The
DynamoDB takes a comprehensive approach to teaching DynamoDB,
including: Discussion of key concepts, underlying infrastructure
components, and API design; Explanations of core strategies for
data modeling, including one-to-many and many-to-many
relationships, filtering, sorting, aggregations, and more; 5 full
walkthrough examples featuring complex data models and a large
number of access patterns.
Alex is presenting “DynamoDB Cost
Optimization Considerations and Strategies” RESTful Java
Patterns and Best Practices: Learn Best Practices to Efficiently
Build Scalable, Reliable, and Maintainable High Performance Restful
Services
By Bhakti Mehta
Amazon September, 2014 This book provides an overview of the
REST architectural style and then dives deep into best practices
and commonly used patterns for building RESTful services that are
lightweight, scalable, reliable, and highly available. It’s
designed to help application developers get familiar with REST. The
book explores the details, best practices, and commonly used REST
patterns as well as gives insights on how Facebook, Twitter,
PayPal, GitHub, Stripe, and other companies are implementing
solutions with RESTful services.
13 February 2025, 4:55 pm by
ScyllaDB
How strategic database migration + data (re)modeling
improved latencies and cut database costs 5X ZEE is
India’s largest media and entertainment business, covering
broadcast TV, films, streaming media, and music. ZEE5 is
their premier OTT streaming service, available in over 190
countries with ~150M monthly active users. And every user’s
playback experience, security, and recommendations rely upon a
“heartbeat API” that processes a whopping 100B+ heartbeats per day.
The engineers behind the system knew that continued business growth
would stress their infrastructure (as well as the people reviewing
the database bills). So, the team decided to rethink the system
before it inflicted any heart attacks. TL;DR, they designed a
system that’s loved internally and by users. And Jivesh Threja
(Tech Lead) and Srinivas Shanmugam (Principal Architect) joined us
on Valentine’s Day last year to share their experiences. They
outlined the technical requirements for the replacement (cloud
neutrality, multi-tenant readiness, simplicity of onboarding new
use cases, and high throughput and low latency at optimal costs)
and how that led to ScyllaDB. Then, they explained how they
achieved their goals through a new stream processing pipeline, new
API layer, and data (re)modeling. The initial results of their
optimization:
5X cost savings (from $744K to $144K
annually) and single-digit millisecond P99 read latency.
Wrapping up, they shared lessons learned that could benefit anyone
considering or using ScyllaDB. Here are some highlights from that
talk… What’s a Heartbeat? “Heartbeat” refers to a request that’s
fired at regular intervals during video playback on the ZEE5 OTT
platform. These simple requests track what users are watching and
how far they’ve progressed in each video. They’re essential for
ZEE5’s “continue watching” functionality, which lets users pause
content on one device then resume it on any device. They’re also
instrumental for calculating key metrics, like concurrent
viewership for a big event or the top shows this week. Why Change?
ZEE5’s original heartbeat system was a web of different databases,
each handling a specific part of the streaming experience. Although
it was technically functional, this approach was expensive and
locked them into a specific vendor ecosystem. The team recognized
an opportunity to streamline their infrastructure– and they went
for it. They wanted a system that wasn’t locked into any particular
cloud provider, would cost less to operate, and could handle their
massive scale with consistently fast performance – specifically,
single-digit millisecond responses. Plus, they wanted the
flexibility to add new features easily and the ability to offer
their system to other streaming platforms. As Srinivas put it: “It
needed to be multi-tenant ready so it could be reused for any OTT
provider. And it needed to be easily extensible to new use cases
without major architectural changes.” System Architecture, Before
and After Here’s a look at their original system architecture with
multiple databases: DynamoDB to store the basic heartbeat data
Amazon RDS to store next and previous episode information Apache
Solr to store persistent metadata One Redis instance to cache
metadata Another Redis instance to store viewership details
Click for a detailed view The ZEE5 team considered
four main database options for this project: Redis, Cassandra,
Apache Ignite, and ScyllaDB. After evaluation and benchmarking,
they chose ScyllaDB. Some of the reasons Srinivas cited for this
decision: “We don’t need an extra cache layer on top of the
persistent database. ScyllaDB manages both the cache layer and the
persistent database within the same infrastructure, ensuring low
latency across regions, replication, and multi-cloud readiness. It
works with any cloud vendor, including Azure, AWS, and GCP, and now
offers managed support with a turnaround time of less than one
hour.” The new architecture simplifies and flattens the previous
system architecture structure.
Click for a detailed view Now, all heartbeat events
are pushed into their heartbeat topic, processed through stream
processing, and ingested into ScyllaDB Cloud using ScyllaDB
connectors. Whenever content is published, it’s ingested into their
metadata topic and then inserted into ScyllaDB Cloud via metadata
connectors. Srinivas concludes: “With this new architecture,
we successfully migrated workloads from DynamoDB, RDS, Redis, and
Solr to ScyllaDB. This has resulted in a
5x cost reduction,
bringing our monthly expenses down from $62,000 to around
$12,000.” Deeper into the Design Next Jivesh shared more
about their low-level design… Real-time stream processing pipeline
In the real-time stream processing pipeline, heartbeats are sent to
ScyllaDB at regular intervals. The heartbeat interval is set to 60
seconds, meaning that every frontend client sends a heartbeat every
60 seconds while a user is watching a video. These heartbeats pass
through the playback stream processing system, business logic
consumers transform that data into the required format – then the
processed data is stored in ScyllaDB. Scalable API layer The first
component in the scalable API layer is the heartbeat service, which
is responsible for handling large volumes of data ingestion. Topics
process the data, then it passes through a connector service and is
stored in ScyllaDB. Another notable API layer service is the
Concurrent Viewership Count service. This service uses ScyllaDB to
retrieve concurrent viewership data – either per user or per asset
(e.g., per ID). For example, if a movie is released, this service
can tell how many users are watching the movie at any given moment.
Metadata management use case One of the first major challenges ZEE5
faced was managing metadata for their massive OTT platform.
Initially, they relied on a combination of three different
databases – Solr, Redis, and Postgres – to handle their extensive
metadata needs. Looking to optimize and simplify, they redesigned
their data model to work with ScyllaDB instead – using ID as the
partition key, along with materialized views. Here’s a look at
their metadata model:
create keyspace.meta_data ( id text,
title text, show_id text, …, …, PRIMARY KEY((id),show_id) ) with
compaction = {‘class’: ‘LeveledCompactionStrategy’ };
In
this model, the ID serves as the partition key. Since this table
experiences relatively few writes (a write occurs only when a new
asset is released) but significantly more reads, they used Leveled
Compaction Strategy to optimize performance. And, according to
Jivesh, “Choosing the right partition and clustering keys helped us
get a single-digit millisecond latency.” Viewership count use case
Viewership Count is another use case that they moved to ScyllaDB.
Viewership count can be tracked per user or per asset ID. ZEE5
decided to design a table where the user ID served as the partition
key and the asset ID as the sort key – allowing viewership data to
be efficiently queried. They set ScyllaDB’s TTL to match the
60-second heartbeat interval, ensuring that data automatically
expires after the designated time. Additionally, they used
ScyllaDB’s Time-Window Compaction Strategy to efficiently manage
data in memory, clearing expired records based on the configured
TTL. Jivesh explained, “This table is continuously updated with
heartbeats from every front end and every user. As heartbeats
arrive, viewership counts are tracked in real time and
automatically cleared when the TTL expires. That lets us
efficiently retrieve live viewership data using ScyllaDB.” Here’s
their viewership count data model:
CREATE TABLE
keyspace.USER_SESSION_STREAM ( USER_ID text, DEVICE_ID text,
ASSET_ID text, TITLE text, …, PRIMARY KEY((USER_ID), ASSET_ID) )
WITH default_time_to_live = 60 and compaction = { 'class' :
'TimeWindowCompactionStrategy' };
ScyllaDB Results and
Lessons Learned The following load test report shows a throughput
of 41.7K requests per second. This benchmark was conducted during
the database selection process to evaluate performance under high
load. Jivesh remarked, “Even with such a high throughput, we could
achieve a microsecond write latency and average microsecond read
latency. This really gave us a clear view of what ScyllaDB could do
– and that helped us decide.” He then continued to share some facts
that shed light on the scale of ZEE5’s ScyllaDB deployment: “We
have around 9TB on ScyllaDB. Even with such a large volume of data,
it’s able to handle latencies within microseconds and a
single-digit millisecond, which is quite tremendous. We have a
daily peak concurrent viewership count of 1 million. Every second,
we are writing so much data into ScyllaDB and getting so much data
out of it We process more than 100 billion heartbeats in a day.
That’s quite huge.” The talk wrapped with the following lessons
learned: Data modeling is the single most critical factor in
achieving single-digit millisecond latencies. Choose the right
quorum setting and compaction strategy. For example, does a
heartbeat need to be written to every node before it can be read,
or is a local quorum sufficient? Selecting the right quorum ensures
the best balance between latency and SLA requirements. Choose
Partition and Clustering Keys wisely – it’s not easy to modify them
later. Use Materialized Views for faster lookups and avoid filter
queries. Querying across partitions can degrade performance. Use
prepared statements to improve efficiency. Use asynchronous queries
for faster query processing. For instance, in the metadata model,
20 synchronous queries were executed in parallel, and ScyllaDB
handled them within milliseconds. Zone-aware ScyllaDB clients help
reduce cross-AZ (Availability Zone) network costs. Fetching data
within the same AZ minimizes latency and significantly reduces
network expenses.
12 February 2025, 7:18 pm by
ScyllaDB
Given that we’re hosting
Monster SCALE Summit…with
tech talks on
extreme-scale engineering…many of
which feature our
monstrously fast and scalable
database, a big announcement is probably expected? We hope
this meets your super-sized expectations. Monster SCALE Summit 2025
will be featuring 60+ tech talks including: Just-added keynotes by
Kelsey Hightower and Rachel Stephens + Adam Jacob Previously-teased
keynotes by Avi Kivity, Martin Kleppmann + Chris Riccomini, Gwen
Shapira, and Dor Laor Engineering talks by gamechangers like Uber,
Slack, Canva, Atlassian, Wise, and Booking.com 14 talks by ScyllaDB
users such as Medium, Disney+, ShareChat, Yieldmo, Clearview AI,
and more – plus two talks by Discord The latest from ScyllaDB
engineering: including object storage, vector search, and “ScyllaDB
X Cloud” Like other ScyllaDB-hosted conferences (e.g.,
P99 CONF), this conference will be
free and virtual so that everyone can participate.
See the agenda and
register – it’s free Mark your calendar for March 11 and 12
because – in addition to all those great talks – you can… Chat
directly with speakers and connect with ~20K of your peers
Participate in some monster scale global distributed system
challenges – with prizes for winners, of course Learn from
ScyllaDB’s top experts, who are eager to answer your toughest
database performance questions in our lively lounge – and preparing
special interactive training courses for the occasion Win
conference swag, sea monster plushies, book bundles, and other cool
giveaways It’s a lot. But hey, it’s Monster SCALE Summit. 🙂
Details, Details Beyond what’s on the agenda, here’s some
additional detail on a few recently-added sessions (see more in our
“tiny peek” blog post) How Discord Performs Database Upgrades
at Scale
Ethan Donowitz, Senior Software Engineer,
Persistence Infrastructure at Discord Database upgrades
are high-risk but high-reward. Upgrading to a newer version can
make your database faster, cheaper, and more reliable; however,
without thorough planning and testing, upgrades can be risky.
Because databases are stateful, it is often not possible to roll
back if you encounter problems after the upgrade due to backwards
incompatible changes across versions. While new versions typically
mean improved query latencies, changes in query planning or cache
behavior across versions can cause unexpected differences in
performance in places one might not expect. Discord relies on
ScyllaDB to serve millions of reads per second across many
clusters, so we needed a comprehensive strategy to sufficiently
de-risk upgrades to avoid impact to our users. To accomplish this,
we use what we call “shadow clusters.” A shadow cluster contains
roughly the same data as its corresponding cluster in production,
and traffic to the primary cluster is mirrored to the shadow
cluster. Running a real production workload on a shadow cluster can
expose differences in performance and resource usage across
versions. When mirroring reads, we also have the ability to perform
“read validations,” where the results for a query issued to the
primary cluster and the shadow cluster are checked for equality.
This gives us confidence that data has not been corrup How Discord
Indexes Trillions of Messages: Scaling Search Infrastructure
Vicki Niu, Senior Software Engineer at Discord
When Discord first built messages search in 2017, we designed our
infrastructure to handle billions of messages sent by millions of
users. As our platform grew to trillions of messages, our search
system failed to keep up. We thus set out to rebuild our message
search platform to meet these new scaling needs using our learnings
and some new technologies. This talk will share how we scaled
Discord’s message search infrastructure using Rust, Kubernetes, and
a multi-cluster Elasticsearch architecture to achieve better
performance, operability, and reliability, while also enabling new
search features for Discord users. ted due to differences in
behavior across versions. Testing with shadow clusters has been
paramount to de-risking complicated upgrades for one of the most
important pieces of infrastructure at Discord. Route It Like It’s
Hot: Scaling Payments Routing at American Express
Benjamin
Cane, Distinguished Engineer at American Express In 2023,
there were over 723 billion credit card transactions. Whenever
someone taps, swipes, dips, or clicks a credit or debit card, a
payment switch ensures the transaction arrives safely and securely
at the correct financial institution.These payment switches are the
backbone of the worldwide payments ecosystem. Join the American
Express Payment Acquiring and Network team as they share their
experiences from building their Global Transaction Router, which is
responsible for switching and routing payments at the scale of
American Express. They will explore how they’ve designed, built,
and operated this Global Transaction Router to perform during
record-breaking shopping holidays, ticket sales, and unexpected
customer behavior. The audience will leave with a deep
understanding of the unique challenges of a payments switch (E.g.,
routing ISO 8583 transactions as fast as possible), some of our
design choices (E.g., using containers and avoiding logging), and a
deep dive into a few implementation challenges (E.g., Inefficient
use of Goroutines and Channels) we found along the way. How Yieldmo
Cut Database Costs and Cloud Dependencies Fast
Todd
Coleman, Chief Architect and Co-founder at Yieldmo
Yieldmo’s business relies on processing hundreds of billions of
daily ad requests with subsecond latency responses. Our services
initially depended on DynamoDB, and we valued its simplicity and
stability. However, DynamoDB costs were becoming unsustainable,
latencies were not ideal, and we sought greater flexibility in
deploying services to other cloud providers. In this session, we’ll
walk you through the various options we considered to address these
challenges and share why and how we ultimately moved forward with
ScyllaDB’s DynamoDB-compatible API.
See more
session details
4 February 2025, 1:19 pm by
ScyllaDB
Let’s focus on the performance-related complexities that
teams commonly face with write-heavy workloads and discuss your
options for tackling them Write-heavy database workloads
bring a distinctly different set of challenges than read-heavy
ones. For example: Scaling writes can be costly, especially if you
pay per operation and writes are 5X more costly than reads Locking
can add delays and reduce throughput I/O bottlenecks can lead to
write amplification and complicate crash recovery Database
backpressure can throttle the incoming load While cost matters –
quite a lot, in many cases – it’s not a topic we want to cover
here. Rather, let’s focus on the performance-related complexities
that teams commonly face and discuss your options for tackling
them. What Do We Mean by “a Real-Time Write Heavy Workload”? First,
let’s clarify what we mean by a “real-time write-heavy” workload.
We’re talking about workloads that: Ingest a large amount of data
(e.g., over 50K OPS) Involve more writes than reads Are bound by
strict latency SLAs (e.g., single-digit millisecond P99 latency) In
the wild, they occur across everything from online gaming to
real-time stock exchanges. A few specific examples:
Internet of Things (IoT) workloads tend to involve
small but frequent append-only writes of time series data. Here,
the ingestion rate is primarily determined by the number of
endpoints collecting data. Think of smart home sensors or
industrial monitoring equipment constantly sending data streams to
be processed and stored.
Logging and Monitoring
systems also deal with frequent data ingestion, but they
don’t have a fixed ingestion rate. They may not necessarily append
only, as well as may be prone to hotspots, such as when one
endpoint misbehaves.
Online Gaming platforms need
to process real-time user interactions, including game state
changes, player actions, and messaging. The workload tends to be
spiky, with sudden surges in activity. They’re extremely latency
sensitive since even small delays can impact the gaming experience.
E-commerce and Retail workloads are typically
update-heavy and often involve batch processing. These systems must
maintain accurate inventory levels, process customer reviews, track
order status, and manage shopping cart operations, which usually
require reading existing data before making updates.
Ad
Tech and Real-time Bidding systems require split-second
decisions. These systems handle complex bid processing, including
impression tracking and auction results, while simultaneously
monitoring user interactions such as clicks and conversions. They
must also detect fraud in real time and manage sophisticated
audience segmentation for targeted advertising.
Real-time
Stock Exchange systems must support high-frequency trading
operations, constant stock price updates, and complex order
matching processes – all while maintaining absolute data
consistency and minimal latency. Next, let’s look at key
architectural and configuration considerations that impact write
performance. Storage Engine Architecture The choice of storage
engine architecture fundamentally impacts write performance in
databases. Two primary approaches exist: LSM trees and B-Trees.
Databases known to handle writes efficiently – such as ScyllaDB,
Apache Cassandra, HBase, and Google BigTable – use Log-Structured
Merge Trees (LSM). This architecture is ideal for handling large
volumes of writes. Since writes are immediately appended to memory,
this allows for very fast initial storage. Once the “memtable” in
memory fills up, the recent writes are flushed to disk in sorted
order. That reduces the need for random I/O. For example, here’s
what the ScyllaDB write path looks like: With B-tree structures,
each write operation requires locating and modifying a node in the
tree – and that involves both sequential and random I/O. As the
dataset grows, the tree can require additional nodes and
rebalancing, leading to more disk I/O, which can impact
performance. B-trees are generally better suited for workloads
involving joins and ad-hoc queries. Payload Size Payload size also
impacts performance. With small payloads, throughput is good but
CPU processing is the primary bottleneck. As the payload size
increases, you get lower overall throughput and disk utilization
also increases. Ultimately, a small write usually fits in all the
buffers and everything can be processed quite quickly. That’s why
it’s easy to get high throughput. For larger payloads, you need to
allocate larger buffers or multiple buffers. The larger the
payloads, the more resources (network and disk) are required to
service those payloads. Compression Disk utilization is something
to watch closely with a write-heavy workload. Although storage is
continuously becoming cheaper, it’s still not free. Compression can
help keep things in check – so choose your compression strategy
wisely. Faster compression speeds are important for write-heavy
workloads, but also consider your available CPU and memory
resources. Be sure to look at the
compression chunk size parameter. Compression basically splits
your data into smaller blocks (or chunks) and then compresses each
block separately. When tuning this setting, realize that larger
chunks are better for reads while smaller ones are better for
writes, and take your payload size into consideration. Compaction
For LSM-based databases, the compaction strategy you select also
influences write performance. Compaction involves merging multiple
SSTables into fewer, more organized files, to optimize read
performance, reclaim disk space, reduce data fragmentation, and
maintain overall system efficiency. When selecting compaction
strategies, you could aim for low read amplification, which makes
reads as efficient as possible. Or, you could aim for low write
amplification by avoiding compaction from being too aggressive. Or,
you could prioritize low space amplification and have compaction
purge data as efficiently as possible. For example, ScyllaDB offers
several compaction strategies (and Cassandra offers similar
ones):
Size-tiered compaction strategy (STCS):
Triggered when the system has enough (four by default) similarly
sized SSTables.
Leveled compaction strategy (LCS):
The system uses small, fixed-size (by default 160 MB) SSTables
distributed across different levels.
Incremental Compaction
Strategy (ICS): Shares the same read and write
amplification factors as STCS, but it fixes its 2x temporary space
amplification issue by breaking huge sstables into SSTable runs,
which are comprised of a sorted set of smaller (1 GB by default),
non-overlapping SSTables.
Time-window compaction strategy
(TWCS): Designed for time series data. For write-heavy
workloads, we warn users to avoid leveled compaction at all costs.
That strategy is designed for read-heavy use cases. Using it can
result in a regrettable 40x write amplification. Batching In
databases like ScyllaDB and Cassandra, batching can actually be a
bit of a trap – especially for write-heavy workloads. If you’re
used to relational databases, batching might seem like a good
option for handling a high volume of writes. But it can actually
slow things down if it’s not done carefully. Mainly, that’s because
large or unstructured batches end up creating a lot of coordination
and network overhead between nodes. However, that’s really not what
you want in a distributed database like ScyllaDB. Here’s how to
think about batching when you’re dealing with heavy writes:
Batch by the Partition Key: Group your writes by
the partition key so the batch goes to a coordinator node that also
owns the data. That way, the coordinator doesn’t have to reach out
to other nodes for extra data. Instead, it just handles its own,
which cuts down on unnecessary network traffic.
Keep
Batches Small and Targeted: Breaking up large batches into
smaller ones by partition keeps things efficient. It avoids
overloading the network and lets each node work on only the data it
owns. You still get the benefits of batching, but without the
overhead that can bog things down.
Stick to Unlogged
Batches: Considering you follow the earlier points, it’s
best to use unlogged batches. Logged batches add extra consistency
checks, which can really slow down the write. So, if you’re in a
write-heavy situation, structure your batches carefully to avoid
the delays that big, cross-node batches can introduce. Wrapping Up
We offered quite a few warnings, but don’t worry. It was easy to
compile a list of lessons learned because so many teams are
extremely successful working with real-time write-heavy workloads.
Now you know many of their secrets, without having to experience
their mistakes. 🙂 If you want to learn more, here are some
firsthand perspectives from teams who tackled quite interesting
write-heavy challenges:
Zillow: Consuming records from multiple data
producers, which resulted in out-of-order writes that could result
in incorrect updates
Tractian: Preparing for 10X growth in high-frequency
data writes from IoT devices
Fanatics: Heavy write operations like handling orders,
shopping carts, and product updates for this online sports retailer
Also, take a look at the following video, where we go into even
greater depth on these write-heavy challenges and also walk you
through what these workloads look like on ScyllaDB.
30 January 2025, 1:28 pm by
ScyllaDB
See the engineering behind real-time personalization at
Tripadvisor’s massive (and rapidly growing) scale What
kind of traveler are you? Tripadvisor tries to assess this as soon
as you engage with the site, then offer you increasingly relevant
information on every click—within a matter of milliseconds. This
personalization is powered by advanced ML models acting on data
that’s stored on ScyllaDB running on AWS. In this article, Dean
Poulin (Tripadvisor Data Engineering Lead on the AI Service and
Products team) provides a look at how they power this
personalization. Dean shares a taste of the technical challenges
involved in delivering real-time personalization at Tripadvisor’s
massive (and rapidly growing) scale. It’s based on the following
AWS re:Invent talk: Pre-Trip Orientation
In Dean’s words …
Let’s start with a quick snapshot of who Tripadvisor is, and the
scale at which we operate. Founded in 2000, Tripadvisor has become
a global leader in travel and hospitality, helping hundreds of
millions of travelers plan their perfect trips. Tripadvisor
generates over $1.8 billion in revenue and is a publicly traded
company on the NASDAQ stock exchange. Today, we have a talented
team of over 2,800 employees driving innovation, and our platform
serves a staggering 400 million unique visitors per month – a
number that’s continuously growing. On any given day, our system
handles more than 2 billion requests from 25 to 50 million users.
Every click you make on Tripadvisor is processed in real time.
Behind that, we’re leveraging machine learning models to deliver
personalized recommendations – getting you closer to that perfect
trip. At the heart of this personalization engine is ScyllaDB
running on AWS. This allows us to deliver millisecond-latency at a
scale that few organizations reach. At peak traffic, we hit around
425K operations per second on ScyllaDB with P99 latencies
for reads and writes around 1-3 milliseconds. I’ll be
sharing how Tripadvisor is harnessing the power of ScyllaDB, AWS,
and real-time machine learning to deliver personalized
recommendations for every user. We’ll explore how we help travelers
discover everything they need to plan their perfect trip: whether
it’s uncovering hidden gems, must-see attractions, unforgettable
experiences, or the best places to stay and dine. This [article] is
about the engineering behind that – how we deliver seamless,
relevant content to users in real time, helping them find exactly
what they’re looking for as quickly as possible. Personalized Trip
Planning Imagine you’re planning a trip. As soon as you land on the
Tripadvisor homepage, Tripadvisor already knows whether you’re a
foodie, an adventurer, or a beach lover – and you’re seeing spot-on
recommendations that seem personalized to your own interests. How
does that happen within milliseconds?
As you browse around Tripadvisor, we start to personalize what
you see using Machine Learning models which calculate scores based
on your current and prior browsing activity. We recommend hotels
and experiences that we think you would be interested in. We sort
hotels based on your personal preferences. We recommend popular
points of interest near the hotel you’re viewing. These are all
tuned based on your own personal preferences and prior browsing
activity. Tripadvisor’s Model Serving Architecture Tripadvisor runs
on hundreds of independently scalable microservices in Kubernetes
on-prem and in Amazon EKS. Our ML Model Serving Platform is exposed
through one of these microservices.
This gateway service abstracts over 100 ML Models from the
Client Services – which lets us run A/B tests to find the best
models using our experimentation platform. The ML Models are
primarily developed by our Data Scientists and Machine Learning
Engineers using Jupyter Notebooks on Kubeflow. They’re managed and
trained using ML Flow, and we deploy them on Seldon Core in
Kubernetes. Our Custom Feature Store provides features to our ML
Models, enabling them to make accurate predictions The Custom
Feature Store The Feature Store primarily serves User Features and
Static Features. Static Features are stored in Redis because they
don’t change very often. We run data pipelines daily to load data
from our offline data warehouse into our Feature Store as Static
Features.
User Features are served in real time through a platform
called Visitor Platform. We execute dynamic CQL queries against
ScyllaDB, and
we do not need a caching layer because
ScyllaDB is so fast. Our Feature Store serves up to 5
million Static Features per second and half a million User Features
per second. What’s an ML Feature? Features are input variables to
the ML Models that are used to make a prediction. There are Static
Features and User Features. Some examples of Static Features are
awards that a restaurant has won or amenities offered by a hotel
(like free Wi-Fi, pet friendly or fitness center). User Features
are collected in real time as users browse around the site. We
store them in ScyllaDB so we can get lightning fast queries. Some
examples of user features are the hotels viewed over the last 30
minutes, restaurants viewed over the last 24 hours, or reviews
submitted over the last 30 days. The Technologies Powering Visitor
Platform ScyllaDB is at the core of Visitor Platform. We use
Java-based Spring Boot microservices to expose the platform to our
clients. This is deployed on AWS ECS Fargate. We run Apache Spark
on Kubernetes for our daily data retention jobs, our offline to
online jobs. Then we use those jobs to load data from our offline
data warehouse into ScyllaDB so that they’re available on the live
site. We also use Amazon Kinesis for processing streaming user
tracking events. The Visitor Platform Data Flow The following
graphic shows how data flows through our platform in four stages:
produce, ingest, organize, and activate.
Data is produced by our website and our mobile apps. Some of
that data includes our Cross-Device User Identity Graph, Behavior
Tracking events (like page views and clicks) and streaming events
that go through Kinesis. Also, audience segmentation gets loaded
into our platform. Visitor Platform’s microservices are used to
ingest and organize this data. The data in ScyllaDB is stored in
two keyspaces: The Visitor Core keyspace, which contains the
Visitor Identity Graph The Visitor Metric keyspace, which contains
Facts and Metrics (the things that the people did as they browsed
the site) We use daily ETL processes to maintain and clean up the
data in the platform. We produce Data Products, stamped daily, in
our offline data warehouse – where they are available for other
integrations and other data pipelines to use in their processing.
Here’s a look at Visitor Platform by the numbers:
Why Two Databases? Our online database is focused on
the real-time, live website traffic. ScyllaDB fills this role by
providing very low latencies and high throughput. We use short term
TTLs to prevent the data in the online database from growing
indefinitely, and our data retention jobs ensure that we only keep
user activity data for real visitors. Tripadvisor.com gets a lot of
bot traffic, and we don’t want to store their data and try to
personalize bots – so we delete and clean up all that data.
Our offline data warehouse retains historical data used for
reporting, creating other data products, and training our ML
Models. We don’t want large-scale offline data processes impacting
the performance of our live site, so we have two separate databases
used for two different purposes. Visitor Platform Microservices We
use 5 microservices for Visitor Platform:
Visitor
Core manages the cross-device user identity graph based on
cookies and device IDs.
Visitor Metric is our
query engine, and that provides us with the ability for exposing
facts and metrics for specific visitors. We use a domain specific
language called visitor query language, or VQL. This example VQL
lets you see the latest commerce click facts over the last three
hours.
Visitor Publisher and
Visitor
Saver handle the write path, writing data into the
platform. Besides saving data in ScyllaDB, we also stream data to
the offline data warehouse. That’s done with Amazon Kinesis.
Visitor Composite simplifies publishing data in
batch processing jobs. It abstracts Visitor Saver and Visitor Core
to identify visitors and publish facts and metrics in a single API
call. Roundtrip Microservice Latency This graph illustrates how our
microservice latencies remain stable over time.
The average latency is only 2.5 milliseconds, and our P999 is
under 12.5 milliseconds. This is impressive performance, especially
given that we handle over 1 billion requests per day. Our
microservice clients have strict latency requirements. 95% of the
calls must complete in 12 milliseconds or less. If they go over
that, then we will get paged and have to find out what’s impacting
the latencies. ScyllaDB Latency Here’s a snapshot of ScyllaDB’s
performance over three days.
At peak, ScyllaDB is handling 340,000 operations per second
(including writes and reads and deletes) and the CPU is hovering at
just 21%. This is high scale in action! ScyllaDB delivers
microsecond writes and millisecond reads for us. This level of
blazing fast performance is exactly why we chose ScyllaDB.
Partitioning Data into ScyllaDB This image shows how we partition
data into ScyllaDB.
The Visitor Metric Keyspace has two tables: Fact and Raw
Metrics. The primary key on the Fact table is Visitor GUID, Fact
Type, and Created At Date. The composite partition key is the
Visitor GUID and Fact Type. The clustering key is Created At Date,
which allows us to sort data in partitions by date. The attributes
column contains a JSON object representing the event that occurred
there. Some example Facts are Search Terms, Page Views, and
Bookings. We use ScyllaDB’s Leveled Compaction Strategy because:
It’s optimized for range queries It handles high cardinality very
well It’s better for read-heavy workloads, and we have about 2-3X
more reads than writes Why ScyllaDB? Our solution was originally
built using Cassandra on-prem. But as the scale increased, so did
the operational burden. It required dedicated operations support in
order for us to manage the database upgrades, backups, etc. Also,
our solution requires very low latencies for core components. Our
User Identity Management system must identify the user within 30
milliseconds – and for the best personalization, we require our
Event Tracking platform to respond in 40 milliseconds. It’s
critical that our solution doesn’t block rendering the page so our
SLAs are very low. With Cassandra, we had impacts to performance
from garbage collection. That was primarily impacting the tail
latencies, the P999 and P9999 latencies.
We ran a Proof of Concept with ScyllaDB and found the
throughput to be much better than Cassandra and the operational
burden was eliminated. ScyllaDB gave us a monstrously fast live
serving database with the lowest possible latencies. We wanted a
fully-managed option, so we migrated from Cassandra to ScyllaDB
Cloud, following a dual write strategy. That allowed us to migrate
with zero downtime while handling 40,000 operations or requests per
second. Later, we migrated from ScyllaDB Cloud to ScyllaDB’s “Bring
your own account” model, where you can have the ScyllaDB team
deploy the ScyllaDB database into your own AWS account. This gave
us improved performance as well as better data privacy. This
diagram shows what ScyllaDB’s BYOA deployment looks like.
In the center of the diagram, you can see a 6-node ScyllaDB
cluster that is running on EC2. And then there’s two additional EC2
instances. ScyllaDB Monitor gives us Grafana dashboards as well as
Prometheus metrics. ScyllaDB Manager takes care of infrastructure
automation like triggering backups and repairs. With this
deployment, ScyllaDB could be co-located very close to our
microservices to give us even lower latencies as well as much
higher throughput and performance. Wrapping up, I hope you now have
a better understanding of our architecture, the technologies that
power the platform, and how ScyllaDB plays a critical role in
allowing us to handle Tripadvisor’s extremely high scale.
28 January 2025, 1:55 pm by
ScyllaDB
Build a shopping cart app with ScyllaDB– and learn how to
use ScyllaDB’s Change Data Capture (CDC) feature to query and
export the history of all changes made to the tables.This
blog post showcases one of ScyllaDB’s sample applications: a
shopping cart
app. The project uses
FastAPI as the backend
framework and ScyllaDB as the database. By cloning the
repository and
running the application, you can explore an example of an API
server built on top of ScyllaDB for a CRUD app. Additionally,
you’ll see how to use ScyllaDB’s
Change Data Capture (CDC) feature to query and export the
history of all changes made to the tables.What’s inside the
shopping cart sample app?The application has two components: an API
server and a database.API server: Python + FastAPIThe backend is
built with Python and FastAPI, a modern Python web framework known
for its speed and ease of use. FastAPI ensures that you have a
framework that can deliver relatively high performance if used with
the right database. At the same time, due to its exceptional
developer experience, you can easily understand the code of the
project and how it works even if you’ve never used it before.The
application exposes multiple API endpoints to perform essential
operations like:Adding products to the cartRemoving products from
the cartUploading new productsUpdating product information (e.g.
price)Database: ScyllaDBAt the core of this application is
ScyllaDB, a low-latency NoSQL database that provides predictable
performance. ScyllaDB excels in handling large volumes of data with
single-digit millisecond latency, making it ideal for large-scale
real-time applications.ScyllaDB acts as the foundation for a
high-performance low-latency app. Moreover, it has additional
capabilities that can help you maintain low p99 latency as well as
analyze user behavior. ScyllaDB’s CDC feature tracks changes in the
database and you can query historical operations. For e-commerce
applications, this means you can capture insights into user
behavior:What products are being added or removed from the cart and
when?How do users interact with the cart?What does a typical
journey look like for a user who actually buys something?These and
other insights are invaluable for personalizing the user
experience, optimizing the buying journey, and increasing
conversion rates.Using ScyllaDB for an ecommerce applicationAs
studies have shown, low latency is critical for achieving high
conversion rates and delivering a smooth user experience. For
instance, shopping cart operations – such as adding, updating, and
retrieving products – require high performance to prevent cart
abandonment.Data modeling, being the foundation for
high-performance web applications, must remain a top priority. So
let’s start with the process of creating a performant data
model.Design a Shopping Cart data modelWe emphasize a practical
“query-first” approach to NoSQL data modeling: start with your
application’s queries, then design your schema around them. This
method ensures your data model is optimized for your specific use
cases and the database can provide a reliable and single-digit p99
latency at any scale.Let’s review the specific CRUD operations and
queries a typical shopping cart application performs.ProductsList,
add, edit and remove products.GET /products?limit=?SELECT * FROM
product LIMIT {limit}GET /products/{product_id}SELECT * FROM
product WHERE id = ?POST /productsINSERT INTO product () values
()PUT /products/{product_id}UPDATE product SET ? WHERE id = ?DELETE
/products/{product_id}DELETE FROM product WHERE id = ?Based on
these requirements, you can create a table to store products. You
can notice what value is often used to filter products: product id.
This is a good indicator that product id should be the partition
key or at least part of it.The Product table:Our application is
simple, so a single column will suffice as the partition key.
However, if your use case requires additional queries and filtering
by additional columns, you can consider using a composite partition
key or adding a
clustering key to the table.CartList, add, remove products from
user’s cart.GET /cart/{user_id}SELECT * FROM cart_items WHERE
user_id = ? AND cart_id = ?POST /cart/{user_id}INSERT INTO cart()
VALUES ()Here we don’t need cart id because the user can only have
one active cart at a time. (You could also build another endpoint
to list past purchases by the user – that endpoint would require
the cart id as well)DELETE /cart/{user_id}DELETE FROM cart_items
WHERE user_id = ? AND cart_id = ? AND product_id = ?POST checkout
/cart/{user_id}/checkoutUPDATE cart SET is_active = false WHERE
user_id = ? AND cart_id = ?The cart-related operations contain a
slightly more complicated logic behind the scenes. We have two
values that we use to query by: user id and cart id. Those can be
used together as composite partition keys.Additionally, one user
can have multiple carts – one they’re using right now to shop and
possibly other ones that they had in the past that they already
paid for. For this reason, we need to have a way to efficiently
find the user’s active cart. This query requirement will be handled
by a
secondary index on the is_active column.The Cart
table:Additionally, we also need to create a table which connects
the Product and Cart tables. Without this table, it would be
impossible to retrieve products from a cart.The Cart_items table:We
enable Change Data Capture for this table. This feature logs all
data operations performed on the table into another table,
cart_items_scylla_cdc_log. Later, we can query this log to retrieve
the table’s historical operations. This data can be used to analyze
user behavior, such as the products users add or remove from their
carts.Final database schema:Now that we’ve covered the data
modeling aspect of the project, you can
clone the
repository and get started with building.Getting
startedPrerequisites:Python 3.8+ScyllaDB cluster (with
ScyllaDB Cloud or
use Docker)Connect to your ScyllaDB cluster using
CQLSH and create the
schema:Then,
install the Python requirements in a new environment:Modify
config.py to match your database credentials:Run the
server:Generate sample user data:This script populates your
ScyllaDB tables with sample data. This is necessary for the next
step, where you will run CDC queries to analyze user
behavior.Analyze user behavior with CDCCDC records every data
change, including deletes, offering a comprehensive view of your
data evolution without affecting database performance. For a
shopping cart application, some potential use cases for CDC
include:Analyzing a specific user’s buying behaviorTracking user
actions leading to checkoutEvaluating product popularity and
purchase frequencyAnalyzing active and abandoned cartsBeyond these
business-specific insights, CDC data can also be exported to
external platforms, such as
Kafka, for further processing and analysis.Here are a couple of
useful tips when working with CDC:The CDC log table contains
timeuuid values, which can be converted to readable timestamps
using the toTimestamp() CQL function.The cdc$operation column helps
filter operations by type. For instance, a value of 2 indicates an
INSERT
operation.The most efficient and scalable way to query CDC data
is to use the
ScyllaDB
source connector and set up an
integration with Kafka.Now, let’s explore a couple of quick
questions that CDC can help answer.How many times did users add
more than 2 of the same product to the cart?How many carts contain
a particular product?Set up ScyllaDB CDC with Kafka ConnectTo
provide a scalable way for you to analyze ScyllaDB CDC logs, you
can use Kafka to receive messages sent by ScyllaDB. Then, you can
use an analytics tool, like Elasticsearch, to get insights. To send
CDC logs to Kafka, you need to install the
ScyllaDB CDC source connector, and create a new ScyllaDB
connection in Kafka Connect.Install the ScyllaDB source connector
on the machine/container that’s running Kafka:Then use the
following ScyllaDB related parameters when you create the
connection:Make sure to enable CDC on each table you want to send
messages from. You can do this by executing the following CQL:Try
it out yourselfIf you are interested in trying out this application
yourself, check out the dedicated documentation site:
shopping-cart.scylladb.com
and the
GitHub
repository.If you have any questions about this project or
ScyllaDB, submit
a question
in ScyllaDB the forum.
22 January 2025, 1:09 pm by
ScyllaDB
Big things have been happening behind the scenes for the premier
Monster SCALE Summit. Ever since we introduced it at P99 CONF, the
community response has been overwhelming. We’re now faced with the
“good” problem of determining how to fit all the selected speakers
into the two half-days we set aside for the event. 😅 If you missed
the intro last year, Monster Scale Summit is a highly technical
conference that connects the community of professionals designing,
implementing, and optimizing performance-sensitive data-intensive
applications. It focuses on exploring “monster scale” engineering
challenges with respect to extreme levels of throughput, data, and
global distribution. The two-day event is free, intentionally
virtual, and highly interactive.
Register – it’s
free and virtual We’ll be announcing the agenda next month. But
we’re so excited about the speaker lineup that we can’t wait to
share a taste of what you can expect. Here’s a preview of 12 of the
60+ sessions that you can join on March 11 and 12… Designing
Data-Intensive Applications in 2025
Martin Kleppmann and
Chris Riccomini (Designing Data-Intensive Applications
book) Join us for an informal chat with Martin Kleppmann
and Chris Riccomini, who are currently revising the famous book
Designing Data-Intensive Applications. We’ll cover how
data-intensive applications have evolved since the book was first
published, the top tradeoffs people are negotiating today, and what
they believe is next for data-intensive applications. Martin and
Chris will also provide an inside look at the book writing and
revision process. The Nile Approach: Re-engineering Postgres for
Millions of Tenants
Gwen Shapira (Nile) Scaling
relational databases is a notoriously challenging problem. Doing so
while maintaining consistent low latency, efficient use of
resources and compatibility with Postgres may seem impossible. At
Nile, we decided to tackle the scaling challenge by focusing on
multi-tenant applications. These applications require not only
scalability, but also a way to isolate tenants and avoid the noisy
neighbor problem. By tackling both challenges, we developed an
approach, which we call “virtual tenant databases”, which gives us
an efficient way to scale Postgres to millions of tenants while
still maintaining consistent performance. In this talk, I’ll
explore the limitations of traditional scaling for multi-tenant
applications and share how Nile’s virtual tenant databases address
these challenges. By combining the best of Postgres existing
capabilities, distributed algorithms and a new storage layer, Nile
re-engineered Postgres for multi-tenant applications at scale. The
Mechanics of Scale
Dominik Tornow (Resonate HQ) As
distributed systems scale, the complexity of their development and
operation skyrockets. A dependable understanding of the mechanics
of distributed systems is our most reliable parachute. In this
talk, we’ll use systems thinking to develop an accurate and concise
mental model of concurrent, distributed systems, their core
challenges, and the key principles to address these challenges.
We’ll explore foundational problems such as the tension between
consistency and availability, and essential techniques like
partitioning and replication. Whether you are building a new system
from scratch or scaling an existing system to new heights, this
talk will provide the understanding to confidently navigate the
intricacies of modern, large-scale distributed systems. Feature
Store Evolution Under Cost Constraints: When Cost is Part of the
Architecture
Ivan Burmistrov and David Malinge
(ShareChat) At P99 CONF 23, the ShareChat team presented
the scaling challenges for the ML Feature Store so it could handle
1 billion features per second. Once the system was scaled to handle
the load, the next challenge the team faced was extreme cost
constraints: it was required to make the same quality system much
cheaper to run. Ivan and David will talk about approaches the team
implemented in order to optimize for cost in the Cloud environment
while maintaining the same SLA for the service. The talk will touch
on such topics as advanced optimizations on various levels to bring
down the compute, minimizing the waste when running on Kubernetes,
autoscaling challenges for stateful Apache Flink jobs, and others.
The talk should be useful for those who are either interested in
building or optimizing an ML Feature Store or in general looking
into cost optimizations in the cloud environment. Time Travelling
at Scale
Richard Hart (Antithesis) Antithesis is a
continuous reliability platform that autonomously searches for
problems in your software within a simulated environment. Every
problem we find can be perfectly reproduced, allowing for efficient
debugging of even the most complex problems. But storing and
querying histories of program execution at scale creates monster
large cardinalities. Over a ~10 hour test run, we generate ~1bn
rows. The solution: our own tree-database. 30B Images and Counting:
Scaling Canva’s Content-Understanding Pipelines
Dr. Kerry
Halupka (Canva) As the demand for high-quality, labeled
image data grows, building systems that can scale content
understanding while delivering real-time performance is a
formidable challenge. In this talk, I’ll share how we tackled the
complexities of scaling content understanding pipelines to support
monstrous volumes of data, including backfilling labels for over 30
billion images. At the heart of our system is an extreme label
classification model capable of handling thousands of labels and
scaling seamlessly to thousands more. I’ll dive into the core
components: candidate image search, zero-shot labelling using
highly trained teacher models, and iterative refinement with visual
critic models. You’ll learn how we balanced latency, throughput,
and accuracy while managing evolving datasets and continuously
expanding label sets. I’ll also discuss the tradeoffs we faced—such
as ensuring precision in labelling without compromising speed—and
the techniques we employed to optimise for scale, including
strategies to address data sparsity and performance bottlenecks. By
the end of the session, you’ll gain insights into designing,
implementing, and scaling content understanding systems that meet
extreme demands. Whether you’re working with real-time systems,
distributed architectures, or ML pipelines, this talk will provide
actionable takeaways for pushing large-scale labelling pipelines to
their limits and beyond. How Agoda Scaled 50x Throughput with
ScyllaDB
Worakarn Isaratham (Agoda) In this talk,
we will explore the performance tuning strategies implemented at
Agoda to optimize ScyllaDB. Key topics include enhancing disk
performance, selecting the appropriate compaction strategy, and
adjusting SSTable settings to match our usage profile. Who Needs
One Database Anyway?
Glauber Costa (Turso)
Developers need databases. That’s how you store your data. And
that’s usually how it goes: you have your large fleet of services,
and they connect to one database. But what if it wasn’t like that?
What if instead of one database, one application would create one
million databases, or even more? In this talk, we’ll explore the
market trends that give rise to use cases where this pattern is
beneficial, and the infrastructure changes needed to support it.
How We Boosted ScyllaDB’s Data Streaming by 25x
Asias He
(ScyllaDB) To improve elasticity, we wanted to speed up
streaming, the process of scaling out/in to other nodes used to
analyze every partition. Enter file-based streaming, a new
feature that optimizes tablet movement. This new approach streams
the entire SSTable files without deserializing SSTable files into
mutation fragments and re-serializing them back into SSTables on
receiving nodes. As a result, significantly less data is streamed
over the network, and less CPU is consumed – especially for
data models that contain small cells. This session will share the
engineering behind this optimization and look at the performance
impact you can expect in common situations. Evolving Atlassian
Confluence Cloud for Scale, Reliability, and Performance
Bhakti Mehta (Atlassian) This session covers the
journey of Confluence Cloud – the team workspace for collaboration
and knowledge sharing used by thousands of companies – and how we
aim to take it to the next level, with scale, performance, and
reliability as the key motivators. This session presents a deep
dive to provide insights into how the Confluence architecture has
evolved into its current form. It discusses how Atlassian deploys,
runs, and operates at scale and all challenges encountered along
the way. I will cover performance and reliability at scale starting
with the fundamentals of measuring everything, re-defining metrics
to be insightful of actual customer pain, auditing end-to-end
experiences. Beyond just dev-ops and best practices, this means
empowering teams to own product stability through practices and
tools. Two Leading Approaches to Data Virtualization: Which Scales
Better?
Dr. Daniel Abadi (University of Maryland)
You have a large dataset stored in location X, and some code to
process or analyze it in location Y. What is better: move the code
to the data, or move the data to the code? For decades, it has
always been assumed that the former approach is more scalable.
Recently, with the rise of cloud computing, and the push to
separate resources for storage and compute, we have seen data
increasingly being pushed to code, flying in face of conventional
wisdom. What is behind this trend, and is it a dangerous idea? This
session will look at this question from academic and practical
perspectives, with a particular focus on data virtualization, where
there exists an ongoing debate on the merits of push-based vs.
pull-based data processing. Scaling a Beast: Lessons from 400x
Growth in a High-Stakes Financial System
Dmytro Hnatiuk
(Wise) Scaling a system from 66 million to over 25 billion
records is no easy feat—especially when it’s a core financial
system where every number has to be right, and data needs to be
fresh right now. In this session, I’ll share the ups and downs of
managing this kind of growth without losing my sanity. You’ll learn
how to balance high data accuracy with real-time performance,
optimize your app logic, and avoid the usual traps of database
scaling. This isn’t about turning you into a database expert—it’s
about giving you the practical, no-BS strategies you need to scale
your systems without getting overwhelmed by technical headaches.
Perfect for engineers and architects who want to tackle big
challenges and come out on top.
14 January 2025, 9:55 am by
ScyllaDB
How a team of just two engineers tackled real-time
persisted events for hundreds of millions of players With
just two engineers, Supercell took on the daunting task of growing
their basic account system into a social platform connecting
hundreds of millions of gamers. Account management, friend
requests, cross-game promotions, chat, player presence tracking,
and team formation – all of this had to work across their five
major games. And they wanted it all to be covered by a single
solution that was simple enough for a single engineer to maintain,
yet powerful enough to handle massive demand in real-time.
Supercell’s Server Engineer, Edvard Fagerholm, recently shared how
their mighty team of two tackled this task. Read on to learn how
they transformed a simple account management tool into a
comprehensive cross-game social network infrastructure that
prioritized both operational simplicity and high performance.
Note: If you enjoy hearing about engineering
feats like this, join us at Monster Scale
Summit (free + virtual). Engineers from Disney+/Hulu,, Slack,
Canva, Uber, Salesforce, Atlassian and more will be sharing
strategies and case studies. Background: Who’s Supercell?
Supercell is the Finland-based company behind the hit games Hay
Day, Clash of Clans, Boom Beach, Clash Royale and Brawl Stars. Each
of these games has generated $1B in lifetime revenue.
Somehow they manage to achieve this with a super small staff. Until
quite recently, all the account management functionality for games
servicing hundreds of millions of monthly active users was being
built and managed by just two engineers. And that brings us to
Supercell ID. The Genesis of Supercell ID Supercell ID was born as
a basic account system – something to help users recover accounts
and move them to new devices. It was originally implemented as a
relatively simple HTTP API. Edvard explained, “The client could
perform HTTP queries to the account API, which mainly returned
signed tokens that the client could present to the game server to
prove their identity. Some operations, like making friend requests,
required the account API to send a notification to another player.
For example, ‘Do you approve this friend request?’ For that
purpose, there was an event queue for notifications. We would post
the event there, and the game backend would forward the
notification to the client using the game socket.” Enter Two-Way
Communication After Edvard joined the Supercell ID project in late
2020, he started working on the notification backend – mainly for
cross-promotion across their five games. He soon realized that they
needed to implement two-way communication themselves, and built it
as follows: Clients connected to a fleet of proxy servers, then a
routing mechanism pushed events directly to clients (without going
through the game). This was sufficient for the immediate goal of
handling cross-promotion and friend requests. It was fairly simple
and didn’t need to support high throughput or low latency. But it
got them thinking bigger. They realized they could use two-way
communication to significantly increase the scope of the Supercell
ID system. Edvard explained, “Basically, it allowed us to implement
features that were previously part of the game server. Our goal was
to take features that any new games under development might need
and package them into our system – thereby accelerating their
development.” With that, Supercell ID began transforming into a
cross-game social network that supported features like friend
graphs, teaming up, chat, and friend state tracking. Evolving
Supercell ID into Cross-Game Social Network At this point, the
Social Network side of the backend was still a single-person
project, so they designed it with simplicity in mind. Enter
abstraction. Finding the right abstraction “We wanted to have only
one simple abstraction that would support all of our uses and could
therefore be designed and implemented by a single engineer,”
explained Edvard. “In other words, we wanted to avoid building a
chat system, a presence system, etc. We wanted to build one thing,
not many.” Finding the right abstraction was key. And a
hierarchical key-value store with Change Data Capture fit the bill
perfectly. Here’s how they implemented it: The top-level keys in
the key-value store are topics that can be subscribed to. There’s a
two-layer map under each top-level key –
map(string,
map(string, string)). Any change to the data under a
top-level key is broadcast to all that key’s subscribers. The
values in the innermost map are also timestamped. Each data source
controls its own timestamps and defines the correct order. The
client drops any update with an older timestamp than what it
already has stored. A typical change in the data would be something
like ‘level equals 10’ changes to ‘level equals 11’. As players
play, they trigger all sorts of updates like this, which means a
lot of small writes are involved in persisting all the events.
Finding the Right Database They needed a database that would
support their technical requirements and be manageable, given their
minimalist team. That translated to the following criteria: Handles
many small writes with low latency Supports a hierarchical data
model Manages backups and cluster operations as a service ScyllaDB
Cloud turned out to be a great fit. (ScyllaDB Cloud is the
fully-managed version of ScyllaDB, a database known for delivering
predictable low latency at scale). How it All Plays Out For an idea
of how this plays out in Supercell games, let’s look at two
examples. First, consider chat messages. A simple chat message
might be represented in their data model as follows: <room
ID> -> <timestamp_uuid> -> message -> “hi there”
metadata
-> …
reactions
-> … Edvard explained, “The top-level key that’s subscribed to
is the chat room ID. The next level key is a timestamp-UID, so we
have an ordering of each message and can query chat history. The
inner map contains the actual message together with other data
attached to it.” Next, let’s look at “presence”, which is used
heavily in Supercell’s new (and highly anticipated) game, mo.co.
The goal of presence, according to Edvard: “When teaming up for
battle, you want to see in real-time the avatar and the current
build of your friends – basically the weapons and equipment of your
friends, as well as what they’re doing. If your friend changes
their avatar or build, goes offline, or comes online, it should
instantly be visible in the ‘teaming up’ menu.” Players’ state data
is encoded into Supercell’s hierarchical map as follows: <player
ID> -> “presence” -> weapon -> sword
level
-> 29
status
-> in battle Note that: The top level is the player ID, the
second level is the type, and the inner map contains the data.
Supercell ID doesn’t need to understand the data; it just forwards
it to the game clients. Game clients don’t need to know the friend
graph since the routing is handled by Supercell ID. Deeper into the
System Architecture Let’s close with a tour of the system
architecture, as provided by Edvard. “The backend is split into
APIs, proxies, and event routing/storage servers. Topics live on
the event routing servers and are sharded across them. A client
connects to a proxy, which handles the client’s topic subscription.
The proxy routes these subscriptions to the appropriate event
routing servers. Endpoints (e.g., for chat and presence) send their
data to the event routing servers, and all events are persisted in
ScyllaDB Cloud. Each topic has a primary and backup shard. If the
primary goes down, the primary shard maintains the memory sequence
numbers for each message to detect lost messages. The secondary
will forward messages without sequence numbers. If the primary is
down, the primary coming up will trigger a refresh of state on the
client, as well as resetting the sequence numbers. The API for the
routing layers is a simple post-event RPC containing a batch of
topic, type, key, value tuples. The job of each API is just to
rewrite their data into the above tuple representation. Every event
is written in ScyllaDB before broadcasting to subscribers. Our APIs
are synchronous in the sense that if an API call gives a successful
response, the message was persisted in ScyllaDB. Sending the same
event multiple times does no harm since applying the update on the
client is an idempotent operation, with the exception of possibly
multiple sequence numbers mapping to the same message. When
connecting, the proxy will figure out all your friends and
subscribe to their topics, same for chat groups you belong to. We
also subscribe to topics for the connecting client. These are used
for sending notifications to the client, like friend requests and
cross promotions. A router reboot triggers a resubscription to
topics from the proxy. We use Protocol Buffers to save on bandwidth
cost. All load balancing is at the TCP level to guarantee that
requests over the same HTTP/2 connection are handled by the same
TCP socket on the proxy. This lets us cache certain information in
memory on the initial listen, so we don’t need to refetch on other
requests. We have enough concurrent clients that we don’t need to
separately load balance individual HTTP/2 requests, as traffic is
evenly distributed anyway, and requests are about equally expensive
to handle across different users. We use persistent sockets between
proxies and routers. This way, we can easily send tens of thousands
of subscriptions per second to a single router without an issue.”
But It’s Not Game Over If you want to watch the complete tech talk,
just press play below: And if you want to read more about
ScyllaDB’s role in the gaming world, you might also want to read:
Epic Games: How Epic Games uses ScyllaDB as a
binary cache in front of NVMe and S3 to accelerate global
distribution of large game assets used by Unreal Cloud DDC.
Tencent Games: How Tencent Games built service
architecture based on CQRS and event sourcing patterns with Pulsar
and ScyllaDB.
Discord: How Discord uses ScyllaDB to power
their massive growth, moving from a niche gaming platform to one of
the world’s largest communication platforms.
8 January 2025, 1:20 pm by
ScyllaDB
Monitoring tips that can help reduce cluster size 2-5X
without compromising latency Editor’s note: The
following is a guest post by Andrei Manakov, Senior Staff Software
Engineer at ShareChat. It was originally published
on Andrei’s blog. I had the privilege of giving
a talk at ScyllaDB Summit 2024, where I briefly addressed the
challenge of analyzing the remaining capacity in ScyllaDB clusters.
A good understanding of ScyllaDB internals is required to plan your
computation cost increase when your product grows or to reduce cost
if the cluster turns out to be heavily over-provisioned. In my
experience, clusters can be reduced by 2-5x without latency
degradation after such an analysis. In this post, I provide more
detail on how to properly analyze CPU and disk resources. How Does
ScyllaDB Use CPU? ScyllaDB is a distributed database, and one
cluster typically contains multiple nodes. Each node can contain
multiple shards, and each shard is assigned to a single core. The
database is built on the
Seastar framework and
uses a shared-nothing approach. All data is usually replicated in
several copies, depending on the
replication factor, and each copy is assigned to a specific
shard. As a result, every shard can be analyzed as an independent
unit and every shard efficiently utilizes all available CPU
resources without any overhead from contention or context
switching. Each shard has different tasks, which we can divide into
two categories: client request processing and maintenance tasks.
All tasks are executed by a scheduler in one thread pinned to a
core, giving each one its own CPU budget limit. Such clear task
separation allows
isolation and prioritization of latency-critical tasks for
request processing. As a result of this design, the cluster handles
load spikes more efficiently and provides gradual latency
degradation under heavy load. [
More details about
this architecture].
Another interesting result of this design is that
ScyllaDB supports
workload prioritization. In my experience, this approach
ensures that critical latency is not impacted during less critical
load spikes. I can’t recall any similar feature in other databases.
Such problems are usually tackled by having 2 clusters for
different workloads. But keep in mind that this feature is
available only in ScyllaDB Enterprise.
However, background tasks may occupy all remaining resources, and
overall CPU utilization in the cluster appears spiky. So, it’s not
obvious how to find the real cluster capacity. It’s easy to see
100%
CPU usage with no performance impact. If we increase the
critical load, it will consume the resources (CPU, I/O) from
background tasks. Background tasks’ duration can increase slightly,
but it’s totally manageable. The Best CPU Utilization Metric How
can we understand the remaining cluster capacity when CPU usage
spikes up to 100% throughout the day, yet the system remains
stable? We need to exclude maintenance tasks and remove all these
spikes from the consideration. Since ScyllaDB distributes all the
data by shards and every shard has its own core, we take into
account the max CPU utilization by a shard excluding maintenance
tasks (you can find
other task types here). In my experience, you can keep the
utilization up to 60-70% without visible degradation in tail
latency. Example of a Prometheus query:
max(sum(rate(scylla_scheduler_runtime_ms{group!="compaction|streaming"}))
by (instance, shard))/10
You can find more details about the
ScyllaDB monitoring stack here. In this
article, PromQL queries are used to
demonstrate how to analyse key metrics effectively.
However, I don’t recommend rapidly downscaling the cluster to the
desired size just after looking at max CPU utilization excluding
the maintenance tasks. First, you need to look at average CPU
utilization excluding maintenance tasks across all shards. In an
ideal world, it should be close to max value. In case of
significant skew, it definitely makes sense to find the root cause.
It can be an inefficient schema with an incorrect
partition key or an incorrect
token-aware/rack-aware configuration in the driver. Second, you
need to take a look at the
average CPU
utilization of excluded tasks for some your workload
specific things. It’s rarely more than 5-10% but you might need to
have more buffer if it uses more CPU. Otherwise, compaction will be
too tight in resources and reads start to become more expensive
with respect to CPU and disk. Third, it’s important to downscale
your cluster gradually. ScyllaDB has an in-memory row cache which
is crucial for ScyllaDB. It allocates all remaining memory for the
cache and with the memory reduction, the hit rate might drop more
than you expected. Hence, CPU utilization can be increased
unilinearly and low cache hit rate can harm your tail latency.
1- (sum(rate(scylla_cache_reads_with_misses{})) /
sum(rate(scylla_cache_reads{})))
I haven’t mentioned RAM in this article as there are
not many actionable points. However, since memory cache is crucial
for efficient reading in ScyllaDB, I recommend always using
memory-optimized virtual machines. The more
memory, the better.
Disk Resources ScyllaDB is a
LSMT-based
database. That means it is optimized for writing by design and
any mutation will lead to new appending new data to the disk. The
database periodically rewrites the data to ensure acceptable read
performance. Disk performance plays a crucial role in overall
database performance. You can find more details about the write
path and compaction in the
scylla
documentation. There are 3 important disk resources we will
discuss here: Throughput, IOPs and free disk space. All these
resources depend on the disk type we attached to our ScyllaDB nodes
and their quantity. But how can we understand the limit of the
IOPs/throughput? There 2 possible options: Any cloud provider or
manufacturer usually provides performance of their disks ; you can
find it on their website. For example,
NVMe disks from Google Cloud. The actual disk performance can
be different compared to the numbers that manufacturers share. The
best option might be just to measure it. And we can easily get the
result. ScyllaDB performs a benchmark during installation to a node
and stores the result in the file
io_properties.yaml. The database uses these limits
internally for achieving
optimal performance.
disks: - mountpoint:
/var/lib/scylla/data read_iops: 2400000 //iops read_bandwidth:
5921532416//throughput write_iops: 1200000 //iops write_bandwidth:
4663037952//throughput
file:
io_properties.yaml Disk Throughput
sum(rate(node_disk_read_bytes_total{})) / (read_bandwidth *
nodeNumber) sum(rate(node_disk_written_bytes_total{})) /
(write_bandwidth * nodeNumber)
In my experience, I haven’t
seen any harm with utilization up to 80-90%. Disk IOPs
sum(rate(node_disk_reads_completed_total{})) / (read_iops *
nodeNumber) sum(rate(node_disk_writes_completed_total{})) /
(write_iops * nodeNumber)
Disk free space It’s crucial to
have significant buffer in every node. In case you’re running out
of space, the node will be basically unavailable and it will be
hard to restore it. However, additional space is required for many
operations: Every update, write, or delete will be written to the
disk and allocate new space. Compaction requires some buffer during
cleaning the space. Back up procedure. The best way to control disk
usage is to use
Time To Live in the tables if it matches your use case. In this
case, irrelevant data will expire and be cleaned during compaction.
I usually try to keep at least 50-60% of free space.
min(sum(node_filesystem_avail_bytes{mountpoint="/var/lib/scylla"})
by
(instance)/sum(node_filesystem_size_bytes{mountpoint="/var/lib/scylla"})
by (instance))
Tablets Most apps have significant load
variations throughout the day or week. ScyllaDB is not elastic and
you need to have provisioned the cluster for the peak load. So, you
could waste a lot of resources during night or weekends. But that
could change soon. A ScyllaDB cluster distributes data across its
nodes and the smallest unit of the data is a partition uniquely
identified by a
partition key. A
partitioner hash function computes tokens to understand in
which nodes data are stored. Every node has its own token range,
and all nodes make a
ring. Previously, adding a new node wasn’t a fast procedure
because it required copying (it is called streaming) data to a new
node, adjusting token range for neighbors, etc. In addition, it’s a
manual
procedure. However, ScyllaDB introduced
tablets in 6.0 version, and it provides new opportunities. A
Tablet is a range of tokens in a table and it includes partitions
which can be replicated independently. It makes the overall process
much smoother and it increases elasticity significantly. Adding new
nodes
takes minutes and
a new node starts processing requests even before full data
synchronization. It looks like a significant step toward full
elasticity which can drastically reduce server cost for ScyllaDB
even more. You can
read more about
tablets here. I am looking forward to testing tablets closely
soon. Conclusion Tablets look like a solid foundation for future
pure elasticity, but for now, we’re planning clusters for peak
load. To effectively analyze ScyllaDB cluster capacity, focus on
these key recommendations: Target
max CPU
utilization (excluding maintenance tasks) per shard at
60–70%. Ensure sufficient
free disk
space to handle compaction and backups. Gradually
downsize clusters to avoid sudden cache
degradation.
2 January 2025, 1:09 pm by
ScyllaDB
It’s been a while since my last update. We’ve been busy improving
the existing ScyllaDB training material and adding new lessons and
labs. In this post, I’ll survey the latest developments and update
you on the live training event taking place later this month. You
can discuss these topics (and more!) on the community forum.
Say hello here. ScyllaDB University LIVE Training In addition
to the self-paced online courses you can take on ScyllaDB
University (see below), we host online live training events. These
events are a great opportunity to improve your NoSQL and ScyllaDB
skills, get hands-on practice, and get your questions answered by
our team of experts. The next event is ScyllaDB University LIVE,
which will occur 29th of January 29. As usual, we’re planning on
having two tracks, an Essentials, and an Advanced track. However,
this time we’ll change the format and make each track a complete
learning path. Stay tuned for more details, and I hope to see you
there.
Save
your spot at ScyllaDB University LIVE ScyllaDB University
Content Updates
ScyllaDB
University is our online learning platform where you can learn
about NoSQL and about ScyllaDB and get some hands-on experience. It
includes many different self-paced lessons, meaning you can study
whenever you have some free time and continue where you left off.
The material is free and all you have to do is create a user
account. We recently added new lessons and updated many existing
ones. All of the following topics were added to the course
S201: Data
Modeling and Application Development. Start learning New in the
How To Write Better Apps Lesson General Data Modeling Guidelines
This lesson discusses key principles of NoSQL data modeling,
emphasizing a query-driven design approach to ensuring efficient
data distribution and balanced workloads. It highlights the
importance of selecting high-cardinality primary keys, avoiding bad
access patterns, and using ScyllaDB Monitoring to identify and
resolve issues such as Hot Partitions and Large Partitions.
Neglecting these practices can lead to slow performance,
bottlenecks, and potentially unreadable data – underscoring the
need for using best practices when creating your data model. To
learn more, you can explore
the complete lesson here. Large Partitions and Collections This
lesson provides insights into common pitfalls in NoSQL data
modeling, focusing on issues like large partitions, collections,
and improper use of ScyllaDB features. It emphasizes avoiding large
partitions due to the impact on performance and demonstrates this
with real-world examples and Monitoring data. Collections should
generally remain small to prevent high latency. The schema used
depends on the use case and on the performance requirements.
Practical advice and tools are offered for testing and monitoring.
You can learn more in
the complete lesson here. Hot Partitions, Cardinality and
Tombstones This lesson explores common challenges in NoSQL
databases, focusing on hot partitions, low cardinality keys, and
tombstones. Hot partitions cause uneven load and bottlenecks, often
due to misconfigurations or retry storms. Having many tombstones
can degrade read performance due to read amplification. Best
practices include avoiding retry storms, using efficient full-table
scans over low cardinality views and preferring partition-level
deletes to minimize tombstone buildup. Monitoring tools and
thoughtful schema design are emphasized for efficient database
performance. You can find
the complete lesson here. Diagnosis and Prevention This lesson
covers strategies to diagnose and prevent common database issues in
ScyllaDB, such as large partitions, hot partitions, and
tombstone-related inefficiencies. Tools like the nodetool
toppartitions command help identify hot partition problems, while
features like per-partition rate limits and shard concurrency
limits manage load and prevent contention. Properly configuring
timeout settings avoids retry storms that exacerbate hot partition
problems. For tombstones, using efficient delete patterns helps
maintain performance and prevent timeouts during reads. Proactive
monitoring and adjustments are emphasized throughout. You can see
the
complete lesson here. New in the Basic Data Modeling Lesson CQL
and the CQL Shell The lesson introduces the Cassandra Query
Language (CQL), its similarities to SQL, and its use in ScyllaDB
for data definition and manipulation commands. It highlights the
interactive CQL shell (CQLSH) for testing and interaction,
alongside a high level overview of drivers. Common data types and
collections like Sets, Lists, Maps, and User-Defined Types in
ScyllaDB are briefly mentioned. The “Pet Care IoT” lab example is
presented, where sensors on pet collars record data like heart rate
or temperature at intervals. This demonstrates how CQL is applied
in database operations for IoT use cases. This example is used in
labs later on. You can watch
the video and complete lesson here. Data Modeling Overview and
Basic Concepts The new video introduces the basics of data modeling
in ScyllaDB, contrasting NoSQL and relational approaches. It
emphasizes starting with application requirements, including
queries, performance, and consistency, to design models. Key
concepts such as clusters, nodes, keyspaces, tables, and
replication factors are explained, highlighting their role in
distributed data systems. Examples illustrate how tables and
primary keys (partition keys) determine data distribution across
nodes using consistent hashing. The lesson demonstrates creating
keyspaces and tables, showing how replication factors ensure data
redundancy and how ScyllaDB maps partition keys to replica nodes
for efficient reads and writes. You can find
the complete lesson here. Primary Key, Partition Key,
Clustering Key This lesson explains the structure and importance of
primary keys in ScyllaDB, detailing their two components: the
mandatory partition key and the optional clustering key. The
partition key determines the data’s location across nodes, ensuring
efficient querying, while the clustering key organizes rows within
a partition. For queries to be efficient, the partition key must be
specified to avoid full table scans. An example using pet data
illustrates how rows are sorted within partitions by the clustering
key (e.g., time), enabling precise and optimized data retrieval.
Find
the complete lesson here. Importance of Key Selection This
video emphasizes the importance of choosing partition and
clustering keys in ScyllaDB for optimal performance and data
distribution. Partition keys should have high cardinality to ensure
even data distribution across nodes and avoid issues like large or
hot partitions. Examples of good keys include unique identifiers
like user IDs, while low-cardinality keys like states or ages can
lead to uneven load and inefficiency. Clustering keys should align
with query patterns, considering the order of rows and prioritizing
efficient retrieval, such as fetching recent data for
time-sensitive applications. Strategic key selection prevents
resource bottlenecks and enhances scalability. Learn more in
the complete lesson. Data Modeling Lab Walkthrough (three
parts) The new three-part video lesson focuses on key aspects of
data modeling in ScyllaDB, emphasizing the design and use of
primary keys. It demonstrates creating a cluster and tables using
the CQL shell, highlighting how partition keys determine data
location and efficient querying while showcasing different queries.
Some tables use a Clustering key, which organizes data within
partitions, enabling efficient range queries. It explains compound
primary keys to enhance query flexibility. Next, an example of a
different clustering key order (ascending or descending) is given.
This enables query optimization and efficient retrieval of data.
Throughout the lab walkthrough, different challenges are presented,
as well as data modeling solutions to optimize performance,
scalability, and resource utilization. You can
watch the walkthrough here and also
take the lab yourself. New in the Advanced Data Modeling Lesson
Collections and Drivers The new lesson discusses advanced data
modeling in ScyllaDB, focusing on collections (Sets, Lists, Maps,
and User-defined types) to simplify models with multi-value fields
like phone numbers or emails. It introduces token-aware and
shard-aware drivers as optimizations to enhance query efficiency.
Token-aware drivers allow clients to send requests directly to
replica nodes, bypassing extra hops through coordinator nodes,
while shard-aware clients target specific shards within replica
nodes for improved performance. ScyllaDB supports drivers in
multiple languages like Java, Python, and Go, along with
compatibility with Cassandra drivers. An entire
course
on Drivers is also available. You can learn more in
the complete lesson here. New in the ScyllaDB Operations Course
Replica level Write/Read Path The lesson explains ScyllaDB’s read
and write paths, focusing on how data is written to Memtables
persisted as immutable SSTables. Because the SSTables are
immutable, they are compacted periodically. Writes, including
updates and deletes, are stored in a commit log before being
flushed to SSTables. This ensures data consistency. For reads, a
cache is used to optimize performance (also using bloom filters).
Compaction merges SSTables to remove outdated data, maintain
efficiency, and save storage. ScyllaDB offers different compaction
strategies and you can choose the most suitable one based on your
use case. Learn more in
the full lesson. Tracing Demo The lesson provides a practical
demonstration of ScyllaDB’s tracing using a three-node cluster. The
tracing tool is showcased as a debugging aid to track request flows
and replica responses. The demo highlights how data consistency
levels influence when responses are sent back to clients and
demonstrates high availability by successfully handling writes even
when a node is down, provided the consistency requirements are met.
You can find
the complete lesson here.
26 December 2024, 2:35 pm by
ScyllaDB
Let’s look back at the top 10 ScyllaDB blog posts written this year
– plus 10 “timeless classics” that continue to get attention.
Before we start, thank you to all the community members who
contributed to our blogs in various ways – from users sharing best
practices at ScyllaDB Summit, to engineers explaining how they
raised the bar for database performance, to anyone who has
initiated or contributed to the discussion on HackerNews, Reddit,
and other platforms. And if you have suggestions for 2025 blog
topics, please share them with us on our socials. With no further
ado, here are the most-read blog posts that we published in 2024…
We Compared ScyllaDB and Memcached and… We Lost?
By
Felipe Cardeneti Mendes Engineers behind ScyllaDB joined
forces with Memcached maintainer dormando for an in-depth look at
database and cache internals, and the tradeoffs in each. Read:
We
Compared ScyllaDB and Memcached and… We Lost? Related:
Why Databases Cache, but Caches Go to Disk Inside
ScyllaDB’s Internal Cache
By Pavel “Xemul”
Emelyanov Why ScyllaDB completely bypasses the Linux cache
during reads, using its own highly efficient row-based cache
instead. Read: I
nside
ScyllaDB’s Internal Cache Related:
Replacing Your Cache with ScyllaDB Smooth Scaling: Why
ScyllaDB Moved to “Tablets” Data Distribution
By Avi
Kivity The rationale behind ScyllaDB’s new “tablets”
replication architecture, which builds upon a multiyear project to
implement and extend Raft. Read:
Smooth Scaling:
Why ScyllaDB Moved to “Tablets” Data Distribution Related:
ScyllaDB Fast Forward: True Elastic Scale Rust vs. Zig
in Reality: A (Somewhat) Friendly Debate
By Cynthia
Dunlop A (somewhat) friendly P99 CONF popup debate with
Jarred Sumner (Bun.js), Pekka Enberg (Turso), and Glauber Costa
(Turso) on ThePrimeagen’s stream. Read:
Rust vs. Zig in Reality: A (Somewhat) Friendly Debate Related:
P99 CONF on demand
Database Internals: Working with IO
By Pavel “Xemul”
Emelyanov Explore the tradeoffs of different Linux I/O
methods and learn how databases can take advantage of a modern
SSD’s unique characteristics. Read:
Database Internals: Working with IO Related:
Understanding Storage I/O Under Load How We Implemented
ScyllaDB’s “Tablets” Data Distribution
By Avi
Kivity How ScyllaDB implemented its new Raft-based tablets
architecture, which enables teams to quickly scale out in response
to traffic spikes. Read:
How We
Implemented ScyllaDB’s “Tablets” Data Distribution Related:
Overcoming Distributed Databases Scaling Challenges with
Tablets How ShareChat Scaled their ML Feature Store
1000X without Scaling the Database
By Ivan Burmistrov and
Andrei Manakov How ShareChat engineers managed to meet
their lofty performance goal without scaling the underlying
database. Read:
How ShareChat Scaled their ML Feature Store 1000X without Scaling
the Database Related:
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
New Google Cloud Z3 Instances: Early Performance Benchmarks
By Łukasz Sójka, Roy Dahan ScyllaDB had the
privilege of testing Google Cloud’s brand new Z3 GCE instances in
an early preview. We observed a 23% increase in write throughput,
24% for mixed workloads, and 14% for reads per vCPU – all at a
lower cost compared to N2. Read:
New
Google Cloud Z3 Instances: Early Performance Benchmarks
Related:
A Deep Dive into ScyllaDB’s Architecture Database
Internals: Working with CPUs
By Pavel “Xemul”
Emelyanov Get a database engineer’s inside look at how the
database interacts with the CPU…in this excerpt from the book,
“Database Performance at Scale.” Read:
Database
Internals: Working with CPUs Related:
Database
Performance at Scale: A Practical Guide [Free Book]
Migrating from Postgres to ScyllaDB, with 349X Faster Query
Processing
By Dan Harris and Sebastian Vercruysse
How Coralogix cut processing times from 30 seconds to 86
milliseconds with a PostgreSQL to ScyllaDB migration. Read:
Migrating from Postgres to ScyllaDB, with 349X Faster Query
Processing Related:
NoSQL Migration Masterclass Bonus: Top NoSQL Database
Blogs From Years Past Many of the blogs published in previous years
continued to resonate with the community. Here’s a rundown of 10
enduring favorites:
How io_uring and eBPF Will Revolutionize Programming in
Linux (Glauber Costa): How io_uring and eBPF will
change the way programmers develop asynchronous interfaces and
execute arbitrary code, such as tracepoints, more securely. [2020]
Benchmarking MongoDB vs ScyllaDB: Performance, Scalability
& Cost (Dr. Daniel Seybold): Dr. Daniel Seybold shares
how MongoDB and ScyllaDB compare on throughput, latency,
scalability, and price-performance in this third-party benchmark by
benchANT. [2023]
Introducing “Database Performance at Scale”: A Free, Open Source
Book (Dor Laor): Introducing a new book that provides
practical guidance for understanding the opportunities, trade-offs,
and traps you might encounter while trying to optimize
data-intensive applications for high throughput and low latency.
[2023]
DynamoDB: When to Move Out (Felipe Cardeneti Mendes):
A look at the top reasons why teams decide to leave DynamoDB:
throttling, latency, item size limits, and limited flexibility…not
to mention costs. [2023]
ScyllaDB vs MongoDB vs PostgreSQL: Tractian’s Benchmarking &
Migration (João Pedro Voltani): TRACTIAN shares their
comparison of ScyllaDB vs MongoDB and PostgreSQL, then provides an
overview of their MongoDB to ScyllaDB migration process, challenges
& results. [2023]
Benchmarking Apache Cassandra (40 Nodes) vs ScyllaDB (4
Nodes) (Juliusz Stasiewicz, Piotr Grabowski, Karol
Baryla): We benchmarked Apache Cassandra on 40 nodes vs ScyllaDB on
just 4 nodes. See how they stacked up on throughput, latency, and
cost. [2022]
How Numberly Replaced Kafka with a Rust-Based ScyllaDB
Shard-Aware Application (Alexys Jacob): How Numberly
used Rust & ScyllaDB to replace Kafka, streamlining the way all its
AdTech components send and track messages (whatever their form).
[2023]
Async Rust in Practice: Performance, Pitfalls,
Profiling (Piotr Sarna): How our engineers used
flamegraphs to diagnose and resolve performance issues in our Tokio
framework based Rust driver. [2022]
On Coordinated Omission (Ivan Prisyazhynyy): Your
benchmark may be lying to you! Learn why coordinated omissions are
a concern, and how we account for them in benchmarking ScyllaDB.
[2021]
Why Disney+ Hotstar Replaced Redis and Elasticsearch with ScyllaDB
Cloud (Cynthia Dunlop) – Get the inside perspective on
how Disney+ Hotstar simplified its “continue watching” data
architecture for scale. [2022]
18 December 2024, 2:00 pm by
ScyllaDB
TL;DR ScyllaDB has decided to focus on a single release stream –
ScyllaDB Enterprise. Starting with the ScyllaDB Enterprise 2025.1
release (ETA February 2025): ScyllaDB Enterprise will change from
closed source to source available. ScyllaDB OSS AGPL 6.2 will stand
as the final OSS AGPL release. A free tier of the full-featured
ScyllaDB Enterprise will be available to the community. This
includes all the performance, efficiency, and security features
previously reserved for ScyllaDB Enterprise. For convenience, the
existing ScyllaDB Enterprise 2024.2 will gain the new source
available license starting from our next patch release (in
December), allowing easy migration of older releases. The source
available Scylla Manager will move to AGPL and the closed source
Kubernetes multi-region operator will be merged with the main
Apache-licensed ScyllaDB Kubernetes operator. Other ScyllaDB
components (e.g., Seastar, Kubernetes operator, drivers) will keep
their current licenses. Why are we doing this? ScyllaDB’s team has
always been extremely passionate about open source, low-level
optimizations, and the delivery of groundbreaking core technologies
– from hypervisors (KVM, Xen), to operating systems (Linux, OSv),
and the ScyllaDB database. Over our 12 years of existence, we
developed an OS, pivoted to the database space, developed Seastar
(the open source standalone core engine of ScyllaDB), and developed
ScyllaDB itself. Dozens of open source projects were created:
drivers, a Kubernetes operator, test harnesses, and various tools.
Open source is an outstanding way to share innovation. It is a
straightforward choice for projects that are not your core
business. However, it is a constant challenge for vendors whose
core product is open source. For almost a decade, we have been
maintaining two separate release streams: one for the open source
database and one for the enterprise product. Balancing the free vs.
paid offerings is a never-ending challenge that involves
engineering, product, marketing, and constant sales discussions.
Unlike other projects that decided to switch to source available or
BSL to protect themselves from “free ride” competition, we were
comfortable with AGPL. We took different paths, from the initial
reimplementation of the Apache Cassandra API, to an open source
implementation of a DynamoDB-compatible API. Beyond the license, we
followed the whole approach of ‘open source first.’ Almost every
line of code – from a new feature, to a bug fix – went to the open
source branch first. We were developing two product lines that
competed with one another, and we had to make one of them
dramatically better. It’s hard enough to develop a single database
and support Docker, Kubernetes, virtual and physical machines, and
offer a database-as-a-service. The value of developing two separate
database products, along with their release trains, ultimately does
not justify the massive overhead and incremental costs required. To
give you some idea of what’s involved, we have had nearly 60 public
releases throughout 2024. Moreover, we have been the single
significant contributor of the source code. Our ecosystem tools
have received a healthy amount of contributions, but not the core
database. That makes sense. The ScyllaDB internal implementation is
a C++, shard-per-core, future-promise code base that is extremely
hard to understand and requires full-time devotion. Thus
source-wise, in terms of the code, we operated as a full
open-source-first project. However, in reality, we benefitted from
this no more than as a source-available project. “Behind the
curtain” tradeoffs of free vs paid Balancing our requirements (of
open source first, efficient development, no crippling of our OSS,
and differentiation between the two branches) has been challenging,
to say the least. Our open source first culture drove us to develop
new core features in the open. Our engineers released these
features before we were prepared to decide what was appropriate for
open source and what was best for the enterprise paid offering. For
example, Tablets, our recent architectural shift, was all developed
in the open – and 99% of its end user value is available in the OSS
release. As the Enterprise version branched out of the OSS branch,
it was helpful to keep a unified base for reuse and efficiency.
However, it reduced our paid version differentiation since all
features were open by default (unless flagged). For a while, we
thought that the OSS release would be the latest and greatest and
have a short lifecycle as a differentiation and a means of
efficiency. Although maintaining this process required a lot of
effort on our side, this could have been a nice mitigation option,
a replacement for a feature/functionality gap between free and
paid. However, the OSS users didn’t really use the latest and
didn’t always upgrade. Instead, most users preferred to stick to
old, end-of-life releases. The result was a lose-lose situation
(for users and for us). Another approach we used was to
differentiate by using peripheral tools – such as Scylla Manager,
which helps to operate ScyllaDB (e.g., running backup/restore and
managing repairs) – and having a usage limit on them. Our
Kubernetes operator is open source and we added a separate closed
source repository for multi-region support for Kubernetes. This is
a complicated path for development and also for our paying users.
The factor that eventually pushed us over the line is that our new
architecture – with Raft, tablets, and native S3 – moves peripheral
functionality into the core database: Our backup and restore
implementation moves from an agent and external manager into the
core database. S3 I/O access for backup and restore (and, in the
future, for tiered storage) is handled directly by the core
database. The I/O operations are controlled by our schedulers,
allowing full prioritization and bandwidth control. Later on,
“point in time recovery” will be provided. This is a large overhaul
unification change, eliminating complexity while improving control.
Repair becomes automatic. Repair is a full-scan, backend process
that merges inconsistent replica data. Previously, it was
controlled by the external Scylla Manager. The new generation core
database runs its own automatic repair with tablet awareness. As a
result, there is no need for an external peripheral tool; repair
will become transparent to the user, like compaction is today.
These changes are leading to a more complete core product, with
better manageability and functionality. However, they eat into the
differentiators for our paid offerings. As you can see, a
combination of architecture consolidations, together with multiple
release stream efforts, have made our lives extremely complicated
and slowed down our progress. Going forward After a tremendous
amount of thought and discussion on these points, we decided to
unify the two release streams as described at the start of this
post. This license shift will allow us to better serve our
customers as well as provide increased free tier value to the
community. The new model opens up access to previously-restricted
capabilities that: Achieve up to 50% higher throughput and 33%
lower latency via profile-guided optimization Speed up node
addition/removal by 30X via file-based streaming Balance multiple
workloads with different performance needs on a single cluster via
workload prioritization Reduce network costs with ZSTD-based
network compression (with a shard dictionary) for intra-node RPC
Combine the best of Leveled Compaction Strategy and Size-tiered
Compaction Strategy with Incremental Compaction Strategy –
resulting in 35% better storage utilization Use encryption at rest,
LDAP integration, and all of the other benefits of the previous
closed source Enterprise version Provide a single (all open source)
Kubernetes operator for ScyllaDB Enable users to enjoy a longer
product life cycle This was a difficult decision for us, and we
know it might not be well-received by some of our OSS users running
large ScyllaDB clusters. We appreciate your journey and we hope you
will continue working with ScyllaDB. After 10 years, we believe
this change is the right move for our company, our database, our
customers, and our early adopters. With this shift, our team will
be able to move faster, better respond to your needs, and continue
making progress towards the major milestones on our roadmap: Raft
for data, optimized tablet elasticity, and tiered (S3) storage.
Read the
FAQ
16 December 2024, 1:09 pm by
ScyllaDB
The following is an excerpt from Chapter 1 of Database
Performance at Scale (an Open Access book that’s available for
free). Follow Joan’s highly fictionalized adventures with some
all-too-real database performance challenges. You’ll laugh. You’ll
cry. You’ll wonder how we worked this “cheesy story” into a deeply
technical book. Get the
complete book, free Lured in by impressive buzzwords like
“hybrid cloud,” “serverless,” and “edge first,” Joan readily joined
a new company and started catching up with their technology stack.
Her first project recently started a transition from their in-house
implementation of a database system, which turned out to not scale
at the same pace as the number of customers, to one of the
industry-standard database management solutions. Their new pick was
a new distributed database, which, contrarily to NoSQL, strives to
keep the original
ACID guarantees known in
the SQL world. Due to a few new data protection acts that tend to
appear annually nowadays, the company’s board decided that they
were going to maintain their own datacenter, instead of using one
of the popular cloud vendors for storing sensitive information. On
a very high level, the company’s main product consisted of only two
layers: The frontend, the entry point for users, which actually
runs in their own browsers and communicates with the rest of the
system to exchange and persist information. The everything-else,
customarily known as “backend,” but actually including load
balancers, authentication, authorization, multiple cache layers,
databases, backups, and so on. Joan’s first introductory task was
to implement a very simple service for gathering and summing up
various statistics from the database, and integrate that service
with the whole ecosystem, so that it fetches data from the database
in real-time and allows the DevOps teams to inspect the statistics
live. To impress the management and reassure them that hiring Joan
was their absolutely best decision this quarter, Joan decided to
deliver a proof-of-concept implementation on her first day! The
company’s unspoken policy was to write software in Rust, so she
grabbed the first driver for their database from a brief crates.io
search and sat down to her self-organized hackathon. The day went
by really smoothly, with Rust’s ergonomy-focused ecosystem
providing a superior developer experience. But then Joan ran her
first smoke tests on a real system. Disbelief turned to
disappointment and helplessness when she realized that every third
request (on average) ended up in an error, even though the whole
database cluster reported to be in a healthy, operable state. That
meant a debugging session was in order! Unfortunately, the driver
Joan hastily picked for the foundation of her work, even though
open-source on its own, was just a thin wrapper over precompiled,
legacy C code, with no source to be found. Fueled by a strong
desire to solve the mystery and a healthy dose of fury, Joan spent
a few hours inspecting the network communication with
Wireshark, and she made an
educated guess that the
bug must be in the hashing key implementation. In the database
used by the company, keys are hashed to later route requests to
appropriate nodes. If a hash value is computed incorrectly, a
request may be forwarded to the wrong node that can refuse it and
return an error instead. Unable to verify the claim due to missing
source code, Joan decided on a simpler path — ditching the
originally chosen driver and reimplementing the solution on one of
the officially supported, open-source drivers backed by the
database vendor, with a solid user base and regularly updated
release schedule. Joan’s diary of lessons learned, part I The
initial lessons include: Choose a driver carefully. It’s at the
core of your code’s performance, robustness, and reliability.
Drivers have bugs too, and it’s impossible to avoid them. Still,
there are good practices to follow: Unless there’s a good reason,
prefer the officially supported driver (if it exists); Open-source
drivers have advantages: They’re not only verified by the
community, but also allow deep inspection of its code, and even
modifying the driver code to get even more insights for debugging;
It’s better to rely on drivers with a well-established release
schedule since they are more likely to receive bug fixes (including
for security vulnerabilities) in a reasonable period of time.
Wireshark is a great open-source tool for interpreting network
packets; give it a try if you want to peek under the hood of your
program. The introductory task was eventually completed
successfully, which made Joan ready to receive her first real
assignment. The tuning Armed with the experience gained working on
the introductory task, Joan started planning how to approach her
new assignment: a misbehaving app. One of the applications
notoriously caused stability issues for the whole system,
disrupting other workloads each time it experienced any problems.
The rogue app was already based on an officially supported driver,
so Joan could cross that one off the list of potential root causes.
This particular service was responsible for injecting data backed
up from the legacy system into the new database. Because the
company was not in a great hurry, the application was written with
low concurrency in mind to have low priority and not interfere with
user workloads. Unfortunately, once every few days something kept
triggering an anomaly. The normally peaceful application seemed to
be trying to perform a denial-of-service attack on its own
database, flooding it with requests until the backend got
overloaded enough to cause issues for other parts of the ecosystem.
As Joan watched metrics presented in a Grafana dashboard, clearly
suggesting that the rate of requests generated by this application
started spiking around the time of the anomaly, she wondered how on
Earth this workload could behave like that. It was, after all,
explicitly implemented to send new requests only when less than 100
of them were currently in progress. Since collaboration was heavily
advertised as one of the company’s “spirit and cultural
foundations” during the onboarding sessions with an onsite coach,
she decided it’s best to discuss the matter with her colleague,
Tony. “Look, Tony, I can’t wrap my head around this,” she
explained. “This service doesn’t send any new requests when 100 of
them are already in flight. And look right here in the logs: 100
requests in progress, one returned a timeout error, and…,” she then
stopped, startled at her own epiphany. “Alright, thanks Tony,
you’re a dear – best
rubber duck ever!,” she
concluded and returned to fixing the code. The observation that led
to discovering the root cause was rather simple: the request didn’t
actually *return* a timeout error because the database server never
sent back such a response. The request was simply qualified as
timed out by the driver, and discarded. But the sole fact that the
driver no longer waits for a response for a particular request does
not mean that the database is done processing it! It’s entirely
possible that the request was instead just stalled, taking longer
than expected, and only the driver gave up waiting for its
response. With that knowledge, it’s easy to imagine that once 100
requests time out on the client side, the app might erroneously
think that they are not in progress anymore, and happily submit 100
more requests to the database, increasing the total number of
in-flight requests (i.e., concurrency) to 200. Rinse, repeat, and
you can achieve extreme levels of concurrency on your database
cluster—even though the application was supposed to keep it limited
to a small number! Joan’s diary of lessons learned, part II The
lessons continue: Client-side timeouts are convenient for
programmers, but they can interact badly with server-side timeouts.
Rule of thumb: make the client-side timeouts around twice as long
as server-side ones, unless you have an extremely good reason to do
otherwise. Some drivers may be capable of issuing a warning if they
detect that the client-side timeout is smaller than the server-side
one, or even amend the server-side timeout to match, but in general
it’s best to double-check. Tasks with seemingly fixed concurrency
can actually cause spikes under certain unexpected conditions.
Inspecting logs and dashboards is helpful in investigating such
cases, so make sure that observability tools are available both in
the database cluster and for all client applications. Bonus points
for distributed tracing, like
OpenTelemetry integration. With
client-side timeouts properly amended, the application choked much
less frequently and to a smaller extent, but it still wasn’t a
perfect citizen in the distributed system. It occasionally picked a
victim database node and kept bothering it with too many requests,
while ignoring the fact that seven other nodes were considerably
less loaded and could help handle the workload too. At other times,
its concurrency was reported to be exactly 200% larger than
expected by the configuration. Whenever the two anomalies converged
in time, the poor node was unable to handle all requests it was
bombarded with, and had to give up on a fair portion of them. A
long study of the driver’s documentation, which was fortunately
available in
mdBook format and kept
reasonably up-to-date, helped Joan alleviate those pains too. The
first issue was simply a misconfiguration of the non-default load
balancing policy, which tried too hard to pick “the least loaded”
database node out of all the available ones, based on heuristics
and statistics occasionally updated by the database itself.
Malheureusement, this policy was also “best effort,” and relied on
the fact that statistics arriving from the database were always
legit – but a stressed database node could become so overloaded
that it wasn’t sending back updated statistics in time! That led
the driver to falsely believe that this particular server was not
actually busy at all. Joan decided that this setup was a premature
optimization that turned out to be a footgun, so she just restored
the original default policy, which worked as expected. The second
issue (temporary doubling of the concurrency) was caused by another
misconfiguration: an overeager speculative retry policy. After
waiting for a preconfigured period of time without getting an
acknowledgment from the database, drivers would speculatively
resend a request to maximize its chances to succeed. This mechanism
is very useful to increase requests’ success rate. However, if the
original request also succeeds, it means that the speculative one
was sent in vain. In order to balance the pros and cons,
speculative retry should be configured to only resend requests if
it’s very likely that the original one failed. Otherwise, as in
Joan’s case, the speculative retry may act too soon, doubling the
number of requests sent (and thus also doubling concurrency)
without improving the success rate at all. Whew, nothing gives a
simultaneous endorphin rush and dopamine hit like a quality
debugging session that ends in an astounding success (except
writing a cheesy story in a deeply technical book, naturally).
Great job, Joan! The end.
Editor’s note: If you
made it this far and can’t get enough of cheesy database
performance stories, see what happened to poor old Patrick in
“
A Tale
of Database Performance Woes: Patrick’s Unlucky Green Fedoras.”
And if you appreciate this sense of humor, see Piotr’s
new
book on writing engineering blog posts.
10 December 2024, 1:05 pm by
ScyllaDB
How Joseph Shorter and Miles Ward led a fast, safe
migration with ScyllaDB’s DynamoDB-compatible API
Digital Turbine is a quiet but powerful player in the mobile ad
tech business. Their platform is preinstalled on Android phones,
connecting app developers, advertisers, mobile carriers, and device
manufacturers. In the process, they bring in $500M annually. And if
their database goes down, their business goes down. Digital Turbine
recently decided to standardize on Google Cloud – so continuing
with their DynamoDB database was no longer an option. They had to
move fast without breaking things. Joseph Shorter (VP, Platform
Architecture at Digital Turbine) teamed up with Miles Ward (CTO at
SADA) and devised a game plan to pull off the move. Spoiler: they
not only moved fast, but also ended up with an approach that was
even faster…and less expensive too. You can hear directly from Joe
and Miles in this conference talk: We’ve captured some highlights
from their discussion below. Why migrate from DynamoDB The tipping
point for the DynamoDB migration was Digital Turbine’s
decision to standardize on GCP following a series of
acquisitions. But that wasn’t the only issue. DynamoDB hadn’t been
ideal from a cost perspective or from a performance perspective.
Joe explained: “It can be a little expensive as you scale, to be
honest. We were finding some performance issues. We were doing a
ton of reads—90% of all interactions with DynamoDB were read
operations. With all those operations, we found that the
performance hits required us to scale up more than we wanted, which
increased costs.” Their DynamoDB migration requirements Digital
Turbine needed the migration to be as fast and low-risk as
possible, which meant keeping application refactoring to a minimum.
The main concern, according to Joe, was “How can we migrate without
radically refactoring our platform, while maintaining at least the
same performance and value, and avoiding a crash-and-burn
situation? Because if it failed, it would take down our whole
company. “ They approached SADA, who helped them think through a
few options – including some Google-native solutions and ScyllaDB.
ScyllaDB stood out due to its DynamoDB API, ScyllaDB Alternator.
What the DynamoDB migration entailed In summary, it was “as easy as
pudding pie” (quoting Joe here). But a little more detail: “There
is a DynamoDB API that we could just use. I won’t say there was no
refactoring. We did some refactoring to make it easy for engineers
to plug in this information, but it was straightforward. It took
less than a sprint to write that code. That was awesome. Everyone
had told us that ScyllaDB was supposed to be a lot faster. Our
reaction was, ‘Sure, every competitor says their product performs
better.’ We did a lot with DynamoDB at scale, so we were skeptical.
We decided to do a proper POC—not just some simple communication
with ScyllaDB compared to DynamoDB. We actually put up multiple
apps with some dependencies and set it up the way it actually
functions in AWS, then we pressure-tested it. We couldn’t afford
any mistakes—a mistake here means the whole company would go down.
The goal was to make sure, first, that it would work and, second,
that it would actually perform. And it turns out, it delivered on
all its promises. That was a huge win for us.” Results so far –
with minimal cluster utilization Beyond meeting their primary goal
of moving off AWS, the Digital Turbine team improved performance –
and they ended up reducing their costs a bit too, as an added
benefit. From Joe: “I think part of it comes down to the fact that
the performance is just better. We didn’t know what to expect
initially, so we scaled things to be pretty comparable. What we’re
finding is that it’s simply running better. Because of that, we
don’t need as much infrastructure. And we’re barely tapping the
ScyllaDB clusters at all right now. A 20% cost difference—that’s a
big number, no matter what you’re talking about. And when you
consider our plans to scale even further, it becomes even more
significant. In the industry we’re in, there are only a few major
players—Google, Facebook, and then everyone else. Digital Turbine
has carved out a chunk of this space, and we have the tools as a
company to start competing in ways others can’t. As we gain more
customers and more people say, ‘Hey, we like what you’re doing,’ we
need to scale radically. That 20% cost difference is already
significant now, and in the future, it could be massive. Better
performance and better pricing—it’s hard to ask for much more than
that. You’ve got to wonder why more people haven’t noticed this
yet.”
Learn more
about the difference between ScyllaDB and DynamoDB Compare
costs: ScyllaDB vs DynamoDB