The Reality of AI and Cloud Adoption
When you want to understand where technology is really heading, look past the headlines and listen to the practitioners. The recent Apache Cassandra® user survey offers exactly that kind of ground truth, with 144 power users sharing insights that cut through the market noise about AI adoption and...Making Effective Partitions for ScyllaDB Data Modeling
Learn how to ensure your tables are perfectly partitioned to satisfy your queries – in this excerpt from the book “ScyllaDB in Action.” Editor’s note We’re thrilled to share the following excerpt from Bo Ingram’s informative – and fun! – new book on ScyllaDB: ScyllaDB in Action. It’s available now via Manning and Amazon. You can also access a 3-chapter excerpt for free, compliments of ScyllaDB. Get the first 3 book chapters, free You might have already experienced Bo’s expertise and engaging communication style in his blog How Discord Stores Trillions of Messages or ScyllaDB Summit talks How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB and So You’ve Lost Quorum: Lessons From Accidental Downtime If not, you should 😉 And if you want to learn more from Bo, join him at our upcoming Masterclass: Data Modeling for Performance Masterclass. We’ve ordered boxes of print books and will be giving them out! Join Bo at the “Data Modeling for Performance” Masterclass The following is an excerpt from Chapter 3; it’s reprinted here with permission of the publisher. Read more from Chapter 3 here *** When analyzing the queries your application needs, you identified several tables for your schema. The next step in schema design is to ask how can these tables be uniquely identified and partitioned to satisfy the queries?. The primary key contains the row’s partition key — you learned in chapter 2 that the primary key determines a row’s uniqueness. Therefore, before you can determine the partition key, you need to know the primary key. PRIMARY KEYS First, you should check to see if what you’re trying to store contains a property that is popularly used for its uniqueness. Cars, for example, have a vehicle identification number, that’s unique per car. Books (including this one!) have an ISBN (international standard book number) to uniquely identify them. There’s no international standard for identifying an article or an author, so you’ll need to think a little harder to find the primary and partition keys. ScyllaDB does provide support for generating unique identifiers, but they come with some drawbacks that you’ll learn about in the next chapter. Next, you can ask if any fields can be combined to make a good primary key. What could you use to determine a unique article? The title might be reused, especially if you have unimaginative writers. The content would ideally be unique per article, but that’s a very large value for a key. Perhaps you could use a combination of fields — maybe date, author, and title? That probably works, but I find it’s helpful to look back at what you’re trying to query. When your application runs theRead Article
query, it’s trying to read only a single
article. Whatever is executing that query, probably a web server
responding to the contents of a URL, is trying to load that
article, so it needs information that can be stored in a URL. Isn’t
it obnoxious when you go to paste a link somewhere and the link
feels like it’s a billion characters long? To load an article, you
don’t want to have to keep track of the author ID, title, and date.
That’s why using an ID as a primary key is often a strong choice.
Providing a unique identifier satisfies the uniqueness requirement
of a primary key, and they can potentially encode other information
inside them, such as time, which can be used for relative ordering.
You’ll learn more about potential types of IDs in the next chapter
when you explore data types. What uniquely identifies an author?
You might think an email, but people sometimes change their email
addresses. Supplying your own unique identifier works well again
here. Perhaps it’s a numeric ID, or maybe it’s a username, but
authors, as your design stands today, need extra information to
differentiate them from other authors within your database. Article
summaries, like the other steps you’ve been through, are a little
more interesting. For the various article summary tables, if you
try to use the same primary key as articles, an ID, you’re going to
run into trouble. If an ID alone makes an article unique in the
articles
table, then presumably it suffices for the
index tables. That turns out to not be the case. An ID can still
differentiate uniqueness, but you also want to query by at least
the partition key to have a performant query, and if an ID is the
partition key, that doesn’t satisfy the use cases for querying by
author, date, or score. Because the partition key is contained
within the primary key, you’ll need to include those fields in your
primary key (figure 3.12). Taking
article_summaries_by_author
, your primary key for that
field would become author and your article ID. Similarly, the other
two tables would have the date and the article ID for
article_summaries_by_dat
e, and
article_summaries_by_score
would use the score and the
article ID for its primary key.
Figure 3.12 You sometimes need to add fields to an already-unique
primary key to use them in a partition key, especially when
denormalizing data. With your primary keys figured out, you can
move forward to determining your partition keys and potentially
adjusting your primary keys. PARTITION KEYS You learned in the last
chapter that the first entry in the primary key serves as the
partition key for the row (you’ll see later how to build a
composite primary key). A good partition key distributes data
evenly across the cluster and is used by the queries against that
table. It stores a relatively small amount of data (a good rule of
thumb is that 100 MB is a large partition, but it depends on the
use case). I’ve listed the tables and their primary key; knowing
the primary keys, you can now rearrange them to specify your
partition keys. What would be a good partition key for each of
these tables? Table 3.3 Each table has a primary key from
which the partition key is extracted For the articles
and authors
tables, it’s very straightforward. There
is one column in the primary key; therefore, the partition key is
the only column. If only everything was that easy!
TIP: By only having the ID as the primary key and
the partition key, you can look up rows only by their ID. A query
where you want to get multiple rows wouldn’t work with this
approach, because you’d be querying across partitions, bringing in
extra nodes and hampering performance. Instead, you want to query
by a partition key that would contain multiple rows, as you’ll do
with article summaries. The article summaries tables, however,
present you with a choice. Given a primary key of an ID and an
author, for article_summaries_by_author,
which should
be the partition key? If you choose ID, Scylla will distribute the
rows for this table around the cluster based on the ID of the
article. This distribution would mean that if you wanted to load
all articles by their author, the query would have to hit nodes all
across the cluster. That behavior is not efficient. If you
partitioned your data by the author, a query to find all articles
written by a given author would hit only the nodes that store that
partition, because you’re querying within that partition (figure
3.13). You almost always want to query by at least the partition
key — it is critical for good performance. Because the use case for
this table is to find articles by who wrote them, the author is the
choice for the partition key. This distribution makes your primary
key author_id, id
because the partition key is the
first entry in the primary key.
Figure 3.13 The partition key for each table should match the
queries you previously determined to achieve the best performance.
article_summaries_by_date
and
article_summaries_by_score,
in what might be the least
surprising statement of the book, present a similar choice as
article_summaries_by_author
with a similar solution.
Because you’re querying article_summaries_by_date
by
the date of the article, you want the data partitioned by it as
well, making the primary key date, id, or score, id for
article_summaries_by_score.
Tip: Right-sizing
partition keys: If a partition key is too large,
data will be unevenly distributed across the cluster. If a
partition key is too small, range scans that load multiple rows
might need several queries to load the desired amount of data.
Consider querying for articles by date — if you publish one article
a week but your partition key is per day, then querying for the
most recent articles will require queries with no results six times
out of seven. Partitioning your data is a balancing exercise.
Sometimes, it’s easy. Author is a natural partition key for
articles by author, whereas something like date might require some
finessing to get the best performance in your application. Another
problem to consider is one partition taking an outsized amount of
traffic — more than the node can handle. In the next chapters, as
you refine your design, you’ll learn how to adjust your partition
key to fit your database just right. After looking closely at the
tables, your primary keys are now ordered correctly to partition
the data. For the article summaries tables, your primary keys are
divided into two categories — partition key and not-partition key.
The leftover bit has an official name and performs a specific
function in your table. Let’s take a look at the not-partition
keys — clustering keys. CLUSTERING KEYS A clustering key is the
non-partition key part of a table’s primary key that defines the
ordering of the rows within the table. In the previous chapter,
when you looked at your example table, you noticed that Scylla had
filled in the table’s implicit configuration, ordering the
non-partition key columns — the clustering keys — by ascending
order. CREATE TABLE initial.food_reviews ( restaurant text,
ordered_at timestamp, food text, review text, PRIMARY KEY
(restaurant, ordered_at, food) ) WITH CLUSTERING ORDER BY
(ordered_at ASC, food ASC); Within the partition, each row is
sorted by its time ordered and then by the name of the food, each
in an ascending sort (so earlier order times and alphabetically by
food). Left to its own devices, ScyllaDB always defaults to the
ascending natural order of the clustering keys. When creating
tables, if you have clustering keys, you need to be intentional
about specifying their order to make sure that you’re getting
results that match your requirements. Consider
article_summaries_by_author
. The purpose of that table
is to retrieve the articles for a given author, but do you want to
see their oldest articles first or their newest ones? By default,
Scylla is going to sort the table by id ASC
, giving
you the old articles first. When creating your summary tables,
you’ll want to specify their sort orders so that you get the newest
articles first — specifying id DESC.
You now have
tables defined, and with those tables, the primary keys and
partition keys are set to specify uniqueness, distribute the data,
and provide an ordering based on your queries. Your queries,
however, are looking for more than just primary keys — authors have
named, and articles have titles and content. Not only do you need
to store these fields, you need to specify the structure of that
data. To accomplish this definition, you’ll use data types. In the
next chapter, you’ll learn all about them and continue practicing
query-first design. Maximizing Self-Managed Apache Cassandra with Kubernetes
Today’s infrastructure architects need to manage large-scale, enterprise Apache Cassandra® clusters to meet emerging AI data needs. To simplify that layer, we introduced DataStax Mission Control, an all-in-one platform for distributed database operations, observability, and deployment. As Mission...Database Internals: Optimizing Memory Management
How databases can get better performance by optimizing memory management The following blog post is an excerpt from Chapter 3 of the Database Performance at Scale book, which is available for free. This book sheds light on often overlooked factors that impact database performance at scale. Get the complete book, free Memory management is the central design point in all aspects of programming. Even comparing programming languages to one another always involves discussions about the way programmers are supposed to handle memory allocation and freeing. No wonder memory management design affects the performance of a database so much. Applied to database engineering, memory management typically falls into two related but independent subsystems: memory allocation and cache control. The former is in fact a very generic software engineering issue, so considerations about it are not extremely specific to databases (though they are crucial and are worth studying). Opposite to that, the latter topic is itself very broad, affected by the usage details and corner cases. Respectively, in the database world, cache control has its own flavor. Allocation The manner in which programs or subsystems allocate and free memory lies at the core of memory management. There are several approaches worth considering. As illustrated by Figure 3-2, a so-called “log-structured allocation” is known from filesystems where it puts sequential writes to a circular log on the persisting storage and handles updates the very same way. At some point, this filesystem must reclaim blocks that became obsolete entries in the log area to make some more space available for future writes. In a naive implementation, unused entries are reclaimed by rereading and rewriting the log from scratch; obsolete blocks are then skipped in the process. Figure 3-2: A log-structured allocation puts sequential writes to a circular log on the persisting storage and handles updates the very same way A memory allocator for naive code can do something similar. In its simplest form, it would allocate the next block of memory by simply advancing a next-free pointer. Deallocation would just need to mark the allocated area as freed. One advantage of this approach is the speed of allocation. Another is the simplicity and efficiency of deallocation if it happens in FIFO order or affects the whole allocation space. Stack memory allocations are later released in the order that’s reverse to allocation, so this is the most prominent and the most efficient example of such an approach. Using linear allocators as general-purpose allocators can be more problematic because of the difficulty of space reclamation. To reclaim space, it’s not enough to just mark entries as free. This leads to memory fragmentation, which in turn outweighs the advantages of linear allocation. So, as with the filesystem, the memory must be reclaimed so that it only contains allocated entries and the free space can be used again. Reclamation requires moving allocated entries around – a process that changes and invalidates their previously known addresses. In naive code, the locations of references to allocated entries (addresses stored as pointers) are unknown to the allocator. Existing references would have to be patched to make the allocator action transparent to the caller; that’s not feasible for a general-purpose allocator like malloc. Logging allocator use is tied to the programming language selection. Some RTTIs, like C++, can greatly facilitate this by providing move-constructors. However, passing pointers to libraries that are outside of your control (e.g., glibc) would still be an issue. Another alternative is adopting a strategy of pool allocators, which provide allocation spaces for allocation entries of a fixed size (see Figure 3-3). By limiting the allocation space that way, fragmentation can be reduced. A number of general-purpose allocators use pool allocators for small allocations. In some cases, those application spaces exist on a per-thread basis to eliminate the need for locking and improve CPU cache utilization. Figure 3-3: Pool allocators provide allocation spaces for allocation entries of a fixed size. Fragmentation is reduced by limiting the allocation space This pool allocation strategy provides two core benefits. First, it saves you from having to search for available memory space. Second, it alleviates memory fragmentation because it pre-allocates in memory a cache for use with a collection of object sizes. Here’s how it works to achieve that: The region for each of the sizes has fixed-size memory chunks that are suitable for the contained objects, and those chunks are all tracked by the allocator. When it’s time for the allocator to actually allocate memory for a certain type of data object, it’s typically possible to use a free slot (chunk) within one of the existing memory slabs. ( Note: We are using the term “slab” to mean one or more contiguous memory pages that contain pre-allocated chunks of memory.) When it’s time for the allocator to free the object’s memory, it can simply move that slot over to the containing slab’s list of unused/free memory slots. That memory slot (or some other free slot) will be removed from the list of free slots whenever there’s a call to create an object of the same type (or a call to allocate memory of the same size). The best allocation approach to pick heavily depends upon the usage scenario. One great benefit of a log-structured approach is that it handles fragmentation of small sub-pools in a more efficient way. Pool allocators, on the other hand, generate less background load on the CPU because of the lack of compacting activity. Cache control When it comes to memory management in a software application that stores lots of data on disk, you cannot overlook the topic of cache control. Caching is always a must in data processing, and it’s crucial to decide what and where to cache. If caching is done at the I/O level, for both read/write and mmap, caching can become the responsibility of the kernel. The majority of the system’s memory is given over to the page cache. The kernel decides which pages should be evicted when memory runs low, when pages need to be written back to disk, and controls read-ahead. The application can provide some guidance to the kernel using the madvise(2) and fadvise(2) system calls. The main advantage of letting the kernel control caching is that great effort has been invested by the kernel developers over many decades into tuning the algorithms used by the cache. Those algorithms are used by thousands of different applications and are generally effective. The disadvantage, however, is that these algorithms are general-purpose and not tuned to the application. The kernel must guess how the application will behave next. Even if the application knows differently, it usually has no way to help the kernel guess correctly. This results in the wrong pages being evicted, I/O scheduled in the wrong order, or read-ahead scheduled for data that will not be consumed in the near future. Next, doing the caching at the I/O level interacts with the topic often referred to as IMR – in memory representation. No wonder that the format in which data is stored on disk differs from the form the same data is allocated in memory as objects. The simplest reason why it’s not the same is byte-ordering. With that in mind, if the data is cached once it’s read from the disk, it needs to be further converted or parsed into the object used in memory. This can be a waste of CPU cycles, so applications may choose to cache at the object level. Choosing to cache at the object level affects a lot of other design points. With that, the cache management is all on the application side including cross-core synchronization, data coherence, invalidation, etc. Next, since objects can be (and typically are) much smaller than the average I/O size, caching millions and billions of those objects requires a collection selection that can handle it (we’ll get to this in a follow-up blog). Finally, caching on the object level greatly affects the way I/O is done. Read about ScyllaDB’s CachingNoSQL Data Modeling: Application Design Before Schema Design
Learn how to implement query-first design to build a ScyllaDB schema for a sample application – in this excerpt from the book “ScyllaDB in Action.” You might have already experienced Bo’s expertise and engaging communication style in his blog How Discord Stores Trillions of Messages or ScyllaDB Summit talks How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB and So You’ve Lost Quorum: Lessons From Accidental Downtime If not, you should 😉 And if you want to learn more from Bo, join him at our upcoming Masterclass: Data Modeling for Performance Masterclass. We’ve ordered boxes of print books and will be giving them out! Join Bo at the “Data Modeling for Performance” Masterclass The following is an excerpt from Chapter 3; it’s reprinted here with permission of the publisher. *** When designing a database schema, you need to create a schema that is synergistic with your database. If you consider Scylla’s goals as a database, it wants to distribute data across the cluster to provide better scalability and fault tolerance. Spreading the data means queries involve multiple nodes, so you want to design your application to make queries that use the partition key, minimizing the nodes involved in a request. Your schema needs to fit these constraints: It needs to distribute data across the cluster It should also query the minimal amount of nodes for your desired consistency level There’s some tension in these constraints — your schema wants to spread data across the cluster to balance load between the nodes, but you want to minimize the number of nodes required to serve a query. Satisfying these constraints can be a balancing act — do you have smaller partitions that might require more queries to aggregate together, or do you have larger partitions that require fewer queries, but spread it potentially unevenly across the cluster? In figure 3.1, you can see the cost of a query that utilizes the partition key and queries across the minimal amount of nodes versus one that doesn’t use the partition key, necessitating scanning each node for matching data. Using the partition key in your query allows the coordinator — the node servicing the request — to direct queries to nodes that own that partition, lessening the load on the cluster and returning results faster. Figure 3.1 Using the partition key minimizes the number of nodes required to serve the request. The aforementioned design constraints are each related to queries. You want your data to be spread across the cluster so that your queries distribute the load amongst multiple nodes. Imagine if all of your data was clustered on a small subset of nodes in your cluster. Some nodes would be working quite hard, whereas others might not be taking much traffic. If some of those heavily utilized nodes became overwhelmed, you could suffer quite degraded performance, as because of the imbalance, many queries could be unable to complete. However, you also want to minimize the number of nodes hit per query to minimize the work your query needs to do; a query that uses all nodes in a very large cluster would be very inefficient. These query-centric constraints necessitate a query-centric approach to design. How you query Scylla is a key component of its performance, and since you need to consider the impacts of your queries across multiple dimensions, it’s critical to think carefully about how you query Scylla. When designing schemas in Scylla, it’s best to practice an approach called query-first-design, where you focus on the queries your application needs to make and then build your database schema around that. In Scylla, you structure your data based on how you want to query it — query-first design helps you. Your query-first design toolbox In query-first design, you take a set of application requirements and ask yourself a series of questions that guide you through translating the requirements into a schema in ScyllaDB. Each of these questions builds upon the next, iteratively guiding you through building an effective ScyllaDB schema. These questions include the following: What are the requirements for my application? What are the queries my application needs to run to meet these requirements? What tables are these queries querying? How can these tables be uniquely identified and partitioned to satisfy the queries? What data goes inside these tables? Does this design match the requirements? Can this schema be improved? This process is visualized in (figure 3.2), showing how you start with your application requirements and ask yourself a series of questions, guiding you from your requirements to the queries you need to run to ultimately, a fully designed ScyllaDB schema ready to store data effectively. Figure 3.2 Query-first design guides you through taking application requirements and converting them to a ScyllaDB schema. You begin with requirements and then use the requirements to identify the queries you need to run. These queries are seeking something — those “somethings” need to be stored in tables. Your tables need to be partitioned to spread data across the cluster, so you determine that partitioning to stay within your requirements and database design constraints. You then specify the fields inside each table, filling it out. At this point, you can check two things: Does the design match the requirements? Can it be improved? This up-front design is important because in ScyllaDB changing your schema to match new use cases can be a high-friction operation. While Scylla supports new query patterns via some of its features (which you’ll learn about in chapter 7), these come at an additional performance cost, and if they don’t fit your needs, might necessitate a full manual copy of data into a new table. It’s important to think carefully about your design: not only what it needs to be, but also what it could be in the future. You start by extracting the queries from your requirements and expanding your design until you have a schema that fits both your application and ScyllaDB. To practice query-first design in Scylla, let’s take the restaurant review application introduced at the beginning of the chapter and turn it into a ScyllaDB schema. The sample application requirements In the last chapter, you took your restaurant reviews and stored them inside ScyllaDB. You enjoyed working with the database, and as you went to more places, you realized you could combine your two great loves — restaurant reviews and databases (if these aren’t your two great loves, play along with me). You decide to build a website to share your restaurant reviews. Because you already have a ScyllaDB cluster, you choose to use that (this is a very different book if you pick otherwise) as the storage for your website. The first step to query-first design is identifying the requirements for your application, as seen in figure 3.4. Figure 3.4 You begin query-first design by determining your application’s requirements. After ruminating on your potential website, you identify the features it needs, and most importantly, you give it a name — Restaurant Reviews. It does what it says! Restaurant Reviews has the following initial requirements: Authors post articles to the website Users view articles to read restaurant reviews Articles contain a title, an author, a score, a date, a gallery of images, the review text, and the restaurant A review’s score is between 1 and 10 The home page contains a summary of articles sorted by most recent, showing the title, author name, score, and one image The home page links to articles Authors have a dedicated page containing summaries of their articles Authors have a name, bio, and picture Users can view a list of article summaries sorted by the highest review score You have a hunch that as time goes by, you’ll want to add more features to this application and use those features to learn more about ScyllaDB. For now, these features give you a base to practice decomposing requirements and building your schema — let’s begin! Determining the queries Next, you ask what are the queries my application needs to run to meet these requirements?, as seen in figure 3.5 Your queries drive how you design your database schema; therefore, it is critical to understand your queries at the beginning of your design. Figure 3.5 Next, you take your requirements and use them to determine your application’s queries. For identifying queries, you can use the familiar CRUD operations — create, read, update, and delete — as verbs. These queries will act on nouns in your requirements, such as authors or articles. Occasionally, you’ll want to filter your queries — you can notate this filtering withby
followed by the filter condition. For example, if
your app needed to load events on a given date, you might use a
Read Events by Date
query. If you take a look at your
requirements, you’ll see several queries you’ll need.
TIP: These aren’t the actual queries you’ll
run; those are written in CQL, as seen in chapter 2, and look more
like SELECT * FROM your_cool_table WHERE
your_awesome_primary_key = 12;
. These are descriptions of
what you’ll need to query — in a later chapter when you finish your
design, you’ll turn these into actual CQL queries. The first
requirement is “Authors post articles to the website,” which sounds
an awful lot like a process that would involve inserting an article
into a database. Because you insert articles in the database via a
query, you will need a Create Article statement. You might be
asking at this point — what is an article? Although other
requirements discuss these fields, you should skip that concern for
the moment. Focus first on what queries you need to run, and then
you’ll later figure out the needed fields. The second requirement
is “Users view articles to read restaurant reviews.” Giving users
unfettered access to the database is a security no-go, so the app
needs to load an article to display to a user. This functionality
suggests a Read Article query (which is different than the user
perusing the article), which you can use to retrieve an article for
a user. The following two requirements refer to the data you need
to store and not a novel way to access them: Articles contain a
title, an author, a score, a date, a gallery of images, the review
text, and the restaurant A review’s score is between 1 and 10
Articles need certain fields, and each article is associated with a
review score that fits within specified parameters. You can save
these requirements for later when you fill out what fields are
needed in each table. The next relevant requirement says “The home
page contains a summary of articles sorted by most recent, showing
the title, author name, score, and one image.” The home page shows
article summaries, which you’ll need to load by their date, sorted
by most recent– Read Article Summaries by Date. Article summaries,
at first glance, look a lot like articles. Because you’re querying
different data, and you also need to retrieve summaries by their
time, you should consider them as different queries. Querying for
an article: loads a title, author, score, date, a gallery of
images, the review text, and the restaurant retrieves one specific
article On the other hand, loading the most recent article
summaries: loads only the title, author name, score, and one image
loads several summaries, sorted by their publishing date Perhaps
they can run against the same table, but you can see if that’s the
case further on in the exercise. When in doubt, it’s best to not
over-consolidate. Be expansive in your designs, and if there’s
duplication, you can reduce it when refining your designs. As you
work through implementing your design, you might discover reasons
for what seems to be unnecessarily duplicated in one step to be a
necessary separation later. Following these requirements is “the
home page links to articles”, which makes explicit that the article
summaries link to the article — you’ll look closer at this one when
you determine what fields you need. The next two requirements are
about authors. The website will contain a page for each
author — presumably only you at the moment, but you have visions of
a media empire. This author page will contain article summaries for
each author — meaning you’ll need to Read Article Summaries
by Author
. In the last requirement, there’s data that each
author has. You can study the specifics of these in a moment, but
it means that you’ll need to read information about the author, so
you will need a Read Author
query. For the last
requirement — ”Users can view a list of article summaries sorted by
the highest review score” — you’ll need a way to surface article
summaries sorted by their scores. This approach requires a
Read Article Summaries by Score
. TIP:
What would make a good partition key for reading articles sorted by
score? It’s a tricky problem; you’ll learn how to attack it
shortly. Having analyzed the requirements, you’ve determined six
queries your schema needs to support: Create Article Read Article
Read Article Summaries by Date Read Article Summaries by Author
Read Article Summaries by Score Read Author You might notice a
problem here — where do article summaries and authors get created?
How can you read them if nothing makes them exist? Requirement
lists often have implicit requirements — because you need to read
article summaries, they have to be created somehow. Go ahead and
add a Create Article Summary and Create Author query to your list.
You now have eight queries from the requirements you created,
listed in table 3.1.
There’s a joke that asks “How do you draw an owl?” — you draw
a couple of circles, and then you draw the rest of the owl.
Query-first design sometimes feels similar. You need to map your
queries to a database schema that is not only effective in meeting
the application’s requirements but is performant and uses
ScyllaDB’s benefits and features. Drawing queries out of
requirements is often straightforward, whereas designing the schema
from those queries requires balancing both application and database
concerns. Let’s take a look at some techniques you can apply as you
design a database schema. New cassandra_latest.yaml configuration for a top performant Apache Cassandra®
Welcome to our deep dive into the latest advancements in Apache Cassandra® 5.0, specifically focusing on the cassandra_latest.yaml configuration that is available for new Cassandra 5.0 clusters.
This blog post will walk you through the motivation behind these changes, how to use the new configuration, and the benefits it brings to your Cassandra clusters.
Motivation
The primary motivation for introducing cassandra_latest.yaml is to bridge the gap between maintaining backward compatibility and leveraging the latest features and performance improvements. The yaml addresses the following varying needs for new Cassandra 5.0 clusters:
- Cassandra Developers: who want to push new features but face challenges due to backward compatibility constraints.
- Operators: who prefer stability and minimal disruption during upgrades.
- Evangelists and New Users: who seek the latest features and performance enhancements without worrying about compatibility.
Using cassandra_latest.yaml
Using cassandra_latest.yaml is straightforward. It involves copying the cassandra_latest.yaml content to your cassandra.yaml or pointing the cassandra.config JVM property to the cassandra_latest.yaml file.
This configuration is designed for new Cassandra 5.0 clusters (or those evaluating Cassandra), ensuring they get the most out of the latest features in Cassandra 5.0 and performance improvements.
Key changes and features
Key Cache Size
- Old: Evaluated as a minimum from 5% of the heap or 100MB
- Latest: Explicitly set to 0
Impact: Setting the key cache size to 0 in the latest configuration avoids performance degradation with the new SSTable format. This change is particularly beneficial for clusters using the new SSTable format, which doesn’t require key caching in the same way as the old format. Key caching was used to reduce the time it takes to find a specific key in Cassandra storage.
Commit Log Disk Access Mode
- Old: Set to legacy
- Latest: Set to auto
Impact: The auto setting optimizes the commit log disk access mode based on the available disks, potentially improving write performance. It can automatically choose the best mode (e.g., direct I/O) depending on the hardware and workload, leading to better performance without manual tuning.
Memtable Implementation
- Old: Skiplist-based
- Latest: Trie-based
Impact: The trie-based memtable implementation reduces garbage collection overhead and improves throughput by moving more metadata off-heap. This change can lead to more efficient memory usage and higher write performance, especially under heavy load.
create table … with memtable = {'class': 'TrieMemtable', … }
Memtable Allocation Type
- Old: Heap buffers
- Latest: Off-heap objects
Impact: Using off-heap objects for memtable allocation reduces the pressure on the Java heap, which can improve garbage collection performance and overall system stability. This is particularly beneficial for large datasets and high-throughput environments.
Trickle Fsync
- Old: False
- Latest: True
Impact: Enabling trickle fsync improves performance on SSDs by periodically flushing dirty buffers to disk, which helps avoid sudden large I/O operations that can impact read latencies. This setting is particularly useful for maintaining consistent performance in write-heavy workloads.
SSTable Format
- Old: big
- Latest: bti (trie-indexed structure)
Impact: The new BTI format is designed to improve read and write performance by using a trie-based indexing structure. This can lead to faster data access and more efficient storage management, especially for large datasets.
sstable: selected_format: bti default_compression: zstd compression: zstd: enabled: true chunk_length: 16KiB max_compressed_length: 16KiB
Default Compaction Strategy
- Old: STCS (Size-Tiered Compaction Strategy)
- Latest: Unified Compaction Strategy
Impact: The Unified Compaction Strategy (UCS) is more efficient and can handle a wider variety of workloads compared to STCS. UCS can reduce write amplification and improve read performance by better managing the distribution of data across SSTables.
default_compaction: class_name: UnifiedCompactionStrategy parameters: scaling_parameters: T4 max_sstables_to_compact: 64 target_sstable_size: 1GiB sstable_growth: 0.3333333333333333 min_sstable_size: 100MiB
Concurrent Compactors
- Old: Defaults to the smaller of the number of disks and cores
- Latest: Explicitly set to 8
Impact: Setting the number of concurrent compactors to 8 ensures that multiple compaction operations can run simultaneously, helping to maintain read performance during heavy write operations. This is particularly beneficial for SSD-backed storage where parallel I/O operations are more efficient.
Default Secondary Index
- Old: legacy_local_table
- Latest: sai
Impact: SAI is a new index implementation that builds on the advancements made with SSTable Storage Attached Secondary Index (SASI). Provide a solution that enables users to index multiple columns on the same table without suffering scaling problems, especially at write time.
Stream Entire SSTables
- Old: implicity set to True
- Latest: explicity set to True
Impact: When enabled, it permits Cassandra to zero-copy stream entire eligible, SSTables between nodes, including every component. This speeds up the network transfer significantly subject to throttling specified by
entire_sstable_stream_throughput_outbound
and
entire_sstable_inter_dc_stream_throughput_outbound
for inter-DC transfers.
UUID SSTable Identifiers
- Old: False
- Latest: True
Impact: Enabling UUID-based SSTable identifiers ensures that each SSTable has a unique name, simplifying backup and restore operations. This change reduces the risk of name collisions and makes it easier to manage SSTables in distributed environments.
Storage Compatibility Mode
- Old: Cassandra 4
- Latest: None
Impact: Setting the storage compatibility mode to none enables all new features by default, allowing users to take full advantage of the latest improvements, such as the new sstable format, in Cassandra. This setting is ideal for new clusters or those that do not need to maintain backward compatibility with older versions.
Testing and validation
The cassandra_latest.yaml configuration has undergone rigorous testing to ensure it works seamlessly. Currently, the Cassandra project CI pipeline tests both the standard (cassandra.yaml) and latest (cassandra_latest.yaml) configurations, ensuring compatibility and performance. This includes unit tests, distributed tests, and DTests.
Future improvements
Future improvements may include enforcing password strength policies and other security enhancements. The community is encouraged to suggest features that could be enabled by default in cassandra_latest.yaml.
Conclusion
The cassandra_latest.yaml configuration for new Cassandra 5.0 clusters is a significant step forward in making Cassandra more performant and feature-rich while maintaining the stability and reliability that users expect. Whether you are a developer, an operator professional, or an evangelist/end user, cassandra_latest.yaml offers something valuable for everyone.
Try it out
Ready to experience the incredible power of the cassandra_latest.yaml configuration on Apache Cassandra 5.0? Spin up your first cluster with a free trial on the Instaclustr Managed Platform and get started today with Cassandra 5.0!
The post New cassandra_latest.yaml configuration for a top performant Apache Cassandra® appeared first on Instaclustr.