The Reality of AI and Cloud Adoption

When you want to understand where technology is really heading, look past the headlines and listen to the practitioners. The recent Apache Cassandra® user survey offers exactly that kind of ground truth, with 144 power users sharing insights that cut through the market noise about AI adoption and...

Making Effective Partitions for ScyllaDB Data Modeling

Learn how to ensure your tables are perfectly partitioned to satisfy your queries  – in this excerpt from the book “ScyllaDB in Action.” Editor’s note We’re thrilled to share the following excerpt from Bo Ingram’s informative – and fun! – new book on ScyllaDB: ScyllaDB in Action. It’s available now via Manning and Amazon. You can also access a 3-chapter excerpt for free, compliments of ScyllaDB. Get the first 3 book chapters, free You might have already experienced Bo’s expertise and engaging communication style in his blog How Discord Stores Trillions of Messages or ScyllaDB Summit talks How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB and  So You’ve Lost Quorum: Lessons From Accidental Downtime  If not, you should 😉 And if you want to learn more from Bo, join him at our upcoming Masterclass: Data Modeling for Performance Masterclass. We’ve ordered boxes of print books and will be giving them out! Join Bo at the “Data Modeling for Performance” Masterclass The following is an excerpt from Chapter 3; it’s reprinted here with permission of the publisher. Read more from Chapter 3 here *** When analyzing the queries your application needs, you identified several tables for your schema. The next step in schema design is to ask how can these tables be uniquely identified and partitioned to satisfy the queries?. The primary key contains the row’s partition key — you learned in chapter 2 that the primary key determines a row’s uniqueness. Therefore, before you can determine the partition key, you need to know the primary key. PRIMARY KEYS First, you should check to see if what you’re trying to store contains a property that is popularly used for its uniqueness. Cars, for example, have a vehicle identification number, that’s unique per car. Books (including this one!) have an ISBN (international standard book number) to uniquely identify them. There’s no international standard for identifying an article or an author, so you’ll need to think a little harder to find the primary and partition keys. ScyllaDB does provide support for generating unique identifiers, but they come with some drawbacks that you’ll learn about in the next chapter. Next, you can ask if any fields can be combined to make a good primary key. What could you use to determine a unique article? The title might be reused, especially if you have unimaginative writers. The content would ideally be unique per article, but that’s a very large value for a key. Perhaps you could use a combination of fields — maybe date, author, and title? That probably works, but I find it’s helpful to look back at what you’re trying to query. When your application runs the Read Article query, it’s trying to read only a single article. Whatever is executing that query, probably a web server responding to the contents of a URL, is trying to load that article, so it needs information that can be stored in a URL. Isn’t it obnoxious when you go to paste a link somewhere and the link feels like it’s a billion characters long? To load an article, you don’t want to have to keep track of the author ID, title, and date. That’s why using an ID as a primary key is often a strong choice. Providing a unique identifier satisfies the uniqueness requirement of a primary key, and they can potentially encode other information inside them, such as time, which can be used for relative ordering. You’ll learn more about potential types of IDs in the next chapter when you explore data types. What uniquely identifies an author? You might think an email, but people sometimes change their email addresses. Supplying your own unique identifier works well again here. Perhaps it’s a numeric ID, or maybe it’s a username, but authors, as your design stands today, need extra information to differentiate them from other authors within your database. Article summaries, like the other steps you’ve been through, are a little more interesting. For the various article summary tables, if you try to use the same primary key as articles, an ID, you’re going to run into trouble. If an ID alone makes an article unique in the articles table, then presumably it suffices for the index tables. That turns out to not be the case. An ID can still differentiate uniqueness, but you also want to query by at least the partition key to have a performant query, and if an ID is the partition key, that doesn’t satisfy the use cases for querying by author, date, or score. Because the partition key is contained within the primary key, you’ll need to include those fields in your primary key (figure 3.12). Taking article_summaries_by_author, your primary key for that field would become author and your article ID. Similarly, the other two tables would have the date and the article ID for article_summaries_by_date, and article_summaries_by_score would use the score and the article ID for its primary key.   Figure 3.12 You sometimes need to add fields to an already-unique primary key to use them in a partition key, especially when denormalizing data. With your primary keys figured out, you can move forward to determining your partition keys and potentially adjusting your primary keys. PARTITION KEYS You learned in the last chapter that the first entry in the primary key serves as the partition key for the row (you’ll see later how to build a composite primary key). A good partition key distributes data evenly across the cluster and is used by the queries against that table. It stores a relatively small amount of data (a good rule of thumb is that 100 MB is a large partition, but it depends on the use case). I’ve listed the tables and their primary key; knowing the primary keys, you can now rearrange them to specify your partition keys. What would be a good partition key for each of these tables?   Table 3.3 Each table has a primary key from which the partition key is extracted For the articles and authors tables, it’s very straightforward. There is one column in the primary key; therefore, the partition key is the only column. If only everything was that easy! TIP: By only having the ID as the primary key and the partition key, you can look up rows only by their ID. A query where you want to get multiple rows wouldn’t work with this approach, because you’d be querying across partitions, bringing in extra nodes and hampering performance. Instead, you want to query by a partition key that would contain multiple rows, as you’ll do with article summaries. The article summaries tables, however, present you with a choice. Given a primary key of an ID and an author, for article_summaries_by_author, which should be the partition key? If you choose ID, Scylla will distribute the rows for this table around the cluster based on the ID of the article. This distribution would mean that if you wanted to load all articles by their author, the query would have to hit nodes all across the cluster. That behavior is not efficient. If you partitioned your data by the author, a query to find all articles written by a given author would hit only the nodes that store that partition, because you’re querying within that partition (figure 3.13). You almost always want to query by at least the partition key — it is critical for good performance. Because the use case for this table is to find articles by who wrote them, the author is the choice for the partition key. This distribution makes your primary key author_id, id because the partition key is the first entry in the primary key. Figure 3.13 The partition key for each table should match the queries you previously determined to achieve the best performance. article_summaries_by_date and article_summaries_by_score, in what might be the least surprising statement of the book, present a similar choice as article_summaries_by_author with a similar solution. Because you’re querying article_summaries_by_date by the date of the article, you want the data partitioned by it as well, making the primary key date, id, or score, id for article_summaries_by_score. Tip: Right-sizing partition keys:  If a partition key is too large, data will be unevenly distributed across the cluster. If a partition key is too small, range scans that load multiple rows might need several queries to load the desired amount of data. Consider querying for articles by date — if you publish one article a week but your partition key is per day, then querying for the most recent articles will require queries with no results six times out of seven. Partitioning your data is a balancing exercise. Sometimes, it’s easy. Author is a natural partition key for articles by author, whereas something like date might require some finessing to get the best performance in your application. Another problem to consider is one partition taking an outsized amount of traffic — more than the node can handle. In the next chapters, as you refine your design, you’ll learn how to adjust your partition key to fit your database just right. After looking closely at the tables, your primary keys are now ordered correctly to partition the data. For the article summaries tables, your primary keys are divided into two categories — partition key and not-partition key. The leftover bit has an official name and performs a specific function in your table. Let’s take a look at the not-partition keys — clustering keys. CLUSTERING KEYS A clustering key is the non-partition key part of a table’s primary key that defines the ordering of the rows within the table. In the previous chapter, when you looked at your example table, you noticed that Scylla had filled in the table’s implicit configuration, ordering the non-partition key columns — the clustering keys — by ascending order. CREATE TABLE initial.food_reviews ( restaurant text, ordered_at timestamp, food text, review text, PRIMARY KEY (restaurant, ordered_at, food) ) WITH CLUSTERING ORDER BY (ordered_at ASC, food ASC); Within the partition, each row is sorted by its time ordered and then by the name of the food, each in an ascending sort (so earlier order times and alphabetically by food). Left to its own devices, ScyllaDB always defaults to the ascending natural order of the clustering keys. When creating tables, if you have clustering keys, you need to be intentional about specifying their order to make sure that you’re getting results that match your requirements. Consider article_summaries_by_author. The purpose of that table is to retrieve the articles for a given author, but do you want to see their oldest articles first or their newest ones? By default, Scylla is going to sort the table by id ASC, giving you the old articles first. When creating your summary tables, you’ll want to specify their sort orders so that you get the newest articles first — specifying id DESC. You now have tables defined, and with those tables, the primary keys and partition keys are set to specify uniqueness, distribute the data, and provide an ordering based on your queries. Your queries, however, are looking for more than just primary keys — authors have named, and articles have titles and content. Not only do you need to store these fields, you need to specify the structure of that data. To accomplish this definition, you’ll use data types. In the next chapter, you’ll learn all about them and continue practicing query-first design.

Maximizing Self-Managed Apache Cassandra with Kubernetes

Today’s infrastructure architects need to manage large-scale, enterprise Apache Cassandra® clusters to meet emerging AI data needs. To simplify that layer, we introduced DataStax Mission Control, an all-in-one platform for distributed database operations, observability, and deployment. As Mission...

Database Internals: Optimizing Memory Management

How databases can get better performance by optimizing memory management The following blog post is an excerpt from Chapter 3 of the Database Performance at Scale book, which is available for free. This book sheds light on often overlooked factors that impact database performance at scale. Get the complete book, free Memory management is the central design point in all aspects of programming. Even comparing programming languages to one another always involves discussions about the way programmers are supposed to handle memory allocation and freeing. No wonder memory management design affects the performance of a database so much. Applied to database engineering, memory management typically falls into two related but independent subsystems: memory allocation and cache control. The former is in fact a very generic software engineering issue, so considerations about it are not extremely specific to databases (though they are crucial and are worth studying). Opposite to that, the latter topic is itself very broad, affected by the usage details and corner cases. Respectively, in the database world, cache control has its own flavor. Allocation The manner in which programs or subsystems allocate and free memory lies at the core of memory management. There are several approaches worth considering. As illustrated by Figure 3-2, a so-called “log-structured allocation” is known from filesystems where it puts sequential writes to a circular log on the persisting storage and handles updates the very same way. At some point, this filesystem must reclaim blocks that became obsolete entries in the log area to make some more space available for future writes. In a naive implementation, unused entries are reclaimed by rereading and rewriting the log from scratch; obsolete blocks are then skipped in the process. Figure 3-2: A log-structured allocation puts sequential writes to a circular log on the persisting storage and handles updates the very same way A memory allocator for naive code can do something similar. In its simplest form, it would allocate the next block of memory by simply advancing a next-free pointer. Deallocation would just need to mark the allocated area as freed. One advantage of this approach is the speed of allocation. Another is the simplicity and efficiency of deallocation if it happens in FIFO order or affects the whole allocation space. Stack memory allocations are later released in the order that’s reverse to allocation, so this is the most prominent and the most efficient example of such an approach. Using linear allocators as general-purpose allocators can be more problematic because of the difficulty of space reclamation. To reclaim space, it’s not enough to just mark entries as free. This leads to memory fragmentation, which in turn outweighs the advantages of linear allocation. So, as with the filesystem, the memory must be reclaimed so that it only contains allocated entries and the free space can be used again. Reclamation requires moving allocated entries around – a process that changes and invalidates their previously known addresses. In naive code, the locations of references to allocated entries (addresses stored as pointers) are unknown to the allocator. Existing references would have to be patched to make the allocator action transparent to the caller; that’s not feasible for a general-purpose allocator like malloc. Logging allocator use is tied to the programming language selection. Some RTTIs, like C++, can greatly facilitate this by providing move-constructors. However, passing pointers to libraries that are outside of your control (e.g., glibc) would still be an issue. Another alternative is adopting a strategy of pool allocators, which provide allocation spaces for allocation entries of a fixed size (see Figure 3-3). By limiting the allocation space that way, fragmentation can be reduced. A number of general-purpose allocators use pool allocators for small allocations. In some cases, those application spaces exist on a per-thread basis to eliminate the need for locking and improve CPU cache utilization. Figure 3-3: Pool allocators provide allocation spaces for allocation entries of a fixed size. Fragmentation is reduced by limiting the allocation space This pool allocation strategy provides two core benefits. First, it saves you from having to search for available memory space. Second, it alleviates memory fragmentation because it pre-allocates in memory a cache for use with a collection of object sizes. Here’s how it works to achieve that: The region for each of the sizes has fixed-size memory chunks that are suitable for the contained objects, and those chunks are all tracked by the allocator. When it’s time for the allocator to actually allocate memory for a certain type of data object, it’s typically possible to use a free slot (chunk) within one of the existing memory slabs. ( Note: We are using the term “slab” to mean one or more contiguous memory pages that contain pre-allocated chunks of memory.) When it’s time for the allocator to free the object’s memory, it can simply move that slot over to the containing slab’s list of unused/free memory slots. That memory slot (or some other free slot) will be removed from the list of free slots whenever there’s a call to create an object of the same type (or a call to allocate memory of the same size). The best allocation approach to pick heavily depends upon the usage scenario. One great benefit of a log-structured approach is that it handles fragmentation of small sub-pools in a more efficient way. Pool allocators, on the other hand, generate less background load on the CPU because of the lack of compacting activity. Cache control When it comes to memory management in a software application that stores lots of data on disk, you cannot overlook the topic of cache control. Caching is always a must in data processing, and it’s crucial to decide what and where to cache. If caching is done at the I/O level, for both read/write and mmap, caching can become the responsibility of the kernel. The majority of the system’s memory is given over to the page cache. The kernel decides which pages should be evicted when memory runs low, when pages need to be written back to disk, and controls read-ahead. The application can provide some guidance to the kernel using the madvise(2) and fadvise(2) system calls. The main advantage of letting the kernel control caching is that great effort has been invested by the kernel developers over many decades into tuning the algorithms used by the cache. Those algorithms are used by thousands of different applications and are generally effective. The disadvantage, however, is that these algorithms are general-purpose and not tuned to the application. The kernel must guess how the application will behave next. Even if the application knows differently, it usually has no way to help the kernel guess correctly. This results in the wrong pages being evicted, I/O scheduled in the wrong order, or read-ahead scheduled for data that will not be consumed in the near future. Next, doing the caching at the I/O level interacts with the topic often referred to as IMR – in memory representation. No wonder that the format in which data is stored on disk differs from the form the same data is allocated in memory as objects. The simplest reason why it’s not the same is byte-ordering. With that in mind, if the data is cached once it’s read from the disk, it needs to be further converted or parsed into the object used in memory. This can be a waste of CPU cycles, so applications may choose to cache at the object level. Choosing to cache at the object level affects a lot of other design points. With that, the cache management is all on the application side including cross-core synchronization, data coherence, invalidation, etc. Next, since objects can be (and typically are) much smaller than the average I/O size, caching millions and billions of those objects requires a collection selection that can handle it (we’ll get to this in a follow-up blog). Finally, caching on the object level greatly affects the way I/O is done. Read about ScyllaDB’s Caching

NoSQL Data Modeling: Application Design Before Schema Design

Learn how to implement query-first design to build a ScyllaDB schema for a sample application  – in this excerpt from the book “ScyllaDB in Action.” You might have already experienced Bo’s expertise and engaging communication style in his blog How Discord Stores Trillions of Messages or ScyllaDB Summit talks How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB and  So You’ve Lost Quorum: Lessons From Accidental Downtime  If not, you should 😉 And if you want to learn more from Bo, join him at our upcoming Masterclass: Data Modeling for Performance Masterclass. We’ve ordered boxes of print books and will be giving them out! Join Bo at the “Data Modeling for Performance” Masterclass The following is an excerpt from Chapter 3; it’s reprinted here with permission of the publisher. *** When designing a database schema, you need to create a schema that is synergistic with your database. If you consider Scylla’s goals as a database, it wants to distribute data across the cluster to provide better scalability and fault tolerance. Spreading the data means queries involve multiple nodes, so you want to design your application to make queries that use the partition key, minimizing the nodes involved in a request. Your schema needs to fit these constraints: It needs to distribute data across the cluster It should also query the minimal amount of nodes for your desired consistency level There’s some tension in these constraints — your schema wants to spread data across the cluster to balance load between the nodes, but you want to minimize the number of nodes required to serve a query. Satisfying these constraints can be a balancing act — do you have smaller partitions that might require more queries to aggregate together, or do you have larger partitions that require fewer queries, but spread it potentially unevenly across the cluster? In figure 3.1, you can see the cost of a query that utilizes the partition key and queries across the minimal amount of nodes versus one that doesn’t use the partition key, necessitating scanning each node for matching data. Using the partition key in your query allows the coordinator — the node servicing the request — to direct queries to nodes that own that partition, lessening the load on the cluster and returning results faster. Figure 3.1 Using the partition key minimizes the number of nodes required to serve the request. The aforementioned design constraints are each related to queries. You want your data to be spread across the cluster so that your queries distribute the load amongst multiple nodes. Imagine if all of your data was clustered on a small subset of nodes in your cluster. Some nodes would be working quite hard, whereas others might not be taking much traffic. If some of those heavily utilized nodes became overwhelmed, you could suffer quite degraded performance, as because of the imbalance, many queries could be unable to complete. However, you also want to minimize the number of nodes hit per query to minimize the work your query needs to do; a query that uses all nodes in a very large cluster would be very inefficient. These query-centric constraints necessitate a query-centric approach to design. How you query Scylla is a key component of its performance, and since you need to consider the impacts of your queries across multiple dimensions, it’s critical to think carefully about how you query Scylla. When designing schemas in Scylla, it’s best to practice an approach called query-first-design, where you focus on the queries your application needs to make and then build your database schema around that. In Scylla, you structure your data based on how you want to query it — query-first design helps you. Your query-first design toolbox In query-first design, you take a set of application requirements and ask yourself a series of questions that guide you through translating the requirements into a schema in ScyllaDB. Each of these questions builds upon the next, iteratively guiding you through building an effective ScyllaDB schema. These questions include the following: What are the requirements for my application? What are the queries my application needs to run to meet these requirements? What tables are these queries querying? How can these tables be uniquely identified and partitioned to satisfy the queries? What data goes inside these tables? Does this design match the requirements? Can this schema be improved? This process is visualized in (figure 3.2), showing how you start with your application requirements and ask yourself a series of questions, guiding you from your requirements to the queries you need to run to ultimately, a fully designed ScyllaDB schema ready to store data effectively. Figure 3.2 Query-first design guides you through taking application requirements and converting them to a ScyllaDB schema. You begin with requirements and then use the requirements to identify the queries you need to run. These queries are seeking something — those “somethings” need to be stored in tables. Your tables need to be partitioned to spread data across the cluster, so you determine that partitioning to stay within your requirements and database design constraints. You then specify the fields inside each table, filling it out. At this point, you can check two things: Does the design match the requirements? Can it be improved? This up-front design is important because in ScyllaDB changing your schema to match new use cases can be a high-friction operation. While Scylla supports new query patterns via some of its features (which you’ll learn about in chapter 7), these come at an additional performance cost, and if they don’t fit your needs, might necessitate a full manual copy of data into a new table. It’s important to think carefully about your design: not only what it needs to be, but also what it could be in the future. You start by extracting the queries from your requirements and expanding your design until you have a schema that fits both your application and ScyllaDB. To practice query-first design in Scylla, let’s take the restaurant review application introduced at the beginning of the chapter and turn it into a ScyllaDB schema. The sample application requirements In the last chapter, you took your restaurant reviews and stored them inside ScyllaDB. You enjoyed working with the database, and as you went to more places, you realized you could combine your two great loves — restaurant reviews and databases (if these aren’t your two great loves, play along with me). You decide to build a website to share your restaurant reviews. Because you already have a ScyllaDB cluster, you choose to use that (this is a very different book if you pick otherwise) as the storage for your website. The first step to query-first design is identifying the requirements for your application, as seen in figure 3.4. Figure 3.4 You begin query-first design by determining your application’s requirements. After ruminating on your potential website, you identify the features it needs, and most importantly, you give it a name — Restaurant Reviews. It does what it says! Restaurant Reviews has the following initial requirements: Authors post articles to the website Users view articles to read restaurant reviews Articles contain a title, an author, a score, a date, a gallery of images, the review text, and the restaurant A review’s score is between 1 and 10 The home page contains a summary of articles sorted by most recent, showing the title, author name, score, and one image The home page links to articles Authors have a dedicated page containing summaries of their articles Authors have a name, bio, and picture Users can view a list of article summaries sorted by the highest review score You have a hunch that as time goes by, you’ll want to add more features to this application and use those features to learn more about ScyllaDB. For now, these features give you a base to practice decomposing requirements and building your schema — let’s begin! Determining the queries Next, you ask what are the queries my application needs to run to meet these requirements?, as seen in figure 3.5 Your queries drive how you design your database schema; therefore, it is critical to understand your queries at the beginning of your design. Figure 3.5 Next, you take your requirements and use them to determine your application’s queries. For identifying queries, you can use the familiar CRUD operations — create, read, update, and delete — as verbs. These queries will act on nouns in your requirements, such as authors or articles. Occasionally, you’ll want to filter your queries — you can notate this filtering with by followed by the filter condition. For example, if your app needed to load events on a given date, you might use a Read Events by Date query. If you take a look at your requirements, you’ll see several queries you’ll need. TIP:  These aren’t the actual queries you’ll run; those are written in CQL, as seen in chapter 2, and look more like SELECT * FROM your_cool_table WHERE your_awesome_primary_key = 12;. These are descriptions of what you’ll need to query — in a later chapter when you finish your design, you’ll turn these into actual CQL queries. The first requirement is “Authors post articles to the website,” which sounds an awful lot like a process that would involve inserting an article into a database. Because you insert articles in the database via a query, you will need a Create Article statement. You might be asking at this point — what is an article? Although other requirements discuss these fields, you should skip that concern for the moment. Focus first on what queries you need to run, and then you’ll later figure out the needed fields. The second requirement is “Users view articles to read restaurant reviews.” Giving users unfettered access to the database is a security no-go, so the app needs to load an article to display to a user. This functionality suggests a Read Article query (which is different than the user perusing the article), which you can use to retrieve an article for a user. The following two requirements refer to the data you need to store and not a novel way to access them: Articles contain a title, an author, a score, a date, a gallery of images, the review text, and the restaurant A review’s score is between 1 and 10 Articles need certain fields, and each article is associated with a review score that fits within specified parameters. You can save these requirements for later when you fill out what fields are needed in each table. The next relevant requirement says “The home page contains a summary of articles sorted by most recent, showing the title, author name, score, and one image.” The home page shows article summaries, which you’ll need to load by their date, sorted by most recent– Read Article Summaries by Date. Article summaries, at first glance, look a lot like articles. Because you’re querying different data, and you also need to retrieve summaries by their time, you should consider them as different queries. Querying for an article: loads a title, author, score, date, a gallery of images, the review text, and the restaurant retrieves one specific article On the other hand, loading the most recent article summaries: loads only the title, author name, score, and one image loads several summaries, sorted by their publishing date Perhaps they can run against the same table, but you can see if that’s the case further on in the exercise. When in doubt, it’s best to not over-consolidate. Be expansive in your designs, and if there’s duplication, you can reduce it when refining your designs. As you work through implementing your design, you might discover reasons for what seems to be unnecessarily duplicated in one step to be a necessary separation later. Following these requirements is “the home page links to articles”, which makes explicit that the article summaries link to the article — you’ll look closer at this one when you determine what fields you need. The next two requirements are about authors. The website will contain a page for each author — presumably only you at the moment, but you have visions of a media empire. This author page will contain article summaries for each author — meaning you’ll need to Read Article Summaries by Author. In the last requirement, there’s data that each author has. You can study the specifics of these in a moment, but it means that you’ll need to read information about the author, so you will need a Read Author query. For the last requirement — ”Users can view a list of article summaries sorted by the highest review score” — you’ll need a way to surface article summaries sorted by their scores. This approach requires a Read Article Summaries by Score. TIP: What would make a good partition key for reading articles sorted by score? It’s a tricky problem; you’ll learn how to attack it shortly. Having analyzed the requirements, you’ve determined six queries your schema needs to support: Create Article Read Article Read Article Summaries by Date Read Article Summaries by Author Read Article Summaries by Score Read Author You might notice a problem here — where do article summaries and authors get created? How can you read them if nothing makes them exist? Requirement lists often have implicit requirements — because you need to read article summaries, they have to be created somehow. Go ahead and add a Create Article Summary and Create Author query to your list. You now have eight queries from the requirements you created, listed in table 3.1. There’s a joke that asks “How do you draw an owl?” — you draw a couple of circles, and then you draw the rest of the owl. Query-first design sometimes feels similar. You need to map your queries to a database schema that is not only effective in meeting the application’s requirements but is performant and uses ScyllaDB’s benefits and features. Drawing queries out of requirements is often straightforward, whereas designing the schema from those queries requires balancing both application and database concerns. Let’s take a look at some techniques you can apply as you design a database schema.

New cassandra_latest.yaml configuration for a top performant Apache Cassandra®

Welcome to our deep dive into the latest advancements in Apache Cassandra® 5.0, specifically focusing on the cassandra_latest.yaml configuration that is available for new Cassandra 5.0 clusters.  

This blog post will walk you through the motivation behind these changes, how to use the new configuration, and the benefits it brings to your Cassandra clusters. 

Motivation 

The primary motivation for introducing cassandra_latest.yaml is to bridge the gap between maintaining backward compatibility and leveraging the latest features and performance improvements. The yaml addresses the following varying needs for new Cassandra 5.0 clusters: 

  1. Cassandra Developers: who want to push new features but face challenges due to backward compatibility constraints. 
  2. Operators: who prefer stability and minimal disruption during upgrades. 
  3. Evangelists and New Users: who seek the latest features and performance enhancements without worrying about compatibility. 

Using cassandra_latest.yaml 

Using cassandra_latest.yaml is straightforward. It involves copying the cassandra_latest.yaml content to your cassandra.yaml or pointing the cassandra.config JVM property to the cassandra_latest.yaml file.  

This configuration is designed for new Cassandra 5.0 clusters (or those evaluating Cassandra), ensuring they get the most out of the latest features in Cassandra 5.0 and performance improvements. 

Key changes and features 

Key Cache Size 

  • Old: Evaluated as a minimum from 5% of the heap or 100MB
  • Latest: Explicitly set to 0

Impact: Setting the key cache size to 0 in the latest configuration avoids performance degradation with the new SSTable format. This change is particularly beneficial for clusters using the new SSTable format, which doesn’t require key caching in the same way as the old format. Key caching was used to reduce the time it takes to find a specific key in Cassandra storage. 

Commit Log Disk Access Mode 

  • Old: Set to legacy
  • Latest: Set to auto

Impact: The auto setting optimizes the commit log disk access mode based on the available disks, potentially improving write performance. It can automatically choose the best mode (e.g., direct I/O) depending on the hardware and workload, leading to better performance without manual tuning.

Memtable Implementation 

  • Old: Skiplist-based
  • Latest: Trie-based

Impact: The trie-based memtable implementation reduces garbage collection overhead and improves throughput by moving more metadata off-heap. This change can lead to more efficient memory usage and higher write performance, especially under heavy load.

create table … with memtable = {'class': 'TrieMemtable', … }

Memtable Allocation Type 

  • Old: Heap buffers 
  • Latest: Off-heap objects 

Impact: Using off-heap objects for memtable allocation reduces the pressure on the Java heap, which can improve garbage collection performance and overall system stability. This is particularly beneficial for large datasets and high-throughput environments. 

Trickle Fsync 

  • Old: False 
  • Latest: True 

Impact: Enabling trickle fsync improves performance on SSDs by periodically flushing dirty buffers to disk, which helps avoid sudden large I/O operations that can impact read latencies. This setting is particularly useful for maintaining consistent performance in write-heavy workloads. 

SSTable Format 

  • Old: big 
  • Latest: bti (trie-indexed structure) 

Impact: The new BTI format is designed to improve read and write performance by using a trie-based indexing structure. This can lead to faster data access and more efficient storage management, especially for large datasets. 

sstable:
  selected_format: bti
  default_compression: zstd
  compression:
    zstd:
      enabled: true
      chunk_length: 16KiB
      max_compressed_length: 16KiB

Default Compaction Strategy 

  • Old: STCS (Size-Tiered Compaction Strategy) 
  • Latest: Unified Compaction Strategy 

Impact: The Unified Compaction Strategy (UCS) is more efficient and can handle a wider variety of workloads compared to STCS. UCS can reduce write amplification and improve read performance by better managing the distribution of data across SSTables. 

default_compaction:
  class_name: UnifiedCompactionStrategy
  parameters:
    scaling_parameters: T4
    max_sstables_to_compact: 64
    target_sstable_size: 1GiB
    sstable_growth: 0.3333333333333333
    min_sstable_size: 100MiB

Concurrent Compactors 

  • Old: Defaults to the smaller of the number of disks and cores
  • Latest: Explicitly set to 8

Impact: Setting the number of concurrent compactors to 8 ensures that multiple compaction operations can run simultaneously, helping to maintain read performance during heavy write operations. This is particularly beneficial for SSD-backed storage where parallel I/O operations are more efficient. 

Default Secondary Index 

  • Old: legacy_local_table
  • Latest: sai

Impact: SAI is a new index implementation that builds on the advancements made with SSTable Storage Attached Secondary Index (SASI). Provide a solution that enables users to index multiple columns on the same table without suffering scaling problems, especially at write time. 

Stream Entire SSTables 

  • Old: implicity set to True
  • Latest: explicity set to True

Impact: When enabled, it permits Cassandra to zero-copy stream entire eligible, SSTables between nodes, including every component. This speeds up the network transfer significantly subject to throttling specified by

entire_sstable_stream_throughput_outbound

and

entire_sstable_inter_dc_stream_throughput_outbound

for inter-DC transfers. 

UUID SSTable Identifiers 

  • Old: False
  • Latest: True

Impact: Enabling UUID-based SSTable identifiers ensures that each SSTable has a unique name, simplifying backup and restore operations. This change reduces the risk of name collisions and makes it easier to manage SSTables in distributed environments. 

Storage Compatibility Mode 

  • Old: Cassandra 4
  • Latest: None

Impact: Setting the storage compatibility mode to none enables all new features by default, allowing users to take full advantage of the latest improvements, such as the new sstable format, in Cassandra. This setting is ideal for new clusters or those that do not need to maintain backward compatibility with older versions. 

Testing and validation 

The cassandra_latest.yaml configuration has undergone rigorous testing to ensure it works seamlessly. Currently, the Cassandra project CI pipeline tests both the standard (cassandra.yaml) and latest (cassandra_latest.yaml) configurations, ensuring compatibility and performance. This includes unit tests, distributed tests, and DTests. 

Future improvements 

Future improvements may include enforcing password strength policies and other security enhancements. The community is encouraged to suggest features that could be enabled by default in cassandra_latest.yaml. 

Conclusion 

The cassandra_latest.yaml configuration for new Cassandra 5.0 clusters is a significant step forward in making Cassandra more performant and feature-rich while maintaining the stability and reliability that users expect. Whether you are a developer, an operator professional, or an evangelist/end user, cassandra_latest.yaml offers something valuable for everyone. 

Try it out 

Ready to experience the incredible power of the cassandra_latest.yaml configuration on Apache Cassandra 5.0? Spin up your first cluster with a free trial on the Instaclustr Managed Platform and get started today with Cassandra 5.0!

The post New cassandra_latest.yaml configuration for a top performant Apache Cassandra® appeared first on Instaclustr.

P99 CONF 24 Recap: Heckling on the Shoulders of Giants

As I sit here at the hotel breakfast bar contemplating what a remarkable couple of days it’s been for the fourth annual P99 CONF, I feel quite honored to have helped host it. While the coffee is strong and the presentations are fresh in my mind, let’s recap some of the great content we shared and reveal some of the behind-the-scenes efforts that made it all happen. Watch P99 CONF Talks On Demand Day 1 While we warmed up the live hosting stage, we had a fright with Error 1016 origin DNS errors on our event platform. As we scrambled to potentially host live on an alternative platform, Cloudflare saved the day and the show was back on the road. DNS issues weren’t going to stop us from launching P99 CONF! Felipe Cardeneti Mendes got things started in the lounge with hundreds of people asking great questions about ScyllaDB pre-show. Co-founder of ScyllaDB, Dor Laor, opened the show with his keynote about ScyllaDB tablets. In the first few slides we were looking at assembly, then 128 fully utilized CPU cores not long after that. By the end of the presentation, we had throughput north of 1M ops/sec – complete with ScyllaDB’s now famous predictable low latency. To help set the scene of P99 CONF, we heard from Pekka Enberg, CTO of Turso (sorry, I’m the one who overlooked the company name mistake in the original video). Pekka dove into the patterns of low latency. This generated more great conversation in chat. If you want all the details, then his book simply titled Latency is a must-read. Since parallel programming is hard, we opened up 3 stages for you to choose from following the keynotes. Felipe returned, this time as a session speaker. Proving that not all benchmarks need to be institutionalized cheating, he paired with Alan “Dormando” of Memcached to see how ScyllaDB stacks up from a caching perspective. We also heard from Luc Lenôtre, a talented engineer, who toyed with a kernel written in Rust. Luc showed us lots of flame graphs and low-level tuning of Maestro. Continuing with the Rust theme was Amos Wenger in a very interesting look at making HTTP faster with io_uring. There were other great talks from well-known companies. For example, Jason Rahman from Microsoft shared his insights on tracing Linux scheduler behavior using ftrace. Also, Christopher Peck from Uber shared their experience tuning with generational ZGC. This reflects much of the P99 CONF content – real world, production experience taming P99 latencies at scale. Another expanding theme at this year’s P99 CONF eBPF. And who better than Liz Rice from Isovalent to kick it off with her keynote, Zero-overhead Container Networking with eBPF and Netkit. I love listening to Liz explain in technical detail the concepts and benefits of using eBPF and will definitely be reading through her book–Learning eBPF on the long flight home to Australia. By the way, books! There are now so many authors associated with P99 CONF which, I think, is a testament to the quality and professionalism of the speakers. We were giving away book bundles to members of the community who were top contributors in the chat, answering and asking great questions (huge thank you!). Some of the books on offer – which you can grab for yourself – are: Database Performance at Scale – by Felipe Mendes (ScyllaDB) et. al. Latency – by Pekka Enberg (Turso) Think Distributed Systems – by Dominik Tornow (Resonate HQ) Writing for Developers: Blogs that Get Read – by Piotr Sarna (poolside) ScyllaDB in Action – by Bo Ingram (Discord) And if you’re truly a tech bookworm, see this blog post for an extensive reading list: 14 Books by P99 CONF Speakers: Latency, Wasm, Databases & More. By mid-afternoon, day 1, Gunnar Morling from decodable lightened things up with his keynote on creating the 1 billion row challenge. I’m sure you’ve heard of it, and we had another speaker, Shraddha Agrawal following up with her version in Golang. We enjoyed lots more great content in the afternoon, including Piotr Sarna (from poolside AI and co-author of Database Performance at Scale + Writing for Developers) taking us back to the long-standing database theme of the conference with performance perspectives on database drivers. Speaking of themes, Wasm returned with book authors Brian Sletten and Ramnivas Laddad looking at WebAssembly on the edge. And the two Adams from Zoo gave us unique insight into building a remote CAD solution that feels local. And showing that we can finish day 1 just as strong as we started it, Carl Lerche, creator of tokio-rs, returned to P99 CONF for the Day 1 closing keynote. This year, he highlighted how Rust – which is typically used at the infrastructure level for all the features we love, like safety, concurrency, and performance – is also applicable at higher levels in the stack. He also announced the first version of Toasty, an ORM for Rust. Day 2 The second day kicked off with Andy Pavlo from CMU and his take on the tension between the database and operating system, with a unique research project, Tigger, a database proxy that pushes a database into kernel space using eBPF. Leading on from that, we had Bryan Cantrill, CTO of Oxide and creator of DTrace, reviewing DTrace’s 21-year history with plenty of insights into the origins and evolution of this framework. Bryan has presented at every P99 CONF and is one of the many industry giants, that you can stand on the shoulders of heckle in chat. We love the dynamism that Bryan brings each year to P99 CONF. More great talks followed with Cameron from Shopify, Cary from Oracle, Vivkek from Linkedin and Richard from Datadog. We were also honored to host the creator of Postgres and Turing Award winner Michael Stonebraker. I also enjoyed Tanel Poder’s take on using eBPF off-CPU sampling to get to the bottom of database wait time. There was also more database content from Lukasz at ScyllaDB with advanced compression techniques. Avi Kivity led the afternoon’s keynote with his fascinating asymptotic performance journey through optimization and tradeoffs he’s made as CTO for ScyllaDB. Also on ScyllaDB, the Sharechat team described the evolution of their feature store. First they made it fast, as described at P99 CONF 23. Now, they’re making it cheaper to run with this impressive performance. Another blast of technical content in the afternoon with Benjamin and Tyler from American Express building card payment systems with ultra-low latency. A P99 favorite, Aleksei from TigerBeetle looked at just-in-time compaction with incredible detail. Dominik from Resonate to us through distributed async await patterns in the cloud. Peter from Percona dove into Redis alternatives. Ash from Unum Cloud gave us insights to AI powered real-time searches taking advantage of SIMD and GPU accelerated implementations. And Cristian from Uber was all about improving p99 latency in third-party APIs. Jose Fernandez from Netflix capped off the second day with his keynote on noisy neighbor detection with eBPF, which was a strong finish to another fantastic conference. What you didn’t see on stage, but happened in the recently renamed “Champagne Room,” was a mini happy birthday celebration for Cynthia – complete with a near-miss disaster champagne opening (no electrical equipment was harmed). Conference chat was lively right until the end. As I finish my second cup of coffee, I think that this is really what makes P99 CONF so special. Not only the ability to watch people from around the world share their real-life P99-related experiences, but having the chance to chat with them too. I look forward to hosting this again next year. Watch P99 CONF Talks On Demand

5X to 40X Lower DynamoDB Costs— with Better P99 Latency

At our recent events, I’ve been fielding a lot of questions about DynamoDB costs. So, I wanted to highlight the cost comparison aspect of a recent benchmark comparing ScyllaDB and DynamoDB. This involved a detailed price-performance comparison analyzing: How cost compares across both DynamoDB pricing models under various workload conditions How latency compares across this set of workloads I’ll share details below, but here’s a quick summary: ScyllaDB costs are significantly lower in all but one scenario. In realistic workloads, costs would be 5X to 40X lower — with up to 4X better P99 latency. Here’s a consolidated look at how DynamoDB and ScyllaDB compare on a Uniform distribution (DynamoDB’s sweet spot). Now, more details on the cost aspect of this comparison. For a deeper dive into the performance aspect, see this DynamoDB vs ScyllaDB price-performance comparison blog as well as the complete benchmark report. How We Compared DynamoDB Costs vs ScyllaDB Costs For our cost comparisons, we launched a small 3-node cluster in ScyllaDB Cloud and measured performance on a wide range of workload scenarios. Next, we calculated the cost of running the same workloads on DynamoDB. We used an item size of 1081 bytes, which translates to 2 WCUs per write operation and 1 RCU per read operation on DynamoDB. Our working data set size was 1 TB, with an approximate cost of ~$250/month in DynamoDB. We used the same ScyllaDB cluster through every testing scenario, thus simplifying ScyllaDB Cloud costs. Hourly rates (on-demand) were used. As ScyllaDB linearly scales with the amount of resources, you can predictably adjust costs to match your desired performance outcome. Annual pricing provides significant cost reduction but is out of the scope of this benchmark. DynamoDB has two modes for non-annual pricing: provisioned and on-demand pricing. Provisioned mode is recommended if your workloads are reasonably predictable. On-demand pricing is significantly more expensive and is a fit for unpredictable, low-throughput workloads. It is possible to combine modes, add auto-scaling, and so forth. DynamoDB provides considerable flexibility around managing the cost and scale of the aforementioned options, but this also results in considerable complexity. For details on how we calculated costs, refer to the Cost Calculations section at the end of this article. Throughout all tests, we ensured ScyllaDB had spare capacity at all times by keeping its load below 75%. Given that, note that it is possible to achieve higher traffic than the numbers reported here at no additional cost, in turn allowing for additional growth. The number of operations per second that the ScyllaDB cluster performs for each workload is reported under the X axis in the following graphs. Provisioned Cost Comparison: DynamoDB vs ScyllaDB Provisioned mode is recommended if your workloads are reasonably predictable. With DynamoDB, you need to be able to predict per-table capacity following AWS DynamoDB read/write capacity unit pricing model. With just one exception, DynamoDB’s cost estimates were consistently higher than ScyllaDB – and much more so for the most write-heavy workloads. In the 1 out of 15 cases where DynamoDB turned out to be less expensive, ScyllaDB could actually drive more utilization to win over DynamoDB. However, we wanted to keep the results consistent and fair. This is not surprising, given that DynamoDB charges 5X more for writes than for reads, while ScyllaDB does not differentiate between operations, and its pricing is based on the actual cluster size. On-Demand Cost Comparison: DynamoDB vs ScyllaDB On-demand pricing is best when the application’s workload is unclear, the application’s data traffic patterns are unknown, and/or your company prefers a pay-as-you-go option. However, as the results show, the convenience and flexibility of DynamoDB’s on-demand pricing often come at quite a cost. To see how we calculated costs, refer to the Cost Calculations section at the end of this article.     Here, the same general trends hold true. ScyllaDB cost is fixed across the board, and its cost advantage grows as write throughput increases. ScyllaDB’s cost advantage over on-demand DynamoDB tables is significantly greater when compared to provisioned capacity on DynamoDB. Why? Because DynamoDB’s on-demand pricing is significantly higher than its provisioned capacity counterpart. Therefore, workloads with unpredictable traffic spikes (which would justify not using provisioned capacity) may easily end up with runaway bills compared to costs with ScyllaDB Cloud. Making ScyllaDB Even More Cost Effective Unlike DynamoDB (where you provision tables), ScyllaDB is provisioned as a cluster, capable of hosting several tables – and therefore consolidating several workloads under a single deployment. Excess hardware capacity may be shared to power more use cases on that cluster. ScyllaDB Cloud and Enterprise users can also use ScyllaDB’s Workload Prioritization to prioritize specific access patterns and further drive consolidation. For example, assume there are 10 use cases that require 100K OPS each. With DynamoDB, users would be forced to allocate a provisioned workload per table or to use the rather expensive on-demand mode. This introduces a set of caveats: If every workload consistently reaches its peak capacity, it will likely get throttled by AWS (provisioned mode), or result in runaway bills (on-demand mode). Likewise, the opposite also holds true: If most workloads are consistently idle, provisioned mode results in non-consumed capacity bills. On-demand mode doesn’t guarantee immediate capacity to support traffic surges. This, in turn, causes your applications to experience some degree of throttling. A standard ScyllaDB deployment is not only more cost effective, but also simplifies management. It allows users to consolidate all workloads within a single cluster and share idle capacity among them. With ScyllaDB Cloud and Enterprise, users further have the flexibility to define priorities on a per-workload basis, allowing the database to make more informed decisions when two or more workloads compete against each other for resources. Cost Calculation Details Here’s how we calculated costs across the different databases and pricing models. DynamoDB Cost Calculations Provisioned Costs for DynamoDB With DynamoDB’s provisioned capacity mode, planning is required. You specify the number of reads and writes per second that you expect your application to require. You can make use of auto-scaling to automatically adjust your table’s capacity based on a specific utilization target in order to sustain spikes outside of your projected planning. In provisioned mode, you need to provision DynamoDB with the expected throughput. You set WCUs (Write Capacity Units) and RCUs (Read Capacity Units) which signify the allowed number of write and read operations per second, respectively. They are priced per hour. One WCU is $0.00065 per hour One RCU is $0.00013 per hour This yields the following formulas for calculating monthly costs:     On-Demand Costs for DynamoDB With DynamoDB’s on-demand mode, no planning is required. You pay for the actual reads/writes that your application is using (the total number of actual writes or reads, not writes or reads per second). In this mode, you pay by usage and the cost is per request unit (rather than per capacity unit, as in the provisioned mode). AWS charges $1.25 per million write request units (WRU) and $0.25 per million read request units (RRU). Therefore, a single request unit costs 1 millionth of the actual write/read operation, as follows: One WRU is $0.00000125 per write One RRU is $0.00000025 per read This yields the following formulas for calculating monthly costs:   ScyllaDB Cost Calculations As stated previously, we used ScyllaDB’s on-demand pricing for all cost comparisons in this study. ScyllaDB’s on-demand costs were determined using our online pricing calculator as follows: From the ScyllaDB Pricing Calculator This calculator estimates the size and cost of a ScyllaDB Cloud cluster based on the specified technical requirements around throughput and item/data size. Note that in ScyllaDB, the primary aspect driving costs is the cluster size, unlike DynamoDB’s model on the volume of reads and writes. Once the cluster size is determined, the deployment can often exceed throughput requirements. For comparison, DynamoDB’s provisioned pricing structure requires users to explicitly specify sustained and peak throughput. Overprovisioning equivalent performance in DynamoDB would be significantly pricier compared to ScyllaDB. Without an annual commitment for cost savings, the estimated annual cost for the ScyllaDB Cloud cluster is $54,528, calculated at a monthly rate of $4,544. Conclusion As the results indicate, what might begin at a seemingly reasonable cost can quickly escalate into “bill shock” with DynamoDB – especially at high throughputs, and particularly with write-heavy workloads. This makes DynamoDB a suboptimal choice for data-intensive applications anticipating steady or rapid growth. ScyllaDB’s significantly lower costs – a reflection of ScyllaDB taking full advantage of modern infrastructure for high throughput and low latency – make it a more cost-effective solution for data-intensive applications. ScyllaDB – with its LSM-tree-based storage, unified caching, shard-per-core design, and advanced schedulers – allows you to maximize the advantages of modern hardware, from huge CPU chips to blazing-fast NVMe. Beyond the presented cost savings, ScyllaDB sustains 2X peaks and provides 2X-4X better P99 latency. Additionally, it can further reduce latency when idle – or enable spare resources to be shared across multiple tables. For larger workloads spanning 500K-1M OPS and beyond, this can result in a cost saving in the millions – with better performance and fewer query limitations.