ScyllaDB in Action Book Excerpt: ScyllaDB, a Distributed Database

How does ScyllaDB provide scalability and fault tolerance by distributing its data across multiple nodes? Read what Bo Ingram (Staff Engineer at Discord) has to say – in this excerpt from the book “ScyllaDB in Action.” Editor’s note: We’re honored to share the following excerpt from Bo Ingram’s informative – and fun! – new book on ScyllaDB: ScyllaDB in Action. You might have already experienced Bo’s expertise and engaging communication style in his blog How Discord Stores Trillions of Messages or ScyllaDB Summit talks How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB and  So You’ve Lost Quorum: Lessons From Accidental Downtime  If not, you should 😉 You can purchase the full 370-page book from Manning.com. You can also access a 122-page early-release digital copy for free, compliments of ScyllaDB. The book excerpt includes a discount code for 45% off the complete book. Get the 122-page PDF for free The following is an excerpt from Chapter 1; it’s reprinted here with permission of the publisher. *** ScyllaDB runs multiple nodes, making it a distributed system. By spreading its data across its deployment, it uses that to achieve its desired availability and consistency, which, when combined, differentiates the database from other systems. All distributed systems have a bar to meet: they must deliver enough value to overcome the introduced complexity. ScyllaDB, designed to be a distributed system, achieves its scalability and fault tolerance through this design. When users write data to ScyllaDB, they start by contacting any node. Many systems follow a leader-follower topology, where one node is designated as a leader, giving it special responsibilities within the system. If the leader dies, a new leader is elected, and the system continues operating. ScyllaDB does not follow this model; each node is as special as any other. Without a centralized coordinator deciding who stores what, each node must know where any given piece of data should be stored. Internally, Scylla can map a given row to the node that owns it, forwarding requests to the appropriate nodes by calculating its owner using the hash ring that you’ll learn about in chapter 3. To provide fault tolerance, ScyllaDB not only distributes data but replicates it across multiple nodes. The database stores a row in multiple locations – the amount depends upon the configured replication factor. In a perfect world, each node acknowledges every request instantly every time, but what happens if it doesn’t? To help with unexpected trouble, the database provides tunable consistency. How you query data is dependent on what degree of consistency you’re looking to get. ScyllaDB is an eventually consistent database, and you perhaps will see inconsistent data as the system converges toward consistency. Developers must keep this eventual consistency in mind when working with the database. To facilitate the various needs of consistency, ScyllaDB provides a variety of consistency levels for queries, including those listed in table 1.1. With a consistency level of ALL, you can require that all replicas for a key acknowledge a query, but this setting harms availability. You can no longer tolerate the loss of a node. With a consistency level of ONE, you require a single replica for a key to acknowledge a query, but this greatly increases our chances of inconsistent results. Luckily, some options aren’t as extreme. ScyllaDB lets you tune consistency via the concept of quorums. A quorum is when a group has a majority of members. Legislative bodies, such as the US Senate, do not operate when the number of members present is below the quorum threshold. When applied to ScyllaDB, you can achieve intermediate forms of consistency. With a QUORUM consistency level, the database requires a majority of replicas for a key to acknowledge a query. If you have three replicas, two of them must accept every read and every write. If you lose one node, you can still rely on the other two to keep serving traffic. You additionally guarantee that a majority of your nodes get every update, preventing inconsistent data if you use the same consistency level when reading. Once you have picked your consistency level, you know how many replicas you need to execute a successful query. A client sends a request to a node, which serves as the coordinator for that query. Your coordinator node reaches out to the replicas for the given key, including itself if it is a replica. Those replicas return results to the coordinator, and the coordinator evaluates them according to our consistency. If it finds the result satisfies the consistency requirements, it returns the result to the caller. The CAP theorem (https://en.wikipedia.org/wiki/CAP_theorem) classifies distributed systems by saying that they cannot provide all three of these properties – consistency, availability, and network partition tolerance, as seen in figure 1.8. For the CAP theorem’s purposes, we define consistency as every request reading the most recent write; it’s a measure of correctness within the database. Availability is whether the system can serve requests, and network partition tolerance is the ability to handle a disconnected node. Figure 1.8 The CAP theorem says a database can only provide two of three properties — consistency, availability, and partition tolerance. ScyllaDB is classified as an AP system. According to the CAP theorem, a distributed system must have partition tolerance, so it ultimately chooses between consistency and availability. If a system is consistent, it must be impossible to read inconsistent data. To achieve consistency, it must ensure that all nodes receive all necessary copies of data. This requirement means it cannot tolerate the loss of a node, therefore losing availability. TIP: In practice, systems aren’t as rigidly classified as the CAP theorem suggests. For a more nuanced discussion of these properties, you can research the PACELC theorem (https://en.wikipedia.org/wiki/PACELC_theorem), which illustrates how systems make partial tradeoffs between latency and consistency. ScyllaDB is typically classified as an AP system. When encountered with a network partition, it chooses to sacrifice consistency and maintain availability. You can see this in its design – ScyllaDB repeatedly makes choices, via quorums and eventual consistency, to keep the system up and running in exchange for potentially weaker consistency. By emphasizing availability, you see one of ScyllaDB’s differentiators against its most popular competition — relational databases.

Getting Started with DataStax Enterprise 6.9

DataStax recently announced the latest version of its widely-used enterprise grade distributed database, DataStax Enterprise 6.9 (DSE 6.9). With the battle-tested Apache Cassandra® at its core, DSE 6.9 gives you all of the stability that your organization has come to expect, while also delivering...

Getting Started with Database-Level Encryption at Rest in ScyllaDB Cloud

Learn about ScyllaDB database-level encryption with Customer-Managed Keys & see how to set up and manage encryption with a customer key — or delegate encryption to ScyllaDB ScyllaDB Cloud takes a proactive approach to ensuring the security of sensitive data: we provide database-level encryption in addition to the default EC2 storage-level encryption. With this added layer of protection, customer data is always protected against attacks. Customers can focus on their core operations, knowing that their critical business and customer assets are well-protected. Customers can either use a customer-managed key or let ScyllaDB Cloud manage a key for them. This article explains how ScyllaDB Cloud protects customer data. It focuses on the technical aspects of ScyllaDB database-level encryption with Customer-Managed Keys (CMK). Specifically, it includes a walkthrough of how to set up and manage encryption on ScyllaDB Cloud clusters with a customer key or how to delegate encryption to ScyllaDB. Storage-level encryption Encryption at rest is when data files are encrypted before being written to persistent storage. ScyllaDB Cloud always uses encrypted volumes to prevent data breaches caused by physical access to disks. Database-level encryption Database-level encryption is a technique for encrypting all data before it is stored in the database. 
 The ScyllaDB Cloud feature is based on the proven ScyllaDB Enterprise database-level encryption at rest, extended with the Customer Managed Keys (CMK) encryption control. This ensures that the data is securely stored – and the customer is the one holding the key. The keys are stored and protected separately from the database, substantially increasing security. ScyllaDB Cloud provides full database-level encryption using the Customer Managed Keys (CMK) concept. It is based on envelope encryption to encrypt the data and decrypt only when the data is needed. This is essential to protect the customer data at rest. Some industries, like healthcare or finance, have strict data security regulations. Encrypting all data helps businesses comply with these requirements, avoiding the need to prove that all tables holding sensitive personal data are covered by encryption. It also helps businesses protect their corporate data, which can be even more valuable. A key feature of CMK is that the customer has complete control of the encryption keys. Data encryption keys will be introduced later (it is confusing to cover them at  the beginning). The customer can: Revoke data access at any time Restore data access at any time Manage the master keys needed for decryption Log all access attempts to keys and data Customers can delegate all key management operations to the ScyllaDB Cloud support team if they prefer this. To do this, the customer can choose the ScyllaDB key when creating the cluster. To ensure customer data is secure and adheres to all privacy regulations. By default, encryption uses the symmetrical algorithm AES-128, a solid corporate encryption standard covering all practical applications. Breaking AES-128 can take an immense amount of time, approximately trillions of years. The strength can be increased to to AES-256. Note: Database-level encryption in ScyllaDB Cloud is available for all clusters deployed in Amazon Web Services. Support for other cloud services, like Google Cloud KMS, will come later this year. Encryption To ensure all user data is protected, ScyllaDB will encrypt: All user tables Commit logs Batch logs Hinted handoff data This ensures all customer data is properly encrypted. The first step of the encryption process is to encrypt every record with a data encryption key (DEK). Once the data is encrypted with the DEK, it is sent to AWS KMS, where the master key (MK) resides. The DEK is then encrypted with the master key (MK), producing an encrypted DEK (EDEK or a wrapped key). The master key remains in the KMS, while the EDEK is returned and stored with the data. The DEK used to encrypt the data is destroyed to ensure data protection. A new DEK will be generated the next time new data needs to be encrypted. Decryption Because the original non-encrypted DEK is destroyed when the EDEK was produced, the data cannot be decrypted. The EDEK cannot be used to decrypt the data directly because the DEK key is encrypted. It has to be decrypted, and for that, the master key will be required again. This can only be decrypted with the master key(MK) in the KMS. Once the DEK is unwrapped, the data can be decrypted. As you can see, the data cannot be decrypted without the master key – which is protected at all times in the KMS and cannot be “copied” outside of KMS. By revoking the master key, the customer can disable access to the data independently from the database or application authorization. Multi-region deployment Adding new data centers to the ScyllaDB cluster will create additional local keys in those regions. All master keys support multi-regions, and a copy of each key resides locally in each region – ensuring those multi-regional setups are protected from regional outages for the cloud provider and against disaster. The keys are available in the same region as the data center and can be controlled independently. In case you use a Customer Key – AWS KMS will charge $1/month for each cluster prorated per hour. Each additional DC creates a replica that is counted as an additional key. There is an additional cost per key request. ScyllaDB Enterprise utilizes those requests efficiently, resulting in an estimated monthly cost of up to $1 for a 9-node cluster. Managing encryption keys adds another layer of administrative work in addition to the extra cost. ScyllaDB Cloud offers database clusters that can be encrypted using keys managed by ScyllaDB support. They provide the same level of protection, but our support team helps you manage the master keys. The ScyllaDB keys are applied by default and are free to our customers. Creating a Cluster with Database-Level Encryption Creating a cluster with database-level encryption requires: A ScyllaDB Cloud account – If you don’t have one, you can create a ScyllaDB Cloud account here. 10 minutes with ScyllaDB Key 20 minutes creating your own key To create a cluster with database-level encryption enabled, we will need a master key. We can either create a customer-managed key using ScyllaDB Cloud UI or skip this step completely and use a ScyllaDB Managed Key, which will skip the next six steps. In both cases, all the data will be protected by strong encryption at the database level. Create a Customer Managed Key After logging into the ScyllaDB Cloud portal, select the “Security” option from the user menu. Then, we can use the “Add Key” option. The Add Key menu will be presented. It allows selection of cloud providers and regions. Currently, only the region can be selected. Select the region where you plan to deploy your cluster. For multi-DC setups, choose the cluster where your first cluster will be. For my cluster, it will be US East. Click the “Set Key.” The key is successfully configured, but it also has to be created in AWS on my behalf so I can have full control over it. This is done using CloudFormation Stack, which allows you to execute a cloud stack with your AWS permissions that will provision the key directly in AWS KMS. Click Launch Stack in the summary pop-up to open the CloudFormation Stack in a new tab and provision your key in AWS. The AWS page will request a login and “StackPrincipal.” The role of the “StackPrincipal” has to be provided in ARN format. Assuming your user has enough permissions, the following cloudshell command will return the ARN as the last line of the response. [CloudShell-user@ip-10-136-55-116 ~]$ aws sts get-caller-identity { "UserId": "AR**********************************domain.com", "Account": "023***********", "Arn": "arn:aws:sts::023**************************domain.com" } Once you successfully execute the stack, the key should be created, and ScyllaDB will get permission to use it with a cluster of your choice. In ScyllaDB, you will see the key name and a green “available,” meaning the key is ready and can be assigned. You can still manage the key. However, if you disable it, it can no longer encrypt or decrypt the data. The next step is to create a cluster with the new key. Choose the Dedicated VM or the Free Trial option from the New Cluster menu. Then, select Customer Key, and you can select the key we created. The key will show only if the cluster and key regions match. Use ScyllaDB Managed Key If you prefer to use ScyllaDB Managed Key, skip all the above steps and choose ScyllaDB Key instead. This is an easier option that encrypts the data on the database level with the same encryption but the key will be managed by ScyllaDB Cloud agents. Create an Encrypted Cluster Once the master key is chosen, click Next and wait a few minutes until the cluster is created with the selected encryption option. You can see the indicator for encryption at rest will be enabled. That’s it! Your cluster is now using database-level encryption with the selected master key. Transparent database-level encryption in ScyllaDB Cloud significantly boosts the security of your ScyllaDB clusters and backups.   Next Steps Start using this feature in ScyllaDB Cloud. Get your questions answered in our community forum and Slack channel. Or, use our contact form.

How Does Data Modeling Change in Apache Cassandra® 5.0 With Storage-Attached Indexes?

Data modeling in Apache Cassandra® is an important topic—how you model your data can affect your cluster’s performance, costs, etc. Today I’ll be looking at a new feature in Cassandra 5.0 called Storage -Attached Indexes (SAI), and how they affect the way you model data in Cassandra databases. 

First, I’ll briefly cover what SAIs are (for more information about SAIs, check out this post). Then I’ll look at 3 use cases where your modeling strategy could change with SAI. Finally, I’ll talk about benefits and constraints of SAIs. and constraints of SAIs. 

What Are StorageAttached Indexes? 

From the Cassandra 5.0 Documentation, Storage Attached Indexes (SAIs) “[provide] an indexing mechanism that is closely integrated with the Cassandra storage engine to make data modeling easier for users. Secondary Indexing, which is indexing values on properties that are not part of the Primary Key for that table, has been available for Cassandra in the past (called SASI and 2i). However, SAIs will replace the existing functionality, as it will be deprecated in 5.0, and then tentatively removed in Cassandra 6.0. 

This is because SAIs improve upon the older methods in a lot of key ways. For one, according to the devs, SAIs are the fastest indexing method for Cassandra clusters. This performance boost was a plus for using indexing in production environments. It also lowered the data storage overhead over prior implementations, which lowers costs by reducing the need for database storage, which induces operational costs, and by reducing latency when dealing with indexes, lowering a loss of user interaction due to high latency. 

How Do SAIs work? 

SAIs are implemented as part of the SSTables, or Sorted String Tables, of a Cassandra database. This is because SAIs index Memtables and SSTables as they are written. It filters from both in-memory and on-disk sources, filtering them out into a series of indexed columns at read time. I’m not going to go into too much detail here because there are a lot of existing resources on this exciting topic: see the Cassandra 5.0 Documentation and the Instaclustr site for examples. 

The main thing to keep in mind is that SAIs are attached to Cassandra’s storage engine, and it’s much more performant from speed, scalability, and data storage angles as a result. This means that you can use indexing reliably in production beginning with Cassandra 5.0, which allows data modeling to be improved very quickly. 

To learn more about how SAIs work, check out this piece from the Apache Cassandra blog. 

What Is SAI For? 

SAI is a filtering engine, and while it does have some functionality overlap with search engines, it directly says it is not an enterprise search engine” (source). 

SAI is meant for creating filters on non-primary-key or composite partition keys (source), essentially meaning that you can enable a ‘WHERE’ clause on any column in your Cassandra 5.0 database. This makes queries a lot more flexible without sacrificing latency or storage space as with prior methods.  

How Can We Use SAI When Data Modeling in Cassandra 5.0? 

Because of the increased scalability and performance of SAIs, data modeling in Cassandra 5.0 will most definitely change 

You will be able to search collections more thoroughly and easily, for instance, indexing is more of an option when designing your Cassandra queries. This will also allow new query types, which can improve your existing querieswhich by Cassandra’s design paradigm changes your table design. 

But what if you’re not on a greenfield project and want to use SAIs? No problem! SAI is backwards-compatible, and you can migrate your application one index at a time if you need. 

How Do StorageAttached Indexes Affect Data Modeling in Apache Cassandra 5.0? 

Cassandra’s SAI was designed with data modeling in mind (source). It unlocks new query patterns that make data modeling easier in quite a few cases. In the Cassandra team’s words: “You can create a table that is most natural for you, write to just that table, and query it any way you want.” (source) 

I think another great way to look at how SAIs affect data modeling is by looking at some queries that could be asked of SAI data. This is because Cassandra data modeling relies heavily on the queries that will be used to retrieve the data. I’ll take a look at 2 use cases: indexing as a means of searching a collection in a row and indexing to manage a one-to-many relationship. 

Use Case: Querying on Values of Non-Primary-Key Columns 

You may find you’re searching for records with a particular value in a particular column often in a table. An example may be a search form for a large email inbox with lots of filters. You could find yourself looking at a record like: 

  • Subject 
  • Sender 
  • Receiver 
  • Body 
  • Time sent 

Your table creation may look like: 

CREATE KEYSPACE IF NOT EXISTS inbox 

WITH REPLICATION = { 

  'class' : 'SimpleStrategy', 

  'replication_factor' : 3 

}; 

CREATE TABLE IF NOT EXISTS emails ( 

  id int,  

  sender text,  

  receivers text,  

  subject text,  

  body text, 

  timeSent timestamp, 

  PRIMARY KEY (id)); 

};

If you allow users to search for a particular subject or sender, and the data set is large, not having SAIs could make query times painful: 

SELECT * FROM emails WHERE emails.sender == “sam.example@example.com”

To fix this problem, we can create secondary indexes on our sender, receiver, and body fields: 

CREATE CUSTOM INDEX sender_sai_idx ON Inbox.emails (sender)  

USING 'StorageAttachedIndex'  

WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 

CREATE INDEX IF NOT EXISTS receiver_sai_idx on Inbox.emails (receiver) 

USING 'StorageAttachedIndex'  

WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 

CREATE CUSTOM INDEX body_sai_idx ON Inbox.emails (body)  

USING 'StorageAttachedIndex'  

WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 

CREATE CUSTOM INDEX subject_sai_idx ON Inbox.emails (subject)  

USING 'StorageAttachedIndex'  

WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};

Once you’ve established the indexes, you can run the same query and it will automatically use the SAI index to find all emails with a sender of “sam.example@examplemail.com OR by subject match/body match.  Note that although the data model changed with the inclusion of the indexes, the SELECT query does not change, and the fields of the table stayed the same as well! 

Use Case: Managing One-To-Many Relationships 

Going back to the previous example, one email could have many recipients. Prior to secondary indexes, you would need to scan every row in the collection of every row in the table in order to query on recipients. This could be solved in a few ways. One is to create a join table for recipients that contains an id, email id, and recipient. This becomes complicated when the constraint that each email should only appear once per email is added. With SAI, we now have an index-based solutioncreate an index on a collection of recipients for each row. 

The script to create the table and indices changes a bit: 

id int,  

  sender text,  

  receivers set<text>,  

  subject text,  

  body text, 

  timeSent timestamp, 

  PRIMARY KEY (id)); 

};

The text type of receivers changes to a set<text>. A set is used because each email should only occur once. This takes the logic you would have had to implement for the join table solution and moves it to Cassandra.  

The indexing code remains mostly the same, except for the creation of the index for receivers: 

CREATE INDEX IF NOT EXISTS receivers_sai_idx on Inbox.emails (receivers)

That’s it! One line of CQL and there’s now an index on receivers. We can query for emails with a particular receiver: 

SELECT * FROM emails WHERE emails.receievers CONTAINS “sam.example@examplemail.com”

There are many one-to-many relationships that can be simplified in Cassandra with the use of secondary indexes and SAI. 

What Are the Benefits of Data Modeling With Storage Attached Indexes? 

There are many benefits to using SAI when data modeling in Cassandra 5.0: 

  • Query performance: because of SAI’s implementation, it has much faster query speeds than previous implementations, and indexed data is generally faster to search than unindexed data. This means you have more flexibility to search within your data and write queries that search non-primary-key columns and collections. 
  • Move over piecemeal: SAI’s backwards compatibility, coupled with how little your table structure has to change to add SAIs, means you can move over your data models piece by piece, meaning moving is easier.  
  • Data storage overhead: SAI has much lower data overhead than previous secondary index implementations, meaning more flexibility in what you can store in your data models without impacting overall storage needs. 
  • More complex queries/features: SAI allows you to write much more thorough queries when looking through SAIs, and offers up a lot of new functionality, like: 
    • Vector Search 
    • Numeric Range queries 
    • AND queries within indexes 
    • Support for map/set/ 

What Are the Constraints of StorageAttached Indexes? 

While there are benefits to SAI, there are also a few constraints, including: 

  • Because SAI is attached to the SSTable mechanism, the performance of queries on indexed columns will be “highly dependent on the compaction strategy in use” (per the Cassandra 5.0 CEP-7) 
  • SAI is not designed for unlimited-size data sets, such as logs; indexing on a dataset like that would cause performance issues. The reason for this is read latency at higher row counts spread across a cluster. It is also related to consistency level (CL), as the higher the CL is, the more nodes you’ll have to ping on larger datasets. (Source). 
  • Query complexity: while you can query as many indexes as you like, when you do so, you incur a cost related to the number of index values processed. Be aware when designing queries to select from as few indexes as necessary. 
  • You cannot index multiple columns in one index, as there is a 1-to-1 mapping of an SAI index to a column. You can however create separate indexes and query them simultaneously. 

This is a v1, and some features, like the LIKE comparison for strings, the OR operator, and global sorting are all slated for v2. 

Disk usage: SAI uses an extra 20-35% disk space over unindexed data; note that over previous versions of indexing, it consumes much less (source). You shouldn’t just make every column an index if you don’t need to, saving disk space and maintaining query performance. 

Conclusion 

SAI is a very robust solution for secondary indexes, and their addition to Cassandra 5.0 opens the door for several new data modelling strategiesfrom searching non-primary-key columns, to managing one-to-many relationships, to vector search. To learn more about SAI, read this post from the Instaclustr by NetApp blog, or check out the documentation for Cassandra 5.0. 

If you’d like to test SAI without setting up and configuring Cassandra yourself, Instaclustr has a free trial and you can spin up Cassandra 5.0 clusters today through a public preview! Instaclustr also offers a bunch of educational content about Cassandra 5.0. 

 

The post How Does Data Modeling Change in Apache Cassandra® 5.0 With Storage-Attached Indexes? appeared first on Instaclustr.

Cassandra Lucene Index: Update

**An important update regarding support of Cassandra Lucene Index for Apache Cassandra® 5.0 and the retirement of Apache Lucene Add-On on the Instaclustr Managed Platform.** 

Instaclustr by NetApp has been maintaining the new fork of the Cassandra Lucene Index plug-in since its announcement in 2018After extensive evaluation, we have decided not to upgrade the Cassandra Lucene Index to support Apache Cassandra® 5.0. This decision aligns with the evolving needs of the Cassandra community and the capabilities offered by the StorageAttached Indexing (SAI) in Cassandra 5.0.  

SAI introduces significant improvements in secondary indexing, while simplifying data modeling and creating new use cases in Cassandra, such as Vector Search. While SAI is not a direct replacement for the Cassandra Lucene Index, it offers a more efficient alternative for many indexing needs.  

For applications requiring advanced indexing features, such as full-text search or geospatial queries, users can consider external integrations, such as OpenSearch®, that offer numerous full-text search and advanced analysis features. 

We are committed to maintaining the Cassandra Lucene Index for currently supported and newer versions of Apache Cassandra 4 (including minor and patch-level versions) for users who rely on its advanced search capabilities. We will continue to release bug fixes and provide necessary security patches for the supported versions in the public repository. 

Retiring Apache Lucene™ Add-On for Instaclustr for Apache Cassandra 

Similarly, Instaclustr is commencing the retirement process of the Apache Lucene add-on on its Instaclustr Managed Platform. The offering will move to the Closed state on July 31, 2024. This means that the add-on will no longer be available for new customers.  

However, it will continue to be fully supported for existing customers with no restrictions on SLAs, and new deployments will be permitted by exception. Existing customers should be aware that the add-on will not be supported for Cassandra 5.0. For more details about our lifecycle policies, please visit our website here.  

Instaclustr will work with the existing customers to ensure a smooth transition during this period. Support and documentation will remain in place for our customers running the Lucene addon on their clusters.  

For those transitioning to, or already using the Cassandra 5.0 beta version, we recommend exploring how Storage-Attached Indexing can help you with your indexing needs. You can try the SAI feature as part of the free trial on the Instaclustr Managed Platform.  

We thank you for your understanding and support as we continue to adapt and respond to the community’s needs. 

If you have any questions about this announcement, please contact us at support@instaclustr.com. 

The post Cassandra Lucene Index: Update appeared first on Instaclustr.

How Freshworks Cut Database P99 Latency by 95% – with Lower Costs

How Freshworks tackled high tail latencies, Cassandra admin burden, and any little surge causing an increase in timeouts Freshworks creates AI-boosted business software that is purpose-built for IT, customer support, sales, and marketing teams to work more efficiently. Given their scale, managing petabytes of data across multiple RDBMS and NoSQL databases was a challenge. Preparing for 10x growth under such circumstances required a strategic approach that would allow them to scale without interrupting business continuity. Spoiler: this approach included ScyllaDB. In the following video, Sunderjeet Singh (ScyllaDB India Manager) kicks off with an introduction to ScyllaDB and Freshworks. Then, Sreedhar Gade (VP of Engineering at Freshworks) shares how Freshworks architected a solution that enables the company to scale operations while keeping costs under control.  Here are highlights from the talk, as shared by Sreedhar Gade… About Freshworks Freshworks was founded in 2010 with the goal of empowering millions of companies across the world in multiple domains. The company went public in 2021. Today Freshworks’ revenue is near $600 million. We are relied upon by customers in over 120 countries, and have earned many recognitions across industry verticals. Technical Challenges From an application perspective, serving Freshworks’ global customer base requires the team to serve products and data with ultra-low latency and high performance. When using Cassandra, the team faced challenges such as: High tail latencies. Every SaaS product vendor is good at serving with high performance up to the 80th or 90th percentiles. But the long tail is where the performance actually starts getting impacted. Improving this can really improve the customer experience. Administrative burden. We don’t want to keep adding SREs and database engineers in step with our company growth. We want to make sure that we stay lean and mean – but still be able to manage a large fleet of database instances. Any slight surge in traffic led to an increase in timeouts.  Any surge in traffic could introduce problems. And with a global customer base, traffic patterns are quite unpredictable. Surges can lead to timeouts – unless we’re able to rapidly scale up and down. Why ScyllaDB ScyllaDB proved that it could solve these challenges for our former Cassandra use cases. It helps us deliver engaging experiences to our customers across the world. It helps us reduce toil for our engineers. It’s easy to scale up. And more importantly, it’s very cost effective — easy on the eyes for our CFO. 😉 Migrating from Cassandra to ScyllaDB To start the migration, we enabled zero downtime dual writes on the Cassandra databases that we wanted to migrate to ScyllaDB. Then, we took a snapshot of the existing Cassandra cluster and created volumes in the ScyllaDB cluster. We started with around 10 TB as part of this project, then moved it forward in different phases. And once the Cassandra migration was done, we used the CDM migrator to validate the migration quality. The Results So Far We are currently live with ScyllaDB in one of the regions, and we’ve been able to migrate about 25% of the data (more than two terabytes) as part of this project. We have already achieved a 20X reduction in tail latency – we brought the P99 latency down from one second to 50 milliseconds. What’s Next with ScyllaDB at Freshworks There are many more opportunities with ScyllaDB at Freshworks, and we have great plans going forward. One of the major projects we’re considering involves taking the text/BLOB data that’s currently stored in MySQL and moving it into ScyllaDB. We expect that will give us cost benefits as well as a performance boost. We are also looking to use ScyllaDB to improve the scalability, performance, and maintenance-related activities across our existing Cassandra workloads, across all our business units and products. This will help ensure that our products can scale 10x and scale on demand.

Why Apache Cassandra 5.0 is a Game Changer for Developers: The New Stack

New features provide an especially inviting playground for teams to do interesting and groundbreaking work, including generative AI initiatives.

The release of open source Apache Cassandra® 5.0, now in open beta with GA expected soon, adds several capabilities that make the NoSQL database even more compelling for enterprises’ mission-critical use cases. Zooming in on the developer level, those new features provide an especially inviting playground for teams to do interesting and groundbreaking work, including, of course, generative AI initiatives.

Cassandra 5.0 also introduces a few improvements to the developer experience itself, making it more efficient — and, frankly, enjoyable — for developers working with the database.

Let’s dig into some of the most important changes in Cassandra 5.0 and how they affect developers.

Read the full blog with our partners at The New Stack!

 

The post Why Apache Cassandra 5.0 is a Game Changer for Developers: The New Stack appeared first on Instaclustr.

Test Drive Vector Search with DataStax Enterprise 6.9

We recently announced the upcoming release of DataStax Enterprise (DSE) 6.9 – the next iteration in the line of self-managed enterprise-grade products from DataStax built on Apache Cassandra®, offering DSE customers a simple upgrade path that adds vector database capabilities for generative AI use...