Announcing ScyllaDB Operator, with Red Hat OpenShift Certification

OpenShift users gain a trusted, validated path for installing and managing ScyllaDB Operator – backed by enterprise-grade support ScyllaDB Operator is an open-source project that helps you run ScyllaDB on Kubernetes by managing ScyllaDB clusters deployed to Kubernetes and automating tasks related to operating a ScyllaDB cluster. For example, it automates installation, vertical and horizontal scaling, as well as rolling upgrades. The latest release (version 1.20) is now Red Hat certified and available directly in the Red Hat OpenShift ecosystem catalog. Additionally, it brings new features, stability improvements and documentation updates. Red Hat OpenShift Certification and Catalog Availablity OpenShift has become a cornerstone platform for enterprise Kubernetes deployments, and we’ve been working to ensure ScyllaDB Operator feels like a native part of that ecosystem. With ScyllaDB Operator 1.20, we’re taking a significant step forward: the operator is now Red Hat certified and available directly in the Red Hat OpenShift ecosystem catalog. See the ScyllaDB Operator project in the Red Hat Ecosystem Catalog. This milestone gives OpenShift users a trusted, validated path for installing and managing ScyllaDB Operator – backed by enterprise-grade support. With this release, you can install ScyllaDB Operator through OLM (Operator Lifecycle Manager) using either the OpenShift Web Console or CLI. For detailed installation instructions and OpenShift-specific configuration examples – including guidance for platforms like Red Hat OpenShift Service on AWS (ROSA) – see the Installing ScyllaDB Operator on OpenShift guide. IPv6 support ScyllaDB Operator 1.20 brings native support for IPv6 and dual-stack networking to your ScyllaDB clusters. With dual-stack support, your ScyllaDB clusters can operate with both IPv4 and IPv6 addresses simultaneously. That provides flexibility for gradual migration scenarios or environments requiring support for both protocols. You can also configure IPv6-first deployments, where ScyllaDB uses IPv6 for internal communication while remaining accessible via both protocols. You can control the IP addressing behaviour of your cluster through v1.ScyllaCluster’s .spec.network. The API abstracts away the underlying ScyllaDB configuration complexity so you can focus on your networking requirements rather than implementation details. With this release, IPv4-first dual-stack and IPv6-first dual-stack configurations are production-ready. IPv6-only single-stack mode is available as an experimental feature under active development; it’s not recommended for production use. See Production readiness for details. Learn more about IPv6 networking in the IPv6 networking documentation. Other notable changes Documentation improvements This release includes a new guide on automatic data cleanups. It explains how ScyllaDB Operator ensures that your ScyllaDB clusters maintain storage efficiency and data integrity by removing stale data and preventing data resurrection. Dependency updates This release also includes regular updates of ScyllaDB Monitoring and the packaged dashboards to support the latest ScyllaDB releases (4.12.1->4.14.0, #3250), as well as its dependencies: Grafana (12.2.0->12.3.2) and Prometheus (v3.6.0->v3.9.1). For more changes and details, check out the GitHub release notes. Upgrade instructions For instructions on upgrading ScyllaDB Operator to 1.20, please refer to the Upgrading ScyllaDB Operator section of the documentation. Supported versions ScyllaDB 2024.1, 2025.1, 2025.3 – 2025.4 Red Hat OpenShift 4.20 Kubernetes 1.32 – 1.35 Container Runtime Interface API v1 ScyllaDB Manager 3.7 – 3.8 Getting started with ScyllaDB Operator ScyllaDB Operator Documentation Learn how to deploy ScyllaDB on Google Kubernetes Engine (GKE) Learn how to deploy ScyllaDB on Amazon Elastic Kubernetes Engine (EKS)  Learn how to deploy ScyllaDB on a Kubernetes Cluster Related links ScyllaDB Operator source (on GitHub) ScyllaDB Operator image on DockerHub ScyllaDB Operator Helm Chart repository ScyllaDB Operator documentation ScyllaDB Operator for Kubernetes lesson in ScyllaDB University Report a problem Your feedback is always welcome! Feel free to open an issue or reach out on the #scylla-operator channel in ScyllaDB User Slack.  

Instaclustr product update: March 2026

Here’s a roundup of the latest features and updates that we’ve recently released.

If you have any particular feature requests or enhancement ideas that you would like to see, please get in touch with us.

Major announcements Introducing AI Cluster Health: Smarter monitoring made simple

Turn complex metrics into clear, actionable insights with AI-powered health indicators—now available in the NetApp Instaclustr console. The new AI Cluster Health page simplifies cluster monitoring, making it easy to understand your cluster’s state at a glance without requiring deep technical expertise. This AI-driven analysis reviews recent metrics, highlights key indicators, explains their impact, and assigns an easy traffic-light health score for a quick status overview.

NetApp introduces Apache Kafka® Tiered Storage support on GCP in public preview

Tiered Storage is now available in public review for Instaclustr for Apache Kafka on Google Cloud Platform. This feature enables customers to optimize storage costs and improve scalability by offloading older Kafka log segments from local disk to Google Cloud Storage (GCS), while keeping active data local for fast access. Kafka clients continue to consume data seamlessly with no changes required, allowing teams to reduce infrastructure costs, simplify cluster scaling, and extend retention periods for analytics or compliance.

Other significant changes Apache Cassandra®
  • Released Apache Cassandra 4.0.19 and 5.0.6 into general availability on the NetApp Instaclustr Managed Platform, giving customers access to the latest stability, performance, and security improvements.
  • Multi–data center Apache Cassandra clusters can now be provisioned across public and private networks via the Instaclustr Console, API, and Terraform provider, enabling customers to provision multi-DC clusters from day one.
  • Single CA feature is now available for Apache Cassandra clusters. See Single CA for more details.
  • GCP n4 machine types are now supported for Apache Cassandra in the available regions.
Apache Kafka®
  • Apache Kafka and Kafka Connect 4.1.1 are now generally available.
  • Added Client Metrics and Observability feature support for Kafka in private preview.
  • Single CA feature is now available for Apache Kafka clusters. See Single CA for more details.
ClickHouse®
  • ClickHouse 25.8.11 has been added to our managed platform in General Availability.
  • Enabled ClickHouse system.session_log table that plays a key role in tracking session lifecycle and auditing user activities for enhanced session monitoring. This helps you with troubleshooting client-side connectivity issues and provides insights into failed connections.
OpenSearch®
  • OpenSearch 2.19.4 and 3.3.2 have been released to general availability.
  • Added support for the OpenSearch Assistant feature in OpenSearch Dashboards for clusters with Dashboards and AI Search enabled.
PostgreSQL® Instaclustr Managed Platform
  • Added self-service Tags Management feature—allowing users to add, edit, or delete tags for their clusters directly through the Instaclustr console, APIs, or Terraform provider for RIYOA deployments
  • Added new region Germany West Central for Azure
Future releases Apache Kafka®
  • Following the private preview release, Kafka’s Client Telemetry feature is progressing toward general availability soon. Read more here.
ClickHouse®
  • We plan to extend the current ClickHouse integration with FSxN data sources by adding support for deployments across different VPCs, enabling broader enterprise lakehouse architectures.
  • Apache Iceberg and Delta Lake integration are planned to soon be available for ClickHouse on the NetApp Instaclustr Platform, giving you a practical way to run analytics on open table formats while keeping control of your existing data platforms.
  • We plan to soon introduce fully integrated AWS PrivateLink as a ClickHouse Add-On for secure and seamless connectivity with ClickHouse.
PostgreSQL®
  • We’re aiming to launch PostgreSQL integrated with FSx for NetApp ONTAP (FSxN) along with NVMe support into general availability soon. This enhancement is designed to combine enterprise-grade PostgreSQL with FSxN’s scalable, cost-efficient storage, enabling customers to optimize infrastructure costs while improving performance and flexibility. NVMe support is designed to deliver up to 20% greater throughput vs NFS.
OpenSearch®
  • An AI Search plugin for OpenSearch is being released to GA (currently in public preview) to enhance search experiences using AI‑powered techniques such as semantic, hybrid, and conversational search, enabling more relevant, context‑aware results and unlocking new use cases including retrieval‑augmented generation (RAG) and AI‑driven chatbots.
Instaclustr Managed Platform
  • Following the public preview release, Zero Inbound Access is progressing to General Availability, designed to deliver the most secure management connectivity by eliminating inbound internet exposure and removing the need for any routable public IP addresses, including bastion or gateway instances.
Did you know?

If you have any questions or need further assistance with these enhancements to the Instaclustr Managed Platform, please contact us.

SAFE HARBOR STATEMENT: Any unreleased services or features referenced in this blog are not currently available and may not be made generally available on time or at all, as may be determined in NetApp’s sole discretion. Any such referenced services or features do not represent promises to deliver, commitments, or obligations of NetApp and may not be incorporated into any contract. Customers should make their purchase decisions based upon services and features that are currently generally available. 

The post Instaclustr product update: March 2026 appeared first on Instaclustr.

Integrated Gauges: Lessons Learned Monitoring Seastar’s IO Stack

Why integrated gauges are a better approach for measuring frequently changing values Many performance metrics and system parameters are inherently volatile or fluctuate rapidly. When using a monitoring system that periodically “scrapes” (polls) a target for its current metric value, the collected data point is merely a snapshot of the system’s state at that precise moment. It doesn’t reveal much about what’s actually happening in that area. Sometimes it’s possible to overcome this problem by accumulating those values somehow – for example, by using histograms or exporting a derived monotonically increasing counter. This article suggests yet another way to extend this approach for a broader set of frequently changing parameters. The Problem with Instantaneous Metrics For rapidly changing values such as queue lengths or request latencies, a single snapshot is most often misleading. If the scrape happens during a peak load, the value will appear high. If it happens during an idle moment, the value will appear low. Over time, the sampled data usually does not accurately represent the system’s true average behavior, leading to misinformed alerting or capacity planning. This phenomenon is apparent when examining synthetically generated data. Take a look at the example below. On that plot, there’s a randomly generated XY series. The horizontal axis represents the event number, the vertical axis represents the frequently fluctuating value. The “value” changes on every event. Fig.1 Random frequently changing data Now let’s thin the data out and see what would happen if we show only every 40th point, as if it were a real monitoring system that captures the value at a notably lower rate. The next plot shows what it would look like. Fig.2 Sampling every 40th point from the data Apparently, this is a very poor approximation of the first plot. It becomes crystal clear how poor it is if we zoom into the range [80-120] of both plots. For the latter (dispersed) plot, we’ll see a couple of dots with values of 2 and 0. But the former (original) plot would reveal drastically different information, shown below. Fig.3 Zoom-in range [80-120] from the data Now remember that the problem revealed itself at the ratio of the real rate to scrape rate being just 40. In real systems, internal parameters may change thousands of times per second while the scrape period is minutes. In that case, you end up scraping less than a fraction of a percent. A better way to monitor the frequently changing data is needed…badly. Histograms Request latency is one example of a statistic that usually contains many data points per second. Histograms can help you see how slow or fast the requests are – without falling into the problem described above. A monitoring histogram is a type of metric that samples observations and counts them in configurable buckets. Instead of recording just a single average or maximum latency value, a histogram provides a distribution of these values over a given period. By analyzing the bucket counts, you can calculate percentiles, including p50 (known as the median value) or p99 (the value below which 99% of all requests fell). This provides a much more robust and actionable view of performance volatility than simple instantaneous gauges or averages. Fig.4 Histogram built from data X-points It’s a static snapshot of the data at the last “time point.” If the values change over time, modern monitoring systems can offer various data views, such as time-series percentiles or heatmaps, to provide the necessary insight into the data. Histograms, however, have a drawback: They require system memory to work, node networking throughput to get reported, and monitoring disk space to be stored. A histogram of N buckets consumes N times more resources than a single value. Addable latencies When it comes to reporting latencies, a good compromise solution between efficiency and informativeness is “counter” metrics coupled with the rate() function. Counters and rates Counter metrics are one of the simplest and most fundamental metric types in the Prometheus monitoring system. It’s a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero upon a restart of the monitored target. Because a counter itself is just an ever-increasing total, it is rarely analyzed directly. Instead, one derives another meaningful metric from counters, most notably a “rate”. Below is a mathematical description of how the prometheus rate() function works. Internally, the system calculates the total sum of the observed values. After scraping the value several times, the monitoring system calculates the “rate” function of the value, which takes the total value difference from the previous scrape, divided by the time passed since then. The rated value thus shows the average value over the specified period of time. Now let’s get back to latencies. To report average latency over the scraping period using counters, the system should accumulate two of them: the total sum of request latencies (e.g., in milliseconds) and the total number of completed requests. The monitoring system would then “rate” both counters to get their averages and divide them, thus showing the average per-request latency. It looks like this: Since the total number of requests served is useful on its own to see the IOPS value, this method is as efficient as exporting the immediate latency value. Yet, it provides a much more stable and representative metric than an instantaneous gauge. We can try to adopt counters to the artificial data that was introduced earlier. The plot below shows what the averaged values for the sub-ranges of length 40 (the sampling step from above) would look like. It differs greatly from the sampled plot of Figure 2 and provides a more accurate approximation of the real data set. Fig.5 Average values for sub-ranges of length 40 Of course, this method has its drawbacks when compared to histograms. It swallows latency spikes that last for a short duration, as compared to the scraping period. The shorter the spike duration, the more likely it would go unnoticed. Queue length Summing up individual values is not always possible though. Queue length is another example of a statistic that’s also volatile and may contain thousands of data points. For example, this could be the queue of tasks to be executed or the queue of IO requests. Whenever an entry is added to the queue, its length increases. When an entry is removed, the length decreases – but it makes little sense to add up the updated queue length elsewhere. If we return to Figure 3 showing the real values in the range of [80-120], what would the average queue length over that period be? Apparently, it’s the sum of individual values divided by their number. But even though we understand what the “average request latency” is, the idea of “average queue length” is harder to accept, mainly because the X value of the above data is usually not an “event”, but it’s rather a “time”. And if, for example, a queue was empty most of the time and then changed to N elements for a few milliseconds, we’d have just two events that changed the queue. Seeing the average queue length of N/2 is counterintuitive. Some time ago, we researched how the “real” length of an IO queue can be implicitly derived from the net sum of request latencies. Here, we’ll show another approach to getting an idea of the queue length. Integrated gauge The approach is a generalization of averaging the sum of individual data points from the data set using counters and the “rate” function. Let’s return to our synthetic data when it was summed and rated (Figure 3) and look at the corresponding math from a slightly different angle. If we treat each value point as a bar with the width of 1 and the height being equal to the value itself, we can see that: The total value is the total sum of the bars’ squares The difference between two scraped total values is the sum of squares of bars between two scraping points The number of points in the range equals the distance between those two scraping points Fig.6 Geometrical interpretation of the exported counters Figure 6 shows this area in green. The average value is thus the height of the imaginary rectangular whose width equals the scraping period. If the distance between two adjacent points is not 1, but some duration, the interpretation still works. We can still calculate the square under the plot and divide it by the area width to get its “average” value. But we need to apply two changes. First, the square of the individual bar is now the product of the value itself and the duration of time passed from the previous point. Think of it this way: previously, the duration was 1, so the multiplication was invisible.Now, it is explicit. Second, we no longer need to count the number of points in the range. Instead, the denominator of the averaging function will be the duration between scraping points. In other words, previously the number of points was the sum of 1-s (one-s) between scraping points; now, it’s the sum of durations – total scraping period It’s now clear that the average value is the result of applying the “rate()” function to the total value. Naming problem (a side note) There are two hard problems in computer science: cache invalidation, naming, and off-by-one errors. Here we had to face one of them – how to name the newly introduced metrics? It is the queue size multiplied by the time the queue exists in such a state. A very close analogy comes from project management. There’s a parameter called “human-hour” and it’s widely used to estimate the resources needed to accomplish some task. In physics, there are not many units that measure something multiplied by time. But there are plenty of parameters that measure something divided by time, like velocity (distance-per-second), electric current (charge-per-second) or power (energy-per-second). Even though it’s possible to define (e.g., charge as current multiplied by time), it’s still charge that’s the primary unit and current is secondary. So far I’ve found quite a few examples of such units. In classical physics, one is called “action” and measures energy-times-time, and another one is “impulse,” which is measured in newtons-times-seconds. Finally, in kinematics there’s a thing called “absement,” which is literally meters-times-seconds. That describes the measure of an object displacement from its initial position. Integrated queue length The approach described above was recently [applied] to the Seastar IO stack – the length of IO classes’ request queues was patched to account for and export the integrated value. Below is the screenshot of a dashboard showing the comparison of both compaction class queue lengths, randomly scraped and integrated. Interesting (but maybe hard to accept) are the fraction values of the queue length. That’s OK, since the number shows the average queue length over a scrape period of a few minutes. A notable advantage of the integrated metric can be seen on the bottom two plots that show the length of the in-disk queue. Only the integrated metric (bottom right plot) shows that disk load actually decreased over time, while the randomly scraped one (bottom left plot) just tells us that the disk wasn’t idling…but not more than that. This effect of “hiding” the reality from the observer can also be seen in the plot that shows how the commitlog class queue length was changing over time. Here we can see a clear rectangular “spike” in the integrated disk queue length plot. That spike was completely hidden by the randomly scraped one. Similarly, the software queue length (upper pair of plots) dynamics is only visible from the integrated counter (upper right plot). Conclusion Relying solely on instantaneous metrics, or gauges, to monitor rapidly fluctuating system parameters very often leads to misleading data that poorly reflects the system’s actual behavior over a period of time. While solutions like histograms offer better statistical insights, they incur notable resource overhead. For metrics where the individual values can be aggregated (like request latencies), converting them into cumulative counters and deriving a rate provides a much more stable and representative average over the scraping interval. This offers a very efficient compromise between informative granularity and resource consumption. For metrics where instantaneous values cannot be simply summed (such as queue lengths), the concept of the Integrated Gauge offers a generalization of the same efficiency. In essence, by treating the gauge not as a point-in-time value but as a continuously accumulating measure of time-at-value, integral gauges provide a highly reliable and definitive representation of a parameter’s average behavior across any given measurement interval.

From ScyllaDB to Kafka: Natura’s Approach to Real-Time Data at Scale

How Natura built a real-time data pipeline to support orders, analytics, and operations Natura, one of the world’s largest cosmetics companies, relies on a network of millions of beauty consultants generating a massive amount of orders, events, and business data every day. From an infrastructure perspective, this requires processing vast amounts of data to support orders, campaigns, online KPIs, predictive analytics, and commercial operations. Natura’s Rodrigo Luchini (Software Engineering Manager) and Marcus Monteiro (Senior Engineering Tech Lead) shared the technical challenges and architecture behind these use cases at Monster SCALE Summit 2025. Fitting for the conference theme, they explained how the team powers real-time sales insights at massive scale by building upon ScyllaDB’s CDC Source Connector. About Natura Rodrigo kicked off the talk with a bit of background on Natura. Natura was founded in 1969 by Antônio Luiz Seabra. It is a Brazilian multinational cosmetics and personal care company known for its commitment to sustainability and ethical sourcing. They were one of the first companies to focus on products connected to beauty, health, and self-care. Natura has three core pillars: Sustainability: They are committed to producing products without animal testing and to supporting local producers, including communities in the Amazon rainforest. People: They value diversity and believe in the power of relationships. This belief drives how they work at Natura and is reflected in their beauty consultant network. Technology: They invest heavily in advanced engineering as well as product development. The Technical Challenge: Managing Massive Data Volume with Real-Time Updates The first challenge they face is integrating a high volume of data (just imagine millions of beauty consultants generating data and information every single day). The second challenge is having real-time updated data processing for different consumers across disconnected systems. Rodrigo explained, “Imagine two different challenges: creating and generating real-time data, and at the same time, delivering it to the network of consultants and for various purposes within Natura.” This is where ScyllaDB comes in. Here’s a look at the initial architecture flow, focusing on the order and API flow. As soon as a beauty consultant places an order, they need to update and process the related data immediately. This is possible for three main reasons: As Rodrigo put it, “The first is that ScyllaDB is a fast database infrastructure that operates in a resilient way. The second is that it is a robust database that replicates data across multiple nodes, which ensures data consistency and reliability. The last but not least reason is scalability – it’s capable of supporting billions of data processing operations.” The Natura team architected their system as follows: In addition to the order ingestion flow mentioned above, there’s an order metrics flow closely connected to ScyllaDB Cloud (ScyllaDB’s fully-managed database-as-a-service offering), as well as the ScyllaDB CDC Source Connector (A Kafka source connector capturing ScyllaDB CDC changes). Together, this enables different use cases in their internal systems. For example, it’s used to determine each beauty consultant’s business plan and also to report data across the direct sales network, including beauty consultants, leaders, and managers. These real-time reports drive business metrics up the chain for accurate, just-in-time decisions. Additionally, the data is used to define strategy and determine what products to offer customers. Contributing to the ScyllaDB CDC Source Connector When Natura started testing the ScyllaDB Connector, they noticed a significant spike in the cluster’s resource consumption. This continued until the CDC log table was fully processed, then returned to normal. At that point, the team took a step back. After reviewing the documentation, they learned that the connector operates with small windows (15 seconds by default) for reading the CDC log tables and sending the results to Kafka. However, at startup, these windows are actually based on the table TTL, which ranged from one to three days in Natura’s use case. Marcus shared: “Now imagine the impact. A massive amount of data, thousands of partitions, and the database reading all of it and staying in that state until the connector catches up to the current time window. So we asked ourselves: ‘Do we really need all the data?’ No. We had already run a full, readable load process for the ScyllaDB tables. What we really needed were just incremental changes, not the last three days, not the last 24 hours, just the last 15 minutes.” So, as ScyllaDB was adding this feature to the GitHub repo, the Natura team created a new option: scylla.custom.window.start. This let them tell the connector exactly where to start, so they could avoid unnecessary reads and relieve unnecessary load on the database. Marcus wrapped up the talk with the payoff: “This results in a highly efficient real-time data capture system that streams the CDC events straight to Kafka. From there, we can do anything—consume the data, store it, or move it to any database. This is a gamechanger. With this optimization, we unlocked a new level of efficiency and scalability, and this made a real difference for us.”

Optimizing a Fast Feature Store for Costs: Lessons Learned

How ShareChat reduced costs for its billion-feature scale feature store with ScyllaDB “Great system…now please make it 10 times cheaper.” That’s not exactly what the ShareChat team wanted to hear after completing a major engineering feat: scaling a real-time feature store 1000X without scaling their database (ScyllaDB). To stretch from supporting 1 million features per second to supporting 1 billion features per second, the team already… Redesigned their database schema to store all features together as protocol buffers and optimized tile configurations (reduced required rows from 2B to 73M/sec) Switched their database compaction strategy from incremental to leveled (doubled database capacity) Forked caching and gRPC libraries to eliminate mutex contention and connection bottlenecks Applied object pooling and garbage collector tuning to reduce memory allocation overhead You can read about those performance optimizations in Scaling an ML Feature Store From 1M to 1B Features per Second. But ShareChat – an Indian leader in a globally competitive social media market – is always looking to optimize. After reaching this scalability milestone, the team received a follow-up challenge: reducing the feature store’s costs by 10X (without compromising performance, of course). David Malinge (Sr. Staff Software Engineer at ShareChat) and Ivan Burmistrov (then Principal Software Engineer at ShareChat) shared how they approached this new challenge in a keynote at Monster Scale Summit 2025. Watch the complete talk, or read the highlights below. Note: Monster Scale Summit is a free + virtual conference on extreme scale engineering, with a focus on data-intensive applications. Learn from luminaries like antirez, creator of Redis; Camille Fournier, author of “The Manager’s Path” and “Platform Engineering”; Martin Kleppmann, author of “Designing Data-Intensive Applications” and more than 50 others, including engineers from Discord, Disney, Pinterest, Rivian, LinkedIn, Nextdoor, and American Express. Register (free) and join us March 11-12 for some lively chats Background ShareChat is one of the largest Indian media network, with over 300 million monthly active users. The app’s popularity stems from its powerful recommendation system – and that’s powered by their feature store. As Malinge explained, “We process more than 2 billion events daily to compute our features and serve close to a billion features per second at peak, with P99 latency under 20 milliseconds. We also read more than 30 billion rows per day in ScyllaDB, so we’re very happy that we’re not charged by the transaction.” Cloud Cost Optimization: Where Do You Even Start? When it came time to make this system 10 times cheaper, the team very quickly realized how complicated and confusing cloud cost optimization could be. For example, Burmistrov noted: Cloud billing is cloudy. AWS and GCP each have around 40,000 different SKUs, which makes it really hard to figure out where your money’s actually going. And cloud providers aren’t exactly motivated to make this easier for you to understand and debug. It’s easy to lose track of little things that add up. Maybe you forgot about an instance that’s still running. Maybe there’s an old deployment nobody uses anymore. Or perhaps you have storage buckets filled with data that should have expired by now (e.g., via Time to Live [TTL]). It all accumulates over time. Your cost intuition from on-prem doesn’t necessarily translate to the cloud. Something that was cost-efficient in your own data center might be expensive in the cloud, and what works well on one cloud provider might not be cost-effective on another. For the first optimization step, the team became sticklers for “hygiene”: clearing out anything they didn’t really need. This involved a deep cleaning for forgotten instances and deployments, unused buckets, data without proper TTLs, and overprovisioning. Having proper attribution was critical here. As Burmistrov explained, “Every workload, every instance, and every resource must be attributed to a specific service and a specific team. Without attribution, cost optimization is a global problem. With attribution, it becomes a local problem. Once each team can clearly see the costs associated with their own services, optimization becomes much easier. The team that owns the service has both the context and the incentive to improve its cost profile. They can see which components are expensive and decide what to do about them.” Cloud Trap 1: Getting Sucked Into Unsustainable Costs Next, they shifted focus to challenges unique to running applications in the cloud. To get the required cost savings, they had to navigate around a few cloud traps. The first trap was the allure and ease of using the cloud provider’s own products – particularly, databases. “They’re great to start with, but at scale they become painful in terms of cost,” explained Burmistrov. Scaling can be particularly costly for solutions that use a metered pricing model (e.g., based on traffic). “We used several GCP databases, including Bigtable and Spanner, but they became expensive and, more importantly, those costs were largely out of our control, we had little leverage over cost,” continued Burmistrov. “So we migrated many workloads to ScyllaDB. Today, more than 30 ScyllaDB clusters are deployed across the organization, including for our feature store use case. Besides stability and low latency, ScyllaDB gives us strong cost control. We can choose instance types, merge use cases into a single cluster, and run at high utilization. ScyllaDB works well at 80%, 90%, even close to 100% utilization, which gives us a very strong cost profile.” Cloud Trap 2: Kubernetes Wastage Next, the team tackled Kubernetes wastage. In Kubernetes, you deploy containers into pods, but you pay for nodes, which are actual VMs. You typically choose node types upfront – for example, nodes with 4 CPUs and 16 GB of memory. However, over time, workloads change, teams change, and these nodes are no longer optimal. Pods cannot fully occupy the nodes, either because of CPU, memory, or scheduling constraints. At that point, you’re paying for resources that aren’t being used. “You may decide, ‘Okay, let’s isolate workloads into separate node pools,’” explained Burmistrov. “But it’s not a solution because, as shown in the image below, we have one node completely wasted.” Simple isolation just moves waste around. The solution: dynamic node allocation based on workloads. Options include open source solutions like Karpenter, cloud-specific ones like GKE Autopilot, or commercial multi-cloud solutions like Cast AI. ShareChat likes Cast AI because it lets them fine-tune the configuration with respect to long-term commitments, spot instances, and other pricing impacts. Cloud Trap 3: Network Costs (“The Cloud Tax”) And then there’s network costs, which the ShareChat team calls “the cloud tax.” Massive apps like ShareChat deploy across multiple zones for high availability. However, multiple zones need to communicate with one another, and the traffic between zones comes with its own cost: Network Inter-Zone Egress (NIZE). That’s the outbound network traffic between availability zones within the same region. It’s billed separately by many cloud providers For example, if you deploy ScyllaDB in three GCP zones with two nodes per zone and use a non-ScyllaDB driver (not recommended), reads may cross zones both at the client level and inside the cluster. For traffic of volume T, you pay for T in NIZE. ScyllaDB drivers help here. As Burmistrov explained, “ Using ScyllaDB’s token-aware routing removes the extra hop inside the cluster, reducing inter-zone traffic. Using token-aware plus zone-aware routing ensures reads stay within the same zone, reducing inter-zone traffic to zero for reads.” Removing the extra hop cuts NIZE down to about 2/3 of T. On the write path, it’s a little different because replication is required. Burmistrov continued, “With three zones, writes generate 2 times T of inter-zone traffic. You can trade availability for cost by using two zones instead of three, reducing this to 1.5 times T. ScyllaDB also lets teams model each zone as a separate data center. In that case, for traffic T, you pay for T of inter-zone traffic. You generally can’t go lower unless you deploy in a single zone.” Tip: Oracle Cloud and Azure don’t charge for inter-zone traffic. If you use one of these cloud providers, you can deploy across three zones with zero inter-zone cost. Database Layer Optimization Another challenge: ShareChat’s generally stable read latencies would spike beyond their SLAs when write-heavy workloads would occur (like a backfill job, for example). The usual option – just scale up the database – would be simple, but the team wanted to reduce costs. Isolating reads and writes into separate data centers could technically improve performance, but that would be even worse from the cost perspective. That approach would double the infrastructure and probably lead to underutilization. Ultimately, they discovered and applied ScyllaDB’s workload prioritization. This lets them control how their various workloads compete for system resources. Malinge explained, “Under the hood, this relies on ScyllaDB’s thread-per-core architecture. Each core runs an async scheduler with multiple queues, each with its own priority. Workload prioritization maps directly to these priorities. Looking at the image, we have a very high share of compute for serving because we have strong SLAs on the read path, which is user facing. Most of the rest goes to less latency-sensitive workloads from asynchronous jobs, like writing the features to the database. We also keep a tiny share for manual queries (debugging, for example). This enables us to have exactly what we want – different latencies for reads and writes – and that allowed us to have the best possible serving for our features.” Feature Serving Layer Optimization That serving layer is a distributed gRPC (Google Remote Procedure Call) service responsible for caching and handling consumer requests. The team made some cost optimizations here too. One of the most impactful was a clever shortcut: instead of fully deserializing complex Protobuf messages, they treated repeated messages as serialized byte blobs. Since Protobuf encodes embedded messages using the same wire format as bytes, they could simply append these serialized records to merge data. With that, they could skip the heavy lift of fully unpacking and rebuilding the messages from scratch. That optimization alone could be the subject of an entire article; please see the video (starting at 18:15) for a detailed walkthrough of their approach. Malinge’s top lesson learned from optimizing this layer: “The first advice is the Captain Obvious one: don’t go blindly. We set up continuous profiling very early on. Nowadays, there are tons of tools available. Integration is super straightforward. You might recognize the vanilla GCP profiler in the image below. It’s just four lines of code to add in Go programs, and this has really helped us on our quest to reduce costs. We’ve actually unlocked more than a 50% reduction in compute by doing optimizations guided by continuous profiling.” Another lesson learned: if you’re using protobuf at high-RPS, serde is probably burning your CPU. Even with ~95% cache hit rates, the hot path was still deserializing protos, merging them, and serializing them again just to stitch responses together. Most of that work was pointless. The team ended up leveraging protobuf’s wire format, where repeated fields are just sequences of records and embedded messages can be treated as raw bytes (no deserialization). They switched to this “lazy” deserialization and merged cached protos without ever touching individual fields. For a practical example, see this repository david-sharechat/lazy-proto. And one parting tip: the wins compound fast. Benchmarking showed lazy merging was about 6 times faster, with about a third of the allocations. In production, that meant lower compute bills and better tail latency. Cost Optimization Actually Fostered Innovation The team’s smart work here underscores that cost optimizations don’t have to compromise the product. They can actually act as a catalyst for better system design. Malinge left us with this: “Our cost savings initiatives actually led to a lot of innovations. As I tried to represent in the left quadrant, there is a very unhealthy way to look at cost savings – one that focuses on shortcuts that negatively impact your product. The key is to stay on the right side of this quadrant, to aim for the sweet spot of reducing costs while improving the product.”

How Agoda Scaled Its Feature Store 50X

Lessons learned on data modeling, cache optimization, and hardware selection Agoda is the Singapore wing of Booking Holdings, the world’s leading provider of online travel (the brand behind Booking.com, Kayak, Priceline, etc.). From January 2023 to February 2025, Agoda server traffic spiked by 50 times. That’s fantastic business growth, but also the trigger for an interesting engineering challenge. Specifically, the team had to determine how to scale their ScyllaDB-backed online feature store to maintain 10ms P99 latencies despite this growth. Complicating the situation, traffic was highly bursty, cache hit rates were unpredictable and cold-cache scenarios could flood the database with duplicate read requests in a matter of seconds. At Monster Scale Summit 2025, Worakarn Isaratham, lead software engineer at Agoda,shared how they tackled the challenge. You can watch his entire talk or read the highlights below. Note: Monster Scale Summit is a free + virtual conference on extreme scale engineering, with a focus on data-intensive applications. Learn from luminaries like antirez, creator of Redis; Camille Fournier, author of “The Manager’s Path” and “Platform Engineering”; Martin Kleppmann, author of “Designing Data-Intensive Applications” and more than 50 others, including engineers from Discord, Disney, Pinterest, Rivian, LinkedIn, Nextdoor, and American Express. Register (free) and join us March 11-12 for some lively chats. Background: A feature store powered by ScyllaDB and DragonflyDB Agoda operates an in-house feature store that supports both offline model training and online inference. For anyone not familiar with feature stores, Isaratham provided a quick primer. A feature store is a centralized repository designed for managing and serving machine learning features. In the context of machine learning, a feature is a measurable property or characteristic of a data point used as input to models. The feature store helps manage features across the entire machine learning pipeline — from data ingestion to model training to inference. Feature stores are integral to Agoda’s business. Isaratham explained: “We’re a digital travel platform and some use cases are directly tied to our product. For example, we try to predict what users want to see, which hotels to recommend and what promotions to serve. On the more technical side, we use it for things like bot detection. The model uses traffic patterns to predict whether a user is a bot, and if so, we can block or deprioritize requests. So the feature store is essential for both product and engineering at Agoda. We’ve got tools to help create feature ingestion pipelines, model training, and the focus here: online feature serving.” One layer deeper into how it works: “We’re currently serving about 3.5 million entities per second (EPS) to our users. About half the features are served from cache within the client SDK, which we provide in Scala and Python. That means 1.7 million entities per second reach our application servers. These are written in Rust, running in our internal Kubernetes pods in our private cloud. From the app servers, we first check if features exist in the cache. We use DragonflyDB as a non-persistent centralized cache. If it’s not in the cache, then we go to ScyllaDB, our source of truth.” ScyllaDB is a high-performance database for workloads that require ultra-low latency at scale. Agoda’s current ScyllaDB cluster is deployed as six bare-metal nodes, replicated across four data centers. Under steady-state conditions, ScyllaDB serves about 200K entities per second across all data centers while meeting a service-level agreement (SLA) of 10ms P99 latency. (In practice, their latencies are typically even lower than their SLA requires.) Traffic growth and bursty workloads However, it wasn’t always that smooth and steady. Around mid-2023, they hit a major capacity problem when a new user wanted to onboard to the Agoda feature store. Their traffic pattern was super bursty: It was normally low, but occasionally flooded them with requests triggered by external signals. These were cold-cache scenarios, where the cache couldn’t help. Isaratham shared, “Bursts reached 120K EPS, which was 12 times the normal load back then.” Request duplication exacerbated the situation. Many identical requests arrived in quick succession. Instead of one request populating the cache and subsequent requests benefiting, all of them hit ScyllaDB at the same time — a classic cache stampede. They also retried failed requests until they succeeded – and that kept the pressure high. This load involved two data centers. One slowed down but remained online. The other was effectively taken out of service. More details from Worakarn: “On the bad DC, error rates were high and retries took 40 minutes to clear; on the good one, it only took a few minutes. Metrics showed that ScyllaDB read latency spiked into seconds instead of milliseconds.” Diagnosing the bottleneck So, they compared setups and found the difference: the problematic data center used SATA SSDs while the better one used NVMe SSDs. SATA (serial advanced technology attachment) was already old tech, even then. The team’s speed tests suggested that replacing the disks would yield a 10X read performance boost – and better write rates too. The team ordered new disks immediately. However, given that the disks wouldn’t arrive for months, they had to figure out a survival strategy until then. As Isaratham shared, “Capacity tests and projections showed that we would hit limits within eight or nine months even without new load – and sooner with it. So, we worked with users to add more aggressive client-side caching, remove unnecessary requests and smooth out bursts. That reduced the new load from 120K to 7K EPS. That was enough to keep things stable, but we were still close to the limit.” Surviving with SATA Given the imminent capacity cap, the team brainstormed ways to improve the situation while still on the existing SATA disks. Since you have to measure before you can improve, getting a clean baseline was the first order of business. “The earlier capacity numbers were from real-world traffic, which included caching effects,” Isaratham detailed. “We wanted to measure cold-cache performance directly. So, we created artificial load using one-time-use test entities, bypassed cache in queries and flushed caches before and after each run. The baseline read capacity on the bad DC was 5K EPS.” With that baseline set, the team considered a few different approaches. Data modeling All features from all feature sets were stored in a single table. The team hoped that splitting tables by feature set might improve locality and reduce read amplification. It didn’t. They were already partitioning by feature set and entity, so the logical reorganization didn’t change the physical layout. Compaction strategy Given a read-heavy workload with frequent updates, ScyllaDB documentation recommends the size-tiered compaction strategy to avoid write amplification. But the team was most concerned about read latency, so they took a different path. According to Worakarn: “We tried leveled compaction to reduce the number of SSTables per read. Tests showed fetching 1KB of data required reading 70KB from disk, so minimizing SSTable reads was key. Switching to leveled compaction improved throughput by about 50%.” Larger SSTable summaries ScyllaDB uses summary files to more efficiently navigate index files. Their size is controlled by the sstable_summary_ratio setting. Increasing the ratio enlarges the summary file and that reduces index reads at the cost of additional memory. The team increased the ratio by 20 times, which boosted capacity to 20K EPS. This yielded a nice 4X improvement, so they rolled it out immediately. What a difference a disk makes Finally, the NVMe disks arrived a few months later. This one change made a massive difference. Capacity jumped to 300K EPS, a staggering 50–60X improvement. The team rolled out improvements in stages: first, the summary ratio tweak (for 2–3X breathing room), then the NVMe upgrade (for 50X capacity). They didn’t apply leveled compaction in production because it only affects new tables and would require migration. Anyway, NVMe already solved the problem. After that, the team shifted focus to other areas: improving caching, rewriting the application in Rust and adding cache stampede prevention to reduce the load on ScyllaDB. They still revisit ScyllaDB occasionally for experiments. A couple of examples: New partitioning scheme: They tried partitioning by feature set only and clustering by entity. However, performance was actually worse, so they didn’t move forward with this idea. Data remodeling: The application originally stored one row per feature. Since all features for an entity are always read together, the team tested storing all features in a single row instead. This improved performance by 35%, but it requires a table migration. It’s on their list of things to do later. Lessons learned Isaratham wrapped it up as follows: “We’d been using ScyllaDB for years without realizing its full potential, mainly because we hadn’t set it up correctly. After upgrading disks, benchmarking and tuning data models, we finally reached proper usage. Getting the basics right – fast storage, knowing capacity, and matching data models to workload – made all the difference. That’s how ScyllaDB helped us achieve 50X scaling.”

How ShareChat Cut Recommendation Engine Costs 90%, Step by Step

From managed Google Cloud services to ScyllaDB, Redpanda, Flink, and beyond The engineering team at ShareChat, India’s largest social media platform, decided it was time to take a hard look at the mounting costs of running their recommendation system. The content deduplication service at its core was built on Google Cloud’s managed stack: Bigtable, Dataflow, Pub/Sub. That foundation helped ShareChat get up and running rapidly. However, costs spiked as they scaled to 300 million users generating billions of daily interactions. And the 40 ms P99 latency was not ideal for customer-facing apps. If you’ve followed ScyllaDB for a while, you probably know that the ShareChat engineers never shy away from monster scale engineering challenges. For example: Pulling off a zero-downtime migration of massive clusters (65+ TB, over 1M ops.sec) Using Kafka Streams for real-time, windowed aggregation of massive engagement events while keeping ScyllaDB writes minimal and reads sub-millisecond Scaling their ScyllaDB-based feature store 1000X without scaling the database – and then making it 10X cheaper Spoiler: They pulled off yet another masterful optimization with their deduplication service (90% cost reduction while lowering P99 latency from 40 ms to 8 ms). And again, they generously shared their strategies and lessons learned with the community – this time at Monster Scale Summit 2025. Watch Andrei Manakov (Senior Staff Software Engineer at ShareChat) detail the step-by-step optimizations, or read the highlights below. Note: Monster Scale Summit is a free + virtual conference on extreme scale engineering, with a focus on data-intensive applications. Learn from leaders like antirez, creator of Redis; Camille Fournier, author of “The Manager’s Path” and “Platform Engineering”; Martin Kleppmann, author of “Designing Data-Intensive Applications” and more than 50 others, including engineers from Discord, Disney, Pinterest, Rivian, LinkedIn, and Freshworks. ShareChat will be delivering two talks this year. Register and join us March 11-12 for some lively chats. Background ShareChat is India’s largest social media platform, with more than 180 million monthly active users and 2.5 billion interactions per month. Its sister product, Moj, is a short-video app similar to TikTok. Moj has over 160 million monthly active users and 6 billion video plays daily. Both apps rely on a smart feed of personalized recommendations. Effective deduplication within that feed is critical for engagement. If users keep seeing the same items over and over again, they’re less likely to return. The deduplication service filters out content users have already seen before the ranking models ever score it. It’s a standard part of the feed pipeline: fetch candidates from vector databases, filter them, score and rank using ML models, then return the top results. The system tracks two distinct concepts: Sent posts and Viewed posts. Sent posts are delivered to the user’s device. They may or may not have been seen. Viewed posts are items that the user has actually seen. Why both? Viewed posts are tracked on user devices and sent to the backend periodically, passing through queues before being persisted in the database. This creates a delay between the time the event occurs and when it’s stored. That delay makes it possible to send duplicate posts in sequential feed requests. Sent posts solve this by being stored immediately before the response goes to the client. That prevents duplicates from sneaking into subsequent feed refreshes. So why not just use Sent posts? That would create a different problem: “In a feed response, we send multiple posts at once, but there’s no guarantee that the user will actually see all of them,” Andrei explained. “If we rely only on Sent posts, we might deduplicate relevant content that the user never saw…and that can negatively affect feed quality.” The Initial Architecture: Google Cloud, Bigtable, Dataflow, Pub/Sub + Redis The initial architecture was built entirely on Google Cloud. View events flowed into Google Pub/Sub. A Dataflow job aggregated these events per user over time windows and wrote batches into Bigtable. Sent posts lived in Redis as linked-list queues. A deduplication service (written in Go and running on Kubernetes) combined both data sources with feed logic and returned results to the client. It worked. But it was expensive, and latency was a suboptimal 40 milliseconds. Moving to Flink, ScyllaDB, and Redpanda The first optimization step was replacing Google’s managed services with specialized, efficiency-oriented alternatives. Andrei shared, “We replaced Google Pub/Sub with Redpanda, which is Kafka-compatible and more efficient. We replaced Google Bigtable with ScyllaDB, and Google Dataflow with Apache Flink. These changes alone brought costs down to about 60% of the original level.” Dataflow to Flink Dataflow was used for streaming jobs and aggregation events. The team enjoyed having a fully managed service with a simplified API that was nice for prototyping. But at scale, it grew costly. Moving to Apache Flink meant they had to run and maintain their own cluster. Still, it was worth it. They cut the cost of streaming jobs 93%. Bigtable to ScyllaDB On the database front, the team moved from Bigtable to ScyllaDB after comparing their tradeoffs, similarities, and differences. Andrei explained, “First of all, both databases are optimized for heavy write workload. Also, we are able to use a quite similar schema: partition by user ID, store all posts in a single column.” But there were key differences. Bigtable’s autoscaling was valuable given ShareChat’s traffic variability (nighttime traffic drops to less than 10% of peak). At that time, ScyllaDB did not support autoscaling, so ShareChat had to provision clusters for peak load and keep additional buffers for spikes. Bigtable also has advanced garbage collection rules that allow limits on rows per partition and TTL definitions. ScyllaDB supports this via TTL, and it’s possible to encounter large partitions if a single user generates many events. For ShareChat’s needs, ScyllaDB was ultimately the best fit. Andrei explained, “ScyllaDB uses a shard-per-core architecture and is highly optimized. It delivers better performance and lower latency than Bigtable. Even a fully provisioned ScyllaDB cluster is significantly cheaper than Bigtable with autoscaling.” One feature proved particularly valuable: workload prioritization. Per Andrei, “ScyllaDB also supports workload prioritization, which is unique to ScyllaDB. It allows resource-based isolation between different workloads. That’s quite critical in our case. The read path directly affects user experience, so we can’t allow any latency spikes there.” If there’s ever any resource contention, workload prioritization can prioritize their reads over writes. PubSub to Redpanda For the event queue, the team also moved from PubSub to Redpanda, which they found to be reliable as well as cost-efficient. It’s a fully Kafka-compatible product, so it was easy to integrate. Like ScyllaDB, it’s written in C++ with a thread-per-core architecture (both are built on Seastar), Redpanda is highly efficient. In ShareChat’s use case, queue costs were reduced by ~66%. Resource Allocation and Database Efficiency With the new foundational infrastructure in place, the team turned to resource optimization. First came Kubernetes resource tuning. Andrei detailed, “We focused on three areas, starting with Kubernetes resources. We adjusted HPA settings and increased target CPU utilization, which resulted in higher CPU usage per pod.” That change had a useful side effect: it exposed inefficiencies that had been lurking at lower utilization levels, including garbage collection pressure and mutex contention. The team fixed those issues before moving on. Apache Flink required its own autoscaling adjustments. A Flink job consists of tasks in an execution pipeline, and the autoscaler monitors task busy time and adjusts parallelism accordingly. But scaling events recreate the job and temporarily halt processing. “With large pipelines, this can lead to instability,” Andrei shared. “For some systems, we had to fork the autoscaler, but for this application, it worked smoothly.” They also discovered they could squeeze even more from their ScyllaDB cluster, so they rightsized it: “We provisioned ScyllaDB clusters targeting around 60% CPU utilization for ScyllaDB. But after learning more, we realized that ScyllaDB is built to utilize all available resources and prioritizes critical tasks over non-critical ones. That means it’s easy to reach 100% CPU without any system degradation by allocating the remaining CPU resources to maintenance tasks. We started targeting 60% CPU utilization only for critical tasks and we could reduce the cluster size even more without any latency degradations.” The details on how they managed this are explained in Andrei’s blog post, ScyllaDB: How To Analyse Cluster Capacity. The combination of those steps brought them down to 45% of the initial cost. Questioning Assumptions: Retention Windows Example Next up: domain-specific engineering improvements. “In theory, every system should have a clear problem statement, defined metrics, and well-understood use cases,” Andrei said. “In reality, systems often contain unclear logic, unexplained parameters, and legacy decisions. We decided to question these assumptions.” One example that’s easy to explain without too much context is how they chose the right window size parameters for their sliding window queries The team focused on the retention window parameters (M and N) that defined how much history to keep per user. Their posts are partitioned by user ID and ordered by time. That creates a sliding window. Andrei explained, “The question is how to choose the right values. The answer is A/B testing.” So, they split users into groups with different parameter configurations and measured the impact. It turned out that their previous windows were too large, so they tightened them. The results per Andrei: “After running more than 10 A/B tests, we reduced latency from 40 milliseconds to 8 milliseconds, were able to reduce the database size, and reduced costs to 30% of the initial level.” Storage Size Optimization through Delta Based Compression The next optimization targeted storage size directly. ScyllaDB read performance depends heavily on cache behavior. Smaller data size means more rows fit into cache, leading to higher cache hit rates, fewer disk reads, and lower CPU usage. The team stores post IDs as integers and evaluated several compression approaches: Bloom filters, Roaring Bitmaps, and delta-based compression. Andrei’s rundown: “Bloom filters are probabilistic and can produce false positives, which would cause duplicates in the feed. Roaring Bitmaps performed poorly for our data distribution. We chose delta-based compression.” With that approach, they sort post IDs, compute deltas between adjacent values, and encode them using Protocol Buffers, which store only meaningful bits for integers. As a result, they reduced storage by 60%, CPU usage by 25%, and overall cost to 20% of the original. Sent Post Optimization: Replacing Redis with ScyllaDB The next optimization involved Sent posts. Previously, Viewed posts were stored in ScyllaDB, while Sent posts were stored in Redis. But then another a-ha moment. Andrei shared, “After compression and cluster resizing, we realized that ScyllaDB was cheaper per gigabyte than Redis. Since Viewed posts and Sent posts are conceptually similar and differ mainly in retention limits, we moved Sent posts to ScyllaDB. This reduced costs to 17% and also improved system maintainability by eliminating one database.” Cloud-Specific Cost Optimizations The final phase focused on cloud provider-specific optimizations – more specifically, systematically eliminating waste in resource allocation and network traffic. The first target was compute provisioning. Every cloud provider has different provision models for virtual machines. Spot instances can be evicted at any time, but they are ideal for stateless workloads. ShareChat used spot instances with on-demand nodes as backup. Tools like Cast AI helped them manage this setup and significantly reduced costs across the organization. Next was inter-zone traffic. Zones are separate data centers within a region, and traffic between them is usually charged. The team cut cross-zone traffic for microservices using follower fetching for Kafka, token-aware and rack-aware routing for ScyllaDB, and topology-aware routing in Kubernetes: For Kafka consumers, enabling the follower fetching feature ensures the consumer connects to a broker in the same zone. That eliminates inter-zone traffic between the consumer and the queue. For ScyllaDB, combining token-aware and rack-aware policies in the driver ensures the application connects to a coordinator in the same zone. Since the coordinator will actually have the data, there is less traffic inside the cluster. For Kubernetes, topology-aware routing uses a best-effort approach, so approximately 5–15% of inter-zone traffic remains. The team built their own service mesh on Envoy proxy to eliminate that remaining traffic entirely. The path to 90% cost savings Over the course of a year, ShareChat’s team reduced infrastructure costs to 10% of the original level while cutting latency from 40 milliseconds to 8 milliseconds. To recap how they got here: Phase Action taken Remaining cost Optimize stack Migrated from GCP managed services to ScyllaDB, Flink & Redpanda 60% Resource tuning Tuned Kubernetes HPA, optimized Flink’s native autoscaling, and rightsized the ScyllaDB cluster 45% Domain Logic A/B testing (e.g., tightening retention sliding windows) 30% Data Efficiency Implemented delta-based compression for Post IDs 20% Database consolidation Migrated Sent posts from Redis to ScyllaDB 17% Cloud Ops Leveraged Spot instances and eliminated inter-zone traffic 10% What’s next on the ShareChat optimization agenda? Stop by Monster Scale Summit 2026 (free + virtual) to hear directly from Andrei.

Books by Monster SCALE Summit Speakers: Data-Intensive Applications & Beyond

Monster Scale Summit speakers have amassed a rather impressive list of publications, including quite a few books If you’ve seen the Monster Scale Summit agenda, you know that the stars have aligned nicely. In just two half days, from anywhere you like, you can learn from 60+ outstanding speakers exploring extreme scale engineering challenges from a variety of angles (distributed databases, real-time data, AI, Rust…) If you read the bios of our speakers, you’ll note that many have written books. This blog highlights 11 Monster Scale reads. Once you register for the conference (it’s free + virtual), you’ll gain 30-day full access to the complete O’Reilly library (thanks to O’Reilly, a conference media sponsor). And Manning Publications is also a media sponsor. They are offering the Monster SCALE community a nice 45% discount on all Manning books. Conference attendees who participate in the speaker chat will be eligible to win print books (courtesy of O’Reilly) and eBook bundles (courtesy of Manning). March 4 update: There’s also a special Manning offer (50% off!) for Monster Scale bundles. See the agenda and register – it’s free Platform Engineering: A Guide for Technical, Product, and People Leaders By Camille Fournier and Ian Nowland October 2024 Bookshop.org | Amazon | O’Reilly Until recently, infrastructure was the backbone of organizations operating software they developed in-house. But now that cloud vendors run the computers, companies can finally bring the benefits of agile custom-centricity to their own developers. Adding product management to infrastructure organizations is now all the rage. But how’s that possible when infrastructure is still the operational layer of the company? This practical book guides engineers, managers, product managers, and leaders through the shifts required to become a modern platform-led organization. You’ll learn what platform engineering is “and isn’t” and what benefits and value it brings to developers and teams. You’ll understand what it means to approach your platform as a product and learn some of the most common technical and managerial barriers to success. With this book, you’ll: Cultivate a platform-as-product, developer-centric mindset Learn what platform engineering teams are and are not Start the process of adopting platform engineering within your organization Discover what it takes to become a product manager for a platform team Understand the challenges that emerge when you scale platforms Automate processes and self-service infrastructure to speed development and improve developer experience Build out, hire, manage, and advocate for a platform team Camille is presenting “What Engineering Leaders Get Wrong About Scale”   Designing Data-Intensive Applications, 2nd Edition By Martin Kleppmann and Chris Riccomini Bookshop.org | Amazon | O’Reilly February 2026 Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition, there’s an overwhelming variety of tools and analytical systems, including relational databases, NoSQL datastores, plus data warehouses and data lakes. What are the right choices for your application? How do you make sense of all these buzzwords? In this second edition, authors Martin Kleppmann and Chris Riccomini build on the foundation laid in the acclaimed first edition, integrating new technologies and emerging trends. You’ll be guided through the maze of decisions and trade-offs involved in building a modern data system, from choosing the right tools like Spark and Flink to understanding the intricacies of data laws like the GDPR. Peer under the hood of the systems you already use, and learn to use them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures Martin and Chris are featured in “Fireside Chat: Designing Data-Intensive Applications, Second Edition”   The Coder Cafe: 66 Timeless Concepts for Software Engineers By Teiva Harsanyi ETA Summer 2026 Manning (use code SCALE2026 for 45%) Great software developers—even the proverbial greybeards—get a little better every day by adding knowledge and skills continuously. This new book invites you to share a cup of coffee with senior Google engineer Teiva Harsanyi as he shows you how to make your code more readable, use unit tests as documentation, reduce latency, navigate complex systems, and more. The Coder Cafe introduces 66 vital software engineering concepts that will upgrade your day-to-day practice, regardless of your skill level. You’ll find focused explanations—each five pages or less—on everything from foundational data structures to distributed architecture. These timeless concepts are the perfect way to turn your coffee break into a high-impact career boost. Generate property-based tests to expose hidden edge cases automatically Explore the CAP and PACELC theorems to balance consistency and availability trade-offs Design graceful-degradation strategies to keep systems usable under failure Leverage Bloom filters to perform fast, memory-efficient membership checks Cultivate lateral thinking to uncover unconventional solutions Teiva is presenting “Working on Complex Systems”   Think Distributed Systems By Dominik Tornow August 2025 Bookshop.org | Amazon | Manning (use code SCALE2026 for 45%) All modern software is distributed. Let’s say that again—all modern software is distributed. Whether you’re building mobile utilities, microservices, or massive cloud native enterprise applications, creating efficient distributed systems requires you to think differently about failure, performance, network services, resource usage, latency, and much more. This clearly-written book guides you into the mindset you’ll need to design, develop, and deploy scalable and reliable distributed systems. In Think Distributed Systems you’ll find a beautifully illustrated collection of mental models for: Correctness, scalability, and reliability Failure tolerance, detection, and mitigation Message processing Partitioning and replication Consensus Dominik is presenting “Systems Engineering for Agentic Applications,” which is the topic of his upcoming book   Database Performance at Scale By Felipe Cardeneti Mendes, Piotr Sarna, Pavel Emelyanov, and Cynthia Dunlop September 2023 Amazon | ScyllaDB (free book) Discover critical considerations and best practices for improving database performance based on what has worked, and failed, across thousands of teams and use cases in the field. This book provides practical guidance for understanding the database-related opportunities, trade-offs, and traps you might encounter while trying to optimize data-intensive applications for high throughput and low latency. Whether you’re building a new system from the ground up or trying to optimize an existing use case for increased demand, this book covers the essentials. The ultimate goal of the book is to help you discover new ways to optimize database performance for your team’s specific use cases, requirements, and expectations. Understand often overlooked factors that impact database performance at scale Recognize data-related performance and scalability challenges associated with your project Select a database architecture that’s suited to your workloads, use cases, and requirements Avoid common mistakes that could impede your long-term agility and growth Jumpstart teamwide adoption of best practices for optimizing database performance at scale Felipe is presenting “The Engineering Behind ScyllaDB’s Efficiency” and will be hosting the ScyllaDB Lounge   Writing for Developers: Blogs That Get Read By Piotr Sarna and Cynthia Dunlop December 2025 Bookshop.org | Amazon | Manning (use code SCALE2026 for 45%) Think about how many times you’ve read an engineering blog that’s sparked a new idea, demystified a technology, or saved you from going down a disastrous path. That’s the power of a well-crafted technical article! This practical guide shows you how to create content your fellow developers will love to read and share. Writing for Developers introduces seven popular “patterns” for modern engineering blogs—such as “The Bug Hunt”, “We Rewrote It in X”, and “How We Built It”—and helps you match these patterns with your ideas. The book covers the entire writing process, from brainstorming, planning, and revising, to promoting your blog and leveraging it into further opportunities. You’ll learn through detailed examples, methodical strategies, and a “punk rock DIY attitude!”: Pinpoint topics that make intriguing posts Apply popular blog post design patterns Rapidly plan, draft, and optimize blog posts Make your content clearer and more convincing to technical readers Tap AI for revision while avoiding misuses and abuses Increase the impact of all your technical communications This book features the work of numerous Monster Scale Summit speakers, including Joran Greef and Sanchay Javeria. There’s also a reference to Pat Helland in Bryan Cantrill’s foreword.   ScyllaDB in Action Bo Ingram October 2024 Bookshop.org | Amazon | Manning  (use code SCALE2026 for 45%) | ScyllaDB (free chapters) ScyllaDB in Action is your guide to everything you need to know about ScyllaDB, from your very first queries to running it in a production environment. It starts you with the basics of creating, reading, and deleting data and expands your knowledge from there. You’ll soon have mastered everything you need to build, maintain, and run an effective and efficient database. This book teaches you ScyllaDB the best way—through hands-on examples. Dive into the node-based architecture of ScyllaDB to understand how its distributed systems work, how you can troubleshoot problems, and how you can constantly improve performance. You’ll learn how to: Read, write, and delete data in ScyllaDB Design database schemas for ScyllaDB Write performant queries against ScyllaDB Connect and query a ScyllaDB cluster from an application Configure, monitor, and operate ScyllaDB in production Bo’s colleagues Ethan Donowitz and Peter French are presenting “How Discord Automates Database Operations at Scale”   Latency: Reduce Delay in Software Systems By Pekka Enberg Bookshop.org | Amazon | Manning (use code SCALE2026 for 45%) Slow responses can kill good software. Whether it’s recovering microseconds lost while routing messages on a server or speeding up page loads that keep users waiting, finding and fixing latency can be a frustrating part of your work as a developer. This one-of-a-kind book shows you how to spot, understand, and respond to latency wherever it appears in your applications and infrastructure. This book balances theory with practical implementations, turning academic research into useful techniques you can apply to your projects. In Latency you’ll learn: What latency is—and what it is not How to model and measure latency Organizing your application data for low latency Making your code run faster Hiding latency when you can’t reduce it Pekka and his Turso teammates are frequent speakers at Monster Scale Summit’s sister conference, P99 CONF   100 Go Mistakes and How to Avoid Them By Teiva Harsanyi August 2022 Bookshop.org | Amazon | Manning (use code SCALE2026 for 45%) 100 Go Mistakes and How to Avoid Them puts a spotlight on common errors in Go code you might not even know you’re making. You’ll explore key areas of the language such as concurrency, testing, data structures, and more—and learn how to avoid and fix mistakes in your own projects. As you go, you’ll navigate the tricky bits of handling JSON data and HTTP services, discover best practices for Go code organization, and learn how to use slices efficiently. This book shows you how to: Dodge the most common mistakes made by Go developers Structure and organize your Go application Handle data and control structures efficiently Deal with errors in an idiomatic manner Improve your concurrency skills Optimize your code Make your application production-ready and improve testing quality Teiva is presenting “Working on Complex Systems”   The Manager’s Path: A Guide for Tech Leaders Navigating Growth and Change By Camille Fournier May 2017 Bookshop.org | Amazon | O’Reilly Managing people is difficult wherever you work. But in the tech industry, where management is also a technical discipline, the learning curve can be brutal—especially when there are few tools, texts, and frameworks to help you. In this practical guide, author Camille Fournier (tech lead turned CTO) takes you through each stage in the journey from engineer to technical manager. From mentoring interns to working with senior staff, you’ll get actionable advice for approaching various obstacles in your path. This book is ideal whether you’re a new manager, a mentor, or a more experienced leader looking for fresh advice. Pick up this book and learn how to become a better manager and leader in your organization. Begin by exploring what you expect from a manager Understand what it takes to be a good mentor, and a good tech lead Learn how to manage individual members while remaining focused on the entire team Understand how to manage yourself and avoid common pitfalls that challenge many leaders Manage multiple teams and learn how to manage managers Learn how to build and bootstrap a unifying culture in teams Camille is presenting “What Engineering Leaders Get Wrong About Scale”   The Missing README: A Guide for the New Software Engineer By Chris Riccomini and Dmitriy Ryaboy Bookshop.org | Amazon | O’Reilly August 2021 For new software engineers, knowing how to program is only half the battle. You’ll quickly find that many of the skills and processes key to your success are not taught in any school or bootcamp. The Missing README fills in that gap—a distillation of workplace lessons, best practices, and engineering fundamentals that the authors have taught rookie developers at top companies for more than a decade. Early chapters explain what to expect when you begin your career at a company. The book’s middle section expands your technical education, teaching you how to work with existing codebases, address and prevent technical debt, write production-grade software, manage dependencies, test effectively, do code reviews, safely deploy software, design evolvable architectures, and handle incidents when you’re on-call. Additional chapters cover planning and interpersonal skills such as Agile planning, working effectively with your manager, and growing to senior levels and beyond. You’ll learn: How to use the legacy code change algorithm, and leave code cleaner than you found it How to write operable code with logging, metrics, configuration, and defensive programming How to write deterministic tests, submit code reviews, and give feedback on other people’s code The technical design process, including experiments, problem definition, documentation, and collaboration What to do when you are on-call, and how to navigate production incidents Architectural techniques that make code change easier Agile development practices like sprint planning, stand-ups, and retrospectives Chris is featured in “Fireside Chat: Designing Data-Intensive Applications, Second Edition” (with Martin Kleppmann)