17 March 2026, 12:23 pm by
ScyllaDB
Why integrated gauges are a better approach for measuring
frequently changing values Many performance metrics and
system parameters are inherently volatile or fluctuate rapidly.
When using a monitoring system that periodically “scrapes” (polls)
a target for its current metric value, the collected data point is
merely a snapshot of the system’s state at that precise moment. It
doesn’t reveal much about what’s actually happening in that area.
Sometimes it’s possible to overcome this problem by accumulating
those values somehow – for example, by using histograms or
exporting a derived monotonically increasing counter. This article
suggests yet another way to extend this approach for a broader set
of frequently changing parameters. The Problem with Instantaneous
Metrics For rapidly changing values such as queue lengths or
request latencies, a single snapshot is most often misleading. If
the scrape happens during a peak load, the value will appear high.
If it happens during an idle moment, the value will appear low.
Over time, the sampled data usually does not accurately represent
the system’s true average behavior, leading to misinformed alerting
or capacity planning. This phenomenon is apparent when examining
synthetically generated data. Take a look at the example below. On
that plot, there’s a randomly generated XY series. The horizontal
axis represents the event number, the vertical axis represents the
frequently fluctuating value. The “value” changes on every event.
Fig.1 Random frequently changing data Now let’s thin the
data out and see what would happen if we show only every 40th
point, as if it were a real monitoring system that captures the
value at a notably lower rate. The next plot shows what it would
look like.
Fig.2 Sampling every 40th point from the data Apparently,
this is a very poor approximation of the first plot. It becomes
crystal clear how poor it is if we zoom into the range [80-120] of
both plots. For the latter (dispersed) plot, we’ll see a couple of
dots with values of 2 and 0. But the former (original) plot would
reveal drastically different information, shown below.
Fig.3 Zoom-in range [80-120] from the data Now remember
that the problem revealed itself at the ratio of the real rate to
scrape rate being just 40. In real systems, internal parameters may
change thousands of times per second while the scrape period is
minutes. In that case, you end up scraping less than a fraction of
a percent. A better way to monitor the frequently changing data is
needed…badly. Histograms Request latency is one example of a
statistic that usually contains many data points per second.
Histograms can help you see how slow or fast the requests are –
without falling into the problem described above. A monitoring
histogram is a type of metric that samples observations and counts
them in configurable buckets. Instead of recording just a single
average or maximum latency value, a histogram provides a
distribution of these values over a given period. By analyzing the
bucket counts, you can calculate percentiles, including p50 (known
as the median value) or p99 (the value below which 99% of all
requests fell). This provides a much more robust and actionable
view of performance volatility than simple instantaneous gauges or
averages.
Fig.4 Histogram built from data X-points It’s a static
snapshot of the data at the last “time point.” If the values change
over time, modern monitoring systems can offer various data views,
such as time-series percentiles or heatmaps, to provide the
necessary insight into the data. Histograms, however, have a
drawback: They require system memory to work, node networking
throughput to get reported, and monitoring disk space to be stored.
A histogram of N buckets consumes N times more resources than a
single value. Addable latencies When it comes to reporting
latencies, a good compromise solution between efficiency and
informativeness is “counter” metrics coupled with the rate()
function. Counters and rates Counter metrics are one of the
simplest and most fundamental metric types in the Prometheus
monitoring system. It’s a cumulative metric that represents a
single monotonically increasing counter whose value can only
increase or be reset to zero upon a restart of the monitored
target. Because a counter itself is just an ever-increasing total,
it is rarely analyzed directly. Instead, one derives another
meaningful metric from counters, most notably a “rate”. Below is a
mathematical description of how the prometheus rate() function
works. Internally, the system calculates the total sum of the
observed values.
After
scraping the value several times, the monitoring system calculates
the “rate” function of the value, which takes the total value
difference from the previous scrape, divided by the time passed
since then.
The
rated value thus shows the average value over the specified period
of time. Now let’s get back to latencies. To report average latency
over the scraping period using counters, the system should
accumulate two of them: the total sum of request latencies (e.g.,
in milliseconds) and the total number of completed requests. The
monitoring system would then “rate” both counters to get their
averages and divide them, thus showing the average per-request
latency. It looks like this:
Since
the total number of requests served is useful on its own to see the
IOPS value, this method is as efficient as exporting the immediate
latency value. Yet, it provides a much more stable and
representative metric than an instantaneous gauge. We can try to
adopt counters to the artificial data that was introduced earlier.
The plot below shows what the averaged values for the sub-ranges of
length 40 (the sampling step from above) would look like. It
differs greatly from the sampled plot of Figure 2 and provides a
more accurate approximation of the real data set.
Fig.5 Average values for sub-ranges of length 40 Of
course, this method has its drawbacks when compared to histograms.
It swallows latency spikes that last for a short duration, as
compared to the scraping period. The shorter the spike duration,
the more likely it would go unnoticed. Queue length Summing up
individual values is not always possible though. Queue length is
another example of a statistic that’s also volatile and may contain
thousands of data points. For example, this could be the queue of
tasks to be executed or the queue of IO requests. Whenever an entry
is added to the queue, its length increases. When an entry is
removed, the length decreases – but it makes little sense to add up
the updated queue length elsewhere. If we return to Figure 3
showing the real values in the range of [80-120], what would the
average queue length over that period be? Apparently, it’s the sum
of individual values divided by their number. But even though we
understand what the “average request latency” is, the idea of
“average queue length” is harder to accept, mainly because the X
value of the above data is usually not an “event”, but it’s rather
a “time”. And if, for example, a queue was empty most of the time
and then changed to N elements for a few milliseconds, we’d have
just two events that changed the queue. Seeing the average queue
length of N/2 is counterintuitive. Some time ago,
we researched how the “real” length of an IO queue can be
implicitly derived from the net sum of request latencies. Here,
we’ll show another approach to getting an idea of the queue length.
Integrated gauge The approach is a generalization of averaging the
sum of individual data points from the data set using counters and
the “rate” function. Let’s return to our synthetic data when it was
summed and rated (Figure 3) and look at the corresponding math from
a slightly different angle. If we treat each value point as a bar
with the width of 1 and the height being equal to the value itself,
we can see that: The total value is the total sum of the bars’
squares The difference between two scraped total values is the sum
of squares of bars between two scraping points The number of points
in the range equals the distance between those two scraping points
Fig.6 Geometrical interpretation of the exported counters
Figure 6 shows this area in green. The average value is thus the
height of the imaginary rectangular whose width equals the scraping
period. If the distance between two adjacent points is not 1, but
some duration, the interpretation still works. We can still
calculate the square under the plot and divide it by the area width
to get its “average” value. But we need to apply two changes.
First, the square of the individual bar is now the product of the
value itself and the duration of time passed from the previous
point. Think of it this way: previously, the duration was 1, so the
multiplication was invisible.Now, it is explicit.
Second, we no longer need to count the number of points in the
range. Instead, the denominator of the averaging function will be
the duration between scraping points. In other words, previously
the number of points was the sum of 1-s (one-s) between scraping
points; now, it’s the sum of durations – total scraping period
It’s
now clear that the average value is the result of applying the
“rate()” function to the total value. Naming problem (a side note)
There are two hard problems in computer science: cache
invalidation, naming, and off-by-one errors. Here we had to face
one of them – how to name the newly introduced metrics? It is the
queue size multiplied by the time the queue exists in such a state.
A very close analogy comes from project management. There’s a
parameter called “
human-hour” and it’s
widely used to estimate the resources needed to accomplish some
task. In physics, there are not many units that measure something
multiplied by time. But there are plenty of parameters that measure
something
divided by time, like velocity
(distance-per-second), electric current (charge-per-second) or
power (energy-per-second). Even though it’s possible to define
(e.g., charge as current multiplied by time), it’s still
charge that’s the primary unit and
current is
secondary. So far I’ve found quite a few examples of such units. In
classical physics, one is called “
action” and
measures energy-times-time, and another one is “
impulse,”
which is measured in newtons-times-seconds. Finally, in kinematics
there’s a thing called “
absement,” which is
literally meters-times-seconds. That describes the measure of an
object displacement from its initial position. Integrated queue
length The approach described above was recently
[applied] to
the Seastar IO stack – the length of IO classes’ request queues was
patched to account for and export the integrated value. Below is
the screenshot of a dashboard showing the comparison of both
compaction class queue lengths, randomly scraped and integrated.
Interesting (but maybe hard to accept) are the fraction values of
the queue length. That’s OK, since the number shows the
average queue length over a scrape period of a few
minutes. A notable advantage of the integrated metric can be seen
on the bottom two plots that show the length of the in-disk queue.
Only the integrated metric (bottom right plot) shows that disk load
actually
decreased over time, while the randomly scraped
one (bottom left plot) just tells us that the disk wasn’t
idling…but not more than that. This effect of “hiding” the reality
from the observer can also be seen in the plot that shows how the
commitlog class queue length was changing over time.
Here we can see a clear rectangular “spike” in the integrated disk
queue length plot. That spike was completely hidden by the randomly
scraped one. Similarly, the software queue length (upper pair of
plots) dynamics is only visible from the integrated counter (upper
right plot). Conclusion Relying solely on instantaneous metrics, or
gauges, to monitor rapidly fluctuating system parameters very often
leads to misleading data that poorly reflects the system’s actual
behavior over a period of time. While solutions like histograms
offer better statistical insights, they incur notable resource
overhead. For metrics where the individual values can be aggregated
(like request latencies), converting them into cumulative counters
and deriving a rate provides a much more stable and representative
average over the scraping interval. This offers a very efficient
compromise between informative granularity and resource
consumption. For metrics where instantaneous values cannot be
simply summed (such as queue lengths), the concept of the
Integrated Gauge offers a generalization of the same
efficiency. In essence, by treating the gauge not as a
point-in-time value but as a continuously accumulating measure of
time-at-value, integral gauges provide a highly reliable and
definitive representation of a parameter’s average behavior across
any given measurement interval.
9 March 2026, 12:17 pm by
ScyllaDB
How Natura built a real-time data pipeline to support
orders, analytics, and operations Natura, one of the
world’s largest cosmetics companies, relies on a network of
millions of beauty consultants generating a massive amount of
orders, events, and business data every day. From an infrastructure
perspective, this requires processing vast amounts of data to
support orders, campaigns, online KPIs, predictive analytics, and
commercial operations. Natura’s Rodrigo Luchini (Software
Engineering Manager) and Marcus Monteiro (Senior Engineering Tech
Lead) shared the technical challenges and architecture behind these
use cases at Monster SCALE Summit 2025. Fitting for the conference
theme, they explained how the team powers real-time sales insights
at massive scale by building upon ScyllaDB’s CDC Source Connector.
About Natura Rodrigo kicked off the talk with a bit of background
on Natura. Natura was founded in 1969 by Antônio Luiz Seabra. It is
a Brazilian multinational cosmetics and personal care company known
for its commitment to sustainability and ethical sourcing. They
were one of the first companies to focus on products connected to
beauty, health, and self-care. Natura has three core pillars:
Sustainability: They are committed to producing products without
animal testing and to supporting local producers, including
communities in the Amazon rainforest. People: They value diversity
and believe in the power of relationships. This belief drives how
they work at Natura and is reflected in their beauty consultant
network. Technology: They invest heavily in advanced engineering as
well as product development. The Technical Challenge: Managing
Massive Data Volume with Real-Time Updates The first challenge they
face is integrating a high volume of data (just imagine millions of
beauty consultants generating data and information every single
day). The second challenge is having real-time updated data
processing for different consumers across disconnected systems.
Rodrigo explained, “Imagine two different challenges: creating and
generating real-time data, and at the same time, delivering it to
the network of consultants and for various purposes within Natura.”
This is where ScyllaDB comes in. Here’s a look at the initial
architecture flow, focusing on the order and API flow.
As soon as a beauty consultant places an order, they need to
update and process the related data immediately. This is possible
for three main reasons: As Rodrigo put it, “The first is that
ScyllaDB is a fast database infrastructure that operates in a
resilient way. The second is that it is a robust database that
replicates data across multiple nodes, which ensures data
consistency and reliability. The last but not least reason is
scalability – it’s capable of supporting billions of data
processing operations.” The Natura team architected their system as
follows:
In addition to the order ingestion flow mentioned above,
there’s an order metrics flow closely connected to ScyllaDB Cloud
(ScyllaDB’s fully-managed database-as-a-service offering), as well
as the ScyllaDB CDC Source Connector (A Kafka source connector
capturing ScyllaDB CDC changes). Together, this enables different
use cases in their internal systems. For example, it’s used to
determine each beauty consultant’s business plan and also to report
data across the direct sales network, including beauty consultants,
leaders, and managers. These real-time reports drive business
metrics up the chain for accurate, just-in-time decisions.
Additionally, the data is used to define strategy and determine
what products to offer customers. Contributing to the ScyllaDB CDC
Source Connector When Natura started testing the ScyllaDB
Connector, they noticed a significant spike in the cluster’s
resource consumption. This continued until the CDC log table was
fully processed, then returned to normal. At that point, the team
took a step back. After reviewing the documentation, they learned
that the connector operates with small windows (15 seconds by
default) for reading the CDC log tables and sending the results to
Kafka. However, at startup, these windows are actually based on the
table TTL, which ranged from one to three days in Natura’s use
case. Marcus shared: “Now imagine the impact. A massive amount of
data, thousands of partitions, and the database reading all of it
and staying in that state until the connector catches up to the
current time window. So we asked ourselves: ‘Do we really need all
the data?’ No. We had already run a full, readable load process for
the ScyllaDB tables. What we really needed were just incremental
changes, not the last three days, not the last 24 hours, just the
last 15 minutes.” So, as ScyllaDB was adding this feature to the
GitHub repo, the Natura team created a new option:
scylla.custom.window.start. This let them tell the connector
exactly where to start, so they could avoid unnecessary reads and
relieve unnecessary load on the database. Marcus wrapped up the
talk with the payoff: “This results in a highly efficient real-time
data capture system that streams the CDC events straight to Kafka.
From there, we can do anything—consume the data, store it, or move
it to any database. This is a gamechanger. With this optimization,
we unlocked a new level of efficiency and scalability, and this
made a real difference for us.”
4 March 2026, 1:16 pm by
ScyllaDB
How ShareChat reduced costs for its billion-feature scale
feature store with ScyllaDB “Great system…now please
make it 10 times cheaper.” That’s not exactly what the
ShareChat team wanted to hear after completing a major engineering
feat:
scaling a real-time feature store 1000X without scaling their
database (ScyllaDB). To stretch from supporting 1 million
features per second to supporting 1 billion features per second,
the team already… Redesigned their database schema to store all
features together as protocol buffers and optimized tile
configurations (reduced required rows from 2B to 73M/sec) Switched
their database compaction strategy from incremental to leveled
(doubled database capacity) Forked caching and gRPC libraries to
eliminate mutex contention and connection bottlenecks Applied
object pooling and garbage collector tuning to reduce memory
allocation overhead You can read about those performance
optimizations in
Scaling an ML Feature Store From 1M to 1B Features per Second.
But ShareChat – an Indian leader in a globally competitive social
media market – is always looking to optimize. After reaching this
scalability milestone, the team received a follow-up challenge:
reducing the feature store’s costs by 10X (without compromising
performance, of course).
David Malinge (Sr. Staff
Software Engineer at ShareChat) and
Ivan Burmistrov (then
Principal Software Engineer at ShareChat) shared how they
approached this new challenge in a keynote at Monster Scale Summit
2025. Watch the complete talk, or read the highlights below.
Note: Monster Scale Summit is a free + virtual
conference on extreme scale engineering, with a focus on
data-intensive applications. Learn from luminaries like
antirez, creator of Redis;
Camille
Fournier, author of “The Manager’s Path” and “Platform
Engineering”;
Martin
Kleppmann, author of “Designing Data-Intensive Applications”
and more than 50 others, including engineers from Discord, Disney,
Pinterest, Rivian, LinkedIn, Nextdoor, and American Express.
Register (free) and join us March 11-12 for some lively chats
Background ShareChat is one of the largest Indian media network,
with over 300 million monthly active users. The app’s popularity
stems from its powerful recommendation system – and that’s powered
by their feature store. As Malinge explained, “We process more than
2 billion events daily to compute our features and serve close to a
billion features per second at peak, with P99 latency under 20
milliseconds. We also read more than 30 billion rows per day in
ScyllaDB, so we’re very happy that we’re not charged by the
transaction.” Cloud Cost Optimization: Where Do You Even Start?
When it came time to make this system 10 times cheaper, the team
very quickly realized how complicated and confusing cloud cost
optimization could be. For example, Burmistrov noted:
Cloud
billing is cloudy. AWS and GCP each have around 40,000
different SKUs, which makes it really hard to figure out where your
money’s actually going. And cloud providers aren’t exactly
motivated to make this easier for you to understand and debug.
It’s easy to lose track of little things that add
up. Maybe you forgot about an instance that’s still
running. Maybe there’s an old deployment nobody uses anymore. Or
perhaps you have storage buckets filled with data that should have
expired by now (e.g., via Time to Live [TTL]). It all accumulates
over time.
Your cost intuition from on-prem doesn’t
necessarily translate to the cloud. Something that was
cost-efficient in your own data center might be expensive in the
cloud, and what works well on one cloud provider might not be
cost-effective on another. For the first optimization step, the
team became sticklers for “hygiene”: clearing out anything they
didn’t really need. This involved a deep cleaning for forgotten
instances and deployments, unused buckets, data without proper
TTLs, and overprovisioning. Having proper attribution was critical
here. As Burmistrov explained, “Every workload, every instance, and
every resource must be attributed to a specific service and a
specific team. Without attribution, cost optimization is a global
problem. With attribution, it becomes a local problem. Once each
team can clearly see the costs associated with their own services,
optimization becomes much easier. The team that owns the service
has both the context and the incentive to improve its cost profile.
They can see which components are expensive and decide what to do
about them.” Cloud Trap 1: Getting Sucked Into Unsustainable Costs
Next, they shifted focus to challenges unique to running
applications in the cloud. To get the required cost savings, they
had to navigate around a few cloud traps. The first trap was the
allure and ease of using the cloud provider’s own products –
particularly, databases. “They’re great to start with, but at scale
they become painful in terms of cost,” explained Burmistrov.
Scaling can be particularly costly for solutions that use a metered
pricing model (e.g., based on traffic). “We used several GCP
databases, including Bigtable and Spanner, but they became
expensive and, more importantly, those costs were largely out of
our control, we had little leverage over cost,” continued
Burmistrov. “So we migrated many workloads to ScyllaDB. Today, more
than 30 ScyllaDB clusters are deployed across the organization,
including for our feature store use case. Besides stability and low
latency, ScyllaDB gives us strong cost control. We can choose
instance types, merge use cases into a single cluster, and run at
high utilization. ScyllaDB works well at 80%, 90%, even close to
100% utilization, which gives us a very strong cost profile.” Cloud
Trap 2: Kubernetes Wastage Next, the team tackled Kubernetes
wastage. In Kubernetes, you deploy containers into pods, but you
pay for nodes, which are actual VMs. You typically choose node
types upfront – for example, nodes with 4 CPUs and 16 GB of memory.
However, over time, workloads change, teams change, and these nodes
are no longer optimal. Pods cannot fully occupy the nodes, either
because of CPU, memory, or scheduling constraints. At that point,
you’re paying for resources that aren’t being used. “You may
decide, ‘Okay, let’s isolate workloads into separate node pools,’”
explained Burmistrov. “But it’s not a solution because, as shown in
the image below, we have one node completely wasted.” Simple
isolation just moves waste around. The solution: dynamic node
allocation based on workloads. Options include open source
solutions like Karpenter, cloud-specific ones like GKE Autopilot,
or commercial multi-cloud solutions like Cast AI. ShareChat likes
Cast AI because it lets them fine-tune the configuration with
respect to long-term commitments, spot instances, and other pricing
impacts. Cloud Trap 3: Network Costs (“The Cloud Tax”) And then
there’s network costs, which the ShareChat team calls “the cloud
tax.” Massive apps like ShareChat deploy across multiple zones for
high availability. However, multiple zones need to communicate with
one another, and the traffic between zones comes with its own cost:
Network Inter-Zone Egress (NIZE). That’s the outbound network
traffic between availability zones within the same region. It’s
billed separately by many cloud providers For example, if you
deploy ScyllaDB in three GCP zones with two nodes per zone and use
a non-ScyllaDB driver (not recommended), reads may cross zones both
at the client level and inside the cluster. For traffic of volume
T, you pay for T in NIZE. ScyllaDB drivers help here. As Burmistrov
explained, “ Using ScyllaDB’s token-aware routing removes the extra
hop inside the cluster, reducing inter-zone traffic. Using
token-aware plus zone-aware routing ensures reads stay within the
same zone, reducing inter-zone traffic to zero for reads.” Removing
the extra hop cuts NIZE down to about 2/3 of T. On the write path,
it’s a little different because replication is required. Burmistrov
continued, “With three zones, writes generate 2 times T of
inter-zone traffic. You can trade availability for cost by using
two zones instead of three, reducing this to 1.5 times T. ScyllaDB
also lets teams model each zone as a separate data center. In that
case, for traffic T, you pay for T of inter-zone traffic. You
generally can’t go lower unless you deploy in a single zone.” Tip:
Oracle Cloud and Azure don’t charge for inter-zone traffic. If you
use one of these cloud providers, you can deploy across three zones
with zero inter-zone cost. Database Layer Optimization Another
challenge: ShareChat’s generally stable read latencies would spike
beyond their SLAs when write-heavy workloads would occur (like a
backfill job, for example). The usual option – just scale up the
database – would be simple, but the team wanted to reduce costs.
Isolating reads and writes into separate data centers could
technically improve performance, but that would be even worse from
the cost perspective. That approach would double the infrastructure
and probably lead to underutilization. Ultimately, they discovered
and applied ScyllaDB’s workload prioritization. This lets them
control how their various workloads compete for system resources.
Malinge explained, “Under the hood, this relies on ScyllaDB’s
thread-per-core architecture. Each core runs an async scheduler
with multiple queues, each with its own priority. Workload
prioritization maps directly to these priorities. Looking at the
image, we have a very high share of compute for serving because we
have strong SLAs on the read path, which is user facing. Most of
the rest goes to less latency-sensitive workloads from asynchronous
jobs, like writing the features to the database. We also keep a
tiny share for manual queries (debugging, for example). This
enables us to have exactly what we want – different latencies for
reads and writes – and that allowed us to have the best possible
serving for our features.” Feature Serving Layer Optimization That
serving layer is a distributed gRPC (Google Remote Procedure Call)
service responsible for caching and handling consumer requests. The
team made some cost optimizations here too. One of the most
impactful was a clever shortcut: instead of fully deserializing
complex Protobuf messages, they treated repeated messages as
serialized byte blobs. Since Protobuf encodes embedded messages
using the same wire format as bytes, they could simply append these
serialized records to merge data. With that, they could skip the
heavy lift of fully unpacking and rebuilding the messages from
scratch. That optimization alone could be the subject of an entire
article; please see the video (starting at 18:15) for a detailed
walkthrough of their approach. Malinge’s top lesson learned from
optimizing this layer: “The first advice is the Captain Obvious
one: don’t go blindly. We set up continuous profiling very early
on. Nowadays, there are tons of tools available. Integration is
super straightforward. You might recognize the vanilla GCP profiler
in the image below. It’s just four lines of code to add in Go
programs, and this has really helped us on our quest to reduce
costs. We’ve actually unlocked more than a 50% reduction in compute
by doing optimizations guided by continuous profiling.” Another
lesson learned: if you’re using protobuf at high-RPS, serde is
probably burning your CPU. Even with ~95% cache hit rates, the hot
path was still deserializing protos, merging them, and serializing
them again just to stitch responses together. Most of that work was
pointless. The team ended up leveraging protobuf’s wire format,
where repeated fields are just sequences of records and embedded
messages can be treated as raw bytes (no deserialization). They
switched to this “lazy” deserialization and merged cached protos
without ever touching individual fields. For a practical example,
see this repository
david-sharechat/lazy-proto.
And one parting tip: the wins compound fast. Benchmarking showed
lazy merging was about 6 times faster, with about a third of the
allocations. In production, that meant lower compute bills and
better tail latency. Cost Optimization Actually Fostered Innovation
The team’s smart work here underscores that cost optimizations
don’t have to compromise the product. They can actually act as a
catalyst for better system design. Malinge left us with this: “Our
cost savings initiatives actually led to a lot of innovations. As I
tried to represent in the left quadrant, there is a very unhealthy
way to look at cost savings – one that focuses on shortcuts that
negatively impact your product. The key is to stay on the right
side of this quadrant, to aim for the sweet spot of reducing costs
while improving the product.”
3 March 2026, 1:32 pm by
ScyllaDB
Lessons learned on data modeling, cache optimization, and
hardware selection Agoda is the Singapore wing of
Booking Holdings, the
world’s leading provider of online travel (the brand behind
Booking.com, Kayak, Priceline, etc.). From January 2023 to February
2025, Agoda server traffic spiked by 50 times. That’s fantastic
business growth, but also the trigger for an interesting
engineering challenge. Specifically, the team had to determine how
to scale their ScyllaDB-backed online feature store to maintain
10ms P99 latencies despite this growth. Complicating the situation,
traffic was highly bursty, cache hit rates were unpredictable and
cold-cache scenarios could flood the database with duplicate read
requests in a matter of seconds. At Monster Scale Summit 2025,
Worakarn Isaratham, lead software engineer at Agoda,shared how they
tackled the challenge. You can watch his entire talk or read the
highlights below.
Note: Monster Scale Summit is a
free + virtual conference on extreme scale engineering, with a
focus on data-intensive applications. Learn from luminaries like
antirez, creator of Redis;
Camille
Fournier, author of “The Manager’s Path” and “Platform
Engineering”;
Martin
Kleppmann, author of “Designing Data-Intensive Applications”
and more than 50 others, including engineers from Discord, Disney,
Pinterest, Rivian, LinkedIn, Nextdoor, and American Express.
Register (free) and join us March 11-12 for some lively chats.
Background: A feature store powered by ScyllaDB and DragonflyDB
Agoda operates an in-house feature store that supports both offline
model training and online inference. For anyone not familiar with
feature stores, Isaratham provided a quick primer. A feature store
is a centralized repository designed for managing and serving
machine learning features. In the context of machine learning, a
feature is a measurable property or characteristic of a data point
used as input to models. The feature store helps manage features
across the entire machine learning pipeline — from data ingestion
to model training to inference. Feature stores are integral to
Agoda’s business. Isaratham explained: “We’re a digital travel
platform and some use cases are directly tied to our product. For
example, we try to predict what users want to see, which hotels to
recommend and what promotions to serve. On the more technical side,
we use it for things like bot detection. The model uses traffic
patterns to predict whether a user is a bot, and if so, we can
block or deprioritize requests. So the feature store is essential
for both product and engineering at Agoda. We’ve got tools to help
create feature ingestion pipelines, model training, and the focus
here: online feature serving.” One layer deeper into how it works:
“We’re currently serving about 3.5 million entities per second
(EPS) to our users. About half the features are served from cache
within the client SDK, which we provide in Scala and Python. That
means 1.7 million entities per second reach our application
servers. These are written in Rust, running in our internal
Kubernetes pods in our private cloud. From the app servers, we
first check if features exist in the cache. We use DragonflyDB as a
non-persistent centralized cache. If it’s not in the cache, then we
go to ScyllaDB, our source of truth.” ScyllaDB is a
high-performance database for workloads that require ultra-low
latency at scale. Agoda’s current ScyllaDB cluster is deployed as
six bare-metal nodes, replicated across four data centers. Under
steady-state conditions, ScyllaDB serves about 200K entities per
second across all data centers while meeting a service-level
agreement (SLA) of 10ms P99 latency. (In practice, their latencies
are typically even lower than their SLA requires.) Traffic growth
and bursty workloads However, it wasn’t always that smooth and
steady. Around mid-2023, they hit a major capacity problem when a
new user wanted to onboard to the Agoda feature store. Their
traffic pattern was super bursty: It was normally low, but
occasionally flooded them with requests triggered by external
signals. These were cold-cache scenarios, where the cache couldn’t
help. Isaratham shared, “Bursts reached 120K EPS, which was 12
times the normal load back then.” Request duplication exacerbated
the situation. Many identical requests arrived in quick succession.
Instead of one request populating the cache and subsequent requests
benefiting, all of them hit ScyllaDB at the same time — a classic
cache stampede. They also retried failed requests until they
succeeded – and that kept the pressure high. This load involved two
data centers. One slowed down but remained online. The other was
effectively taken out of service. More details from Worakarn: “On
the bad DC, error rates were high and retries took 40 minutes to
clear; on the good one, it only took a few minutes. Metrics showed
that ScyllaDB read latency spiked into seconds instead of
milliseconds.” Diagnosing the bottleneck So, they compared setups
and found the difference: the problematic data center used SATA
SSDs while the better one used NVMe SSDs. SATA (serial advanced
technology attachment) was already old tech, even then. The team’s
speed tests suggested that replacing the disks would yield a 10X
read performance boost – and better write rates too. The team
ordered new disks immediately. However, given that the disks
wouldn’t arrive for months, they had to figure out a survival
strategy until then. As Isaratham shared, “Capacity tests and
projections showed that we would hit limits within eight or nine
months even without new load – and sooner with it. So, we worked
with users to add more aggressive client-side caching, remove
unnecessary requests and smooth out bursts. That reduced the new
load from 120K to 7K EPS. That was enough to keep things stable,
but we were still close to the limit.” Surviving with SATA Given
the imminent capacity cap, the team brainstormed ways to improve
the situation while still on the existing SATA disks. Since you
have to measure before you can improve, getting a clean baseline
was the first order of business. “The earlier capacity numbers were
from real-world traffic, which included caching effects,” Isaratham
detailed. “We wanted to measure cold-cache performance directly.
So, we created artificial load using one-time-use test entities,
bypassed cache in queries and flushed caches before and after each
run. The baseline read capacity on the bad DC was 5K EPS.” With
that baseline set, the team considered a few different approaches.
Data modeling All features from all feature sets were stored in a
single table. The team hoped that splitting tables by feature set
might improve locality and reduce read amplification. It didn’t.
They were already partitioning by feature set and entity, so the
logical reorganization didn’t change the physical layout.
Compaction strategy Given a read-heavy workload with frequent
updates, ScyllaDB documentation recommends the size-tiered
compaction strategy to avoid write amplification. But the team was
most concerned about read latency, so they took a different path.
According to Worakarn: “We tried leveled compaction to reduce the
number of SSTables per read. Tests showed fetching 1KB of data
required reading 70KB from disk, so minimizing SSTable reads was
key. Switching to leveled compaction improved throughput by about
50%.” Larger SSTable summaries ScyllaDB uses summary files to more
efficiently navigate index files. Their size is controlled by the
sstable_summary_ratio setting. Increasing the ratio enlarges the
summary file and that reduces index reads at the cost of additional
memory. The team increased the ratio by 20 times, which boosted
capacity to 20K EPS. This yielded a nice 4X improvement, so they
rolled it out immediately. What a difference a disk makes Finally,
the NVMe disks arrived a few months later. This one change made a
massive difference. Capacity jumped to 300K EPS, a staggering
50–60X improvement. The team rolled out improvements in stages:
first, the summary ratio tweak (for 2–3X breathing room), then the
NVMe upgrade (for 50X capacity). They didn’t apply leveled
compaction in production because it only affects new tables and
would require migration. Anyway, NVMe already solved the problem.
After that, the team shifted focus to other areas: improving
caching, rewriting the application in Rust and adding cache
stampede prevention to reduce the load on ScyllaDB. They still
revisit ScyllaDB occasionally for experiments. A couple of
examples: New partitioning scheme: They tried partitioning by
feature set only and clustering by entity. However, performance was
actually worse, so they didn’t move forward with this idea. Data
remodeling: The application originally stored one row per feature.
Since all features for an entity are always read together, the team
tested storing all features in a single row instead. This improved
performance by 35%, but it requires a table migration. It’s on
their list of things to do later. Lessons learned Isaratham wrapped
it up as follows: “We’d been using ScyllaDB for years without
realizing its full potential, mainly because we hadn’t set it up
correctly. After upgrading disks, benchmarking and tuning data
models, we finally reached proper usage. Getting the basics right –
fast storage, knowing capacity, and matching data models to
workload – made all the difference. That’s how ScyllaDB helped us
achieve 50X scaling.”
2 March 2026, 1:47 pm by
ScyllaDB
From managed Google Cloud services to ScyllaDB, Redpanda,
Flink, and beyond The engineering team at ShareChat,
India’s largest social media platform, decided it was time to take
a hard look at the mounting costs of running their recommendation
system. The content deduplication service at its core was built on
Google Cloud’s managed stack: Bigtable, Dataflow, Pub/Sub. That
foundation helped ShareChat get up and running rapidly. However,
costs spiked as they scaled to 300 million users generating
billions of daily interactions. And the 40 ms P99 latency was not
ideal for customer-facing apps. If you’ve followed ScyllaDB for a
while, you probably know that the ShareChat engineers never shy
away from monster scale engineering challenges. For example:
Pulling off a
zero-downtime migration of massive clusters (65+ TB, over 1M
ops.sec) Using Kafka Streams for real-time, windowed
aggregation
of massive engagement events while keeping ScyllaDB writes
minimal and reads sub-millisecond Scaling their
ScyllaDB-based feature store 1000X without scaling the database
– and then
making it 10X
cheaper Spoiler: They pulled off yet another masterful
optimization with their deduplication service (90% cost reduction
while lowering P99 latency from 40 ms to 8 ms). And again, they
generously shared their strategies and lessons learned with the
community – this time at Monster Scale Summit 2025. Watch
Andrei
Manakov (Senior Staff Software Engineer at ShareChat) detail
the step-by-step optimizations, or read the highlights below.
Note: Monster Scale Summit is a free + virtual
conference on extreme scale engineering, with a focus on
data-intensive applications. Learn from leaders like antirez,
creator of Redis; Camille Fournier, author of “The Manager’s Path”
and “Platform Engineering”; Martin Kleppmann, author of “Designing
Data-Intensive Applications” and more than 50 others, including
engineers from Discord, Disney, Pinterest, Rivian, LinkedIn, and
Freshworks. ShareChat will be delivering
two talks this
year.
Register and join us March 11-12 for some lively chats.
Background ShareChat is India’s largest social media platform, with
more than 180 million monthly active users and 2.5 billion
interactions per month. Its sister product, Moj, is a short-video
app similar to TikTok. Moj has over 160 million monthly active
users and 6 billion video plays daily. Both apps rely on a smart
feed of personalized recommendations. Effective deduplication
within that feed is critical for engagement. If users keep seeing
the same items over and over again, they’re less likely to return.
The deduplication service filters out content users have
already seen before the ranking models ever score it. It’s a
standard part of the feed pipeline: fetch candidates from vector
databases, filter them, score and rank using ML models, then return
the top results. The system tracks two distinct concepts: Sent
posts and Viewed posts.
Sent posts are delivered to the
user’s device. They may or may not have been seen.
Viewed
posts are items that the user has actually seen.
Why both? Viewed posts are tracked on user devices and sent to
the backend periodically, passing through queues before being
persisted in the database. This creates a delay between the time
the event occurs and when it’s stored. That delay makes it possible
to send duplicate posts in sequential feed requests. Sent posts
solve this by being stored immediately before the response goes to
the client. That prevents duplicates from sneaking into subsequent
feed refreshes. So why not just use Sent posts? That would create a
different problem: “In a feed response, we send multiple posts at
once, but there’s no guarantee that the user will actually see all
of them,” Andrei explained. “If we rely only on Sent posts, we
might deduplicate relevant content that the user never saw…and that
can negatively affect feed quality.” The Initial Architecture:
Google Cloud, Bigtable, Dataflow, Pub/Sub + Redis The initial
architecture was built entirely on Google Cloud. View events flowed
into Google Pub/Sub. A Dataflow job aggregated these events per
user over time windows and wrote batches into Bigtable. Sent posts
lived in Redis as linked-list queues. A deduplication service
(written in Go and running on Kubernetes) combined both data
sources with feed logic and returned results to the client.
It worked. But it was expensive, and latency was a suboptimal
40 milliseconds. Moving to Flink, ScyllaDB, and Redpanda The first
optimization step was replacing Google’s managed services with
specialized, efficiency-oriented alternatives. Andrei shared, “We
replaced Google Pub/Sub with Redpanda, which is Kafka-compatible
and more efficient. We replaced Google Bigtable with ScyllaDB, and
Google Dataflow with Apache Flink. These changes alone brought
costs down to about 60% of the original level.”
Dataflow to Flink Dataflow was used for streaming jobs and
aggregation events. The team enjoyed having a fully managed service
with a simplified API that was nice for prototyping. But at scale,
it grew costly. Moving to Apache Flink meant they had to run and
maintain their own cluster. Still, it was worth it. They cut the
cost of streaming jobs 93%.
Bigtable to ScyllaDB On the database front, the team moved
from Bigtable to ScyllaDB after comparing their tradeoffs,
similarities, and differences. Andrei explained, “First of all,
both databases are optimized for heavy write workload. Also, we are
able to use a quite similar schema: partition by user ID, store all
posts in a single column.” But there were key differences.
Bigtable’s autoscaling was valuable given ShareChat’s traffic
variability (nighttime traffic drops to less than 10% of peak). At
that time, ScyllaDB did not support autoscaling, so ShareChat had
to provision clusters for peak load and keep additional buffers for
spikes. Bigtable also has advanced garbage collection rules that
allow limits on rows per partition and TTL definitions. ScyllaDB
supports this via TTL, and it’s possible to encounter large
partitions if a single user generates many events.
For ShareChat’s needs, ScyllaDB was ultimately the best fit.
Andrei explained, “ScyllaDB uses a shard-per-core architecture and
is highly optimized. It delivers better performance and lower
latency than Bigtable. Even a fully provisioned ScyllaDB cluster is
significantly cheaper than Bigtable with autoscaling.” One feature
proved particularly valuable:
workload prioritization. Per Andrei, “ScyllaDB also supports
workload prioritization, which is unique to ScyllaDB. It allows
resource-based isolation between different workloads. That’s quite
critical in our case. The read path directly affects user
experience, so we can’t allow any latency spikes there.” If there’s
ever any resource contention, workload prioritization can
prioritize their reads over writes. PubSub to Redpanda For the
event queue, the team also moved from PubSub to
Redpanda, which they found to be
reliable as well as cost-efficient. It’s a fully Kafka-compatible
product, so it was easy to integrate. Like ScyllaDB, it’s written
in C++ with a thread-per-core architecture (both are built on
Seastar),
Redpanda is highly efficient. In ShareChat’s use case, queue costs
were reduced by ~66%. Resource Allocation and Database Efficiency
With the new foundational infrastructure in place, the team turned
to resource optimization. First came Kubernetes resource tuning.
Andrei detailed, “We focused on three areas, starting with
Kubernetes resources. We adjusted HPA settings and increased target
CPU utilization, which resulted in higher CPU usage per pod.” That
change had a useful side effect: it exposed inefficiencies that had
been lurking at lower utilization levels, including garbage
collection pressure and mutex contention. The team fixed those
issues before moving on. Apache Flink required its own autoscaling
adjustments. A Flink job consists of tasks in an execution
pipeline, and the autoscaler monitors task busy time and adjusts
parallelism accordingly. But scaling events recreate the job and
temporarily halt processing. “With large pipelines, this can lead
to instability,” Andrei shared. “For some systems, we had to fork
the autoscaler, but for this application, it worked smoothly.” They
also discovered they could squeeze even more from their ScyllaDB
cluster, so they rightsized it: “We provisioned ScyllaDB clusters
targeting around 60% CPU utilization for ScyllaDB. But after
learning more, we realized that ScyllaDB is built to utilize all
available resources and prioritizes critical tasks over
non-critical ones. That means it’s easy to reach 100% CPU without
any system degradation by allocating the remaining CPU resources to
maintenance tasks. We started targeting 60% CPU utilization only
for critical tasks and we could reduce the cluster size even more
without any latency degradations.” The details on how they managed
this are explained in Andrei’s blog post,
ScyllaDB: How To Analyse Cluster Capacity. The combination of
those steps brought them down to 45% of the initial cost.
Questioning Assumptions: Retention Windows Example Next up:
domain-specific engineering improvements. “In theory, every system
should have a clear problem statement, defined metrics, and
well-understood use cases,” Andrei said. “In reality, systems often
contain unclear logic, unexplained parameters, and legacy
decisions. We decided to question these assumptions.” One example
that’s easy to explain without too much context is how they chose
the right window size parameters for their sliding window queries
The team focused on the retention window parameters (M and N) that
defined how much history to keep per user. Their posts are
partitioned by user ID and ordered by time. That creates a sliding
window. Andrei explained, “The question is how to choose the right
values. The answer is A/B testing.” So, they split users into
groups with different parameter configurations and measured the
impact.
It turned out that their previous windows were too large, so
they tightened them. The results per Andrei: “After running more
than 10 A/B tests, we reduced latency from 40 milliseconds to 8
milliseconds, were able to reduce the database size, and reduced
costs to 30% of the initial level.”
Storage Size Optimization through Delta Based Compression The
next optimization targeted storage size directly. ScyllaDB read
performance depends heavily on cache behavior. Smaller data size
means more rows fit into cache, leading to higher cache hit rates,
fewer disk reads, and lower CPU usage.
The team stores post IDs as integers and evaluated several
compression approaches: Bloom filters, Roaring Bitmaps, and
delta-based compression. Andrei’s rundown: “Bloom filters are
probabilistic and can produce false positives, which would cause
duplicates in the feed. Roaring Bitmaps performed poorly for our
data distribution. We chose delta-based compression.”
With that approach, they sort post IDs, compute deltas between
adjacent values, and encode them using Protocol Buffers, which
store only meaningful bits for integers. As a result, they reduced
storage by 60%, CPU usage by 25%, and overall cost to 20% of the
original.
Sent Post Optimization: Replacing Redis with ScyllaDB The next
optimization involved Sent posts. Previously, Viewed posts were
stored in ScyllaDB, while Sent posts were stored in Redis. But then
another a-ha moment. Andrei shared, “After compression and cluster
resizing, we realized that ScyllaDB was cheaper per gigabyte than
Redis. Since Viewed posts and Sent posts are conceptually similar
and differ mainly in retention limits, we moved Sent posts to
ScyllaDB. This reduced costs to 17% and also improved system
maintainability by eliminating one database.”
Cloud-Specific Cost Optimizations The final phase focused on cloud
provider-specific optimizations – more specifically, systematically
eliminating waste in resource allocation and network traffic. The
first target was compute provisioning. Every cloud provider has
different provision models for virtual machines. Spot instances can
be evicted at any time, but they are ideal for stateless workloads.
ShareChat used spot instances with on-demand nodes as backup. Tools
like
Cast AI helped them manage this
setup and significantly reduced costs across the organization. Next
was inter-zone traffic. Zones are separate data centers within a
region, and traffic between them is usually charged. The team cut
cross-zone traffic for microservices using follower fetching for
Kafka, token-aware and rack-aware routing for ScyllaDB, and
topology-aware routing in Kubernetes: For Kafka consumers, enabling
the follower fetching feature ensures the consumer connects to a
broker in the same zone. That eliminates inter-zone traffic between
the consumer and the queue. For ScyllaDB, combining token-aware and
rack-aware policies in the driver ensures the application connects
to a coordinator in the same zone. Since the coordinator will
actually have the data, there is less traffic inside the cluster.
For Kubernetes, topology-aware routing uses a best-effort approach,
so approximately 5–15% of inter-zone traffic remains. The team
built their own service mesh on
Envoy proxy to eliminate that
remaining traffic entirely.
The path to 90% cost savings Over the course of a year,
ShareChat’s team reduced infrastructure costs to 10% of the
original level while cutting latency from 40 milliseconds to 8
milliseconds. To recap how they got here:
Phase
Action taken Remaining cost
Optimize stack Migrated from GCP managed services to ScyllaDB,
Flink & Redpanda 60% Resource tuning Tuned Kubernetes HPA,
optimized Flink’s native autoscaling, and rightsized the ScyllaDB
cluster 45% Domain Logic A/B testing (e.g., tightening retention
sliding windows) 30% Data Efficiency Implemented delta-based
compression for Post IDs 20% Database consolidation Migrated Sent
posts from Redis to ScyllaDB 17% Cloud Ops Leveraged Spot instances
and eliminated inter-zone traffic 10% What’s next on the ShareChat
optimization agenda? Stop by
Monster Scale
Summit 2026 (free + virtual) to hear directly from Andrei.
24 February 2026, 1:37 pm by
ScyllaDB
Monster Scale Summit speakers have amassed a rather
impressive list of publications, including quite a few
books If you’ve seen the
Monster Scale
Summit agenda, you know that the stars have aligned nicely. In
just two half days, from anywhere you like, you can learn from 60+
outstanding speakers exploring extreme scale engineering challenges
from a variety of angles (distributed databases, real-time data,
AI, Rust…) If you read the bios of our speakers, you’ll note that
many have written books. This blog highlights 11 Monster Scale
reads. Once you register for the conference (it’s free + virtual),
you’ll gain 30-day full access to the complete O’Reilly library
(thanks to O’Reilly, a conference media sponsor). And Manning
Publications is also a media sponsor. They are offering the Monster
SCALE community a nice 45% discount on all Manning books.
Conference attendees who participate in the speaker chat will be
eligible to win print books (courtesy of O’Reilly) and eBook
bundles (courtesy of Manning).
March 4 update:
There’s also a special
Manning
offer (50% off!) for Monster Scale bundles. See the agenda and
register – it’s free Platform Engineering: A Guide for
Technical, Product, and People Leaders
By Camille
Fournier and Ian Nowland October 2024
Bookshop.org |
Amazon |
O’Reilly Until recently, infrastructure was the backbone of
organizations operating software they developed in-house. But now
that cloud vendors run the computers, companies can finally bring
the benefits of agile custom-centricity to their own developers.
Adding product management to infrastructure organizations is now
all the rage. But how’s that possible when infrastructure is still
the operational layer of the company? This practical book guides
engineers, managers, product managers, and leaders through the
shifts required to become a modern platform-led organization.
You’ll learn what platform engineering is “and isn’t” and what
benefits and value it brings to developers and teams. You’ll
understand what it means to approach your platform as a product and
learn some of the most common technical and managerial barriers to
success. With this book, you’ll: Cultivate a platform-as-product,
developer-centric mindset Learn what platform engineering teams are
and are not Start the process of adopting platform engineering
within your organization Discover what it takes to become a product
manager for a platform team Understand the challenges that emerge
when you scale platforms Automate processes and self-service
infrastructure to speed development and improve developer
experience Build out, hire, manage, and advocate for a platform
team
Camille is presenting “What Engineering Leaders Get Wrong
About Scale” Designing Data-Intensive Applications, 2nd
Edition
By
Martin Kleppmann and Chris Riccomini
Bookshop.org |
Amazon
| O’Reilly February 2026 Data is at the center of many
challenges in system design today. Difficult issues such as
scalability, consistency, reliability, efficiency, and
maintainability need to be resolved. In addition, there’s an
overwhelming variety of tools and analytical systems, including
relational databases, NoSQL datastores, plus data warehouses and
data lakes. What are the right choices for your application? How do
you make sense of all these buzzwords? In this second edition,
authors Martin Kleppmann and Chris Riccomini build on the
foundation laid in the acclaimed first edition, integrating new
technologies and emerging trends. You’ll be guided through the maze
of decisions and trade-offs involved in building a modern data
system, from choosing the right tools like Spark and Flink to
understanding the intricacies of data laws like the GDPR. Peer
under the hood of the systems you already use, and learn to use
them more effectively Make informed decisions by identifying the
strengths and weaknesses of different tools Navigate the trade-offs
around consistency, scalability, fault tolerance, and complexity
Understand the distributed systems research upon which modern
databases are built Peek behind the scenes of major online
services, and learn from their architectures
Martin and Chris
are featured in “Fireside Chat: Designing Data-Intensive
Applications, Second Edition” The Coder Cafe: 66
Timeless Concepts for Software Engineers
By
Teiva Harsanyi ETA Summer 2026
Manning (use
code SCALE2026 for 45%) Great software developers—even the
proverbial greybeards—get a little better every day by adding
knowledge and skills continuously. This new book invites you to
share a cup of coffee with senior Google engineer Teiva Harsanyi as
he shows you how to make your code more readable, use unit tests as
documentation, reduce latency, navigate complex systems, and more.
The Coder Cafe introduces 66 vital software engineering
concepts that will upgrade your day-to-day practice, regardless of
your skill level. You’ll find focused explanations—each five pages
or less—on everything from foundational data structures to
distributed architecture. These timeless concepts are the perfect
way to turn your coffee break into a high-impact career boost.
Generate property-based tests to expose hidden edge cases
automatically Explore the CAP and PACELC theorems to balance
consistency and availability trade-offs Design graceful-degradation
strategies to keep systems usable under failure Leverage Bloom
filters to perform fast, memory-efficient membership checks
Cultivate lateral thinking to uncover unconventional solutions
Teiva is presenting “Working on Complex Systems”
Think Distributed Systems
By Dominik Tornow August
2025
Bookshop.org |
Amazon |
Manning
(use code SCALE2026 for 45%) All modern software is distributed.
Let’s say that again—all modern software is distributed. Whether
you’re building mobile utilities, microservices, or massive cloud
native enterprise applications, creating efficient distributed
systems requires you to think differently about failure,
performance, network services, resource usage, latency, and much
more. This clearly-written book guides you into the mindset you’ll
need to design, develop, and deploy scalable and reliable
distributed systems. In
Think Distributed Systems you’ll
find a beautifully illustrated collection of mental models for:
Correctness, scalability, and reliability Failure tolerance,
detection, and mitigation Message processing Partitioning and
replication Consensus
Dominik is presenting “Systems
Engineering for Agentic Applications,” which is the topic of his
upcoming book Database Performance at
Scale
By Felipe Cardeneti Mendes, Piotr Sarna, Pavel
Emelyanov, and Cynthia Dunlop September 2023
Amazon |
ScyllaDB
(free book) Discover critical considerations and best practices for
improving database performance based on what has worked, and
failed, across thousands of teams and use cases in the field. This
book provides practical guidance for understanding the
database-related opportunities, trade-offs, and traps you might
encounter while trying to optimize data-intensive applications for
high throughput and low latency. Whether you’re building a new
system from the ground up or trying to optimize an existing use
case for increased demand, this book covers the essentials. The
ultimate goal of the book is to help you discover new ways to
optimize database performance for your team’s specific use cases,
requirements, and expectations. Understand often overlooked factors
that impact database performance at scale Recognize data-related
performance and scalability challenges associated with your project
Select a database architecture that’s suited to your workloads, use
cases, and requirements Avoid common mistakes that could impede
your long-term agility and growth Jumpstart teamwide adoption of
best practices for optimizing database performance at scale
Felipe is presenting “The Engineering Behind ScyllaDB’s
Efficiency” and will be hosting the ScyllaDB Lounge
Writing for Developers: Blogs That Get Read
By Piotr Sarna
and Cynthia Dunlop December 2025
Bookshop.org |
Amazon |
Manning (use
code SCALE2026 for 45%) Think about how many times you’ve read an
engineering blog that’s sparked a new idea, demystified a
technology, or saved you from going down a disastrous path. That’s
the power of a well-crafted technical article! This practical guide
shows you how to create content your fellow developers will love to
read and share.
Writing for Developers introduces seven
popular “patterns” for modern engineering blogs—such as “The Bug
Hunt”, “We Rewrote It in X”, and “How We Built It”—and helps you
match these patterns with your ideas. The book covers the entire
writing process, from brainstorming, planning, and revising, to
promoting your blog and leveraging it into further opportunities.
You’ll learn through detailed examples, methodical strategies, and
a “
punk
rock DIY attitude!”: Pinpoint topics that make intriguing posts
Apply popular blog post design patterns Rapidly plan, draft, and
optimize blog posts Make your content clearer and more convincing
to technical readers Tap AI for revision while avoiding misuses and
abuses Increase the impact of all your technical communications
This book features the work of numerous Monster Scale Summit
speakers, including Joran Greef and Sanchay Javeria. There’s also a
reference to Pat Helland in Bryan Cantrill’s foreword.
ScyllaDB in Action
Bo Ingram October 2024
Bookshop.org |
Amazon |
Manning
(use code SCALE2026 for 45%) |
ScyllaDB
(free chapters)
ScyllaDB in Action is your guide to
everything you need to know about ScyllaDB, from your very first
queries to running it in a production environment. It starts you
with the basics of creating, reading, and deleting data and expands
your knowledge from there. You’ll soon have mastered everything you
need to build, maintain, and run an effective and efficient
database. This book teaches you ScyllaDB the best way—through
hands-on examples. Dive into the node-based architecture of
ScyllaDB to understand how its distributed systems work, how you
can troubleshoot problems, and how you can constantly improve
performance. You’ll learn how to: Read, write, and delete data in
ScyllaDB Design database schemas for ScyllaDB Write performant
queries against ScyllaDB Connect and query a ScyllaDB cluster from
an application Configure, monitor, and operate ScyllaDB in
production
Bo’s colleagues Ethan Donowitz and Peter French are
presenting “How Discord Automates Database Operations at
Scale” Latency: Reduce Delay in Software Systems
By Pekka Enberg
Bookshop.org |
Amazon |
Manning (use code
SCALE2026 for 45%) Slow responses can kill good software. Whether
it’s recovering microseconds lost while routing messages on a
server or speeding up page loads that keep users waiting, finding
and fixing latency can be a frustrating part of your work as a
developer. This one-of-a-kind book shows you how to spot,
understand, and respond to latency wherever it appears in your
applications and infrastructure. This book balances theory with
practical implementations, turning academic research into useful
techniques you can apply to your projects. In Latency you’ll learn:
What latency is—and what it is not How to model and measure latency
Organizing your application data for low latency Making your code
run faster Hiding latency when you can’t reduce it
Pekka and
his Turso teammates are frequent speakers at Monster Scale Summit’s
sister conference, P99
CONF 100 Go Mistakes and How to Avoid Them
By Teiva
Harsanyi August 2022
Bookshop.org |
Amazon |
Manning (use code SCALE2026 for 45%)
100 Go Mistakes and
How to Avoid Them puts a spotlight on common errors in Go code
you might not even know you’re making. You’ll explore key areas of
the language such as concurrency, testing, data structures, and
more—and learn how to avoid and fix mistakes in your own projects.
As you go, you’ll navigate the tricky bits of handling JSON data
and HTTP services, discover best practices for Go code
organization, and learn how to use slices efficiently. This book
shows you how to: Dodge the most common mistakes made by Go
developers Structure and organize your Go application Handle data
and control structures efficiently Deal with errors in an idiomatic
manner Improve your concurrency skills Optimize your code Make your
application production-ready and improve testing quality
Teiva
is presenting “Working on Complex Systems” The
Manager’s Path: A Guide for Tech Leaders Navigating Growth and
Change
By
Camille Fournier May 2017
Bookshop.org |
Amazon |
O’Reilly Managing people is difficult wherever you work. But in
the tech industry, where management is also a technical discipline,
the learning curve can be brutal—especially when there are few
tools, texts, and frameworks to help you. In this practical guide,
author Camille Fournier (tech lead turned CTO) takes you through
each stage in the journey from engineer to technical manager. From
mentoring interns to working with senior staff, you’ll get
actionable advice for approaching various obstacles in your path.
This book is ideal whether you’re a new manager, a mentor, or a
more experienced leader looking for fresh advice. Pick up this book
and learn how to become a better manager and leader in your
organization. Begin by exploring what you expect from a manager
Understand what it takes to be a good mentor, and a good tech lead
Learn how to manage individual members while remaining focused on
the entire team Understand how to manage yourself and avoid common
pitfalls that challenge many leaders Manage multiple teams and
learn how to manage managers Learn how to build and bootstrap a
unifying culture in teams
Camille is presenting “What
Engineering Leaders Get Wrong About Scale” The Missing
README: A Guide for the New Software Engineer
By Chris
Riccomini and Dmitriy Ryaboy
Bookshop.org |
Amazon |
O’Reilly August 2021 For new software engineers, knowing how to
program is only half the battle. You’ll quickly find that many of
the skills and processes key to your success are not taught in any
school or bootcamp.
The Missing README fills in that gap—a
distillation of workplace lessons, best practices, and engineering
fundamentals that the authors have taught rookie developers at top
companies for more than a decade. Early chapters explain what to
expect when you begin your career at a company. The book’s middle
section expands your technical education, teaching you how to work
with existing codebases, address and prevent technical debt, write
production-grade software, manage dependencies, test effectively,
do code reviews, safely deploy software, design evolvable
architectures, and handle incidents when you’re on-call. Additional
chapters cover planning and interpersonal skills such as Agile
planning, working effectively with your manager, and growing to
senior levels and beyond. You’ll learn: How to use the legacy code
change algorithm, and leave code cleaner than you found it How to
write operable code with logging, metrics, configuration, and
defensive programming How to write deterministic tests, submit code
reviews, and give feedback on other people’s code The technical
design process, including experiments, problem definition,
documentation, and collaboration What to do when you are on-call,
and how to navigate production incidents Architectural techniques
that make code change easier Agile development practices like
sprint planning, stand-ups, and retrospectives
Chris is
featured in “Fireside Chat: Designing Data-Intensive Applications,
Second Edition” (with Martin Kleppmann)
23 February 2026, 2:30 pm by
ScyllaDB
Performance pitfalls when aiming for high-performance I/O
on modern hardware and cloud platforms Over the past two
blog posts in this series, we’ve explored real-world performance
investigations on storage-optimized instances with high-performance
NVMe RAID arrays.
Part 1 examined how the
Seastar
IO Scheduler‘s fair-queue issues inadvertently throttle
bandwidth to a fraction of the advertised instance capacity.
Part 2 dove into the filesystem layer, revealing how XFS’s
block size and alignment strategies could force read-modify-write
operations and cut a big chunk out of write throughput. This third
and final part synthesizes those findings into a practical
performance checklist. Whether you’re optimizing ScyllaDB, building
your own database system, or simply trying to understand why your
storage isn’t delivering the advertised performance, understanding
these three interconnected layers – disk, filesystem, and
application – is essential. Each layer has its own assumptions of
what constitutes an optimal request. When these expectations
misalign, the consequences cascade down, amplifying latency and
degrading throughput. This post presents a set of delicate pitfalls
we’ve encountered, organized by layer. Each includes concrete
examples from production investigations as well as actionable
mitigation strategies. Mind the Disk — Block Size and Alignment
Matter Physical and Logical sector sizes Modern SSDs, particularly
high-performance NVMe drives, expose two critical properties:
physical sector size and logical sector size. The logical sector
size is what the operating system sees. It typically defaults to
512 bytes for backward compatibility, though many modern drives are
also capable of reporting 4K. The physical sector size reflects how
data is actually stored on the flash chips. It’s the size at which
the drive delivers peak performance. When you submit a write
request that doesn’t align to the physical sector size, the SSD
controller must: Read the entire physical page containing your data
Modify the portion you’re writing to Write the entire sector back
to the empty/erased flash page In our investigation of AWS i7i
instances, we were bitten by exactly this problem. First,
IOTune, our disk benchmarking tool, was using 512 byte requests
to run the benchmarks because the firmware in these new NVMes was
incorrectly reporting the physical sector size as 512 bytes when it
was actually 4 KB. This made us measure disks as having up to 25%
less read IOPS and up to 42% less write IOPS. The measurements were
used to configure the IO Scheduler, so we ended up using the disks
like they were a less performant model. That’s a very good way for
your business to lose money. 🙂 If you’re wondering why I only
explained the reasoning for slow writes with requests unaligned to
the physical sector size, it’s because we still don’t fully
understand why read requests are also hit by this issue. It’s an
open question and we’re still researching some leads (which I hope
will get us an answer). Sustained performance If your software
relies on a disk hitting a certain performance number, try to
account for the fact that even dedicated provisioned NVMes have
peak and sustained performance values. It’s well known that elastic
storage (like EBS, for instance) has baseline performance and peak
performance, but it’s less intuitive for dedicated NVMes to behave
like this. Measuring disk performance with 10m+ workloads might
result in 3-5% lower IOPS/throughput. That allows your software to
better predict how the disk behaves under sustained load. Mind the
RAID If you’re using a RAID0 for NVMe arrays, be aware that your
app’s parallel architecture might resonate with the way requests
end up distributed over the RAID array. RAIDs are made out of
chunks and stripes. A chunk is a block of data within a single
disk; when the chunk size is exceeded, the driver moves on to the
next disk in the array. A stripe contains all the chunks, one from
each disk that will get written sequentially. A RAID0 with 2 disks
will get written like this: stripe0: chunk0 (disk0), chunk1
(disk1); stripe1; stripe2… Filesystems usually align files to the
RAID stripe boundary. Depending on the write pattern of your
application, you could end up stressing some of the disks more, and
not leveraging the entire power of the RAID array. Key Takeaways
for the Disk Layer
Detection: Always verify
physical and logical sector sizes if you suspect your issue might
be related to this. Don’t blindly trust firmware-reported values;
cross-check with benchmarking tools and adjust your app and
filesystem to use the physical sector size if possible.
Measurement discipline: Increase measurement time
when benchmarking disks. Even dedicated NVMes can have baseline vs.
peak performance.
RAID awareness: RAID
architecture is made out of blocks, addresses, and drivers managing
them. It’s not a magic endpoint that will just amplify your
N-drives array into N times the performance of a disk. Its
architecture has its own set of assumptions and limitations which,
together with the filesystem’s own limitations and assumptions,
might interfere with your app’s. Mind the Filesystem — Block Size,
Alignment, and Metadata Operations Filesystem Block Size and
Request Alignment Every filesystem has its own block size,
independent of the disk’s physical sector size. XFS, for instance,
can be formatted with block sizes ranging from 512 bytes to 64 KB.
In ScyllaDB, we used to format with 1K block sizes because we
wanted the fastest commitlog writes >= 1K. For older SSDs, the
physical sector size was 512 bytes. On modern 4K-optimized NVMe
arrays, this choice became a liability. We realized that 4K block
sizes would bring us lots of extra write throughput. This
filesystem-level block size affects two critical aspects: how data
is stored and aligned on disk, and how metadata is laid out. Here’s
a concrete example. When Seastar issues 128 KB sequential writes to
a file on 1K-block XFS, the filesystem doesn’t seem to align these
to 4K boundaries (maybe because the SSD firmware reported a
physical sector size of 512 bytes). Using blktrace to inspect the
actual disk I/O, we observed that approximately 75% of requests
aligned to 1K or 2K boundaries. For these requests, the drive
controller would split them each into at most 3 parts: a head, a
4K-aligned core, and a tail. For the head and tail, the disk would
perform RMW, which is very slow. That would become the dominating
factor for the entire request (consisting of the 3 parts).
Reformatting the filesystem with 4K block size completely
transformed the alignment distribution of requests, and 100% of
them aligned to 4K. This brought a lot of throughput back for us.
Filesystem Metadata Operations and Blocking Consider this: when a
file grows, XFS must:
Allocate extents from the
freespace tree, requiring B-Tree modifications and mutex locks
Update inode metadata to reflect the new file size
Flush metadata periodically to ensure durability
Update access/change times (ctimes) on every write
Each of these operations can block subsequent I/O submissions. In
our research, we discovered that the RWF_NOWAIT flag (requesting
non-blocking async I/O submission) was insufficiently effective
when metadata operations were queued. Writes would be re-submitted
from worker threads rather than the Seastar reactor, adding
context-switch overhead and latency spikes. When the final size of
files is known, it is beneficial to pre-allocate or pre-truncate
the file to that size using functions like fallocate() or
ftruncate(). This practice dramatically improves the alignment
distribution across the file and helps to amortize the overhead
associated with extent allocation and metadata updates. While
effective, fallocate() can be an expensive operation. It can
potentially impact latency, especially if the allocation groups are
already busy. Truncation is significantly cheaper; this alone can
offer substantial benefits. Another helpful technique is using
approaches like Seastar’s sloppy_size=true, where a file’s size is
doubled via truncation whenever the current limit is reached. Key
Takeaways for the Filesystem Layer
Format
Correctly: Format XFS (or other filesystems) with block
sizes matching your SSD physical sector size if possible. Most
modern NVMe drives are 4K-optimized. Go lower only if there are
strong restrictions – like inability to read 4K aligned or
potential read amplification or you have benchmarks showing better
performance for the disk used with a smaller request size.
Pre-allocation: When file sizes are known,
pre-truncate or pre-allocate files to their final size using
`fallocate()` or truncation. This amortizes extent allocation
overhead and ensures uniform alignment across the file.
Metadata Flushing: Understand the filesystem’s
metadata update behavior. File sizes and access time updates are
expensive. If you’re doing AIO, use RWF_NOWAIT if possible and make
sure it works by tracing the calls with `strace`. Data deletion has
also proved to have very expensive side effects. However, on older
generation NVMes, TRIM requests that accumulated in the filesystem
would flush all at once, overloading the SSD controller and causing
huge latency spikes for the apps running on that machine. Mind the
Application — Parallelism and Request Size Strategy Parallelism
tuning Modern NVMe storage devices can handle thousands of requests
in flight simultaneously. Factors like the internal queue depth and
the number of outstanding requests the device can accept determine
the maximum achievable bandwidth and latency when properly loaded.
However, application-level concurrency (threads, fibers, async
tasks) must be sufficient to keep these queues full. Generally, the
bandwidth vs. latency dependency is defined by two parts. If you
measure latency and bandwidth while you increase the app
parallelism, the latency is constant and the bandwidth grows – as
long as the internal disk parallelism is not saturated. Once the
disk is throughput loaded, the bandwidth stops growing or grows
very little, while the latency scales almost linearly with the
input increase. The relationship between throughput, latency, and
queue depth follows Little’s Law:
Average Queue Depth =
Throughput * Average Latency For a device delivering 14.5 GB/s
with 128K requests, the number of requests/second the device can
handle is 14.5GB/s divided by 128K – so roughly 113k req/s. If an
individual request latency is, for example, 1ms, the device queue
needs at least 113 outstanding requests to sustain 14.5 GB/s. This
has practical implications: if you tune your application for, say,
40 concurrent requests and later upgrade to a faster device, you’ll
need to increase concurrency or you’ll under-utilize the new
device. Conversely, if you over-provision concurrency, you risk
queue saturation and latency spikes because the SSD controllers
only have so much compute power themselves. In ScyllaDB,
concurrency is expressed as the number of shards (1 per CPU core)
and fibers submitting I/O in parallel. Because we want the database
to perform exceptionally even in mixed workloads, we’re using
Seastar’s IO Scheduler to modulate the number of requests we send
to storage. The Scheduler is configured with a latency goal that it
needs to follow and with the iops/bandwidth the disk can handle at
peak performance. In real workloads, it’s very difficult to match
the load with a static execution model you’ve built yourself, even
with thorough benchmarking and testing. Hardware performance
varies, and read/write patterns are usually surprising. Your best
bet is to use an IO scheduler built to leverage the maximum
potential of the hardware while still obeying a latency limit
you’ve set. Request size strategy Many small I/O requests often
fail to saturate an NVMe drive’s throughput because bandwidth is
the product of IOPS and request size, and small requests hit an
IOPS/latency/CPU ceiling before they hit the PCIe/NAND bandwidth
ceiling. For a given request size, bandwidth is essentially IOPS x
Request size. So, for instance, 4K I/Os would need extremely high
IOPS to reach GB/s-class throughput. A concrete example: 350k IOPS
(which is typical for an AWS i7i.4xlarge, for instance) at 4K
request size will get you ~1.3GB/s (i7i.4xlarge can do 3.5GB/s
easily). This looks slow in throughput terms even though the device
might be doing exactly what the workload allows. NVMe devices reach
peak bandwidth when they have enough outstanding requests (queue
depth) to keep the many internal flash channels busy. If your
workload issues many small requests but mostly one-at-a-time (or
only a couple in flight), the drive spends time waiting between
completions instead of streaming data. As a result, bandwidth stays
low. Another important aspect is that with small I/O sizes, the
fixed per-I/O costs (syscalls, scheduling, interrupts, copying,
etc.) can dominate the latency. That means the kernel/filesystem
path becomes the limiter even before the NVMe hardware does. This
is one reason why real applications often can’t reproduce vendors’
peak numbers. In practice, you usually need high concurrency
(multiple threads/fibers/jobs) and true async I/O to maintain a
deep in-flight queue and approach device limits. But, to repeat one
idea from the section above, also remember that the storage
controller itself has a limited compute capacity. Overloading it
results in higher latencies for both read and write paths. It is
important to find the right request size for your application’s
workload. Go as big as your latency expectations will allow. If
your workload is not very predictable, keep in mind that you’ll
most likely need some dynamic logic that adjusts the I/O patterns
so you can squeeze every last drop of performance out of that
expensive storage. Key Takeaways for the Application Layer
Parallelism tuning: Benchmark your specific
hardware and workload to find the optimal fiber/thread/shard count.
Watch the latency. While you increase the parallelism, you’ll
notice throughput increasing. At some point, throughput will
plateau and latency will start to increase. That’s the sweet spot
you’re looking for.
Request size strategy: Picking
the right request size is important. Seastar defaults to 128K for
sequential writes, which works well on most modern storage (but
validate it via benchmarking). If your device prefers larger or
smaller requests for throughput, the cost is latency – so design
your workload accordingly. In practice, we’ve never seen SSDs that
cannot be saturated for throughput with 128K requests and ScyllaDB
can achieve sub-millisecond latencies with this request size.
Conclusion The path from application buffer to persistent storage
on modern hardware is complex. Often, performance issues are
counterintuitive and very difficult to track down. A 1 KB
filesystem block size doesn’t “save space” on 4K-optimized SSDs; it
wastes throughput by forcing read-modify-write operations. A
perfectly tuned IO Scheduler can still throttle requests if given
incorrect disk properties. Sufficient parallelism doesn’t guarantee
high throughput if request sizes are too small to fill device
queues. By minding the disk, the filesystem, and the application –
and by understanding how they interact – you can fully take
advantage of modern storage hardware and build systems that
reliably deliver the performance that the storage vendor
advertised.
19 February 2026, 2:19 pm by
ScyllaDB
At ScyllaDB’s Monster SCALE Summit 25, Hightower shared why
facing scale the hard way is the right way “There’s a
saying I like: ‘Some people have good ideas. Some ideas have
people.’ When your idea outlives you, that’s success.” – Kelsey
Hightower Kelsey Hightower’s career is a perfect example of
that. His ideas have taken on a life of their own, extending far
beyond his work at
Puppet,
CoreOS, to
KubeCon
and Google. And he continues to scale his impact with his signature
unscripted keynotes as well as the definitive book, “
Kubernetes
Up and Running.” We were thrilled that Hightower joined
ScyllaDB’s Monster SCALE Summit to share his experiences and advice
on engineering at scale. And to scale his insights beyond those who
joined us for the live keynote, we’re capturing some of the most
memorable moments and takeaways here.
Note: Monster SCALE
Summit 2026 will go live March 11-12, featuring antirez, creator of
Redis; Camille Fournier, author of “The Manager’s Path” and
“Platform Engineering”; Martin Kleppmann, author of “Designing
Data-Intensive Applications”) and more than 50 others. The event is
free and virtual; register and join the community for some lively
chats. Monster Scale
Summit 2026 – Free + Virtual Fail Before You Scale The
interview with host Tim Koopmans began with a pointed warning for
attendees of a conference that’s all about scaling: Don’t just
chase scale because you’re fascinated by others’ scaling strategies
and achievements. You need to really experience some pain
personally first. As Hightower put it: “A lot of people go see a
conference talk — I’m probably guilty of this myself — and then try
to ‘do scale things’ before they even have experience with what
they’re doing. Back at Puppet Labs, lots of people wrote shell
scripts with bad error handling. Things went awry when conditions
weren’t perfect. Then they moved on to configuration management,
and those who made that journey could understand the trade-offs.
Those who started directly with Puppet often didn’t.” “Be sure you
have a reason,” he said “So before you over-optimize for a problem
you may not even have, you should ask: ‘How bad is it really? Where
are the metrics proving that you need a more scalable solution?’
Sometimes you can do nothing and just wait for scaling to become
the default option.” Ultimately, you should hope for the “good
problem” where increasing demand causes you to hit the limits of
your tech, he said. That’s much better than having few customers
and over-engineering for problems you don’t have. Make the Best
Choice You Can … For Now The conversation shifted to what level of
scale teams should target with their initial tech stack and each
subsequent iteration. Should you optimize for a future state that
hasn’t happened yet? Play it safe in case the market changes? “If
you’re not sure whether you’re on the right stack … I promise you,
it’s going to change,” Hightower said. “Make the best choice you
can for now. You can spend all year optimizing for ‘the best
thing,’ but it may not be the best thing 10 years from now. Say you
pick a database, go all in, learn best practices. But put a little
footnote in your design doc: ‘Here’s how we’d change this.’
Estimate the switching cost. If you do that, you won’t get stuck in
sunk cost fallacy later.” Rather than trying to predict the future,
think about how to avoid getting trapped. You don’t want
dependencies or extensions to limit your ability to migrate when
it’s time to take a better (or just different) path. “Change isn’t
failure,” he emphasized. “Plan for it; don’t fear it.” In
Hightower’s view, scaling decisions should start on a whiteboard,
not in code. “When I was at Google, we’d do technical whiteboard
sessions. Draw a line — that’s time. “Today, we’re here. Our
platform allows us to do these things. Is that good or bad?” Then
draw ahead: “Where do we want to be in two years?” He continued,
“Usually that’s driven by teams and customer needs. You can’t do
everything at once. So plot milestones — six months, a year, etc.
You can push things out in time for when new libraries or tools
arrive. If something new shows up that gets you two years ahead
instantly, great. Having a timeline gives you freedom without guilt
that you can’t ship everything today.” Are You Really Prepared For
a 747? Following up on the Google thread, Koopmans asked, “I’d love
to hear practical ways Google avoids over-engineering when
designing for scale.” To illustrate why “Google-scale” solutions
don’t always fit everyone else, Hightower used a memorable analogy:
“I had a customer once say, ‘We want BigQuery on-prem.’ I said,
‘You do? Really? OK, how much money do you have?’ And it was one of
those companies that had plenty of capital, so that wasn’t the
issue. I told them, ‘That would be like going to the airport,
looking out the window, seeing a brand-new 747 and telling the
airline that you want that plane. Even if they let you buy it, you
don’t have a pilot’s license, you don’t know how to fuel it. Where
are you going to park it? Are you going to drive it down your
subdivision, decapitating the roofs of your neighbors’ houses?”
Some things just aren’t meant for everyone.” Ultimately, whether
it’s over-engineering or not depends on the target user. Understand
who they are, how they work and what tools they use, then build
with that in mind. Hightower also warned against treating “best
practices” as universal truths: “One question that most customers
show up with is, ‘What are the best practices?’ Not necessarily the
best practices for me. They just want to know what everyone else is
doing. I think that might be another anti-pattern in the mix, where
you only care about what everyone else is doing and you don’t bring
the necessary context for a good recommendation.” How Leaders
Should Think About Dev Tooling “Serializing engineering culture”
(Hightower’s phrase) like Google did with its google3 monorepo
makes it simple for thousands of new engineers to join the team and
start contributing almost instantly. During his tenure at Google,
everything from gRPC to deployment tools was integrated. Engineers
just opened a browser, added code and reviews would start
automatically. However, there’s a fine line between serializing and
stifling. Hightower believes that prohibiting engineers from even
installing Python on their laptops, for example, is overkill:
“That’s like telling Picasso he can’t use his favorite brush.” He
continued: “Everyone works differently. As a leader, learn what
tools people actually use and promote sharing. Have engineers show
their workflows — the shortcuts, setups and plugins that make them
productive. That’s where creativity and speed come from. Share the
nuance. Most people think their tricks are too small to matter, but
they do. I want to see your dotfiles! You’ll inspire others.” Watch
the Complete Talk As Hightower noted, “Some people have good ideas.
Some ideas have people.” His approach to scale – pragmatic,
context-driven and human – shows why some ideas really do outlive
the people who created them. You can see the full talk below. Fun
fact: it was truly an unscripted interview – Hightower insisted!
The team met him in the hotel lobby that morning, chatted a bit
during a coffee run, prepped the camera angles … and suddenly
Hightower and Koopmans were broadcasting to 20,000 attendees around
the world.