Scylla Summit 2019
I've had the pleasure to attend again and present at the Scylla Summit in San Francisco and the honor to be awarded the...
Scylla: four ways to optimize your disk space consumption
We recently had to face free disk space outages on some of our scylla clusters and we learnt some very interesting things while outlining some improvements t...
Scylla Summit 2018 write-up
It's been almost one month since I had the chance to attend and speak at Scylla Summit 2018 so I'm reliev...
Authenticating and connecting to a SSL enabled Scylla cluster using Spark 2
This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the...
A botspot story
I felt like sharing a recent story that allowed us identify a bot in a haystack thanks to Scylla.
...
Evaluating ScyllaDB for production 2/2
In my previous blog post, I shared [7 lessons on our experience in evaluating Scylla](https://www.ultrabug.fr...
Evaluating ScyllaDB for production 1/2
I have recently been conducting a quite deep evaluation of ScyllaDB to find out if we could benefit from this database in some of...
Enhance Apache Cassandra Logging
Cassandra usually output all its logs in a system.log file. It
uses log4j old 1.2
version for cassandra 2.0, and since
2.1, logback, which of
course use different syntax :)
Logs can be enhanced with some configuration. These explanations
works with Cassandra 2.0.x and Cassandra 2.1.x, I haven’t tested
others versions yet.
I wanted to split logs in different files, depending on their “sources” (repair, compaction, tombstones etc), to ease debugging, while keeping the system.log as usual.
For example, to declare 2 new files to handle, say Repair and Tombstones logs :
Cassandra 2.0 :
You need to declare each new log files in log4j-server.properties file.
Cassandra 2.1 :
It is in the logback.xml file.
Now that theses new files are declared, we need to fill them with logs. To do that, simply redirect some Java class to the good file. To redirect the class org.apache.cassandra.db.filter.SliceQueryFilter, loglevel WARN to the Tombstone file, simply add :
Cassandra 2.0 :
Cassandra 2.1 :
It’s a on-the-fly configuration, so no need to restart Cassandra
!
Now you will have dedicated files for each kind of logs.
A list of interesting Cassandra classes :
You can find from which java class a log message come from by adding “%c” in log4j/logback “ConversionPattern” :
You can disable “additivity” (i.e avoid adding messages in system.log for example) in log4j for a specific class by adding :
For logback, you can add additivity=”false” to <logger .../> elements.
To migrate from log4j logs to logback.xml, you can look at http://logback.qos.ch/translator/
Sources :
- http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLoggingLevels_r.html
- http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configLoggingLevels_t.html
- https://logging.apache.org/log4j/1.2/manual.html
- http://logback.qos.ch/manual/appenders.html
Note: you can add http://blog.alteroot.org/feed.cassandra.xml to your rss aggregator to follow all my Cassandra posts :)
How to change Cassandra compaction strategy on a production cluster
I’ll talk about changing Cassandra CompactionStrategy on a live
production Cluster.
First of all, an extract of the
Cassandra documentation :
Periodic compaction is essential to a healthy Cassandra database because Cassandra does not insert/update in place. As inserts/updates occur, instead of overwriting the rows, Cassandra writes a new timestamped version of the inserted or updated data in another SSTable. Cassandra manages the accumulation of SSTables on disk using compaction. Cassandra also does not delete in place because the SSTable is immutable. Instead, Cassandra marks data to be deleted using a tombstone.
By default, Cassandra use SizeTieredCompactionStrategyi (STC). This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, 4 by default.
Another compaction strategy available since
Cassandra 1.0 is
LeveledCompactionStrategy (LCS) based on LevelDB.
Since 2.0.11, DateTieredCompactionStrategy
is also available.
Depending on your needs, you may need to change the compaction strategy on a running cluster. Change this setting involves rewrite ALL sstables to the new strategy, which may take long time and can be cpu / i/o intensive.
I needed to change the compaction strategy on my production
cluster to LeveledCompactionStrategy because of our workflow : lot
of updates and deletes, wide rows etc.
Moreover, with the default STC, progressively the largest SSTable
that is created will not be compacted until the amount of actual
data increases four-fold. So it can take long time before old data
are really deleted !
Note: You can test a new compactionStrategy on one new node with the write_survey bootstrap option. See the datastax blogpost about it.
The basic procedure to change the CompactionStrategy is to alter the table via cql :
If you run alter table to change to LCS like that, all nodes will recompact data at the same time, so performances problems can occurs for hours/days…
A better solution is to migrate nodes by nodes !
You need to change the compaction locally on-the-fly, via the
JMX, like in write_survey mode.
I use jmxterm
for that. I think I’ll write articles about all theses jmx things
:)
For example, to change to LCS on mytable table with
jmxterm :
A nice one-liner :
On next commitlog flush, the node will start it compaction to rewrite all it mytable sstables to the new strategy.
You can see the progression with nodetool :
You need to wait for the node to recompact all it sstables, then
change the strategy to instance2, etc.
The transition will be done in multiple compactions if you have
lots of data. By default new sstables will be 160MB large.
you can monitor you table with nodetool cfstats too :
You can see the 31/4 : it means that there is 31 sstables in L0, whereas cassandra try to have only 4 in L0.
Taken from the code ( src/java/org/apache/cassandra/db/compaction/LeveledManifest.java )
When all nodes have the new strategy, let’s go for the global alter table. /!\ If a node restart before the final alter table, it will recompact to default strategy (SizeTiered)!
Et voilà, I hope this article will help you :)
My latest Cassandra blogpost was one year ago… I have several in mind (jmx things !) so stay tuned !
Replace a dead node in Cassandra
Note (June 2020): this article is old and not really revelant
anymore. If you use a modern version of cassandra, look at
-Dcassandra.replace_address_first_boot
option !
I want to share some tips about my experimentations with Cassandra (version 2.0.x).
I found some documentations on datastax website about replacing a dead node, but it is not suitable for our needs, because in case of hardware crash, we will set up a new node with exactly the same IP (replace “in place”). Update : the documentation in now up to date on datastax !
If you try to start the new node with the same IP, cassandra doesn’t start with :
java.lang.RuntimeException: A node with address /10.20.10.2 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
So, we need to use the “cassandra.replace_address” directive (which is not really documented ? :() See this commit and this bug report, available since 1.2.11/2.0.0, it’s an easier solution and it works.
+ - New replace_address to supplant the (now removed) replace_token and
+ replace_node workflows to replace a dead node in place. Works like the
+ old options, but takes the IP address of the node to be replaced.
It’s a JVM directive, so we can add it at the end of /etc/cassandra/cassandra-env.sh (debian package), for example:
JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=10.20.10.2"
Of course, 10.20.10.2 = ip of your dead/new node.
Now, start cassandra, and in logs you will see :
INFO [main] 2014-03-10 14:58:17,804 StorageService.java (line 941) JOINING: schema complete, ready to bootstrap
INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: waiting for pending range calculation
INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: calculation complete, ready to bootstrap
INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: Replacing a node with token(s): [...]
[...]
INFO [main] 2014-03-10 14:58:17,844 StorageService.java (line 941) JOINING: Starting to bootstrap...
INFO [main] 2014-03-10 14:58:18,551 StreamResultFuture.java (line 82) [Stream #effef960-6efe-11e3-9a75-3f94ec5476e9] Executing streaming plan for Bootstrap
Node is in boostraping mode and will retrieve data from cluster.
This may take lots of time.
If the node is a seed node, a warning will indicate that the node
did not auto bootstrap. This is normal, you need to run a
nodetool repair on the node.
On the new node :
# nodetools netstats
Mode: JOINING
Bootstrap effef960-6efe-11e3-9a75-3f94ec5476e9
/10.20.10.1
Receiving 102 files, 17467071157 bytes total
[...]
After some time, you will see some informations on logs !
On the new node :
INFO [STREAM-IN-/10.20.10.1] 2014-03-10 15:15:40,363 StreamResultFuture.java (line 215) [Stream #effef960-6efe-11e3-9a75-3f94ec5476e9] All sessions completed
INFO [main] 2014-03-10 15:15:40,366 StorageService.java (line 970) Bootstrap completed! for the tokens [...]
[...]
INFO [main] 2014-03-10 15:15:40,412 StorageService.java (line 1371) Node /10.20.10.2 state jump to normal
WARN [main] 2014-03-10 15:15:40,413 StorageService.java (line 1378) Not updating token metadata for /10.20.30.51 because I am replacing it
INFO [main] 2014-03-10 15:15:40,419 StorageService.java (line 821) Startup completed! Now serving reads.
And on other nodes :
INFO [GossipStage:1] 2014-03-10 15:15:40,625 StorageService.java (line 1371) Node /10.20.10.2 state jump to normal
Et voilà, dead node has been replaced !
Don’t forget to REMOVE modifications on
cassandra-env.sh after the complete bootstrap !
Enjoy !