Cassandra with Node.js and Arduino

Intro

This post continues where this post stopped. The Cassandra setup used for this post is more or less the same so please read this post if you are interested in cassandra setup before continuing with the rest of the post.

Arduino

Learning big data stuff is most exciting when the data represents something from the real world and not something generated with a help of big loop and then randomized data in it. To create data for this example I've used the following components:

  1. arduino uno
  2. Photoresistor GL5528 LDR
  3. 10K OHM NTC Thermistor 5mm
  4. 2x 10k resistor
  5. Protoboard
  6. Wires
Couple of this inexpensive components combined with arduino give us a nice big data sensor / generator. Now it might not seem that complicated but sampling any data at a one second level will hit on the cassandra limitations after one month of sampling if not done right, so having a simple arduino setup is fun and motivating way to tackle learning cassandra stuff. For now let's concentrate on the arduino part. The wiring is shown here:


The Arduino sketch will be on the gitHub, so we'll concentrate on the important parts. The light level in this example is read at analog 0. Reading analog values in arduino results in values ranging from 0-1023. We'll define light level as a mapping from 0-1023 into 0-100. Arduino already has a built in function for this called map. Also, I had some trouble in my initial experiments with Arduino serial communication and reading pin values. The data written to the serial port simply got corrupted after a while. I've read a couple of forums on this subject and found out that it actually helps when one delays execution after reading a pin value for 1ms. Also to keep the things as stable as possible we'll pause the execution for 1 second after writing to serial port as shown here:

  int light = map(analogRead(0), 0, 1023, 0, 100);
  delay(1);

    ....

  sprintf(sOut, "%d,%s", light, deblank(sTemp));

  Serial.println(sOut);
  delay(1000);
 

Node.js and Cassandra

Parsing the messages that come from the measuring devices is pretty repetitive stuff that causes pretty ugly code. I've learned that the hard way. To make parsing of this messages as easy as possible I've written a small utility package for parsing the messages that come from the measuring devices and it's available on npm.

Using serial ports in node.js doesn't take a lot of steps to setup:

  var serial = require( "serialport" );
  var SerialPort = serial.SerialPort;

  var portName = "/dev/tty.usb-something";

  var sp = new SerialPort(portName, {
      baudrate:9600,
      parser:serial.parsers.readline("\n")
  });

  sp.on("data", function ( data ) {
  var arduinoData = translator.parse(data);
  //...
 

To make the data handling easier and more in accordance with cassandra best practices the readings will be partitioned by date when they were recorded.

  CREATE TABLE room_data (
    day text,
    measurementtime timestamp,
    light int,
    temperature float,
    PRIMARY KEY (day, measurementtime)
  ) WITH CLUSTERING ORDER BY (measurementtime DESC);
 

Also the data will probably be more often fetched for recent time stamps with queries that have limits set on them. To make this fetching easier we've added a clustering statement above. Also to get the current light and temperature level we would just have to run the following query (no where combined with now function):

  SELECT * FROM room_data LIMIT 1;
 

After setting up the cassandra and reading the data from the serial port and parsing the data it's time to write this data into the cassandra. Analyzing the data and doing something useful with it will be in some future posts that I'll make but for now I'll stop with writing the data into cassandra:

  client.execute('INSERT INTO room_data ' + 
   '(day, measurementtime, light, temperature)' + 
   ' VALUES (?, dateof(now()), ?, ?)',
   [
    moment().format('YYYY-MM-DD'),
    arduinoData.light,
    arduinoData.temperature
   ],
   function(err, result) {
    if (err) {
     console.log('insert failed', err);
    }
   }
  );
 

On the fifth line I've used moment.js to format current time into string representation of current date used for partitioning in cassandra. The rest of the code is pretty much the usual sql stuff found in other database environments.

I recorder couple of hours worth of data here. Just in case anybody wants a sneak peak without having to setup everything up. I've exported the data out from cassandra trought cql using this command:

  COPY room_data (day, measurementtime, light, temperature) 
   TO 'room_data.csv';
 

The rest of the example is located on gitHub.

Replace a dead node in Cassandra

Note (June 2020): this article is old and not really revelant anymore. If you use a modern version of cassandra, look at -Dcassandra.replace_address_first_boot option !

I want to share some tips about my experimentations with Cassandra (version 2.0.x).

I found some documentations on datastax website about replacing a dead node, but it is not suitable for our needs, because in case of hardware crash, we will set up a new node with exactly the same IP (replace “in place”). Update : the documentation in now up to date on datastax !

If you try to start the new node with the same IP, cassandra doesn’t start with :

java.lang.RuntimeException: A node with address /10.20.10.2 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

So, we need to use the “cassandra.replace_address” directive (which is not really documented ? :() See this commit and this bug report, available since 1.2.11/2.0.0, it’s an easier solution and it works.

+    - New replace_address to supplant the (now removed) replace_token and
+      replace_node workflows to replace a dead node in place.  Works like the
+      old options, but takes the IP address of the node to be replaced.

It’s a JVM directive, so we can add it at the end of /etc/cassandra/cassandra-env.sh (debian package), for example:

JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=10.20.10.2" 

Of course, 10.20.10.2 = ip of your dead/new node.

Now, start cassandra, and in logs you will see :

INFO [main] 2014-03-10 14:58:17,804 StorageService.java (line 941) JOINING: schema complete, ready to bootstrap
INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: waiting for pending range calculation
INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: calculation complete, ready to bootstrap
INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: Replacing a node with token(s): [...]
[...]
INFO [main] 2014-03-10 14:58:17,844 StorageService.java (line 941) JOINING: Starting to bootstrap...
INFO [main] 2014-03-10 14:58:18,551 StreamResultFuture.java (line 82) [Stream #effef960-6efe-11e3-9a75-3f94ec5476e9] Executing streaming plan for Bootstrap

Node is in boostraping mode and will retrieve data from cluster. This may take lots of time.
If the node is a seed node, a warning will indicate that the node did not auto bootstrap. This is normal, you need to run a nodetool repair on the node.

On the new node :

# nodetools netstats

Mode: JOINING
Bootstrap effef960-6efe-11e3-9a75-3f94ec5476e9
    /10.20.10.1
        Receiving 102 files, 17467071157 bytes total
[...]

After some time, you will see some informations on logs !
On the new node :

 INFO [STREAM-IN-/10.20.10.1] 2014-03-10 15:15:40,363 StreamResultFuture.java (line 215) [Stream #effef960-6efe-11e3-9a75-3f94ec5476e9] All sessions completed
 INFO [main] 2014-03-10 15:15:40,366 StorageService.java (line 970) Bootstrap completed! for the tokens [...]
[...]
 INFO [main] 2014-03-10 15:15:40,412 StorageService.java (line 1371) Node /10.20.10.2 state jump to normal
 WARN [main] 2014-03-10 15:15:40,413 StorageService.java (line 1378) Not updating token metadata for /10.20.30.51 because I am replacing it
 INFO [main] 2014-03-10 15:15:40,419 StorageService.java (line 821) Startup completed! Now serving reads.

And on other nodes :

 INFO [GossipStage:1] 2014-03-10 15:15:40,625 StorageService.java (line 1371) Node /10.20.10.2 state jump to normal

Et voilà, dead node has been replaced !
Don’t forget to REMOVE modifications on cassandra-env.sh after the complete bootstrap !

Enjoy !

Hello Cassandra in node.js

Intro

Since I started to work in a team that deals with BigData stuff I came into contact with Apache Cassandra. After years in the relational world it took me some getting used to the many concepts that the Cassandra relies on. Actually in the relational world the concepts would be heavy anti patterns. I went over a couple of tutorials etc. for intro into the Cassandra data model I would recommend this video by Patrick McFadin:

C* Summit 2013: The World's Next Top Data Model


Basic setup

The easiest way to get the Cassandra is to download it from here: http://planetcassandra.org/Download/StartDownload

I somehow dislike when various applications write to /var/something and having to use the root access to install something unless it's absolutely necessary. So I followed this manual to avoid this problem.


cassandra.yaml

The Cassandra is setup out of the box to support queries coming from cql shell ("cqlsh"). The goal of this blog entry is to show how to make a simple connection from node.js to the Cassandra, so there is a bit of tweaking that has to be done in order to get all this working. The necessary configuration is located in this file:

 install_dir/conf/cassandra.yaml
The properties I had to change were (basically this allows logging in with users other than default):
 authenticator: PasswordAuthenticator
 authorizer: CassandraAuthorizer
After that going into bin directory and running cqlsh will require username & password
 ./cqlsh -u cassandra -p cassandra

Cassandra keyspace setup

 CREATE KEYSPACE test WITH REPLICATION = 
  {'class': 'SimpleStrategy', 'replication_factor': 1};

 --check if it's created with this
 DESCRIBE KEYSPACES;

 USE test;

 CREATE TABLE test_table (
  id text,
  test_value text,
  PRIMARY KEY (id)
 );

 INSERT INTO test_table (id, test_value) VALUES ('1', 'a');

 INSERT INTO test_table (id, test_value) VALUES ('2', 'b');

 INSERT INTO test_table (id, test_value) VALUES ('3', 'c');

 SELECT * FROM test_table;
 
If everything is o.k. you should see something like:
  id  | test_value
 ----+------------
   3 |          c
   2 |          b
   1 |          a

 (3 rows)
Add a testuser to make the hello world example work:
  create user testuser with password 'testuser';

  grant all on test.test_table to testuser;

node-cassandra-cql

I tried several Cassandra connection libraries from gitHub for the node.js and the one that I found most easy to work with (and setup) was node-cassandra-cql by jorgebay. The story with the project is pretty much standard. Going into new project empty directory and initializing it with init and then installing module with npm.

 npm init

 npm install node-cassandra-cql

 #copy hellocassandra.js from
 #https://github.com/msval/hellocassandrainnodejs

 node hellocassandra.js
 

Anyway here's my example on gitHub.