Search This Blog

Thursday, August 20, 2015

Collecting Apache Storm time series data with Graphite

Naturally, Apache Storm processes data as tuples over time.  This is an ideal framework to utilize for streaming data through a 'pipe line'.  However, maintaining a record of this information is not necessarily an inherent feature of storm.  For metrics collection and analysis, I used Graphite.  Graphite is available in the Fedora EPEL and a great option for quickly collecting some metrics and generating a graph w/ overlaying analytical functions.

The easiest way to collect information is by going straight to the source.  The Nimbus server.  Using the NimbusClient class, available from the package backtype.storm.generated, you can easily review the ClusterSummary and iterate through each TopologySummary's ID to retrieve TopologyInfo and each topology's ExecutorSummary, to determine emitted and transferred tuples. Additionally, you can get information about errors, threads, and likely other details related to Apache Storm topologies I have not collected in my example code.

First, Lets look at how I will push data in to the carbon-cache daemon.
https://github.com/charlescva/graphite-common/blob/master/src/main/java/zkCliTest.java#L44

   // TCP Stream to carbon-cache.
    private void graphite(Map metrics) {
        // Current Time-Stamp for test
    long epoch = System.currentTimeMillis()/1000;
   
        try { // output  stream to the host on default port.
        Socket conn          = new Socket("cabon-cache.novalocal", 2003);
        DataOutputStream dos = new DataOutputStream(conn.getOutputStream());
        // graphite syntax map the #ngsm to output stream.
            for  (String metric : metrics.keySet() ){
                dos.writeBytes(metric +
            " " + metrics.get(metric) +
            " " + epoch + "\n");
            }
        //CLOSED CONNECTION.
            conn.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
The above code simply opens a TCP socket to the carbon-cache, writes a Map of metrics with the current timestamp, and closes the connection.  Easy.  The Take-away here, is the syntax. '[metric_path] [value] [time]' as String.

It would be redundant for me to display all of the Java code here: https://github.com/charlescva/graphite-common/blob/master/src/main/java/zkCliTest.java#L125 which essentially takes a Nimbus client connect, and builds a model of the nimbus state with your topoloy as each child node of the root context.

So, rather than bore you.  Here is a screenshot!


As you can see, the "My Topology" is a test, and clearly a static source that is linear.  But all and all, you can quickly get some good information.  Feel free to comment, as I find this article particularly interesting.

1 comment:

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Storm , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Sangita Mohanty
    MaxMunus
    E-mail: sangita@maxmunus.com
    Skype id: training_maxmunus
    Ph:(0) 9738075708 / 080 - 41103383
    http://www.maxmunus.com/

    ReplyDelete